The A-Z of AI

The ultimate glossary of AI terms

What you say to your customers has never been more important – or more closely scrutinized – than it is now. AI is here to help you say the right thing in your brand voice at scale, and some of the most trailblazing brands in the world are using it. In our Marketing’s Missing Millions report, our research shows that 92% of marketers are willing to trust AI – but only if it can be proven effective. Sounds like there’s still some skepticism about AI that we need to clear up so that marketers can make the most of this technology.

“Is it robots? Do the robots have lasers? Are the robots angry because we programmed them incorrectly? How far can the laser beams shoot?”

Relax. Brew up a nice relaxing cup of chamomile tea and we’ll explain everything. In fact, we’ve pledged to make AI easy to understand and, dare we say, fun? Read on to learn the terms behind AI from A to Z, and you’ll be well on your way to sounding like the smartest person in the room (or Zoom) in your next marketing meeting, or the nerdiest person at your next dinner party.

| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z |


The fraction of predictions that a classification model got right. This is a metric used to assess the performance of an AI algorithm.


Also known as bots, droids or intelligent agents. Agents are autonomous software programs that respond to their environment and act on the behalf of humans to accomplish a system’s target function. When multiple agents are used together in a system, they interact with one another to achieve the goals of the overall system.

Find out more

Diagram to show how agents work

AI-Powered Copywriting

A process that leverages artificial intelligence and performance data to produce highly engaging text for the purpose of advertising or other forms of marketing. This is the service provided by Phrasee!

Find out more


A specific set of instructions which detail how a task is meant to be carried out. Examples of algorithms are computer programs or baking recipes. 

Find out more


Artificial intelligence

Artificial Intelligence (AI) is the theory and development of computer systems able to perform tasks that normally require human intelligence – e.g. visual perception or language processing.

Find out more


A mechanism that allows a deep neural network to define which parts of an input is it thinks is most important, and then put more emphasis on those parts in prediction. This has been a recent development (2018) in machine learning which has revolutionized NLP. 

Bayesian network

A model that represents and calculates the probabilistic relationships between a set of random variables and an uncertain domain via a directed acyclic graph. The nodes on the graph represent the random variables and the links between them represent their conditional dependencies.

For example, a Bayesian network can be used to calculate the probabilities of various diseases being present (the uncertain domain) based on the given symptoms (the variables).


A computer program that conducts conversations with human users by simulating how humans would behave as conversational partners.

Find out more


A method of unsupervised learning and a common statistical data analysis technique. In this method, observations that show similarities to each other are organized into groups (called clusters).

Find out more


Combinatorial explosion

A fundamental problem in computing whereby the number of combinations that a computer has to examine grows exponentially. The number of combinations can become so large that even the fastest computers aren’t able to examine them all in a conceivable time frame (we are talking hundreds of thousands of years here 🤯).


Computational creativity

A multidisciplinary research area that draws on the fields of art, science, philosophy and AI to engineer computational systems that are able to model, stimulate and replicate human creativity. For example, IBM researchers are currently exploring how computational creativity can be used in the food industry to make recipes for dishes that have never been imagined before by humans.

Find out more



Any collection of information converted into a digital form.

Find out more

Data mining

The process of combing through a data set to identify patterns and extract information. Often such patterns and information are only clear when a large enough data set is analyzed. For this reason, AI and machine learning are extremely helpful in such a process.

Decision tree

A model that uses prescriptive analytics to establish the best course of action for a given situation. The model assesses the relationships between the elements of a decision to recommend one or more possible courses of action. It may also predict what should happen if a certain action is taken.


Deep Blue (chess-playing AI)

A computer developed by IBM that is able to play chess without human guidance. It became the first computer chess-playing system to win both a chess game and chess match against a human world champion, Garry Kasparov, in 1997.

Find out more

Deep learning

A subset of AI and machine learning in which neural networks are “layered”, i.e. combined with plenty of computing power, and given a large volume of training data to create extremely powerful learning models capable of processing data in new and exciting ways, e.g. advancing the fields of computer vision and natural language processing.

Find out more

Descriptive model

A summary of a data set that describes its main features and quantifies relationships in the data. Some common measures used to describe a data set are measures of central tendency (mean, median and mode).



When training a neural network, one epoch is one cycle through the full training set. Neural networks are usually trained using multiple epochs. 


An input variable used in making predictions, for example, some features used to predict if subject line is high performing could be length, number of emojis, etc.

Feature engineering

The process of taking raw data, determining useful features to be included in a statistical model, and converting these features into a machine-readable format. 

Genetic algorithm

A method for solving optimization problems by mimicking the process of natural selection and biological evolution. The algorithm randomly selects pairs of individuals from the population to be used as parents. These are then crossed over to create a new generation of two individuals, or children. This process is repeated until the optimization problem is solved.



A deep neural network built by OpenAI. It is a language model that can be used to generate text. With 175 billion parameters, at the time of its release it is the largest neural network ever trained on human language. 


Graphics processing unit. This is a piece of hardware that exists in many computers which primarily handles the rendering of images on screens. However, data scientists have repurposed this technology to train large neural networks since it is able to do math very quickly.


A simple rule-based algorithm used to solve a problem. For example, an algorithm that determines if an animal is a dog using a list of questions about its appearance, e.g. does it have four legs, a tail, fur? If the questions are all answered as “yes”, then the algorithm classifies the animal as a “dog”. 


A variable of a model that the machine learning system cannot learn through successive training iterations on its own, e.g. the number of layers, the learning rate, the number of epochs, etc. 

Inductive logic programming (ILP)

An approach to machine learning whereby hypothesized logic is developed based on known background knowledge and a set of both positive and negative examples of what is and isn’t true.


Inductive reasoning

The ability to derive key generalized conclusions or theories by analyzing patterns in a large data set.



The Jacobian of a set of functions is a matrix of partial derivatives of those functions. This is often used in machine learning to find the gradient (rate of change) of your loss (the function you want to minimize) with respect to your parameters – this is how a neural network learns. 


A popular clustering algorithm – this algorithm uses an iterative approach to find the centroids of k clusters of data points. All the data points in the set are then assigned to the cluster whose centroid is closest them. 


In supervised learning (where the correct result is known), the label is the answer or results portion of an example.

Loebner prize

An annual competition that awards a prize to the computer program which performs the highest in a standard Turing test. The contest was launched in 1990 by Hugh Loebneran inventor and outspoken social activist. 

Machine learning

A subfield of AI in which algorithms “learn” how to complete a specified task. The traditional approach to computer programming relies on explicit instructions written by a human. In contrast, machine learning uses statistical pattern recognition and inference to derives its own mapping between input data and output data.

Find out more


A representation of what a machine learning system has learned during training. This could be a simple set of rules or a very complex, multi-dimensional neural network.

Natural language generation (NLG)

A subfield of AI and natural language processing in which algorithms attempt to generate language that is comprehensible and human sounding. Some NLG systems are based on rules and templates. Other NLG systems are statistical in nature. For example, generative deep learning models can be used to predict the next word in a sentence by considering all possible words and choosing the word with highest probability.

Find out more


Natural language processing (NLP)

Natural language processing is an interdisciplinary branch of study between computer science and linguistics that focuses on giving the computer the ability to read, write, listen and understand day-to-day language used by humans.


Neural network

An approach to machine learning that is loosely based on biological neural networks in the brain. Neural networks learn nonlinear relationships between input data and output data.

Find out more


A situation where a machine learning model is tuned to the training data too well. In other words, it fits the model to patterns in the data that are due to noise, as opposed to the true relationship between the input and output. Overfit models have poor performance when tested on unseen, real-world data

Find out more


Optical character recognition (OCR)

A computer system that takes images of typed, handwritten, or printed text and converts them into machine-readable text. For example, when you deposit a check at an ATM (or on your phone), OCR software is used to recognize the information written on the check.


A variable of a model that the machine learning system learns through successive training iterations on its own, e.g. weights in a neural network.

Predictive analysis

The act of analyzing current and past data to look for patterns that can help make predictions about future events or performance.

Predictive model

A model that uses observations measured in a sample to predict the probability that a different sample or remainder of the population will exhibit the same behavior or have the same outcome.

QA model

A ‘question answering (QA)’ model is a statistical model which automatically responds to questions posed by humans with human-sounding answers. Most QA models build a knowledge base which they are able to query in order to find the correct response to any given question. 

Recurrent neural network

A type of neural network in which recorded data and outcomes are fed back through the network, forming a cycle. This process allows the network to use its internal memory to sort through data as it goes.


A statistical method used to determine the relationships between input (independent) and output (dependent) variables.

Find out more


Reinforcement learning

A type of machine learning in which machines are “taught” to achieve their target function through a process of experimentation and reward. In reinforcement learning, the machine receives positive reinforcement when its processes produce the desired result and negative reinforcement when they do not.


Supervised learning

A type of machine learning in which examples of input/output pairs are provided to the machine learning algorithm. The goal of the algorithm is to learn the relationship between the input and output.

Swarm intelligence

An approach to artificial intelligence that is based on the idea that when individual agents come together, the interactions between them lead to the emergence of a more “intelligent” collective behavior. It stems from the natural behavior of animals such as bees, which combine into swarms to work more intelligently.


Test data set

In machine learning, the test data set is the data given to the machine after the training and validation phases have been completed. The test data set is used to check the performance characteristics of the algorithms produced after the completion of the first two phases when presented with unknown data. This will give a good indication of the accuracy, sensitivity and specificity of the algorithm’s predictive powers.

Training data set

In machine learning, the training data set is the data given to the machine during the initial training or “learning” phase. From this data set, the machine is meant to gain some insight into options for the efficient completion of its assigned task through identifying relationships in the data.



A structure of natural language model which leverages multiple attention mechanisms to build an understanding of unstructured text. This model architecture was published in 2018 by researchers at Google and sparked a wave of NLP research which we are still riding today! 

Turing test

A test developed by Alan Turing in 1950. It is a test of a machine’s ability to exhibit behavior that is indistinguishable from that of a human. The test is based on a process in which a series of judges attempt to discern interactions with a control (human) from interactions with the machine (computer) being tested.

Find out more

Unsupervised learning

A type of machine learning in which no examples are provided of the desired output. In unsupervised learning, the machine is left to identify patterns and draw its own conclusions from the data sets it is given.

Validation data set

In machine learning, the validation data set is the data given to the machine after the initial learning phase has been completed. The validation data is used to identify which of the relationships identified during the learning phase will be the most effective to use in predicting future performance.


The learned parameters of a neuron in a neural network. These tell the neurons how responsive they should be to any specific feature at the input. The neural network learns by iteratively updating these weights in order to minimize a target function (often called a loss function). 


An open-source implementation of gradient boosting. Gradient boosting is a machine learning technique based on an ensemble of decision trees. It is a popular and powerful library that is often a strong alternative to neural networks.  

Yann LeCunn

A prominent deep learning researcher who is the Chief AI Scientist at Facebook. He is also one of very few options for “Y” in this glossary. 


A common measurement in statistics. It is the number of standard deviations that a raw score is above or below the mean value of all samples in a set. 

Discover how applying actual AI to language can deliver big results!