AI inference is when an AI model produces predictions or conclusions. AI training is the process that enables AI models to make accurate inferences.
After reading this article you will be able to:
Copy article link
In the field of artificial intelligence (AI), inference is the process that a trained machine learning model* uses to draw conclusions from brand-new data. An AI model capable of making inferences can do so without examples of the desired result. In other words, inference is an AI model in action.
An example of AI inference would be a self-driving car that is capable of recognizing a stop sign, even on a road it has never driven on before. The process of identifying this stop sign in a new context is inference.
Another example: A machine learning model trained on the past performance of professional sports players may be able to make predictions about the future performance of a given sports player before they are signed to a contract. Such a prediction is an inference.
*Machine learning is a type of AI.
To get to the point of being able to identify stop signs in new locations (or predict a professional athlete's performance), machine learning models go through a process of training. For the autonomous vehicle, its developers showed the model thousands or millions of images of stop signs. A vehicle running the model may have even been driven on roads (with a human driver as backup), enabling it to learn from trial and error. Eventually, after enough training, the model was able to identify stop signs on its own.
Almost any real-world application of AI relies on AI inference. Some of the most commonly used examples include:
At its essence, AI training involves feeding AI models large data sets. Those data sets can be structured or unstructured, labeled or unlabeled. Some types of models may need specific examples of inputs and their desired outputs. Other models — such as deep learning models — may only need raw data. Eventually the models learn to recognize patterns or correlations, and they can then make inferences based on new inputs.
As training progresses, developers may need to fine-tune the models. They have it provide some inferences right after the initial training process, then correct the outputs. Imagine an AI model has been tasked to identify the photos of dogs from a data set of pet photographs. If the model instead identifies photos of cats, it needs some tuning.
AI programs extend the capabilities of computers to far beyond what they were able to do previously. But this comes at the cost of using much more processing power than traditional computer programs — just as, for a person, solving a complex mathematical equation requires more focus and concentration than solving "2 + 2."
Training an AI model can be very expensive in terms of compute power. But it is more or less a one-time expense. Once a model is properly trained, it ideally does not need to be trained further.
Inference, however, is ongoing. If a model is actively in use, it is constantly applying its training to new data and making additional inferences. This takes quite a bit of compute power and can be very expensive.
Cloudflare Workers AI offers developers access to GPUs all over the globe for running AI tasks. This pairs with Vectorize, a service for generating and storing embeddings for machine learning models. Cloudflare also offers cost-effective object storage for maintaining collections of training data — R2, a zero-egress-fee storage platform.
Learn more about how Cloudflare enables developers to run AI inference at the edge.