Run inference on region: Earth

Build and deploy ambitious AI applications to Cloudflare's global network

Get Started

Learn more

Full-stack AI Building Blocks

Serverless AI on GPUs

Run generative AI tasks on our global network of NVIDIA GPUs with no extra setup.

Models Included

Choose from a variety of popular models in our catalog including Llama-2, Whisper, and ResNet50.

Available everywhere

Run AI models from Workers, Pages, or anywhere via our REST API.

Supercharge with Vectorize

Generate and store embeddings in a globally distributed vector database.

AI Gateway

Improve reliability and scalability with caching, rate limiting, and analytics.

Train with R2

Build multi-cloud training architectures with free egress.

Zero to production in minutes

Less boilerplate. More fun.

Choose a template from our curated catalog of off-the-shelf models, that allow you to perform tasks including image classification, sentiment analysis, speech recognition, text generation, or translation.

Learn more

Add a vector database without breaking the bank

Speed up and scale your AI Workflows with Vectorize. Generate and store new or existing embeddings to enable search on top of your own data for repeated use with machine learning models.

Learn more

Grab your model and go

All it takes is a few lines of code with Workers AI and Vectorize to run an AI inference task on Pages using your favorite framework, Workers, or any stack via an API. Pick your model and go.

Learn more

Cloudflare powers millions of Internet properties

Enhance and protect your AI applications

Build reliable, secure, cost-effective AI architectures

No more surprise bills from your AI vendors

The AI Gateway adds a layer of control and protection in LLM applications
• Apply rate-limits and caching to protect back-end infrastructure and avoid surprise bills.
• Gain visibility into how many people are using the service.

Learn more

Train where it's cheapest with egress-free data

Cost-effective storage for training models and AI-generated assets with R2
• Egress-free storage makes multi-cloud architectures for training LLMs affordable.
• Limitless storage for the ever-growing assets generated by users.

Learn more

Get started with a template

Workers AI + Vectorize Tutorial

Build a retrieval augmented generation (RAG app) with Workers AI and Vectorize. View Github Resources >

View Tutorial

Workers + ChatGPT

Build a ChatGGPT search plugin with Notion and Pinecone. View Github Resources >

View Tutorial

Workers + LangChain

Build an LLM search app powered by Workers and Langchain. View Github Resources >

View Tutorial

SiteGPT

"We use Cloudflare for everything – storage, cache, queues, and most importantly for training data and deploying the app on the edge, so I can ensure the product is reliable and fast. It's also been the most affordable option, with competitors costing more for a single day's worth of requests than Cloudflare costs in a month."

- Bhanu Teja Pachipulusu
Founder

Get started with Cloudflare AI today

Get Started