Run inference on region: Earth
- Build and deploy ambitious AI applications to Cloudflare's global network
Full-stack AI Building Blocks
Serverless AI on GPUs
Run generative AI tasks on our global network of NVIDIA GPUs with no extra setup.
Models Included
Choose from a variety of popular models in our catalog including Llama-2, Whisper, and ResNet50.
Available everywhere
Run AI models from Workers, Pages, or anywhere via our REST API.
Supercharge with Vectorize
Generate and store embeddings in a globally distributed vector database.
AI Gateway
Improve reliability and scalability with caching, rate limiting, and analytics.
Train with R2
Build multi-cloud training architectures with free egress.
Zero to production in minutes
Less boilerplate. More fun.
Choose a template from our curated catalog of off-the-shelf models, that allow you to perform tasks including image classification, sentiment analysis, speech recognition, text generation, or translation.
Add a vector database without breaking the bank
Speed up and scale your AI Workflows with Vectorize. Generate and store new or existing embeddings to enable search on top of your own data for repeated use with machine learning models.
Grab your model and go
All it takes is a few lines of code with Workers AI and Vectorize to run an AI inference task on Pages using your favorite framework, Workers, or any stack via an API. Pick your model and go.
Cloudflare powers millions of Internet properties
Enhance and protect your AI applications
Build reliable, secure, cost-effective AI architectures
No more surprise bills from your AI vendors
The AI Gateway adds a layer of control and protection in LLM applications
• Apply rate-limits and caching to protect back-end infrastructure and avoid surprise bills.
• Gain visibility into how many people are using the service.
Train where it's cheapest with egress-free data
Cost-effective storage for training models and AI-generated assets with R2
• Egress-free storage makes multi-cloud architectures for training LLMs affordable.
• Limitless storage for the ever-growing assets generated by users.
Get started with a template
Workers AI + Vectorize Tutorial
Build a retrieval augmented generation (RAG app) with Workers AI and Vectorize. View Github Resources >
Workers + ChatGPT
Build a ChatGGPT search plugin with Notion and Pinecone. View Github Resources >
Workers + LangChain
Build an LLM search app powered by Workers and Langchain. View Github Resources >
SiteGPT
"We use Cloudflare for everything – storage, cache, queues, and most importantly for training data and deploying the app on the edge, so I can ensure the product is reliable and fast. It's also been the most affordable option, with competitors costing more for a single day's worth of requests than Cloudflare costs in a month."
- Bhanu Teja Pachipulusu
Founder