Cache-control is an HTTP header that dictates browser caching behavior. In a nutshell, when someone visits a website, their browser will save certain resources, such as images and website data, in a store called the cache. When that user revisits the same website, cache-control sets the rules which determine whether that user will have those resources loaded from their local cache, or whether the browser will have to send a request to the server for fresh resources. In order to understand cache-control in greater depth, a basic understanding of browser caching and HTTP headers is required.
As explained above, browser caching is when a web browser saves website resources so it doesn’t have to fetch them again from a server. For example, a background image on a website might be saved locally in cache so that when a user visits that page for the second time, the image will load from the user’s local files and the page will load much faster.
Browsers will only store these resources for a specified period of time, known as the Time To Live (TTL). If a user requests a cached resource after the TTL has expired, the browser will have to reach out to the server again and download a fresh copy of the resource. How do browsers and web servers know the TTL for each resource? This is where HTTP headers come into play.
The Hypertext Transfer Protocol (HTTP) outlines the syntax for communications on the World Wide Web, and this communication consists of requests from clients to servers and responses from servers back to clients. These HTTP requests and responses each come stamped with a series of key-value pairs called headers.
These headers contain a lot of important information about each communication. For example, a request header usually contains:
Response headers often include information on:
A cache-control header can appear in both HTTP requests and responses.
Headers consist of key-value pairs which are separated by a colon. For cache-control, the ‘key’, or the part to the left of the colon, is always ‘cache-control’. The ‘value’ is what’s found on the right of the colon, and there can be one or several comma-separated values for cache control.
These values are called directives, and they dictate who can cache a resource as well as how long those resources can be cached before they must be updated. Below we go through some of the most common cache-control directives:
A response with a ‘private’ directive can only be cached by the client and never by an intermediary agent, such as a CDN or a proxy. These are often resources containing private data, such as a website displaying a user’s personal information.
Conversely, the ‘public’ directive means the resource can be stored by any cache.
A response with a ‘no-store’ directive cannot be cached anywhere, ever. This means that every time a user requests this data, a request must be sent to the origin server for a fresh copy. This directive is typically reserved for resources that contain extremely sensitive data, such as bank account information.
This directive means that cached versions of the requested resource cannot be used without first checking to see if there is an updated version. This is typically done using an ETag.
An ETag is another HTTP header which contains a token unique to the version of the resource at the time it was requested. This token is changed on the origin server whenever the resource is updated.
When a user returns to a page with a ‘no-cache’ resource, the client will always have to connect to the origin server and compare the ETag on the cached resource with one on the server. If the ETags are identical, the cached resource will be provided to the user. If not, this means that the resource has been updated and the client will need to download a fresh version to provide to the user. This process ensures that the user is always getting the most up-to-date version of that resource without requiring unnecessary downloads.
This directive dictates the time to live, in other words how many seconds a resource can be served from cache after it's been downloaded. For example, if the max age is set to 1800, this means that for 1,800 seconds (30 minutes) after the resource was first requested from the server, the user will be served a cached version of that resource on subsequent requests. If the user requests the resource again after that 30 minutes has expired, the client will have to request a fresh copy from the origin server.
The ‘s-maxage’ directive is specifically for shared caches such as CDNs, and it dictates how long those shared caches can keep serving up the resource from cache. This directive overrides the ‘max-age’ directive for individual clients.
Browser caching is a great way to both preserve resources and improve user experience on the Internet, but without cache-control, it would be very brittle. Every resource on every site would be bound by the same caching rules, meaning that sensitive information would be cached the same way as public information, and frequently-updated resources would be cached for the same amount of time as ones that rarely change.
Cache-control adds the flexibility that makes browser caching truly useful, letting developers dictate how each resource will be cached. It also lets developers set special rules for intermediaries, which is a factor in why sites that use a CDN, like the Cloudflare CDN, tend to perform better than sites that don’t.