The Internet is borderless and decentralized. It allows information to cross the globe in a matter of milliseconds, making services and business models commonplace that would have been unimaginable a few decades ago.
But the legal and regulatory reality that organizations face concerning information and data is far more complicated. Data privacy concerns have prompted the creation and enforcement of stringent privacy regulations, such as the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA). Well over 100 countries have passed a unique set of data regulation laws, each implementing their own framework for how data can cross their borders.
Data regulations aimed at protecting consumer privacy are sometimes hard to interpret, constantly changing, and difficult to comply with given that the globally interconnected network of the Internet does not recognize national borders. Data regulations vary across the world, and at times, by industry, making it difficult for organizations to adhere to the latest standards, necessary certifications, and physical data storage requirements.
Caught between these two realities, many organizations lean on data localization: the practice of keeping data within a given region, rather than allowing it to cross the world or leave a certain cloud region for processing and storage.
Yet data localization introduces its own set of challenges.
Modern organizations have four main choices for where they want to run their applications:
On-premise data center
Hybrid cloud infrastructure
Which model they choose has a major impact on both how their business will scale and on how data localization can be implemented.
1. On-premise data center: Storing data from in-region customers in an on-premise data center makes localization relatively simple. As long as the on-premise infrastructure is adequately protected, the data within remains local.
But for out-of-region customers, an on-premise approach makes data localization all but impossible. To serve those customers, their data must be brought into the internal data center, and out of the region of the data's origin.
2. Public cloud: In many ways, public cloud computing makes serving a global audience simpler compared to on-premise computing, since cloud-based applications can run on servers in a wide range of global regions. However, cloud computing also offers less visibility into where data is processed, creating a challenge for organizations that want control over where data goes.
Organizations that use public cloud computing and want to localize their data should consider where their public cloud vendor's cloud regions are located. A "cloud region" is the area where a cloud provider's servers are physically located in data centers. Restricting data to a given cloud region should make localization possible. However, not all public cloud providers will have data centers in the required regions, and not all can guarantee that the data will not leave the region.
3. Private cloud: Like the on-premise data center model, a private cloud model partially solves the problem of data localization: if the cloud is located within the required region, then the data within is localized as a matter of course. But customers outside the cloud region cannot have their data localized unless additional private clouds are configured within their region as well. Running a private cloud in every region where an organization’s customers reside can become expensive to maintain. (Private clouds cost more than public clouds since the cost for the physical infrastructure is not carried by multiple cloud customers.)
4. Hybrid infrastructure: Similar data localization challenges to what has already been described apply to hybrid models as well. Organizations often struggle to ensure data goes to the right place in a hybrid cloud model — especially a challenge when synchronizing data across multiple different cloud platforms and types of infrastructure.
Keeping all infrastructure within one region inhibits the ability to reach a global audience; conversely, maintaining infrastructure all around the world is untenable for most organizations.
The best approach is to partner with a global edge network — either a CDN vendor or a vendor that offers additional services along with CDN caching — that is infrastructure-agnostic. This allows websites and applications to scale up to global audiences, no matter whether they use a hybrid cloud, public cloud, private cloud, or on-premise model.
Without granular control over where data is processed, data localization is not possible. But without a widely distributed non-local presence, serving a global audience is not possible. Localizing and globalizing are two opposite abilities — but ideally, a data localization partner will be able to offer both simultaneously.
For organizations that need to localize data, the end goal is controlling where data is processed and stored. Organizations must evaluate edge network vendors to make sure they allow for localized control of where data goes and how it is processed.
Organizations that collect user data use encryption to protect that data both in transit and at rest, so only authorized parties can view, process, or alter it. For data that crosses networks, the encryption protocol in widest use today is Transport Layer Security (TLS). TLS relies on asymmetric encryption, which requires two keys: a public key and a private key. While the public key is made available to the entire Internet, the private key is kept secret.
Where the private key is stored determines where encrypted data, including potentially sensitive data, is decrypted. This is important for localization because once data is decrypted, it becomes visible to any parties with access to the decrypted data.
TLS encryption is strong enough to stand up to encryption-breaking attempts from almost anyone. This means that data encrypted with TLS can safely traverse areas outside of the localized region — as long as it remains encrypted. To ensure that decryption only takes place within a designated region, organizations require two crucial capabilities:
Capability 1: Local TLS key storage. Organizations need to be able to keep their private keys on servers within the localized region. This ensures that data encrypted with TLS can only be decrypted and viewed within that region. If an organization uses an outside vendor for TLS, that vendor needs to offer keyless SSL to make sure the key does not leave the organization's infrastructure.
Capability 2: Proxying encrypted connections. Organizations that want to implement data localization practically need to combine localized private key storage with a global network that can efficiently proxy encrypted connections from their customers to the places where private keys are stored and TLS can safely terminate.
Once data is localized, organizations must take precautions to ensure it remains in its localized region. Internal access control is extremely important for keeping data localized, especially for organizations with an international presence. If an employee outside the localized region accesses data from within the region, this counteracts all that was done to keep the data local.
Unfortunately, today many organizations have legacy authorization systems in place that trust anyone within the corporate network, regardless of their location. This setup, known as the castle-and-moat model (with the network perimeter being the moat), does not easily map onto a data localization approach. If anyone in the organization can access data, regardless of location, the data might as well not be localized.
Organizations can solve this by treating the location as an authorization factor for accessing data.
This is easier to implement when organizations adopt a Zero Trust model rather than a castle-and-moat model. In a Zero Trust model, no user or device is trusted by default, even from inside the corporate network. Several factors can be evaluated by the Zero Trust solution before it grants access: device posture, user identity and privileges, location, and more.
In edge computing, applications run on an edge network with many points of presence, rather than in a few isolated data centers. This offers the advantage of running code all around the world, simultaneously serving users more efficiently and processing data as close to those users as possible. This aspect of edge computing makes localization more feasible.
Another advantage of edge computing, from the localization and regulatory compliance standpoint, is that different code can run at different parts of the edge. This makes for both effective data localization and localized regulatory compliance: slightly different application functions can be deployed in different regions, depending on the regulations within those regions.
Privacy on the Internet is critical to the safety and security of our personal and professional lives, yet the Internet was not built with privacy in mind. As a result, fear around how Internet-based technology companies handle data and privacy abounds.
Cloudflare’s mission to help build a better Internet includes a focus on fixing this fundamental design flaw by building privacy-enhancing products and technologies.
The Cloudflare Data Localization Suite ingests traffic at over 200 locations around the globe, then forwards all traffic for localization customers to data centers within the localized region. Traffic is neither inspected nor decrypted until it reaches an in-region data center. With Geo Key Manager, customers can keep their TLS keys within a specified region. This enables Cloudflare customers to combine the benefits of relying on a global network for performance, security, and availability with the need for localization.
This article is part of a series on the latest trends and topics impacting today’s technology decision-makers.