What is pseudonymization?

Pseudonymization replaces personal information with aliases to make data sets more private. Pseudonymous data cannot be matched to an identifiable person unless it is combined with a separate set of information.

Learning Objectives

After reading this article you will be able to:

  • Define pseudonymization
  • Explain how pseudonymization works
  • Describe why pseudonymization is not required by the GDPR
  • Contrast pseudonymization vs. anonymization

Related Content


Want to keep learning?

Subscribe to theNET, Cloudflare's monthly recap of the Internet's most popular insights!

Refer to Cloudflare's Privacy Policy to learn how we collect and process your personal data.

Copy article link

What is pseudonymization?

Pseudonymization is the process of removing personal identifiers from data and replacing those identifiers with placeholder values. It is sometimes used for protecting personal privacy or improving data security. In combination with other important privacy safeguards, such as encryption, pseudonymization can help maintain user privacy.

Generally, a "pseudonym" is a fake name used to conceal one's identity. For instance, many book authors use a pseudonym or "pen name." Data pseudonymization is somewhat like this concept, but the pseudonym values are not usually used publicly. It is also important to note that any personal information, not just a person's name, can be pseudonymized.

How does pseudonymization work?

Imagine Alice creates an account on a streaming service. As part of the signup process, the streaming service stores her name in their database. However, the service does not record her in its personal records database (let's call this Database 1) as "Alice," instead using pseudonymization to change "Alice" to "Person 17332."

Database 1:

Name Account type
Person 17332 Full membership
Person 12348 Free trial
Person 74738 VIP membership
Person 78383 Full membership

A list of names and their corresponding pseudonym is kept in a separate database (let's call this Database 2). Someone who only had access to Database 1 would be able to view the pseudonymous data but would be unable to match that data to a specific individual, such as Alice. To do so, they would also need access to Database 2, the list of names and pseudonyms.

Database 2:

Name Pseudonym
Alice Person 17332
Bob Person 12348
Carlos Person 74738
David Person 78383

Now imagine Chuck, a rogue employee at the streaming service, steals Database 1. He analyzes the data, but cannot verify the identity of any user, because the list of pseudonyms is stored separately. He cannot do much with his stolen data unless he also steals Database 2.

In this way, pseudonymization helps protect privacy and enhance security. However, identifying someone is still possible in several ways. If the identifying data is not stored separately, individuals can be identified — for instance if Chuck steals Database 2 as well, he can easily identify Alice by name. In addition, it is often possible to identify individuals in pseudonymized data by combining the data with other external data sources (imagine if Alice had posted about having a full membership on the streaming service on social media).

Pseudonymization therefore needs to be combined with other processes and technologies in order to keep data private. For example: imagine that the streaming service uses encryption to protect Databases 1 and 2, not just pseudonymization. If Chuck steals both databases, now all he can see is:

Database 1:

Name Account type
P0kOFAw20PHbOnT7oXXvlm4
lfOkGbahX+1XCv1VECrE=
nm+nauwi7eePi7ZKJH0sIeV
LbxBJgixIdL1sOXvsUnw=
88X5ceFkvcYjG+WxROkAT6X
Lh8wuqc3NctBP7mkIAYM=
w+1iufZv3OrLPb7sESpeNIu
5kzX4IVaNYz7DhpSeFKo=
Zh3MZza5QM0Q+BtNGBx7eel
MafyehzZBv5I2zdodp8E=
CGDoLDA7X/poEyTI+UWa8mu
C9bjmbMfAmwhrNZbjUbc=
WbAJpSq+GRuaVK5Qogdfa2t
WYQq2Ge2GiS1zJsmUOG8=
nm+nauwi7eePi7ZKJH0sIeV
LbxBJgixIdL1sOXvsUnw=

Database 2:

Name Pseudonym
lenaV3sVToJ8FdDHNwLIMed
0AN5I+P7KSrN3nKj8WN8=
P0kOFAw20PHbOnT7oXXvlm4
lfOkGbahX+1XCv1VECrE=
srS9OH6GK4qa33jgZx+24ZJ
ghF1BZE9Agc825l1c0lA=
88X5ceFkvcYjG+WxROkAT6X
Lh8wuqc3NctBP7mkIAYM=
ddbqSa7o561pBZzFHebo2LZ
vKrgWCKj7XM1n10/waw8=
Zh3MZza5QM0Q+BtNGBx7eel
MafyehzZBv5I2zdodp8E=
TKtTr4dDNRd+yb6f4DzUlrg
hC10OgUXlkR0X8wzkzJw=
WbAJpSq+GRuaVK5Qogdfa2t
WYQq2Ge2GiS1zJsmUOG8=

For this reason, encryption provides stronger protections against snoopers like Chuck. Learn more about privacy and encryption.

Is pseudonymization required by the GDPR?

The General Data Protection Regulation (GDPR) mentions pseudonymization as one method that can be used to protect personal data, but it does not require its use. Pseudonymization is no guarantee that privacy will be preserved, nor does it guarantee that an organization will avoid violating the GDPR.

In fact, the GDPR still considers pseudonymous data to be personal data because it can be associated with a person by adding additional information. (In the example above, one could identify Alice's membership level in Database 1 by adding the information from Database 2.) The GDPR states that:

"Personal data which have undergone pseudonymisation, which could be attributed to a natural person by the use of additional information should be considered to be information on an identifiable natural person."

So while pseudonymization can be useful for protecting data, it is not sufficient on its own for maintaining privacy or for GDPR compliance.

What is the difference between pseudonymization and anonymization?

Anonymization makes data completely anonymous. Identifying information is stripped away altogether, and unlike pseudonymization, the process ideally cannot be reversed. If the data in the example above was anonymized, all information that could identify Alice, like her name, would be removed from the database instead of it just being replaced with a pseudonym:

Name Account type
******** Full membership
******** Free trial
******** VIP membership
******** Full membership

Data anonymization helps with privacy but is not always practical or possible. If it was impossible for the example streaming service to associate accounts with specific people, they would not be able to provide their service at all.

However, there are cases when anonymization is preferable. For instance, medical researchers sometimes use aggregated healthcare data that has been anonymized in order to preserve privacy. Additionally, anonymous data can still provide valuable insights — some web analytics services anonymize their data, for instance.

But even anonymized data may not fully protect user privacy. By combining anonymized data with other data sets, by looking at the context for the data, or by using several other methods, it is sometimes possible to associate anonymous data with a specific person. Even anonymized personal data needs to be protected with encryption, access control, and other safeguards against privacy violations.