What is pseudonymization?

Pseudonymization replaces personal information with aliases to make data sets more private. Pseudonymous data cannot be matched to an identifiable person unless it is combined with a separate set of information.

Share facebook icon linkedin icon twitter icon email icon
  • Data privacy
  • Encryption and privacy
  • Cookies
  • Compliance
  • Glossary

Pseudonymization

Learning Objectives

After reading this article you will be able to:

  • Define pseudonymization
  • Explain how pseudonymization works
  • Contrast pseudonymization vs. anonymization

What is pseudonymization?

Pseudonymization is the process of removing personal identifiers from data and replacing those identifiers with placeholder values. It is sometimes used for protecting personal privacy or improving data security. In combination with other important privacy safeguards, such as encryption, pseudonymization can help maintain user privacy.

Generally, a "pseudonym" is a fake name used to conceal one's identity. For instance, many book authors use a pseudonym or "pen name." Data pseudonymization is somewhat like this concept, but the pseudonym values are not usually used publicly. It is also important to note that any personal information, not just a person's name, can be pseudonymized.

How does pseudonymization work?

Imagine Alice creates an account on a streaming service. As part of the signup process, the streaming service stores her name in their database. However, the service does not record her in its personal records database (let's call this Database 1) as "Alice," instead using pseudonymization to change "Alice" to "Person 17332."

Database 1:

Name Account type
Person 17332 Full membership
Person 12348 Free trial
Person 74738 VIP membership
Person 78383 Full membership

A list of names and their corresponding pseudonym is kept in a separate database (let's call this Database 2). Someone who only had access to Database 1 would be able to view the pseudonymous data but would be unable to match that data to a specific individual, such as Alice. To do so, they would also need access to Database 2, the list of names and pseudonyms.

Database 2:

Name Pseudonym
Alice Person 17332
Bob Person 12348
Carlos Person 74738
David Person 78383

Now imagine Chuck, a rogue employee at the streaming service, steals Database 1. He analyzes the data, but cannot verify the identity of any user, because the list of pseudonyms is stored separately. He cannot do much with his stolen data unless he also steals Database 2.

In this way, pseudonymization helps protect privacy and enhance security. However, identifying someone is still possible if the identifying data is not stored separately — if Chuck steals Database 2 as well, he can identify Alice by name.

Pseudonymization therefore needs to be combined with other processes and technologies in order to keep data private. For example: imagine that the streaming service uses encryption to protect Databases 1 and 2, not just pseudonymization. If Chuck steals both databases, now all he can see is:

Database 1:

Name Account type
P0kOFAw20PHbOnT7oXXvlm4
lfOkGbahX+1XCv1VECrE=
nm+nauwi7eePi7ZKJH0sIeV
LbxBJgixIdL1sOXvsUnw=
88X5ceFkvcYjG+WxROkAT6X
Lh8wuqc3NctBP7mkIAYM=
w+1iufZv3OrLPb7sESpeNIu
5kzX4IVaNYz7DhpSeFKo=
Zh3MZza5QM0Q+BtNGBx7eel
MafyehzZBv5I2zdodp8E=
CGDoLDA7X/poEyTI+UWa8mu
C9bjmbMfAmwhrNZbjUbc=
WbAJpSq+GRuaVK5Qogdfa2t
WYQq2Ge2GiS1zJsmUOG8=
nm+nauwi7eePi7ZKJH0sIeV
LbxBJgixIdL1sOXvsUnw=

Database 2:

Name Pseudonym
lenaV3sVToJ8FdDHNwLIMed
0AN5I+P7KSrN3nKj8WN8=
P0kOFAw20PHbOnT7oXXvlm4
lfOkGbahX+1XCv1VECrE=
srS9OH6GK4qa33jgZx+24ZJ
ghF1BZE9Agc825l1c0lA=
88X5ceFkvcYjG+WxROkAT6X
Lh8wuqc3NctBP7mkIAYM=
ddbqSa7o561pBZzFHebo2LZ
vKrgWCKj7XM1n10/waw8=
Zh3MZza5QM0Q+BtNGBx7eel
MafyehzZBv5I2zdodp8E=
TKtTr4dDNRd+yb6f4DzUlrg
hC10OgUXlkR0X8wzkzJw=
WbAJpSq+GRuaVK5Qogdfa2t
WYQq2Ge2GiS1zJsmUOG8=

For this reason, encryption provides stronger protections against snoopers like Chuck. Learn more about privacy and encryption.

What is the difference between pseudonymization and anonymization?

Anonymization makes data completely anonymous. Identifying information is stripped away altogether, and unlike pseudonymization, the process usually cannot be reversed. If the data in the example above was anonymized, Alice's name would be removed from the database instead of replaced with a pseudonym:

Name Account type
******** Full membership
******** Free trial
******** VIP membership
******** Full membership

Data anonymization helps with privacy but is not always practical or possible. If it was impossible for the example streaming service to associate accounts with specific people, they would not be able to provide their service at all.

However, there are cases when anonymization is preferable. For instance, medical researchers sometimes use aggregated healthcare data that has been anonymized in order to preserve privacy. Additionally, anonymous data can still provide valuable insights — some web analytics services anonymize their data, for instance.

But even anonymized data may not fully protect user privacy. By combining anonymized data with other data sets, by looking at the context for the data, or by using several other methods, it is sometimes possible to associate anonymous data with a specific person. Even anonymized personal data needs to be protected with encryption, access control, and other safeguards against privacy violations.