Pseudonymization replaces personal information with aliases to make data sets more private. Pseudonymous data cannot be matched to an identifiable person unless it is combined with a separate set of information.
Pseudonymization is the process of removing personal identifiers from data and replacing those identifiers with placeholder values. It is sometimes used for protecting personal privacy or improving data security. In combination with other important privacy safeguards, such as encryption, pseudonymization can help maintain user privacy.
Generally, a "pseudonym" is a fake name used to conceal one's identity. For instance, many book authors use a pseudonym or "pen name." Data pseudonymization is somewhat like this concept, but the pseudonym values are not usually used publicly. It is also important to note that any personal information, not just a person's name, can be pseudonymized.
Imagine Alice creates an account on a streaming service. As part of the signup process, the streaming service stores her name in their database. However, the service does not record her in its personal records database (let's call this Database 1) as "Alice," instead using pseudonymization to change "Alice" to "Person 17332."
A list of names and their corresponding pseudonym is kept in a separate database (let's call this Database 2). Someone who only had access to Database 1 would be able to view the pseudonymous data but would be unable to match that data to a specific individual, such as Alice. To do so, they would also need access to Database 2, the list of names and pseudonyms.
Now imagine Chuck, a rogue employee at the streaming service, steals Database 1. He analyzes the data, but cannot verify the identity of any user, because the list of pseudonyms is stored separately. He cannot do much with his stolen data unless he also steals Database 2.
In this way, pseudonymization helps protect privacy and enhance security. However, identifying someone is still possible if the identifying data is not stored separately — if Chuck steals Database 2 as well, he can identify Alice by name.
Pseudonymization therefore needs to be combined with other processes and technologies in order to keep data private. For example: imagine that the streaming service uses encryption to protect Databases 1 and 2, not just pseudonymization. If Chuck steals both databases, now all he can see is:
For this reason, encryption provides stronger protections against snoopers like Chuck. Learn more about privacy and encryption.
Anonymization makes data completely anonymous. Identifying information is stripped away altogether, and unlike pseudonymization, the process usually cannot be reversed. If the data in the example above was anonymized, Alice's name would be removed from the database instead of replaced with a pseudonym:
Data anonymization helps with privacy but is not always practical or possible. If it was impossible for the example streaming service to associate accounts with specific people, they would not be able to provide their service at all.
However, there are cases when anonymization is preferable. For instance, medical researchers sometimes use aggregated healthcare data that has been anonymized in order to preserve privacy. Additionally, anonymous data can still provide valuable insights — some web analytics services anonymize their data, for instance.
But even anonymized data may not fully protect user privacy. By combining anonymized data with other data sets, by looking at the context for the data, or by using several other methods, it is sometimes possible to associate anonymous data with a specific person. Even anonymized personal data needs to be protected with encryption, access control, and other safeguards against privacy violations.
After reading this article you will be able to:
Right to be forgotten
Encryption and privacy