Pseudonymised Data

Pseudonymisation takes the most identifying fields within a database and replaces them with artificial identifiers, or pseudonyms. For example a name is replaced with a unique number. The purpose is to render the data record less identifying and therefore reduce concerns with data sharing and data retention. Pseudonymised Data is typically used for analytics and data processing, often with the aim of improving processing efficiency. The process can also be used as part of a Data Fading policy.

Pseudonymised Data is not the same as Anonymised Data. When data has been pseudonymised it still retains a level of detail in the replaced data that should allow tracking back of the data to its original state. With anonymised data the level of detail is reduced rendering a reverse compilation impossible.

Care must be taken with personal data because patterns in data may infer meanings that allow reconstruction of the source data. For example with a postcode you may infer the street name, and a postcode with the street number a specific property. It is prudent to protect Pseudonymised Data with encryption algorithms such as Elliptic Curve Diffie-Hellman Exchange (ECDHE) and ideally with the use of Forward Secrecy to safeguard sets of data. The GDPR encourages the use of pseudonymisation to reduce the risk to data subjects.

The choice of which data fields are to be pseudonymised is sometimes subjective. Pseudonymised Data should include all fields that are highly selective, for example a social security or national insurance number. Less selective fields, such as birth date, zip code or postcode are often also included because they may retain sufficient detail to allow an Inference Attack, where such data is cross-referenced with other data sets, to reveal the replaced data. However pseudonymising these less identifying fields can affect analysis and new data fields are often inserted, such as region instead of address, or year of birth instead of birth date.