Blog Content

Home – Blog Content

Anonymisation vs pseudonymisation: a distinction that changes everything for your AI use cases

Many organisations think they have resolved their compliance problem by “masking” data before passing it to a generative AI tool.
That is often not enough. And the confusion between Anonymisation vs. pseudonymisation data and pseudonymised data is, in most cases, the reason why.

Note : This article was written in collaboration between the SerendipAI team and attorney Nina TOGOUNA of the law firm Togouna & Tome Avocats.

Anonymised vs. Pseudonymised: the core distinction

Anonymised data no longer allows a person to be identified, and that process is irreversible. It falls outside the scope of the GDPR: no processing obligations, no consent requirements, and no retention periods apply.

Pseudonymised data involves replacing direct identifiers with a code, a number, or another substitute identifier, while the re-identification key is kept separately. It remains personal data in full, as long as the person concerned can still be identified, directly or indirectly, through additional information. All GDPR obligations continue to apply.

Pseudonymisation is a security measure recognised by the GDPR. It does not constitute anonymisation.

What personal data actually means

People often think of first and last names. The legal concept goes much further.

Personal data includes an email address, a phone number, an IP address, a customer or employee number, a photograph, a voice recording, location data, a browsing history, a cookie identifier, and a point that tends to be underestimated: any combination of information that allows a person to be identified indirectly.

That last point is precisely what creates the most blind spots in current practice.

Example 1: The village of 80 residents

Imagine a health database published with no names. It contains age, sex, postcode, and a diagnosis.

In a town of 80 people, that combination is often enough to identify someone with precision. “Female, 67 years old, postcode 55XXX, type 2 diabetes” may refer to only one person in that area.

The data is not anonymised. It remains personal data, because the person concerned can be re-identified.

What the GDPR says (Recital 26)

The reasonable risk of re-identification is the criterion that separates pseudonymisation from anonymisation.

As long as that risk exists, even without certainty, the data remains personal.

Removing a name is not sufficient if cross-referencing with other data can still lead back to the individual.

Example 2: The postcode as an identifying marker

In Luxembourg, certain postcodes correspond to very specific neighbourhoods or streets. The geographic granularity there is finer than in most other countries.

Cross-referencing a postcode with sensitive information such as religious affiliation, health status, ethnic origin, or political views can make it possible to identify individuals within a minority in that area.

Processing these special categories of personal data, known as “sensitive data”, is in principle prohibited by the GDPR, except in cases explicitly provided for, such as the express consent of the person concerned.

The issue in an LLM context

Transmitting a dataset that combines postcodes with sensitive characteristics to an external AI tool constitutes processing of sensitive data, which is in principle prohibited.

The absence of direct identifiers such as names is not sufficient to exclude the application of the GDPR, as long as the risk of re-identification has not been ruled out. Such processing requires a specific legal basis under the GDPR.

What this means for your LLM use cases

Generative AI tools, including their so-called “enterprise” versions, operate on infrastructure that is external to your information system. Any data transmitted to these tools leaves your sphere of control.

If the data is pseudonymised and not anonymised, it constitutes personal data outside the controlled environment of the data controller. This processing requires an appropriate legal basis, a signed Data Processing Agreement (DPA) with the provider, and registration in the records of processing activities.

Compliance does not depend solely on the level of service subscribed from the AI tool provider. It depends on the nature of the data transmitted, the obligations on the data controller under the GDPR, and their ability to demonstrate compliance with those obligations.

The most common exposure vectors:

  • prompts containing extracts from contracts or client files that have been partially redacted
  • internal datasets transmitted for analysis without prior verification of the actual level of anonymisation
  • uncontrolled use of consumer-facing interfaces by employees (also known as shadow AI)

What we observe in practice

Compliance incidents related to LLMs do not generally result from sophisticated attacks.

They reflect the absence of a usage framework and poor qualification of the data being transmitted.

What the AI Act adds

The GDPR governs the obligations of the data controller, and where applicable its processors, in relation to personal data processing. The AI Act governs those of providers and deployers of AI systems, with reinforced requirements for uses classified as “high risk,” such as creditworthiness assessment, candidate selection or evaluation, and medical decision-making.

For these systems, the AI Act requires both the provider and the deployer to maintain continuous, traceable documentation of the data used:

  • The provider must document the nature, origin, and protection of training, validation, and test datasets in the CE marking technical documentation.
  • The deployer must document the origin, suitability, and protection of the data used, and retain that documentation throughout the period of use of the system.

Using pseudonymised personal data does not exempt either the data controller or the processor. The intersection of the GDPR and the AI Act requires a two-level qualification, legal and technical, each giving rise to distinct obligations.

Anonymisation vs. pseudonymisation practice: three questions to ask before any LLM use

Question 1: The actual nature of the data
Is the data genuinely anonymised, or only pseudonymised?
Does a correspondence key exist somewhere in your systems, or could cross-referencing with other variables allow re-identification?
Question 2: The scope of the processing
Does the tool operate on infrastructure under your control?
Have the provider’s processing conditions been verified?
Is a DPA in place?
Question 3: Documentation and traceability
Is this use case documented in your records of processing activities?
For systems subject to the AI Act, has the nature of the input data been formalised and versioned?

To conclude

Technology is not the risk. The absence of data qualification is.

This distinction is not a legal detail reserved for compliance teams. It directly conditions the legality of a use case, the organisation’s liability in the event of an incident, and the robustness of the AI governance framework put in place.

Organisations that have built structured data classification processes will be better equipped to get value from these tools. Those that haven’t will find that AI amplifies risks they already carry.

Technology is not the risk. The absence of data qualification is.

Digital Omnibus: towards a redefinition of the status of pseudonymised data?

The European Commission proposed, as part of the Digital Omnibus published on 19 November 2025, to clarify the definition of personal data.

The proposal provides that certain pseudonymised data may, in specific situations, no longer be treated as personal data with respect to a recipient who has no reasonable means of re-identifying the persons concerned, particularly in AI development contexts.

This proposal is currently under discussion at European level. In a joint opinion published in February 2026, the EDPB and the EDPS raised reservations about the scope of this change in light of the current GDPR framework and the case law of the CJEU (https://www.edpb.europa.eu/system/files/2026-02/edpb_edps_jointopinion_202602_digitalomnibus_en.pdf).

In the absence of a final text, pseudonymised data continues to be treated as personal data subject to the GDPR.

#GDPR #AIAct #DataGovernance #LLM #Compliance #AIGovernance #PrivacyByDesign

Leave a Reply

Your email address will not be published. Required fields are marked *

Serendipai, your dedicated AI partner

A training provider recognized by the Luxembourg government and Fit4AI-accredited by Luxinnovation

 
 

 

SerendipAI

logo_luxinnovation

 

 
 

 

 

Fit4AI

logo_min_edu

 
 

 

 

© All rights reserved – Hosted by our partner LaNetCie