Opinion European Data Protection Board: AI models fall under GDPR

On December 17, 2024, the European Data Protection Board (EDPB) adopted an important guideline for the AI community: Opinion 28/2024. The EDPB is an institution, established in the GDPR, that represents all the Data Protection Authorities in the EU member states. Its opinions provide guidance with high impact for the enforcement of data protection rules and principles. These opinions are therefore always very important for those who develop and deploy applications and devices that process personal data (and for those who do research in this area). What makes Opinion 28/2024 extra special is the fact that it was addressing a range of questions that arose out of the AI development community.

A link to the Opinion 28/2024 document is available below in the section 'More information'.

The Irish Data Protection Authority had intervened in the practice of training GROK with user data: GROK being an AI model of X. That intervention created doubt on the legitimacy of this training practice under the GDPR, on the side of X but also Meta. These companies warned against the effect of a potential ban. If this would be prohibited, so they stated, AI models would be significantly poorer to serve the nuances of European needs. So, in order to resolve this matter for the entire EU, the Irish DPA requested an opinion from the EDPB, which we now have.

Obviously, Opinion 28/2024 merits the attention of our AI MAPS consortium and the wider audience. In this blogpost I will address only one topic, which is the applicability of the rules and principles of the General Data Protection Regulation on AI models. A second blogpost - in the pipeline - will focus on the legal basis for processing personal data in the context of development and deployment of AI models. The crucial legal point, flowing from application of the GDPR on AI model training, is indeed whether this can be covered by one out of the limited list of justifications, that the regulation offers in art. 6. For now, we turn to the preliminary issue of applicability of the GDPR.

The EDPB’s first and highly Important finding is that AI models trained on personal data are not by nature exempted from GDPR criteria for legitimate processing. Theoretically this is an important step, because it includes software configurations in the legal mechanisms that are intended to protect not the software but the information in the data. What that would mean for AI on hardware (chips) is possibly a follow-up question, but that is not the issue now.

The EDPB refers to the ‘memory’ of the model, from which data, used for training, could be extracted. Additionally, also the outputs of the model – the inferences – have to be considered. All this needs careful assessment for the question whether the GDPR applies to a concrete AI model. Since it is clear that these data are not by default anonymous, AI models are to be vetted against the GDPR norms.

When the data retrieved from AI models are indeed anonymous, the model does not fall under the GDPR norms. Clearly, the anonymity of the data is the key factor as well as a key challenge. For this reason, additional clarification would have been welcome on a specific detail of the standard of anonymization under the GDPR. Data that cannot lead back to a person, directly or indirectly identifiable, are anonymous. Particularly for sensitive data (biometrics, health data etc.), this standard is generally appreciated as stricter than under US law. Supposedly, the GDPR requires exclusion of re-identification with the help of any technological tool, such as model inversion attacks. The relevant text expresses that all tools “reasonably likely to be used” must be taken into consideration. According to the Opinion 28/2024, we have to take into account the status quo as well as technological developments.

So, the EDPB states that the future status of the technology must be included in the test. This part of the opinion re-iterates the case-law of the EU Court of Justice, e.g. a judgment of March 7, 2024. However, neither this case law, nor the EDPB give clear guidance on the scope of these “technological developments”. How deep do we have to look into the crystal ball of the future? How far ahead must we speculate on what AI models might bring as re-identification tools? In my opinion, the phrase “reasonably likely” also applies to what must be taken into account in “the technological developments”. I suggest that only technological developments, reasonably to be expected to become widely available in the short or medium term, in view of the status in computer science, need to be considered, not more. Otherwise, the EDPB’s phrase “reasonably likely to be used” for means of identification would effectively be meaningless and an absolute exclusion of re-identification would indeed be the norm. A clear statement of the EDPB to this effect would clarify the status of the law for AI model developers as well as the data protection authorities. No doubt on the interpretation of the GDPR would be good for the national oversight authorities and the market actors. Unfortunately, Opinion 28/2024 falls short in removing this doubt.

Next time, more on the justification under the GDPR of AI model training.

More information

Link to document: Opinion 28/2024

Blogpost for the AI-MAPS project by Evert Stamhuis