On December 17, 2024, the European Data Protection Board (EDPB) adopted an important guideline for the AI community: Opinion 28/2024. The EDPB is an institution, established in the GDPR, that represents all the Data Protection Authorities in the EU member states. Its opinions provide guidance with high impact for the enforcement of data protection rules and principles. These opinions are therefore always very important for those who develop and deploy applications and devices that process personal data (and for those who do research in this area).
What makes Opinion 28/2024Opens external interesting for our AI MAPS consortium and beyond, is the fact that it was addressing a range of questions that arose out of the AI development community. In my previous blogpost, I highlighted the EDPB’s conclusion that AI models fall under the GDPR, when data on identified or identifiable persons were or are processed in the training or the use of the model. In this post I will turn to the consequential issue of whether these models can be covered by one out of the limited list of justifications, that art. 6 GDPR provides. In other words, what are the conditions in which models trained on personal data could be justifiable? Spoiler alert: it depends.
In its Opinion, the EDPB does not deal with all the six justifications on the list in the GDPR – which include informed consent and the necessity of processing for fulfilling a legal obligation (like for the police). The focus is only on the justification of “the legitimate interest of the controller or a third party”. That is effectively the most logical justification for today’s commercial AI model development and deployment. In the wording of this justificatory ground, the weighing of two opposing interests is postulated: the interests of the AI developer or third party against the interests and fundamental rights and freedoms of the data subjects. Only when the latter interests and rights do not override the former, can the processing go forward. As a result, the GDPR brings the AI developers - and the AI users and supervising agencies accordingly - into a concrete balancing exercise. It is what it is: the question when AI model development or deployment with the use of personal data can be legal, cannot be answered by straightforward box-ticking or an easy decision tree.
The Opinion meticulously describes the diverse facts and circumstances that should be put into this balancing act, heavily relying on an earlier guideline on this topicOpens external. The first argumental steps can be summarized into two simple statements: if the purpose of the model is not well defined, transparent and legally acceptable, you can forget it. And: if you could also fulfill this purpose by using non-personal data or less intrusive alternatives, you can forget it. The EDPB keeps stressing that the justifications need to be understood in conjunction with the GDPR as a whole. That entails special attention to the principles, consolidated in art. 5 GDPR: The processing of personal data must be lawful, transparent, minimal and for a specified and legitimate purpose. The business logic of choosing the cheapest alternative is not working, but transposed into: choosing the least intrusive alternative, at the lowest costs possible. Attention first to intrusion, then to cost, whether - as a developer - you like it or not.
Once you have the purpose and the necessity in good shape, the final balancing of interests has to happen. Does the impact on the data subjects’ fundamental rights override the interest of the lawful and purposeful processing? The EDPB urges the impact assessment to be holistic, considering the nature of the model, the intended deployment and the possible unintended consequences
There are too many details in Opinion 28/2024 to summarize the full impact discovery for a blogpost. I can only present a couple of remarkable findings, hoping not to keep every interested reader from consulting the full text. Measures that mitigate the potential impact are a relevant factor. The EDPB counts under that the possibilities to enforce the data subject’s right to have their data erased by unlearning the model. Web scraping as data retrieval method requires specific mitigating measures, such as exclusion from the scraping all risk-sensitive data as well as risk sensitive content or websites, plus providing decent opt-out facilitation.
One final point is taken from the advice, that will be relevant for the traffic between developers and deployers. Important is the extent to which unlawfulness travels with the model downstream. The EDPB argues that there is no watershed between development and deployment in this regard. And because unlawfulness may be glued to the model, the deployers need to do their own balancing exercise, that comprises what happened during the development of the model. Self-declarations of the model provider will not satisfy the conclusion that everything was fine. The deployer must independently watch the scales and thereby include the balancing and mitigating factors in the development stage into their own balancing act. One could conclude that this becomes a bit overdone, balancing the balancing. The alternative of a separation between production and deployment, however, would ignore that AI system construction is fluent and iterative in nature. The potential risks for fundamental rights flow from the accumulated impact of the activities. Only when the model is anonymized before actual deployment could we take any unlawfulness in the development stage being washed away; according to the EDPB.
It would have been convenient if the EDPB would provide a simple yes/no answer to the question from the AI community. Should the tech companies now stop with their model development using data scraped from the platforms they own? The answer that it depends, may come to some as a disappointment. To me, however, it is already a gain of this Opinion that you cannot use whatever you have extracted from your users. There are rules to comply with and fundamental rights to observe. Hopefully the companies do take the point and set their extreme engineering and investment powers to the task of innovating their production processes. Let their competitiveness come from high compliance, more than from low cost.
- Related content
- Related links
- Overview blogposts | AI Maps