Workshop reflections: training data and cocreation

Blogpost for the AI-MAPS project by Nanou van Iersel

On December 16, work package ‘legal aspects’ hosted a workshop with participants from government, law enforcement, industry and academia, titled training data and cocreation. Our focus: how to navigate the ethical, legal and technical complexities of data sharing between organizations to improve AI models? Imagine a public organization procures AI software, should they share their data with the company to improve it?

The workshop began with a deep dive into the legal and technical challenges of personal data in AI training data sets. The fact that this is a hot topic, was illustrated last summer by a dispute between X and the Irish Data Protection Commission over the use of tweets to train X’s generative model – in response to which the European Data Protection Board is bringing out an opinion on AI models. In addition, dr. Francien Dechesne explained that the question of ‘what counts as personal data?’ remains a subject of debate. She also highlighted the potential and limitations of synthetic data, as well as the risks of model inversion attacks – a technique to uncover an AI model’s training data.

Following this, we were joined by the data protection officer from the hospital Erasmus MC, dr. Hanneke Luth. She shared her experiences and challenges with data sharing in the medical domain. This talk set the stage for a reflective discussion, moderated by prof. dr. Evert Stamhuis, on the lessons that could be translated from the medical domain to the security domain, in which most participants work as researchers, practitioners, or policymakers.

The last part of the workshop was reserved for break-out sessions, asking the participants to reflect on three questions: 

  1. Why is it important for public and private organizations to collaborate in AI development? What is your perspective on the urgency of this collaboration from your professional practice?
  2. How are ethical and legal concerns about data sharing addressed in your organization? What measures do you take (e.g., ethical discussions, contracts)?
  3. What does an ideal relationship between public and private organizations look like in the context of data sharing? What would be an ideal division of responsibilities?

The workshop highlighted that while data sharing is critical for developing effective AI systems, it is fraught with challenges. Legally, current frameworks make sharing difficult; technically, there are no universal fixes; ethically, participants raised privacy concerns as well as the costs and impact of not improving AI systems. As we move forward, one thing is clear: the question of data sharing to improve AI models transcends individual organizations and sectors. It is a societal issue that requires public and political deliberation. This workshop was just the beginning of a larger conversation—one that we will continue through our research as work package ‘legal aspects’.

Related content
Blogpost for the AI-MAPS project by Nanou van Iersel
Stakeholder day AI MAPS: Insights for Legal Aspects
Blogpost for the AI-MAPS project by Nanou van Iersel
Collage of images and texts about online media
Related links
Overview blogposts | AI Maps

Compare @count study programme

  • @title

    • Duration: @duration
Compare study programmes