The final days of 2024 were very eventful in the world of AI and data protection: the European Data Protection Board (EDPB) published its Article 64 General Data Protection Regulation (GDPR) opinion on training AI models using personal data (the EDPB Opinion). Two days later, the Italian Garante per la Protezione dei Dati Personali (Garante) announced a fine of €15 million and corrective measures imposed on OpenAI. OpenAI has said that it will appeal the fine.
Regulatory focus to date has, for the most part, been on the providers training the AI models. 2024 has also seen the Irish Data Protection Commission (DPC) inquire into and engage with Google, Meta, and X on their use of personal data to train large language models (LLMs). However, the EDPB Opinion also emphasised the need for controllers deploying the models to carry out an appropriate assessment on whether the model was developed lawfully. The EDPB Opinion’s emphasis on checking whether any sanctions have been imposed on the model provider as part of the assessment process now raises questions on GDPR-compliant use of third-party LLMs by deployers.
This post will focus on the implications of these developments for data controllers deploying LLMs and practical steps they can take to manage data protection risk.
The EDPB Opinion
The EDPB Opinion covers:
- when and how AI models can be considered ‘anonymous’
- demonstrating legitimate interests for development and deployment
- what happens where an AI model has been found to have been created, updated, or developed using unlawfully processed personal data, and what the impact is on the lawfulness of the continued or subsequent processing or operation of the AI model.
1. Anonymity – is personal data processed in an AI model? The EDPB’s view is that anonymity must be assessed on a case-by-case basis. The bar for anonymity is set very high: for an AI model to be considered anonymous, the likelihood of extracting or obtaining personal data from the model should be insignificant, taking into account ‘all the means reasonably likely to be used’ by the controller or another person. In practice, it is likely that LLMs will not generally be considered ‘anonymous’. The EDPB Opinion does not refer to the discussion paper from the Hamburg Commissioner for Data Protection and Freedom of Information and the view that LLMs do not store personal data. The EDPB has flagged that it plans to issue guidelines on anonymisation, pseudonymisation, and data scraping in the context of generative AI.
2. Relying on legitimate interests? The bulk of the discussion in the Opinion is on demonstrating a legitimate interest for the development phase. Many of the potential concerns to address arise in the context of development rather than deployment, for example, articulating a sufficiently clear and precise legitimate interest, and ensuring processing is within the data subject’s reasonable expectations to satisfy the balancing test.
3. Deploying a model developed unlawfully? The EDPB discusses a scenario where the model retains personal data, i.e. does not meet the high bar set for anonymisation under point #1. It emphasises that another controller deploying that model would need to carry out an appropriate assessment to ensure the AI model was not developed by unlawfully processing personal data. The assessment should take account of the risks raised in the deployment phase in terms of the level of detail, and should look at the source of the personal data.
In the EDPB’s view, when looking at whether the controller conducted an appropriate assessment, supervisory authorities should consider “whether the controller has assessed some non-exhaustive criteria, such as the source of the data and whether the AI model is the result of an infringement of the GDPR, particularly if it was determined by a SA or a court, so that the controller deploying the model could not ignore that the initial processing was unlawful” (emphasis added).
If a controller is looking to rely on legitimate interests for its deployment of an LLM, the fact that initial processing was unlawful should be taken into account in its legitimate interests assessment. The lawfulness of the processing in the development phase may impact the lawfulness of the subsequent processing.
The Garante fine and corrective measures imposed on OpenAI
The Garante had previously imposed a temporary limitation on OpenAI’s processing on 30 March 2023, meaning that ChatGPT could not be used in Italy until the limitation was lifted on 28 April 2023. At the time, the Garante was satisfied that OpenAI had commenced the implementation of the measures required in an order of 11 April 2023, including making improvements to its privacy notice, identifying a lawful basis for its processing, and providing a tool for data subjects to exercise their right to object.
However, in parallel the Garante had opened an investigation into OpenAI’s processing. In its decision of 2 November 2024, published along with a press release on 20 December 2024, the Garante alleged that OpenAI had breached Articles 33, 5(1)(a), 5(2), 6, 12, 13, 24, and 25 GDPR, as well as Article 83(5)(e).
The decision does not relate to ongoing breaches, as OpenAI established its European headquarters in Ireland during the course of the investigation, meaning that the Irish DPC now acts as its lead supervisory authority.
In terms of the breaches that the Garante considered were not ongoing:
Article 33 (breach notification): OpenAI was subject to a data breach on 20 March 2023. It had notified the DPC, believing that the DPC would communicate the information to other supervisory authorities. However, its establishment in Ireland took place a few days after the breach, so in the Garante’s view, the one-stop shop procedure was not applicable and the breach notification in Italy was still required.
Article 5(2) and 6 (accountability and lawful basis): According to the Garante, OpenAI had not identified a lawful basis for training ChatGPT pre-launch, nor had it identified an appropriate lawful basis for processing from launch on 30 November 2022 until 30 March 2023. OpenAI argued that it was not subject to the GDPR at that time, but the Garante considered that the GDPR extra-territorial provisions applied from launch of the ChatGPT services to EU users on 30 November 2022.. The Garante focused on OpenAI’s not having identified an appropriate lawful basis or carried out a suitable legitimate interests assessment and Data Protection Impact Assessment at the time of launch. The Garante confirmed that it did not have jurisdiction to assess whether OpenAI’s ongoing reliance on legitimate interests for training ChatGPT was lawful, as ongoing processing should now be assessed by the DPC.
Articles 12 and 13 (communication of information to data subjects): Similarly, in the Garante’s view, the information communicated to data subjects from launch to 30 March 2023 was not sufficient for compliance with Articles 12 and 13 GDPR. In respect of ChatGPT users, the privacy notice was available only in English language and certain paragraphs, such as the purposes of the processing were too broad and unclear. As to non-users, no privacy notice on the processing of their data for the purpose of training ChatGPT was available.
Articles 24 and 25 (responsibility of the controller and data protection by design and default): As at 30 March 2023, OpenAI did not have systems to verify the age of data subjects and ensure 13-18 year olds had a parent or guardian’s permission to use the service. OpenAI now uses a third-party provider for age verification services.
Article 83(5)(e) (failure to comply with an order from the supervisory authority): As a condition for lifting its temporary ban imposed on 30 March 2023, the Garante had ordered OpenAI to carry out an information campaign on main Italian mass media by 15 May 2023. This was to be a non-promotional campaign informing individuals of the probable collection of their personal data, accompanied by a privacy notice on their website and a tool to allow individuals to request deletion of their personal data. OpenAI did carry out an information campaign, but the Garante “expressed strong opposition” to the measures taken, as they were carried out without the Garante’s prior consultation and agreement. OpenAI offered to collaborate with the Garante to increase transparency and raise awareness of its processing, but the Garante still considered that its order had been breached.
In light of these findings, the Garante imposed the €15 million fine and required OpenAI to carry out a further six-month information campaign on radio, television, newspapers, and the internet. It did take into account measures OpenAI had already taken as mitigating factors, such as updating its privacy notice and adding age gating mechanisms.
OpenAI has said that it believes the fine is disproportionate and will appeal.
Can legitimate interests be relied on as a lawful basis by providers to train LLMs?
As mentioned above, the Garante’s decision did not discuss whether OpenAI could rely on legitimate interests to train its models on an ongoing basis. The Garante decision states that determination of this has been transmitted to the Irish DPC so we will need to await further decisions on this point from the Irish DPC.
The DPC now has the benefit of the EDPB Opinion (which it requested) to help it form its views on this question. The EDPB Opinion does not rule out relying on legitimate interests as the lawful basis for training LLMs. It sets out specific considerations in relation to the necessity, purpose, and balancing tests. It also includes specific measures that can be adopted in the development phase to mitigate the risks to data subject rights and freedoms identified in the balancing test. These suggested mitigations include excluding data content from publications which might include data about vulnerable individuals, excluding certain sources or websites, and excluding collection from websites that clearly object to web scraping. The EDPB Opinion sets a high standard; the DPC’s decision will be very important and illuminating as to how it is applied.
Next steps for controllers deploying LLMs
The Garante’s decision did not comment on the lawfulness of other controllers’ use of ChatGPT. The EDPB Opinion noted that whether the deployer can use a model lawfully will be assessed on a case-by-case basis, taking account of the assessment carried out by the deployer. As highlighted above, sanctions are relevant, though in the context of the OpenAI fine, an appeal is ongoing.
Controllers wishing to deploy ChatGPT or any other LLM or multi-modal model must:
Carry out a comprehensive assessment. This must include a legitimate interests assessment if they intend to rely on legitimate interests, and should generally include a data protection impact assessment. The assessment should document any measures in place that help to demonstrate why the controller has a legitimate interest in deployment of the model and risk mitigations deployed, for example, additional data minimisation measures like blocking personal data in outputs. The EDPB Opinion includes some discussion of factors for deployers to take into account in legitimate interests assessments, such as considering the data subjects’ reasonable expectations within the context of the model’s specific capabilities.
Assessment processes should capture new LLM add-ons for existing tools, as well as new tools.
Use the enterprise version and ensure your organisation’s personal data is not used to train the model. Ensure the provider does not use prompts, data used for Retrieval Augmented Generation, or data you are using to fine-tune a model. It is unlikely to be possible to demonstrate a legitimate interest in transferring data to a provider to train the base version of the model. Any such use by the provider is also likely to breach any professional secrecy or confidentiality obligations.
Combat shadow AI. The promise of productivity gains can tempt staff to try tools that have not been assessed. Ensure a clear policy is in place, include technical measures like blocking certain sites where appropriate. Ensuring tools have been through the assessment process mentioned above and that the enterprise version has been rolled out will also be key, so that staff know which tools they can use and that they do not need to circumvent policy to access them.
Implement an AI governance programme. Underpinning all the measures above, controllers must implement a comprehensive AI governance programme to ensure that any AI use cases to be deployed (or developed) are triaged and directed an appropriate assessment process, with a robust governance structure underpinning decision-making. This governance process will also be important to ensure that organisations navigate the other legal obligations that apply to their development and use of AI, as well. These include financial services regulation, employment, and of course, the EU AI Act. The AI Act’s prohibitions will apply from apply from 2 February 2025, with fines of up to 7% of worldwide annual turnover for non-compliance.