After over a year of examining how the EU’s data protection regulations apply to OpenAI’s viral chatbot, ChatGPT, a data protection taskforce shared its preliminary conclusions on Friday
The primary conclusion is that the working group of privacy enforcers continues to be divided on crucial legal matters, including the legitimacy and impartiality of OpenAI’s processing.
Penalties of up to 4% of global annual turnover may be imposed for confirmed violations of the bloc’s privacy regime, making this an issue of critical importance.
Moreover, watchdogs have the authority to halt non-compliant processing.
Thus, OpenAI faces a substantial regulatory risk in the region at a time when AI-specific legislation is scant (and, in the case of the European Union, not yet entirely operational for years).
However, in the absence of explicit guidance from European Union data protection regulators regarding how existing data protection legislation pertains to ChatGPT, OpenAI will probably persist in its operations without hindrance.
Notwithstanding the increasing number of complaints alleging that its technology infringes on different facets of the bloc’s General Data Protection Regulation (GDPR).
As an illustration, Poland’s data protection authority (DPA) initiated this investigation in response to a complaint that the chatbot had falsified information about a specific individual and was refusing to rectify the mistakes. Recently, an analogous complaint was filed in Austria.
Numerous GDPR complaints, but considerably less enforcement
In principle, the General Data Protection Regulation (GDPR) applies to all personal data collection and processing instances.
However, this is not the case with large language models (LLMs), such as OpenAI’s GPT, which powers ChatGPT, which scrapes data from the public internet for model training purposes (including siphoning users’ social media posts), which is manifestly done on a large scale.
DPAs are also authorized by EU regulation to order the cessation of any non-compliant processing.
This could be a potent lever for GDPR enforcers to use to shape the operations of the AI behemoth responsible for ChatGPT in the region.
We caught a glimpse of this last year when Italy’s privacy authority temporarily prohibited OpenAI from processing the data of ChatGPT users in Italy.
The action, executed under the emergency provisions of the GDPR, caused the AI behemoth to suspend the nation’s service temporarily.
In response to a list of demands from the DPA, OpenAI modified the information and controls it offers users before the resumption of ChatGPT in Italy.
However, the Italian investigation into the chatbot persists, encompassing crucial matters such as the legal justification put forth by OpenAI for utilizing individuals’ data for training its AI models.
As a result, the instrument continues to be surrounded by legal ambiguity within the EU.
By the GDPR, any organization wishing to process personal data must have a legal justification. The regulation specifies six potential bases, although most of them do not apply to OpenAI. ]
Furthermore, the Italian Data Protection Authority (DPA) has already instructed the AI behemoth that it cannot rely on a contractual obligation to process individuals’ data to train its AIs.
This leaves the company with only two viable legal grounds: consent (which entails requesting users’ permission to use their data) or legitimate interests (LI), which necessitates a balancing test and mandates that the controller provide users with the ability to object to the processing.
OpenAI has transitioned from asserting that it possesses an LI for processing personal data utilized in model training since the intervention of Italy.
Nonetheless, in January, the DPA’s preliminary investigation report concluded that OpenAI had contravened the GDPR.
However, specifics of the initial findings were kept a secret, so the complete evaluation by the authority regarding the legal basis aspect remains to be seen. Regarding the complaint, a definitive decision is still pending.
A “precision fix” for the legality of ChatGPT?
The report of the task force examines this complex matter of legality, emphasizing that ChatGPT requires a legitimate legal foundation for every phase of processing personal data: training, prompts and ChatGPT outputs, data pre-processing (including filtering), training data collection, and any training conducted on ChatGPT prompts.
The task force identifies “peculiar risks” for individuals’ fundamental rights in the first three phases, with the report emphasizing how the scale and automation of web scraping can result in the ingestion of vast quantities of personal data encompassing numerous facets of individuals’ lives.
It also specifies that scraped data may contain the most sensitive forms of personal information (referred to as “special category data” by the GDPR), including health information, sexual orientation, political views, and so forth, which require an even higher legal threshold for processing than general personal data.
Regarding particular category data, the task force further contends that mere public status does not imply that it has been “manifestly” made public, which would qualify for an exemption from the GDPR’s explicit consent requirement for processing such data.
(“To rely on the exception established in Article 9.2(e) GDPR, it is necessary to determine whether the data subject explicitly and unambiguously intended to make the personal data in question accessible to the general public through a clear affirmative action,” the document states.)
For OpenAI to generally rely on LI as its legal foundation, it must provide evidence that it requires the data to be processed, that the processing be restricted to that which is essential for this necessity, and that it conducts a balancing test that compares its legitimate interests in the processing to the rights and freedoms of the data subjects (i.e., the individuals whose information the data pertains to).
An additional recommendation is put forth by the task force, which states that “sufficient safeguards”—including “technical measures,” “establishing precise collection criteria,” and blocking access to specific data categories or sources (such as social media profiles)—could “alter the balancing test in favor of the controller” by permitting the collection of fewer data initially, thereby mitigating the effects on individuals.
This strategy could compel AI companies to be more conscientious of the data they acquire to mitigate privacy risks.
The task force also recommends that “measures should be in place to delete or anonymize personal data collected via web scraping before the training phase.”
OpenAI also intends to utilize LI to analyze the prompt data of ChatGPT users to train models. The report emphasizes that users must be “clearly and demonstrably informed” that such content may be utilized for training purposes; this is one of the factors that the balancing test for LI would consider.
The individual DPAs evaluating complaints will be responsible for determining whether or not the AI behemoth has met the prerequisites to be considered a reliable provider of LI.
ChatGPT’s developer would have only one lawful recourse in the EU if it could not comply: requesting citizens’ consent.
Additionally, considering the substantial volume of personal information likely included in training datasets, the feasibility of implementing such a system remains to be determined.
Also, agreements that the AI behemoth is rapidly reaching with news publishers to license their journalism would not serve as a model for licensing the personal data of Europeans, given that the law prohibits the sale of consent and that it must be voluntarily provided.
Transparency and fairness are not optional
The task force’s report emphasizes another aspect of the GDPR’s fairness principle: privacy risk cannot be transferred to the user. This is illustrated by including a clause in terms and conditions stating that “data subjects are responsible for their chat inputs.”
It further states, “OpenAI continues to bear the responsibility of adhering to the GDPR and ought not to contend that the provision of specific personal information was initially prohibited.”
Regarding transparency obligations, the task force seems to acknowledge that OpenAI may be able to rely on an exemption (GDPR Article 14(5)(b)) to inform individuals about data collected about them, considering the extent of web crawling required to obtain training datasets for LLMs.
However, the report stresses the “critical nature” of notifying users that their inputs might be utilized for educational objectives.
The report additionally addresses the matter of ChatGPT “hallucinating” (falsifying information), cautions that OpenAI must adhere to the GDPR “principle of data accuracy”—and thus provides “proper information” regarding the chatbot’s “probabilistic output” and “limited level of reliability.”
Additionally, the task force recommends that OpenAI inform users “explicitly” that the generated text “may be fabricated or biased.”
Concerning data subject rights, such as the right to rectify personal data—an aspect that has garnered considerable attention in GDPR complaints regarding ChatGPT—the report emphasizes the “critical” nature of ensuring that individuals can exercise their rights effortlessly.
Furthermore, the report highlights certain drawbacks of OpenAI’s present methodology, such as the inability to rectify erroneous personal data generated about users; instead, it merely provides an option to impede the generation process.
Nevertheless, the task force’s recommendation regarding OpenAI’s “modalities” for users to exercise their data rights lacks specificity.
At best, it merely urges the organization to implement “necessary safeguards” and “appropriate measures designed to effectively implement data protection principles” to comply with the GDPR and protect the rights of data subjects.
This is similar to saying, “We, too, have no idea how to resolve this.”
Is ChatGPT halting GDPR enforcement?
Following Italy’s attention-grabbing intervention in OpenAI in April 2023, the ChatGPT task force was established to facilitate the enforcement of the bloc’s privacy regulations on emerging technology.
The regulatory body that oversees the implementation of EU law in this domain is the European Data Protection Board (EDPB), within which the task force operates.
DPAs maintain their autonomy and possess the necessary capabilities to independently enforce the law in areas where GDPR enforcement is decentralized.
Despite the permanent autonomy of DPAs to enforce locally, watchdogs are apprehensive and risk-averse regarding the appropriate course of action regarding a nascent technology such as ChatGPT.
The Italian DPA explicitly stated in its draft decision announcement earlier this year that its procedure would “consider” the findings of the EDPB task force.
There may be additional indications that watchdog organizations would be better off delaying their enforcement efforts until the working group issues a final report, perhaps in the following year.
Therefore, the mere existence of the task force may be affecting OpenAI’s chatbot’s GDPR compliance enforcement by prolonging decisions and slowing down investigations of complaints.
As an illustration, Poland’s data protection authority indicated in a recent interview with local media that its investigation into OpenAI would be postponed until the task force completes its work.
The watchdog has yet to respond to our inquiry about whether the ChatGPT taskforce’s parallel workstream is delaying enforcement. An EDPB spokesperson informed us that the task force’s work “does not prejudge the analysis that each DPA will conduct in their ongoing investigations.” However, they further stated, “Although DPAs have the authority to enforce, the EDPB plays a crucial role in fostering cooperation among DPAs about enforcement.”
There is a wide range of opinions among DPAs regarding the urgency with which they should address concerns regarding ChatGPT.
In 2023, Helen Dixon, the (now-former) data protection commissioner of Ireland, stated at a Bloomberg conference that data protection authorities (DPAs) should not hasten to ban ChatGPT, arguing that they must first determine “how to regulate it properly” before proceeding.
This contrasts Italy, where the watchdog grabbed headlines last year for its swift interventions.
Last autumn’s decision by OpenAI to establish an EU operation in Ireland was probably not coincidental.
Subsequently, in December, the AI giant discreetly amended its terms and conditions to designate its newly established Irish entity, OpenAI Ireland Limited, as the regional provider of services, including ChatGPT.
This restructuring enabled the AI giant to apply to the Data Protection Commission (DPC) of Ireland to assume the role of its primary supervisor for GDPR oversight.
The regulatory-risk-oriented legal restructuring implemented by OpenAI has been successful, as evidenced by the EDPB ChatGPT task force’s report indicating that the company was granted leading establishment status on February 15th of this year.
This grants OpenAI the ability to utilize the One-Stop Shop (OSS) mechanism in the GDPR, which directs any cross-border complaints to a lead DPA in the country of main establishment (i.e., Ireland, in this case).
Although this may appear peculiar, it essentially means that the AI company can now avoid the risk of additional decentralized GDPR enforcement, as has occurred in Poland and Italy, because Ireland’s DPC will have the authority to determine which complaints are investigated, how, and when in the future.
The Irish regulatory body has earned a reputation for enforcing the GDPR on Big Tech in a business-friendly manner. Alternatively stated, “Big AI” could potentially be the next to profit from Dublin’s prodigiousness in interpreting the data protection regulations of the bloc.