Two days after an open letter called for a moratorium on the development of more powerful generative AI models so that regulators can catch up with ChatGPT, for example, Italy’s data protection authority has just issued a timely reminder that some countries Doing have laws that already apply to advanced AI — by to order OpenAI to immediately stop processing people’s data locally.
The Italian data protection authority said it is concerned that ChatGPT’s creator is violating the European Union’s General Data Protection Regulation (GDPR).
In particular, the Guarantee said it has issued the order to block ChatGPT over concerns that OpenAI has unlawfully processed people’s data — and also the lack of a system to prevent minors from accessing the technology.
The San Francisco-based company has 20 days to respond to the order — bolstered by the threat of some meaty penalties if it fails to comply. (Reminder: Fines for breaching the EU’s data protection regime can be up to 4% of annual turnover or €20 million, whichever is higher.)
It’s worth noting that since OpenAI does not have a legal entity based in the EU, any data protection authority is empowered to intervene, under the GDPR, if it sees any risk to local users. (So where Italy steps in, others may follow.)
Series of GDPR issues
The GDPR applies when processing personal data of EU users. And it’s clear that OpenAI’s grand language model has handled this kind of information — because it can, for example, produce biographies of named individuals in the region on demand (we know; we’ve tried). Though OpenAI has declined to provide details on the training data used for the latest iteration of the technology, GPT-4. But it has revealed that previous models have been trained on data scraped from the internet, including forums like Reddit. So if you’ve been reasonably online, chances are the bot knows your name.
Add to that, ChatGPT has been shown to produce totally false information about said individuals – apparently it fabricates details that the training data misses. Which may raise further concerns about the GDPR, as the regulation provides Europeans with a range of rights to their data, including the right to rectify errors. And it’s not clear how/if people can ask OpenAI to correct erroneous statements about them generated by the bot, in just one sample scenario.
The Guarantee‘s statement also points to a data breach the service suffered earlier this month – then OpenAI allowed a conversation history feature leaked users’ chats and said it may have exposed some users’ payment details.
Data breaches are another area regulated by the GDPR, with an emphasis on ensuring that entities processing personal data adequately protect the information. Pan-EU law also includes requirements to notify relevant supervisory authorities of significant breaches within tight timeframes.
Overarching all of this is the big(er) question on which legal basis did OpenAI rely for processing Europeans’ data in the first place? In other words, the lawfulness of this processing.
The GDPR allows for a number of possibilities – from consent to public interest – but the scale of processing to train these large language models complicates the issue of legality, as the Guarantee notes (noting the “massive collection and storage of personal data”), with data minimization being another key focus of the regulation – which also contains principles requiring transparency and fairness. Still, at least the (now) for-profit company behind ChatGPT seems to have failed to inform the people it has repurposed to train its commercial AIs. That could be a pretty sticky problem.
If OpenAI has unlawfully processed Europeans’ data, data protection authorities across the block could order the data deleted – although it’s an open question whether that would force models trained on ill-gotten data to be retrained, as a existing law struggles with advanced technology.
On the other hand, Italy may have banned all machine learning by, uh, accidentally… 😬
“[T]e Privacy Garant points to the lack of information to users and all interested parties whose data is collected by OpenAI, but above all to the lack of a legal basis that justifies the massive collection and storage of personal data, with the aim of ‘beating the underlying algorithms’ train’ the functioning of the platform”, the CBP writes today in its statement [which we’ve translated from Italian using AI].
“As shown by the checks carried out, the information provided by ChatGPT does not always correspond to the real data, establishing an inaccurate processing of personal data,” it added.
The authority added that it is concerned about the risk of minors’ data being processed by OpenAI, as the company does not actively prevent people under the age of 13 from signing up to use the chatbot, for example by applying age verification technology.
Risks to children’s data is one area where the regulator has been very active – it recently ordered a similar ban on the virtual friendship AI chatbot, Replika, due to concerns about children’s safety. In recent years, it has also sued TikTok for underage use, forcing the company to purge more than half a million accounts that it couldn’t confirm didn’t belong to children.
So if OpenAI can’t definitively confirm the age of users who signed up in Italy, it could – at the very least – be forced to delete their accounts and start over with a more robust sign-up process.
OpenAI has been contacted for comment on the Guarantee‘s order.