New EU AI legislation will require corporations to disclose their system training data, revealing one of the industry’s biggest secrets
There has been a significant increase in public engagement and investment in generative AI, a collection of applications that can be employed to swiftly generate text, images, and audio content, in the 18 months since Microsoft-backed OpenAI introduced ChatGPT to the public.
However, concerns have been raised regarding how AI companies acquire the data used to train their models and whether feeding them bestselling books and Hollywood movies without the creators’ consent constitutes a copyright violation. This is due to the industry’s rapid growth.
The EU’s AI Act, which was recently enacted, is being implemented in phases over the next two years. This will allow regulators to adjust to the new laws while businesses navigate new responsibilities. However, how certain of these regulations will be implemented in practice remains to be seen.
One of the Act’s more contentious provisions mandates that organizations that deploy general-purpose AI models, such as ChatGPT, must provide “detailed summaries” of the content used to train them. The AI Office, which has recently been established, intends to disseminate a template for organizations to adhere to in early 2025, following a consultation with stakeholders.
Even though the specifics still need to be resolved, AI companies are adamantly opposed to disclosing the data on which their models were trained. They regard the information as a trade secret that would provide competitors with an unwarranted advantage if disclosed.
“It would be a dream come true to view my competitors’ datasets and, equally, for them to view ours,” stated Matthieu Riouf, CEO of Photoroom, an AI-powered image editing firm.
“It is comparable to cooking,” he concluded. “There is a secret component of the recipe that the most talented chefs would not disclose, the “je ne sais quoi” that distinguishes it.”
The level of detail in these transparency reports will have significant repercussions for large technology companies such as Google and Meta, which have prioritized AI technology in their future operations.
TRADE SECRETS SHARING
In the past year, several prominent technology companies, such as Google, OpenAI, and Stability AI, have been the subject of litigation from creators who claim that their content was improperly used to train their models.
Although U.S. President Joe Biden has issued numerous executive orders emphasizing AI’s security implications, concerns regarding copyright have yet to be thoroughly examined. There has been bipartisan support in Congress for the demand that technology companies compensate data rights holders.
Technology companies have executed numerous content-licensing agreements with websites and media outlets in response to the increasing scrutiny. The Financial Times and The Atlantic were among the publications with which OpenAI signed agreements, while Google entered into agreements with Reddit, a social media platform owned by NewsCorp.
Nevertheless, OpenAI faced criticism in March when CTO Mira Murati declined to respond to a query from the Wall Street Journal regarding using YouTube videos to train its video-generating tool Sora. The company claimed that this would violate its terms and conditions.
In the previous month, OpenAI encountered additional criticism for incorporating an AI-generated voice into a public demonstration of the most recent iteration of ChatGPT, which Scarlett Johansson described as “eerily similar” to her own.
Thomas Wolf, the co-founder of the prominent AI startup Hugging Face, expressed his support for increased transparency; however, this sentiment was not universally shared within the industry. “It is difficult to predict the outcome.” He stated, “A great deal still needs to be resolved.”
The continent’s senior legislators continue to be divided.
Dragos Tudorache, one of the European parliamentarians who oversaw the formulation of the AI Act, stated that AI companies should be required to disclose their datasets to the public.
“They have to be detailed enough for Scarlett Johansson, Beyonce, or for whoever to know if their work, their songs, their voice, their art, or their science were used in training the algorithm,” according to him.
A representative of the Commission stated:
“The AI Act acknowledges the need to ensure an appropriate balance between the legitimate need to protect trade secrets and, on the other hand, the need to facilitate the ability of parties with legitimate interests, including copyright holders, to exercise their rights under Union law.”
The French government has privately opposed the introduction of regulations that could impede the competitiveness of European AI ventures under President Emmanuel Macron.
Bruno Le Maire, the French finance minister, expressed his desire for Europe to become a global leader in artificial intelligence (AI) during his speech at the Viva Technology conference in Paris in May.
He stated that the country should not confine itself to consuming products from the United States and China.
“For once, Europe, which has created controls and standards, needs to understand that you have to innovate before regulating,” according to him.
“Otherwise, you risk regulating technologies that you haven’t mastered or regulating them badly because you haven’t mastered them.”