The first trade group in the sector has been established by OkSeven content-licensing sellers of audio, image, video, and other AI datasets for use in training artificial intelligence systems, they announced on Wednesday
The companies stated that the Dataset Providers Alliance (DPA) will promote “ethical data sourcing” in the training of A.I. systems, which includes preserving content owners’ intellectual property rights and the rights of individuals depicted in datasets.
Rightsify, an American music dataset company; Vaisual, an image licensing service; Pixta, a Japanese stock photo provider; and Datarade, a data marketplace based in Germany, are among the founding members.
In recent years, the emergence of generative A.I. technologies that can imitate human creativity has elicited an outcry from content creators and a series of copyright lawsuits against tech companies such as Google (GOOGL.O), Meta, and ChatGPT maker OpenAI, which Microsoft sponsors.
Models have been trained by developers who provide them with extensive content, much of which is obtained for free from the internet without the permission of the creators or owners of the rights.
Tech companies, which assert that the usage is legal, are also secretly paying for access to private collections of content to mitigate legal and regulatory risks and to meet their requirements for specific categories of data.
A nascent industry of companies that bundle content and sell access to it for use by A.I. systems has emerged in response to the possibility that demand for licensed data will increase if copyright owners prevail in their legal battles.
Consequently, organizations have been established to establish ethical standards for the industry, such as Fairly Trained, a non-profit organization established this year. Moderately Trained certifies models that have not utilized copyrighted materials without a license.
The DPA is concerned with the content of those transactions, necessitating that its members refrain from selling text data obtained through web mining or audio that includes individuals’ voices without their explicit consent.
Alex Bestall, CEO of Rightsify and its licensing subsidiary GCX, who spearheaded the group’s establishment, stated that a significant emphasis will be placed on advocating for legislation.
One is the NO FAKES Act, a U.S. bill introduced last year to establish penalties for producing unauthorized digital replicas of individuals’ voices or likenesses.
“Advocacy will be a big part of it because everyone’s taken their positions on A.I. and copyright, but a lot of these battles are yet to be solved, and it’s going to take a while for them to be,” according to Bestall.
He also stated that the DPA will advocate for additional training data transparency requirements, such as those outlined in the European Union’s A.I. Act and a comparable U.S. measure introduced in April, the Generative A.I. Copyright Disclosure Act.
He stated that the organization intends to release a white paper in July that delineates its stances.