Subscribe for notification
AI

OpenAI Mistakenly Deletes Evidence in NY Times Lawsuit

The New York Times and Daily News are suing OpenAI for scraping their works to train its AI models without permission, and OpenAI programmers purportedly removed important data

OpenAI consented to furnish two virtual machines earlier this autumn to enable counsel for The Times and Daily News to conduct searches for their copyrighted content in its AI training sets.

(Virtual machines are software-based computers that are frequently employed for testing, backing up data, and executing applications within the operating system of another computer.)

The publishers’ attorneys have stated in a letter that they and the experts they have employed have devoted more than 150 hours to investigating OpenAI’s training data since November 1.

However, the letter mentioned above, filed in the U.S. District Court for the Southern District of New York late Wednesday, indicates that on November 14, OpenAI engineers deleted all of the publishers’ search data stored on one of the virtual servers.

OpenAI attempted to retrieve the data, and it was largely successful. Nevertheless, the letter states that the recovered data “cannot be used to determine where the news plaintiffs’ copied articles were used to build [OpenAI’s] models” due to the irretrievable loss of the folder structure and file names.

Sam Altman, CEO OpenAI | BankInfoSecurity

Counsel for The Times and Daily News stated that the plaintiffs have been compelled to re-create their work from the ground up, which has consumed a significant amount of computer processing time and personal hours.

Plaintiffs were informed yesterday that the recovered data is unusable and that an entire week’s worth of attorneys’ and experts’ work must be redone. Consequently, this supplemental letter is being submitted today.

The plaintiffs’ counsel asserts that they have no cause to suspect that the deletion was deliberate. However, they did point out that the incident serves as a reminder that OpenAI is “in the best position to search its own datasets” for potentially infringing content using its own tools.

A spokesperson for OpenAI declined to issue a statement.

OpenAI has consistently maintained that training models with publicly available data, such as articles from The Times and Daily News, is fair use in this instance and others.

In other words, OpenAI believes that it is not obligated to license or compensate for the examples — even if it generates revenue from the models — when developing models such as GPT-4o, which “learn” from billions of examples of e-books, essays, and other types of content to produce human-sounding text.

That being said, OpenAI has signed licensing agreements with many new publishers, such as the Associated Press, Axel Springer, the proprietor of Business Insider, the Financial Times, the parent company of People, Dotdash Meredith, and News Corp.

OpenAI has refrained from disclosing the specifics of these agreements to the public; however, Dotdash, one of its content partners, is purportedly receiving a minimum of $16 million annually.

OpenAI has not affirmed or denied that it trained its AI systems on any specific copyrighted works without permission.

Hillary Ondulohi

Hillary is a media creator with a background in mechanical engineering. He leverages his technical expertise to craft informative pieces on protechbro.com, making complex concepts accessible to a wider audience.

Disqus Comments Loading...

Recent Posts

Spotify Tests Audiobook Videos in Expansion Push

Spotify is improving the audiobook experience for premium users by implementing three new experiments: the addition of visuals that appear…

33 minutes ago

MARA Holdings Raises $1B for Extra Bitcoin Purchases

MARA Holdings has successfully concluded its $1 billion private offering, and part of the proceeds will be allocated to the…

38 minutes ago

WordPress.com’s Automattic Acquires Grammar Tool Harper

Automattic, the owner of WordPress.com, has acquired Harper, a grammar-checking tool, to enhance its content creation capabilities Automattic, the owner…

47 minutes ago

FCC Chair Jessica Rosenworcel to Step Down

Jessica Rosenworcel, the first female FCC chair, announced Wednesday that she will resign if Donald Trump becomes president In 2012,…

1 hour ago

Threads Tweaks Algorithm to Prioritize Followed Accounts

Threads is eventually making changes to the algorithmic feed to surface more content from the people you follow, following numerous…

1 hour ago

Reddit Experiences Another Outage

The social media platform Reddit encountered an additional outage this morning at approximately 7 a.m. PT, following the one that…

1 hour ago