OpenAI Mistakenly Deletes Evidence in NY Times Lawsuit

New York Times denies OpenAI's 'hacking' claim in copyright fight | Reuters

The New York Times and Daily News are suing OpenAI for scraping their works to train its AI models without permission, and OpenAI programmers purportedly removed important data

OpenAI consented to furnish two virtual machines earlier this autumn to enable counsel for The Times and Daily News to conduct searches for their copyrighted content in its AI training sets.

(Virtual machines are software-based computers that are frequently employed for testing, backing up data, and executing applications within the operating system of another computer.)

The publishers’ attorneys have stated in a letter that they and the experts they have employed have devoted more than 150 hours to investigating OpenAI’s training data since November 1.

However, the letter mentioned above, filed in the U.S. District Court for the Southern District of New York late Wednesday, indicates that on November 14, OpenAI engineers deleted all of the publishers’ search data stored on one of the virtual servers.

OpenAI attempted to retrieve the data, and it was largely successful. Nevertheless, the letter states that the recovered data “cannot be used to determine where the news plaintiffs’ copied articles were used to build [OpenAI’s] models” due to the irretrievable loss of the folder structure and file names.

Sam Altman, CEO OpenAI | BankInfoSecurity

Counsel for The Times and Daily News stated that the plaintiffs have been compelled to re-create their work from the ground up, which has consumed a significant amount of computer processing time and personal hours.

Plaintiffs were informed yesterday that the recovered data is unusable and that an entire week’s worth of attorneys’ and experts’ work must be redone. Consequently, this supplemental letter is being submitted today.

The plaintiffs’ counsel asserts that they have no cause to suspect that the deletion was deliberate. However, they did point out that the incident serves as a reminder that OpenAI is “in the best position to search its own datasets” for potentially infringing content using its own tools.

A spokesperson for OpenAI declined to issue a statement.

OpenAI has consistently maintained that training models with publicly available data, such as articles from The Times and Daily News, is fair use in this instance and others.

In other words, OpenAI believes that it is not obligated to license or compensate for the examples — even if it generates revenue from the models — when developing models such as GPT-4o, which “learn” from billions of examples of e-books, essays, and other types of content to produce human-sounding text.

That being said, OpenAI has signed licensing agreements with many new publishers, such as the Associated Press, Axel Springer, the proprietor of Business Insider, the Financial Times, the parent company of People, Dotdash Meredith, and News Corp.

Business Insider Could be Bought by Axel Springer for Around $560 Million | CNBC

OpenAI has refrained from disclosing the specifics of these agreements to the public; however, Dotdash, one of its content partners, is purportedly receiving a minimum of $16 million annually.

OpenAI has not affirmed or denied that it trained its AI systems on any specific copyrighted works without permission.

Tags: Daily NewsGenerative AILawsuitNew York TimesNY TimesOpenAI

5 hours ago

Hillary Ondulohi

Hillary is a media creator with a background in mechanical engineering. He leverages his technical expertise to craft informative pieces on protechbro.com, making complex concepts accessible to a wider audience.

Next Reddit Returns After 4-hour Outage »

Previous « DOJ Says Google Must sell Chrome to Curb Monopoly

Disqus Comments Loading...

Spotify Tests Audiobook Videos in Expansion Push

Spotify is improving the audiobook experience for premium users by implementing three new experiments: the addition of visuals that appear…

33 minutes ago

Crypto

MARA Holdings Raises $1B for Extra Bitcoin Purchases

MARA Holdings has successfully concluded its $1 billion private offering, and part of the proceeds will be allocated to the…

38 minutes ago

Tech

WordPress.com’s Automattic Acquires Grammar Tool Harper

Automattic, the owner of WordPress.com, has acquired Harper, a grammar-checking tool, to enhance its content creation capabilities Automattic, the owner…

47 minutes ago