Tech

Reddit to Block Automated Scraping

After AI startups were found scraping its website for content, Reddit announced on Tuesday that it would alter its web standard to restrict automated data scraping

The decision is made when artificial intelligence firms have been accused of plagiarizing content from publishers to produce AI-generated summaries without obtaining permission or providing credit.

Reddit announced that it would revise the Robots Exclusion Protocol, or “robots.txt,” a widely recognized standard intended to specify which website components are permissible for crawling.

The company also stated that it would continue implementing rate-limiting, a method that regulates the number of requests from a specific entity. Additionally, it will prevent unknown algorithms and crawlers from data scraping, which involves collecting and storing raw information on its website.

In recent years, robots.txt has emerged as a critical tool publishers use to prevent technology companies from using their content to train AI algorithms and generate summaries in response to specific search queries.

TollBit, a content licensing startup, wrote to publishers last week to inform them that numerous AI firms were exploiting the web standard to trawl publisher sites.

This report is the result of an investigation conducted by Wired, which revealed that Perplexity, an AI search startup, likely circumvented attempts to block its web crawler through robots.txt.

Business media publisher Forbes accused Perplexity of plagiarizing its investigative articles for use in generative AI systems without attribution earlier in June.

On Tuesday, Reddit announced that its content will remain accessible to researchers and organizations like the Internet Archive for non-commercial purposes.

Caleb Ogwuche

Caleb, a graduate in Biological Science, serves as a DevOps Engineer. He expertly leverages his scientific knowledge and technical prowess to deliver insightful tech content on protechbro.com.

Share
Published by
Caleb Ogwuche

Recent Posts

Bitcoin Breaks From Stocks, Market Shift Eyed

As market divergence grows, the Bitcoin and S&P 500 correlation drops below 0.05, indicating independent…

33 seconds ago

CoinDCX Buys BitOasis Exchange for Undisclosed Amount

India's largest crypto exchange, CoinDCX buys BitOasis to expand its reach in the Middle East…

53 mins ago

Bitcoin Mining Firm Genesis Digital Assets Wants US IPO

A popular Bitcoin mining firm, Genesis Digital Assets (GDA), is considering using the US initial…

1 hour ago

HashKey Airdrops HSK Token via Telegram

On Tuesday, HashKey, a crypto firm headquartered in Asia, announced the formal airdrop of its…

5 hours ago

Biden Revokes 8 Huawei Licenses in 2024

According to Reuters, the Biden administration has revoked eight licenses this year that permitted some…

5 hours ago

EU Targets China’s Temu, Shein with Import Duty

The Financial Times reported on Wednesday that three sources told them the EU is considering…

6 hours ago