Tech

Reddit to Block Automated Scraping

After AI startups were found scraping its website for content, Reddit announced on Tuesday that it would alter its web standard to restrict automated data scraping

The decision is made when artificial intelligence firms have been accused of plagiarizing content from publishers to produce AI-generated summaries without obtaining permission or providing credit.

Reddit announced that it would revise the Robots Exclusion Protocol, or “robots.txt,” a widely recognized standard intended to specify which website components are permissible for crawling.

The company also stated that it would continue implementing rate-limiting, a method that regulates the number of requests from a specific entity. Additionally, it will prevent unknown algorithms and crawlers from data scraping, which involves collecting and storing raw information on its website.

In recent years, robots.txt has emerged as a critical tool publishers use to prevent technology companies from using their content to train AI algorithms and generate summaries in response to specific search queries.

TollBit, a content licensing startup, wrote to publishers last week to inform them that numerous AI firms were exploiting the web standard to trawl publisher sites.

This report is the result of an investigation conducted by Wired, which revealed that Perplexity, an AI search startup, likely circumvented attempts to block its web crawler through robots.txt.

Business media publisher Forbes accused Perplexity of plagiarizing its investigative articles for use in generative AI systems without attribution earlier in June.

On Tuesday, Reddit announced that its content will remain accessible to researchers and organizations like the Internet Archive for non-commercial purposes.

Tags: Artificial IntelligenceRedditRobots Exclusion ProtocolStartUps

5 months ago

Caleb Ogwuche

Caleb, a graduate in Biological Science, serves as a DevOps Engineer. He expertly leverages his scientific knowledge and technical prowess to deliver insightful tech content on protechbro.com.

Next OpenAI Delays 'Voice Mode' to July »

Previous « Apple's New iOS 18 Beta Includes RCS Support

Disqus Comments Loading...

YouTube Updates Eraser Tool to Remove Copyrighted Music

Apple Acknowledges Security Flaw Exposing Crypto Users—Here’s What to Do

Apple acknowledged on Monday that its devices were susceptible to an exploit that enabled the execution of remote malicious code…

1 minute ago

Tech

Hackers Breach Tate’s Online ‘University,’ Steal Data

Hackers have infiltrated an online course that was established by Andrew Tate, a self-described misogynist and purported influencer The compromise…

5 hours ago

Apple Builds Conversational Siri with LLMs

Apple is reportedly working on an enhanced version of Siri, incorporating large language models (LLMs) to create a more conversational…

5 hours ago

Tech

YouTube Shorts Unveils AI Video Backgrounds

Thursday was the day that YouTube announced that its Dream Screen feature for Shorts now allows users to construct movie…

5 hours ago

Marissa Mayer Pitches Ad-backed AI Chatbot Model

Marissa Mayer proposes a business model for AI chatbots funded by advertising, highlighting potential opportunities for monetization Marissa Mayer possesses…

5 hours ago

Tech

Palo Alto Networks Warns of Firewall Breaches

By exploiting two new zero-day vulnerabilities discovered in widely used software developed by cybersecurity behemoth Palo Alto Networks, malicious hackers…

6 hours ago

Reddit to Block Automated Scraping

Related Post

Recent Posts

Apple Acknowledges Security Flaw Exposing Crypto Users—Here’s What to Do

Hackers Breach Tate’s Online ‘University,’ Steal Data

Apple Builds Conversational Siri with LLMs

YouTube Shorts Unveils AI Video Backgrounds

Marissa Mayer Pitches Ad-backed AI Chatbot Model

Palo Alto Networks Warns of Firewall Breaches