Categories: Technology

Reddit to Update Web Standard to Block Automated Data Scraping From Its Website

Social media platform Reddit said on Tuesday it will update a Web standard used by the platform to block automated data scraping from its website, following reports that AI startups were bypassing the rule to gather content for their systems.

The move comes at a time when artificial intelligence firms have been accused of plagiarizing content from publishers to create AI-generated summaries without giving credit or asking for permission.

Reddit said that it would update the Robots Exclusion Protocol, or “robots.txt,” a widely accepted standard meant to determine which parts of a site are allowed to be crawled.

The company also said it will maintain rate-limiting, a technique used to control the number of requests from one particular entity, and will block unknown bots and crawlers from data scraping – collecting and saving raw information – on its website.

More recently, robots.txt has become a key tool that publishers employ to prevent tech companies from using their content free-of-charge to train AI algorithms and create summaries in response to some search queries.

Last week, a letter to publishers by the content licensing startup TollBit said that several AI firms were circumventing the web standard to scrape publisher sites.

This follows a Wired investigation which found that AI search startup Perplexity likely bypassed efforts to block its Web crawler via robots.txt.

Earlier in June, business media publisher Forbes accused Perplexity of plagiarizing its investigative stories for use in generative AI systems without giving credit.

Reddit said on Tuesday that researchers and organizations such as the Internet Archive will continue to have access to its content for non-commercial use.

© Thomson Reuters 2024


Affiliate links may be automatically generated – see our ethics statement for details.
News Today

Recent Posts

Kareena Kapoor’s Next Untitled Film With Meghna Gulzar Gets Prithviraj Sukumaran On Board

Kareena Kapoor is working with Raazi director Meghna Gulzar for her next film. The project,…

2 weeks ago

Purdue basketball freshman Daniel Jacobsen injured vs Northern Kentucky

2024-11-09 15:00:03 WEST LAFAYETTE -- Daniel Jacobsen's second game in Purdue basketball's starting lineup lasted…

2 weeks ago

Rashida Jones honors dad Quincy Jones with heartfelt tribute: ‘He was love’

2024-11-09 14:50:03 Rashida Jones is remembering her late father, famed music producer Quincy Jones, in…

2 weeks ago

Nosferatu Screening at Apollo Theatre Shows Student Interest in Experimental Cinema – The Oberlin Review

2024-11-09 14:40:03 A silent German expressionist film about vampires accompanied by Radiohead’s music — what…

2 weeks ago

What Are Adaptogens? Find Out How These 3 Herbs May Help You Tackle Stress Head-On

Let's face it - life can be downright stressful! With everything moving at breakneck speed,…

2 weeks ago

The new Mac Mini takes a small step towards upgradeable storage

Apple’s redesigned Mac Mini M4 has ditched the previous M2 machine’s SSD that was soldered…

2 weeks ago