Reddit Blocks Internet Archive to Prevent AI Crawlers from Accessing Content AI NEWS

Home
AInews
Reddit Blocks Internet Archive to Prevent AI Crawlers from Accessing Content

Reddit Blocks Internet Archive to Prevent AI Crawlers from Accessing Content

2025-08-13

Reddit has announced its decision to block the Internet Archive from indexing its popular online forum. This move aims to prevent AI companies from scraping its content for training purposes.

According to reports, Reddit discovered that AI firms were using the Internet Archive's platform to extract data. Prior to this decision, the company had already restricted direct scraping attempts through its official website. The new policy means the organization's widely-used Wayback Machine will no longer archive Reddit pages, posts, profiles or comments - except for content displayed on the homepage.

The Verge reported that going forward, the archive will only show popular posts and news headlines from specific dates. Previously, the Wayback Machine archived every page, preserving all content posted on Reddit's self-proclaimed "homepage of the internet."

Reddit did not specify which AI companies used the Wayback Machine to bypass its content scraping ban. A company spokesperson told The Verge they "became aware of AI companies violating platform policies... and extracting data from the Wayback Machine."

The company appears to believe the Internet Archive should implement measures to prevent such scraping, suggesting this decision might not be permanent. However, the report also highlighted Reddit's concern about the Wayback Machine's tendency to archive posts and comments later deleted by users, calling this a privacy issue.

"Until they can protect their site and comply with platform policies, we will restrict access to certain Reddit data to protect our users," the company stated.

While Reddit raised privacy concerns, its primary motivation for blocking crawlers likely stems from financial interests. AI companies are explicitly prohibited from scraping the site unless they pay for data access. Several firms have accepted Reddit's offer, particularly Google LLC and OpenAI.

Reddit never disclosed the value of its agreement with OpenAI, but its deal with Google reportedly reached approximately $60 million. The company previously stated it aims to generate up to $200 million in revenue from such licensing agreements over three years.

One company seemingly unwilling to pay is Anthropic PBC. In June, Reddit filed a lawsuit against the firm, alleging it continued scraping content after claiming it would stop.

The Internet Archive is not the first organization blocked by Reddit over scraping concerns. In June 2024, the social media company revealed it had restricted Microsoft's Bing and smaller search engines like DuckDuckGo, Mojeek and Qwant to prevent content extraction through their archives.

It remains unclear whether the Internet Archive will implement measures to prevent archive scraping in an attempt to lift Reddit's restrictions. Mark Graham, director of the Wayback Machine, stated in a declaration that his team is engaged in "ongoing discussions" regarding the matter.

Vizcom AI

Transform sketches into 3D models and edit them

Keploy

Automated testing made easy with AI technology

Figma Make

Create prototype apps from existing designs

Doctronic

AI platform providing personalized health guidance

3D Look AI

AI body scanner for accurate body measurements

VulnZap

AI code vulnerability scanner

The Furnisher

AI room design tool for quick makeovers

RECENT AI TOOLS

Plaud

Vizcom AI

Keploy

Figma Make

Doctronic

RECENT AI NEWS

Voxtral Transcribe 2 offers speech recognition at $0.003 per minute.

NVIDIA RTX 50 Series Super Refresh Delayed, RTX 60 Series May Miss 2027 Launch

Higgsfield Launches "Atmosphere" Editor to Aid Dynamic Graphics Creation

Meta Tests Its AI-Generated "Vibes" Video Standalone App

OpenAI Launches Frontier Agent Management Platform and New GPT-5.3-Codex Model

Anthropic Releases Claude Opus 4.6 with 1 Million Token Context Support

Amazon to Test AI Tools for Film and TV Production Next Month

Google's Annual Revenue Surpasses $400 Billion for the First Time

RECENT AI TOOLS