Reddit Blocks Internet Archive to Prevent AI Crawlers from Accessing Content

2025-08-13

Reddit has announced its decision to block the Internet Archive from indexing its popular online forum. This move aims to prevent AI companies from scraping its content for training purposes.

According to reports, Reddit discovered that AI firms were using the Internet Archive's platform to extract data. Prior to this decision, the company had already restricted direct scraping attempts through its official website. The new policy means the organization's widely-used Wayback Machine will no longer archive Reddit pages, posts, profiles or comments - except for content displayed on the homepage.

The Verge reported that going forward, the archive will only show popular posts and news headlines from specific dates. Previously, the Wayback Machine archived every page, preserving all content posted on Reddit's self-proclaimed "homepage of the internet."

Reddit did not specify which AI companies used the Wayback Machine to bypass its content scraping ban. A company spokesperson told The Verge they "became aware of AI companies violating platform policies... and extracting data from the Wayback Machine."

The company appears to believe the Internet Archive should implement measures to prevent such scraping, suggesting this decision might not be permanent. However, the report also highlighted Reddit's concern about the Wayback Machine's tendency to archive posts and comments later deleted by users, calling this a privacy issue.

"Until they can protect their site and comply with platform policies, we will restrict access to certain Reddit data to protect our users," the company stated.

While Reddit raised privacy concerns, its primary motivation for blocking crawlers likely stems from financial interests. AI companies are explicitly prohibited from scraping the site unless they pay for data access. Several firms have accepted Reddit's offer, particularly Google LLC and OpenAI.

Reddit never disclosed the value of its agreement with OpenAI, but its deal with Google reportedly reached approximately $60 million. The company previously stated it aims to generate up to $200 million in revenue from such licensing agreements over three years.

One company seemingly unwilling to pay is Anthropic PBC. In June, Reddit filed a lawsuit against the firm, alleging it continued scraping content after claiming it would stop.

The Internet Archive is not the first organization blocked by Reddit over scraping concerns. In June 2024, the social media company revealed it had restricted Microsoft's Bing and smaller search engines like DuckDuckGo, Mojeek and Qwant to prevent content extraction through their archives.

It remains unclear whether the Internet Archive will implement measures to prevent archive scraping in an attempt to lift Reddit's restrictions. Mark Graham, director of the Wayback Machine, stated in a declaration that his team is engaged in "ongoing discussions" regarding the matter.