Cloudflare has taken a strong stance against unauthorized AI scraping. As of July 1, websites using Cloudflare’s services will automatically block AI bots that try to access their content without permission.
This includes bots from OpenAI, Google, Anthropic, Perplexity, and Amazon tools, widely used to train large language models. The move is a direct response to growing concerns from publishers and creators whose content is being used to power AI systems without consent or compensation.
“If you run a website, your content shouldn’t be taken without your permission,” said Cloudflare CEO Matthew Prince.
This default block is now live for all Cloudflare users no setup required. Site owners can still allow specific bots through manual configuration, but unless approved, most major AI crawlers will be stopped.
Cloudflare’s AI Bot Blocklist Gives Websites Automatic Protection
Until now, websites had to rely on updating their robots.txt files to block unwanted crawlers, an approach that many bots ignored. With Cloudflare’s move, that protection is now enforced at the network level, meaning bots that don’t comply are denied access automatically.
The update is particularly relevant for content-heavy websites, newsrooms, blogs, product portals, help centers, and more that have seen their content reused by generative AI tools without compensation or attribution.
This change offers those sites an immediate layer of defense against having their original content extracted, repurposed, and embedded in AI models.
New AI Scraping Marketplace Allows Sites to Monetize Their Content
To complement the blocklist, Cloudflare launched an AI scraping marketplace. It allows AI companies to formally request access to a site’s content and pay for it.
Website owners can set terms, approve or deny requests, and charge based on how their content will be used. Cloudflare verifies the bots and enforces access rules. This change gives businesses a fair shot at turning their content into real value, instead of watching it get scraped and reused without credit or compensation.
It’s a big deal for brands that invest in content, whether it’s help docs, blogs, product pages, or internal knowledge hubs. These resources often end up training AI models that never ask permission, and sometimes even compete with the original creators. That’s the part many businesses are waking up to now.
A Step Toward Ethical AI Scraping
Cloudflare’s update isn’t just technical; it reflects a bigger shift toward a web where consent matters. With AI evolving so fast, businesses that publish original content need to think differently about how their data is being used.
And for developers building AI systems, especially those that rely on web data, this means being more intentional about how they collect and use information. The lines are changing, and staying responsible is no longer optional.
At Tekrevol, we work with companies to build AI systems that are not only powerful but also ethical and secure. From chatbots to autonomous agents, our approach to AI agent development emphasizes responsible data usage.
Cloudflare’s update gives businesses more control over their content and developers a clearer framework to build AI the right way.