AI-Crawler Blocking Monitor

Who's blocking the AI crawlers?

GPTBot, ClaudeBot & co. read the web to train AI models and cite answers. We measure daily how many of Germany's 1,000 most-visited websites block these bots — openly via robots.txt and covertly at the server.

Panel: 1,000 German top domains (CrUX)

Which AI crawlers get blocked?

Share of the queried domains that block each bot. "robots.txt" = an explicit rule in robots.txt; "server-side" = the site answers the bot with 403/429 while serving a browser normally (GPTBot & ClaudeBot only, top-200).

AI crawlerrobots.txtserver-side

CCBotCommon Crawl · Datensatz

28.2%

GPTBotOpenAI · Training

27.2%

15.5%

BytespiderByteDance · Training

25%

ClaudeBotAnthropic · Training

23.5%

16.7%

Google-ExtendedGoogle · Gemini-Training

21.2%

meta-externalagentMeta · Training

20.5%

Applebot-ExtendedApple · Training

19.9%

AmazonbotAmazon · Assistent

17.2%

anthropic-aiAnthropic · Training (alt)

15.5%

PerplexityBotPerplexity · Suche

15.5%

ChatGPT-UserOpenAI · On-Demand

10.9%

OAI-SearchBotOpenAI · Suche

6.1%

explicit incl. blanket block (User-agent: *)

Two ways to block AI crawlers

1 · Open: robots.txt

A rule in robots.txt asks the bot not to read the site. Polite, public — and readable by anyone. Reputable crawlers obey it; it isn't technically enforced.

2 · Covert: server-side

The site recognizes the bot by its User-Agent and rejects it outright (403/429) while serving visitors normally. This block appears in no archive — we measure it ourselves, straight from Frankfurt.

Well-known German sites in detail

A selection of major German brands and which AI crawlers they block per robots.txt. Green = allowed, red = blocked.

	GPTBot	ClaudeBot	Google-Extended	PerplexityBot	CCBot	Bytespider
SpiegelNachrichten
BildNachrichten
ZeitNachrichten
FAZNachrichten
SüddeutscheNachrichten
WeltNachrichten
TagesschauNachrichten
n-tvNachrichten
FocusNachrichten
SternNachrichten
HandelsblattWirtschaft
heiseTech
GolemTech
ChipTech
t-onlinePortal
OttoE-Commerce
ZalandoE-Commerce
IdealoE-Commerce
ChefkochLifestyle
kickerSport

blockedallowed

Recent changes

0 domains in the panel changed their GPTBot/ClaudeBot rules in the last 7 days.

No changes among the tracked brands since tracking began — the first changes will appear here as soon as they happen.

How we measure

Honest and reproducible — a fixed panel, publicly documented sources, a clean split between robots.txt and the server's response.

1
Fixed panel: top-1000 (DE)
We check the same 1,000 most-visited German domains — Google's public CrUX country list (Chrome usage data). Frozen in place so the time series stays comparable.
2
robots.txt, daily
For each domain we read robots.txt and check, per bot, whether an explicit block (Disallow: /) is present.
3
Server-side probe, weekly
For the top-200 we fetch the homepage with a GPTBot/ClaudeBot User-Agent and compare to a browser fetch. If the server answers the bot with 403/429, that's a covert block.
4
Forward only, raw stays private
A history over a fixed panel can't be reconstructed — the server-side block in particular is in no archive. We don't publish the raw per-domain data; only aggregates, a curated brand table and changes.

"Block" here means blocking the whole site (Disallow: / or 403/429 on the homepage). The server-side probe is deliberately conservative: a browser fetch must succeed first, so blanket protection systems (e.g. Cloudflare challenges) aren't miscounted as AI blocking. We follow robots.txt conventions and query each domain only once a day.

Frequently asked

What is an AI crawler?

An automated bot that reads websites — to train AI models (e.g. GPTBot, ClaudeBot, Google-Extended) or to cite answers in AI search (e.g. OAI-SearchBot, PerplexityBot). Sites can block it via robots.txt or at the server.

What's the difference between robots.txt and server-side blocking?

robots.txt is a public request that reputable bots respect — but it isn't technically enforced. A server-side block actively rejects the bot (403/429). The latter is more binding and appears in no public archive — which is why we measure it ourselves.

Should I block AI crawlers?

It's a trade-off: blocking protects content from training but costs visibility in AI search — block ChatGPT & co. and you'll be cited there less. For many companies the right answer is nuanced: block training bots, allow search bots.

Where does the domain list come from?

From Google's public CrUX country list — the 1,000 most-visited domains from Germany, based on real Chrome usage data. We keep the list deliberately fixed so the time series stays comparable across the months.

Sources: each domain's robots.txt (public), domain panel from the CrUX top list (Google, CC BY). Server-side values are our own measurements from eu-central-1 (Frankfurt). "Block" = blocking the whole site. No warranty for completeness; robots.txt rules can be ambiguous.

Stay visible where AI answers

We tune your site so the right AI crawlers find you — and the wrong ones stay out.

Start a project See services