A quarter of Germany's biggest websites now block the AI crawlers

When ChatGPT, Claude or Perplexity answer a question, they read the web first. A growing share of German websites wants to stop exactly that. We measured how widespread the AI blockade in Germany really is — analysing the robots.txt of the 1,000 most-visited German domains (891 of which were reachable). The result is unambiguous: about a quarter of Germany's biggest websites actively block the major AI crawlers.

In brief

27.2% of the queried top domains explicitly block GPTBot (OpenAI's training crawler) via robots.txt. The Common-Crawl bot CCBot is even higher, at 28.2%.
ClaudeBot (Anthropic) is blocked by 23.5%, Bytespider (TikTok/ByteDance) by 25.0%, Google-Extended by 21.2%.
The real finding: sites block training bots far more than search bots. OpenAI's search crawler OAI-SearchBot is blocked by just 6.1% — a quarter as many as GPTBot.
News sites are strictest: FAZ, Spiegel and Zeit block nearly every major AI crawler.
And: some blocking appears in no robots.txt. Around 16% of the top-200 sites reject GPTBot or ClaudeBot directly at the server (HTTP 403) — invisible to browser visitors.

Who gets blocked most?

For every domain we checked whether the robots.txt contains an explicit block (Disallow: /) for the bot. Here's how the blocking spreads across the twelve major AI crawlers:

AI crawler	Operator	Purpose	blocked by
CCBot	Common Crawl	Dataset	28.2%
GPTBot	OpenAI	Training	27.2%
Bytespider	ByteDance	Training	25.0%
ClaudeBot	Anthropic	Training	23.5%
Google-Extended	Google	Gemini training	21.2%
meta-externalagent	Meta	Training	20.5%
Applebot-Extended	Apple	Training	19.9%
Amazonbot	Amazon	Assistant	17.2%
anthropic-ai	Anthropic	Training (legacy)	15.5%
PerplexityBot	Perplexity	Search	15.5%
ChatGPT-User	OpenAI	On-demand	10.9%
OAI-SearchBot	OpenAI	Search	6.1%

Training yes, search no — the real finding

The most telling number isn't the top value, it's the gap within a single provider. OpenAI runs several crawlers: GPTBot collects text for model training, OAI-SearchBot fetches citations for search inside ChatGPT. GPTBot is blocked by 27.2% of sites — OAI-SearchBot by only 6.1%.

That gap is a stance, not an accident: German site owners don't want to become training material unasked — but they very much do want to be cited in AI answers. Blanket-blocking everything throws away visibility in exactly the search systems now displacing classic Google traffic.

News sites block hardest

Across sectors, publishers block most consistently. In our sample of well-known brands, FAZ, Spiegel and Zeit each block nine to twelve of the twelve tracked AI crawlers — essentially the whole list. E-commerce and portal sites are far more open. That matches the economics: publishers are negotiating licences with AI companies and block until money flows — while an online shop tends to benefit from a mention in ChatGPT.

The covert block: 403 instead of robots.txt

robots.txt is a polite request. It's public, and reputable crawlers respect it — but it can't be enforced. So some sites block harder: they recognise the bot by its user agent and reject it at the server (HTTP 403), while a normal browser gets the page as usual.

This kind of block appears in no robots.txt and in no archive — you only see it if you knock with the bot's user agent yourself. That's exactly what we did for the 200 largest domains: 15.5% reject GPTBot server-side, 16.7% ClaudeBot — on top of the robots.txt blocks. Real blocking is therefore higher than robots.txt alone suggests.

What this means for you

If you run a company website, this isn't academic. It helps decide whether AI systems will still find and cite you.

Blocking everything is rarely the right answer. The nuanced line professionals take: block training bots (GPTBot, ClaudeBot, Google-Extended, CCBot) if you care about how your content is used — but allow search bots (OAI-SearchBot, PerplexityBot) so you show up in AI answers.
Check what your server actually does. Many blocks arise unnoticed from security or CDN rules. If your host rejects AI crawlers server-side, none of that is in your robots.txt — and you only notice when you're missing.
Decide deliberately, not by reflex. Visibility in AI search is the new SEO channel. Wall it off and you save yourself the training — and lose the customers who'll ask ChatGPT instead of Google.

How we measured

The basis is a fixed set of the 1,000 most-visited German domains from Google's public CrUX country list (Chrome usage data). For each domain we read robots.txt daily and check, per crawler, for an explicit block (Disallow: /). On 3 July 2026, 891 of the 1,000 domains were reachable. The server-side probe runs weekly for the top-200: we fetch the homepage once with a browser user agent and once with a GPTBot/ClaudeBot user agent, and count a 403/429 as a block only when the browser is served normally. All measurements run from a Frankfurt cloud region — we deliberately don't publish the raw per-domain data, only the aggregates.

The numbers are a snapshot and change daily. You'll find the continuously updated status in the AI-Crawler Blocking Monitor.

All analyses are based on i6eal's own measurements or on clearly labelled sources. Figures are snapshots and may change; corrections are disclosed transparently.