When ChatGPT, Claude or Perplexity answer a question, they read the web first. A growing share of German websites wants to stop exactly that. We measured how widespread the AI blockade in Germany really is — analysing the robots.txt of the 1,000 most-visited German domains (891 of which were reachable). The result is unambiguous: about a quarter of Germany's biggest websites actively block the major AI crawlers.
In brief
- 27.2% of the queried top domains explicitly block GPTBot (OpenAI's training crawler) via robots.txt. The Common-Crawl bot CCBot is even higher, at 28.2%.
- ClaudeBot (Anthropic) is blocked by 23.5%, Bytespider (TikTok/ByteDance) by 25.0%, Google-Extended by 21.2%.
- The real finding: sites block training bots far more than search bots. OpenAI's search crawler OAI-SearchBot is blocked by just 6.1% — a quarter as many as GPTBot.
- News sites are strictest: FAZ, Spiegel and Zeit block nearly every major AI crawler.
- And: some blocking appears in no robots.txt. Around 16% of the top-200 sites reject GPTBot or ClaudeBot directly at the server (HTTP 403) — invisible to browser visitors.
Who gets blocked most?
For every domain we checked whether the robots.txt contains an explicit block (Disallow: /) for the bot. Here's how the blocking spreads across the twelve major AI crawlers:
| AI crawler | Operator | Purpose | blocked by |
|---|---|---|---|
| CCBot | Common Crawl | Dataset | 28.2% |
| GPTBot | OpenAI | Training | 27.2% |
| Bytespider | ByteDance | Training | 25.0% |
| ClaudeBot | Anthropic | Training | 23.5% |
| Google-Extended | Gemini training | 21.2% | |
| meta-externalagent | Meta | Training | 20.5% |
| Applebot-Extended | Apple | Training | 19.9% |
| Amazonbot | Amazon | Assistant | 17.2% |
| anthropic-ai | Anthropic | Training (legacy) | 15.5% |
| PerplexityBot | Perplexity | Search | 15.5% |
| ChatGPT-User | OpenAI | On-demand | 10.9% |
| OAI-SearchBot | OpenAI | Search | 6.1% |
Training yes, search no — the real finding
The most telling number isn't the top value, it's the gap within a single provider. OpenAI runs several crawlers: GPTBot collects text for model training, OAI-SearchBot fetches citations for search inside ChatGPT. GPTBot is blocked by 27.2% of sites — OAI-SearchBot by only 6.1%.
That gap is a stance, not an accident: German site owners don't want to become training material unasked — but they very much do want to be cited in AI answers. Blanket-blocking everything throws away visibility in exactly the search systems now displacing classic Google traffic.
News sites block hardest
Across sectors, publishers block most consistently. In our sample of well-known brands, FAZ, Spiegel and Zeit each block nine to twelve of the twelve tracked AI crawlers — essentially the whole list. E-commerce and portal sites are far more open. That matches the economics: publishers are negotiating licences with AI companies and block until money flows — while an online shop tends to benefit from a mention in ChatGPT.
The covert block: 403 instead of robots.txt
robots.txt is a polite request. It's public, and reputable crawlers respect it — but it can't be enforced. So some sites block harder: they recognise the bot by its user agent and reject it at the server (HTTP 403), while a normal browser gets the page as usual.
This kind of block appears in no robots.txt and in no archive — you only see it if you knock with the bot's user agent yourself. That's exactly what we did for the 200 largest domains: 15.5% reject GPTBot server-side, 16.7% ClaudeBot — on top of the robots.txt blocks. Real blocking is therefore higher than robots.txt alone suggests.
What this means for you
If you run a company website, this isn't academic. It helps decide whether AI systems will still find and cite you.
- Blocking everything is rarely the right answer. The nuanced line professionals take: block training bots (GPTBot, ClaudeBot, Google-Extended, CCBot) if you care about how your content is used — but allow search bots (OAI-SearchBot, PerplexityBot) so you show up in AI answers.
- Check what your server actually does. Many blocks arise unnoticed from security or CDN rules. If your host rejects AI crawlers server-side, none of that is in your robots.txt — and you only notice when you're missing.
- Decide deliberately, not by reflex. Visibility in AI search is the new SEO channel. Wall it off and you save yourself the training — and lose the customers who'll ask ChatGPT instead of Google.
How we measured
The basis is a fixed set of the 1,000 most-visited German domains from Google's public CrUX country list (Chrome usage data). For each domain we read robots.txt daily and check, per crawler, for an explicit block (Disallow: /). On 3 July 2026, 891 of the 1,000 domains were reachable. The server-side probe runs weekly for the top-200: we fetch the homepage once with a browser user agent and once with a GPTBot/ClaudeBot user agent, and count a 403/429 as a block only when the browser is served normally. All measurements run from a Frankfurt cloud region — we deliberately don't publish the raw per-domain data, only the aggregates.
The numbers are a snapshot and change daily. You'll find the continuously updated status in the AI-Crawler Blocking Monitor.

