GPTBot, ClaudeBot & co. read the web to train AI models and cite answers. We measure daily how many of Germany's 1,000 most-visited websites block these bots — openly via robots.txt and covertly at the server.
Share of the queried domains that block each bot. "robots.txt" = an explicit rule in robots.txt; "server-side" = the site answers the bot with 403/429 while serving a browser normally (GPTBot & ClaudeBot only, top-200).
explicit incl. blanket block (User-agent: *)
A rule in robots.txt asks the bot not to read the site. Polite, public — and readable by anyone. Reputable crawlers obey it; it isn't technically enforced.
The site recognizes the bot by its User-Agent and rejects it outright (403/429) while serving visitors normally. This block appears in no archive — we measure it ourselves, straight from Frankfurt.
A selection of major German brands and which AI crawlers they block per robots.txt. Green = allowed, red = blocked.
| GPTBot | ClaudeBot | Google-Extended | PerplexityBot | CCBot | Bytespider | |
|---|---|---|---|---|---|---|
| SpiegelNachrichten | ||||||
| BildNachrichten | ||||||
| ZeitNachrichten | ||||||
| FAZNachrichten | ||||||
| SüddeutscheNachrichten | ||||||
| WeltNachrichten | ||||||
| TagesschauNachrichten | ||||||
| n-tvNachrichten | ||||||
| FocusNachrichten | ||||||
| SternNachrichten | ||||||
| HandelsblattWirtschaft | ||||||
| heiseTech | ||||||
| GolemTech | ||||||
| ChipTech | ||||||
| t-onlinePortal | ||||||
| OttoE-Commerce | ||||||
| ZalandoE-Commerce | ||||||
| IdealoE-Commerce | ||||||
| ChefkochLifestyle | ||||||
| kickerSport |
blockedallowed
0 domains in the panel changed their GPTBot/ClaudeBot rules in the last 7 days.
No changes among the tracked brands since tracking began — the first changes will appear here as soon as they happen.
Honest and reproducible — a fixed panel, publicly documented sources, a clean split between robots.txt and the server's response.
We check the same 1,000 most-visited German domains — Google's public CrUX country list (Chrome usage data). Frozen in place so the time series stays comparable.
For each domain we read robots.txt and check, per bot, whether an explicit block (Disallow: /) is present.
For the top-200 we fetch the homepage with a GPTBot/ClaudeBot User-Agent and compare to a browser fetch. If the server answers the bot with 403/429, that's a covert block.
A history over a fixed panel can't be reconstructed — the server-side block in particular is in no archive. We don't publish the raw per-domain data; only aggregates, a curated brand table and changes.
"Block" here means blocking the whole site (Disallow: / or 403/429 on the homepage). The server-side probe is deliberately conservative: a browser fetch must succeed first, so blanket protection systems (e.g. Cloudflare challenges) aren't miscounted as AI blocking. We follow robots.txt conventions and query each domain only once a day.
An automated bot that reads websites — to train AI models (e.g. GPTBot, ClaudeBot, Google-Extended) or to cite answers in AI search (e.g. OAI-SearchBot, PerplexityBot). Sites can block it via robots.txt or at the server.
robots.txt is a public request that reputable bots respect — but it isn't technically enforced. A server-side block actively rejects the bot (403/429). The latter is more binding and appears in no public archive — which is why we measure it ourselves.
It's a trade-off: blocking protects content from training but costs visibility in AI search — block ChatGPT & co. and you'll be cited there less. For many companies the right answer is nuanced: block training bots, allow search bots.
From Google's public CrUX country list — the 1,000 most-visited domains from Germany, based on real Chrome usage data. We keep the list deliberately fixed so the time series stays comparable across the months.
Sources: each domain's robots.txt (public), domain panel from the CrUX top list (Google, CC BY). Server-side values are our own measurements from eu-central-1 (Frankfurt). "Block" = blocking the whole site. No warranty for completeness; robots.txt rules can be ambiguous.
We tune your site so the right AI crawlers find you — and the wrong ones stay out.