[{"data":1,"prerenderedAt":30},["ShallowReactive",2],{"nr-en-deutsche-top-websites-sperren-ki-crawler":3},{"slug":4,"title":5,"dek":6,"date":7,"time":8,"publishedAt":9,"updated":10,"updatedAt":10,"dateFmt":11,"updatedFmt":10,"kind":12,"tier":13,"author":14,"authorName":15,"topics":16,"tracker":21,"trackerLabel":22,"headlineStat":23,"image":24,"imageAlt":25,"csv":26,"minutes":27,"words":28,"html":29},"deutsche-top-websites-sperren-ki-crawler","A quarter of Germany's biggest websites now block the AI crawlers","We analysed the robots.txt of all 1,000 most-visited German domains. GPTBot, ClaudeBot & co. are being blocked systematically — news sites hardest of all. And some of the blocking appears in no robots.txt.","2026-07-03","09:00","2026-07-03T09:00:00+02:00","","July 3, 2026","analyse","flagship","ideal-syka","Ideal Syka",[17,18,19,20],"AI crawlers","robots.txt","SEO","GEO","\u002Fki-crawler-monitor","AI-Crawler Blocking Monitor","27.2% of Germany's biggest websites block GPTBot via robots.txt","\u002Fog-nr\u002Fdeutsche-top-websites-sperren-ki-crawler.en.png","Share of German top-1000 websites blocking each AI crawler","\u002Fnewsroom\u002Fdata\u002Fki-crawler-blockade-2026-07-03.csv",5,900,"\u003Cp>When ChatGPT, Claude or Perplexity answer a question, they read the web first. A growing share of German websites wants to stop exactly that. We measured how widespread the AI blockade in Germany really is — analysing the robots.txt of the \u003Cstrong>1,000 most-visited German domains\u003C\u002Fstrong> (891 of which were reachable). The result is unambiguous: \u003Cstrong>about a quarter of Germany&#39;s biggest websites actively block the major AI crawlers.\u003C\u002Fstrong>\u003C\u002Fp>\n\u003Ch2>In brief\u003C\u002Fh2>\n\u003Cul>\n\u003Cli>\u003Cstrong>27.2%\u003C\u002Fstrong> of the queried top domains explicitly block \u003Cstrong>GPTBot\u003C\u002Fstrong> (OpenAI&#39;s training crawler) via robots.txt. The Common-Crawl bot \u003Cstrong>CCBot\u003C\u002Fstrong> is even higher, at \u003Cstrong>28.2%\u003C\u002Fstrong>.\u003C\u002Fli>\n\u003Cli>\u003Cstrong>ClaudeBot\u003C\u002Fstrong> (Anthropic) is blocked by \u003Cstrong>23.5%\u003C\u002Fstrong>, \u003Cstrong>Bytespider\u003C\u002Fstrong> (TikTok\u002FByteDance) by \u003Cstrong>25.0%\u003C\u002Fstrong>, \u003Cstrong>Google-Extended\u003C\u002Fstrong> by \u003Cstrong>21.2%\u003C\u002Fstrong>.\u003C\u002Fli>\n\u003Cli>The real finding: sites block \u003Cstrong>training bots far more than search bots\u003C\u002Fstrong>. OpenAI&#39;s search crawler \u003Cstrong>OAI-SearchBot\u003C\u002Fstrong> is blocked by just \u003Cstrong>6.1%\u003C\u002Fstrong> — a quarter as many as GPTBot.\u003C\u002Fli>\n\u003Cli>News sites are strictest: FAZ, Spiegel and Zeit block nearly every major AI crawler.\u003C\u002Fli>\n\u003Cli>And: some blocking appears in \u003Cstrong>no robots.txt\u003C\u002Fstrong>. Around \u003Cstrong>16%\u003C\u002Fstrong> of the top-200 sites reject GPTBot or ClaudeBot directly at the server (HTTP 403) — invisible to browser visitors.\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Ch2>Who gets blocked most?\u003C\u002Fh2>\n\u003Cp>For every domain we checked whether the robots.txt contains an explicit block (\u003Ccode>Disallow: \u002F\u003C\u002Fcode>) for the bot. Here&#39;s how the blocking spreads across the twelve major AI crawlers:\u003C\u002Fp>\n\u003Ctable>\n\u003Cthead>\n\u003Ctr>\n\u003Cth>AI crawler\u003C\u002Fth>\n\u003Cth>Operator\u003C\u002Fth>\n\u003Cth>Purpose\u003C\u002Fth>\n\u003Cth>blocked by\u003C\u002Fth>\n\u003C\u002Ftr>\n\u003C\u002Fthead>\n\u003Ctbody>\u003Ctr>\n\u003Ctd>CCBot\u003C\u002Ftd>\n\u003Ctd>Common Crawl\u003C\u002Ftd>\n\u003Ctd>Dataset\u003C\u002Ftd>\n\u003Ctd>\u003Cstrong>28.2%\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>GPTBot\u003C\u002Ftd>\n\u003Ctd>OpenAI\u003C\u002Ftd>\n\u003Ctd>Training\u003C\u002Ftd>\n\u003Ctd>\u003Cstrong>27.2%\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Bytespider\u003C\u002Ftd>\n\u003Ctd>ByteDance\u003C\u002Ftd>\n\u003Ctd>Training\u003C\u002Ftd>\n\u003Ctd>\u003Cstrong>25.0%\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>ClaudeBot\u003C\u002Ftd>\n\u003Ctd>Anthropic\u003C\u002Ftd>\n\u003Ctd>Training\u003C\u002Ftd>\n\u003Ctd>\u003Cstrong>23.5%\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Google-Extended\u003C\u002Ftd>\n\u003Ctd>Google\u003C\u002Ftd>\n\u003Ctd>Gemini training\u003C\u002Ftd>\n\u003Ctd>\u003Cstrong>21.2%\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>meta-externalagent\u003C\u002Ftd>\n\u003Ctd>Meta\u003C\u002Ftd>\n\u003Ctd>Training\u003C\u002Ftd>\n\u003Ctd>\u003Cstrong>20.5%\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Applebot-Extended\u003C\u002Ftd>\n\u003Ctd>Apple\u003C\u002Ftd>\n\u003Ctd>Training\u003C\u002Ftd>\n\u003Ctd>\u003Cstrong>19.9%\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>Amazonbot\u003C\u002Ftd>\n\u003Ctd>Amazon\u003C\u002Ftd>\n\u003Ctd>Assistant\u003C\u002Ftd>\n\u003Ctd>\u003Cstrong>17.2%\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>anthropic-ai\u003C\u002Ftd>\n\u003Ctd>Anthropic\u003C\u002Ftd>\n\u003Ctd>Training (legacy)\u003C\u002Ftd>\n\u003Ctd>\u003Cstrong>15.5%\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>PerplexityBot\u003C\u002Ftd>\n\u003Ctd>Perplexity\u003C\u002Ftd>\n\u003Ctd>Search\u003C\u002Ftd>\n\u003Ctd>\u003Cstrong>15.5%\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>ChatGPT-User\u003C\u002Ftd>\n\u003Ctd>OpenAI\u003C\u002Ftd>\n\u003Ctd>On-demand\u003C\u002Ftd>\n\u003Ctd>\u003Cstrong>10.9%\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003Ctr>\n\u003Ctd>OAI-SearchBot\u003C\u002Ftd>\n\u003Ctd>OpenAI\u003C\u002Ftd>\n\u003Ctd>Search\u003C\u002Ftd>\n\u003Ctd>\u003Cstrong>6.1%\u003C\u002Fstrong>\u003C\u002Ftd>\n\u003C\u002Ftr>\n\u003C\u002Ftbody>\u003C\u002Ftable>\n\u003Ch2>Training yes, search no — the real finding\u003C\u002Fh2>\n\u003Cp>The most telling number isn&#39;t the top value, it&#39;s the gap \u003Cem>within\u003C\u002Fem> a single provider. OpenAI runs several crawlers: \u003Cstrong>GPTBot\u003C\u002Fstrong> collects text for model training, \u003Cstrong>OAI-SearchBot\u003C\u002Fstrong> fetches citations for search inside ChatGPT. GPTBot is blocked by 27.2% of sites — OAI-SearchBot by only 6.1%.\u003C\u002Fp>\n\u003Cp>That gap is a stance, not an accident: \u003Cstrong>German site owners don&#39;t want to become training material unasked — but they very much do want to be cited in AI answers.\u003C\u002Fstrong> Blanket-blocking everything throws away visibility in exactly the search systems now displacing classic Google traffic.\u003C\u002Fp>\n\u003Ch2>News sites block hardest\u003C\u002Fh2>\n\u003Cp>Across sectors, publishers block most consistently. In our sample of well-known brands, \u003Cstrong>FAZ, Spiegel and Zeit\u003C\u002Fstrong> each block nine to twelve of the twelve tracked AI crawlers — essentially the whole list. E-commerce and portal sites are far more open. That matches the economics: publishers are negotiating licences with AI companies and block until money flows — while an online shop tends to benefit from a mention in ChatGPT.\u003C\u002Fp>\n\u003Ch2>The covert block: 403 instead of robots.txt\u003C\u002Fh2>\n\u003Cp>robots.txt is a polite request. It&#39;s public, and reputable crawlers respect it — but it can&#39;t be enforced. So some sites block harder: they recognise the bot by its user agent and reject it at the server (HTTP 403), while a normal browser gets the page as usual.\u003C\u002Fp>\n\u003Cp>This kind of block appears in \u003Cstrong>no robots.txt\u003C\u002Fstrong> and in \u003Cstrong>no archive\u003C\u002Fstrong> — you only see it if you knock with the bot&#39;s user agent yourself. That&#39;s exactly what we did for the 200 largest domains: \u003Cstrong>15.5% reject GPTBot server-side, 16.7% ClaudeBot\u003C\u002Fstrong> — on top of the robots.txt blocks. Real blocking is therefore higher than robots.txt alone suggests.\u003C\u002Fp>\n\u003Ch2>What this means for you\u003C\u002Fh2>\n\u003Cp>If you run a company website, this isn&#39;t academic. It helps decide whether AI systems will still find and cite you.\u003C\u002Fp>\n\u003Cul>\n\u003Cli>\u003Cstrong>Blocking everything is rarely the right answer.\u003C\u002Fstrong> The nuanced line professionals take: block training bots (GPTBot, ClaudeBot, Google-Extended, CCBot) if you care about how your content is used — but \u003Cstrong>allow search bots (OAI-SearchBot, PerplexityBot)\u003C\u002Fstrong> so you show up in AI answers.\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Check what your server actually does.\u003C\u002Fstrong> Many blocks arise unnoticed from security or CDN rules. If your host rejects AI crawlers server-side, none of that is in your robots.txt — and you only notice when you&#39;re missing.\u003C\u002Fli>\n\u003Cli>\u003Cstrong>Decide deliberately, not by reflex.\u003C\u002Fstrong> Visibility in AI search is the new SEO channel. Wall it off and you save yourself the training — and lose the customers who&#39;ll ask ChatGPT instead of Google.\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Ch2>How we measured\u003C\u002Fh2>\n\u003Cp>The basis is a \u003Cstrong>fixed set of the 1,000 most-visited German domains\u003C\u002Fstrong> from Google&#39;s public CrUX country list (Chrome usage data). For each domain we read robots.txt daily and check, per crawler, for an explicit block (\u003Ccode>Disallow: \u002F\u003C\u002Fcode>). On 3 July 2026, 891 of the 1,000 domains were reachable. The server-side probe runs weekly for the top-200: we fetch the homepage once with a browser user agent and once with a GPTBot\u002FClaudeBot user agent, and count a 403\u002F429 as a block only when the browser is served normally. All measurements run from a Frankfurt cloud region — we deliberately don&#39;t publish the raw per-domain data, only the aggregates.\u003C\u002Fp>\n\u003Cp>The numbers are a snapshot and change daily. You&#39;ll find the \u003Cstrong>continuously updated status\u003C\u002Fstrong> in the AI-Crawler Blocking Monitor.\u003C\u002Fp>\n",1783110803381]