AI Latency Index

How fast is AI from Germany?

The big latency benchmarks measure from the US — which says little for users in Germany. We measure several times a day straight from Frankfurt: how fast the major AI endpoints start responding, how much they push through, and how reliable they are.

Measured from eu-central-1 (Frankfurt)

Time-to-first-token from Frankfurt

Time-to-first-token (TTFT) is the metric most sensitive to server distance — which is why it's the focus. Sorted by TTFT; throughput is an approximation (~).

We start recording the daily trend today — the TTFT curve grows from tomorrow. Today you see the current readings.

EndpointTTFTThroughputErrors
Mistral Smallvia AI GatewayMistral · AI Gateway317 ms~257.10%
Claude Haiku 4.5EU region · directAWS Bedrock · eu-central-1 (Frankfurt)565 ms~3110%
GPT-5 miniUS-global · directOpenAI · api.openai.com (US-global)668 ms~124.912.5%
GPT-5 minivia AI GatewayOpenAI · AI Gateway671 ms~133.112.5%
Claude Haiku 4.5via AI GatewayAnthropic · AI Gateway795 ms~382.90%
Gemini 3 Flashvia AI GatewayGoogle · AI Gateway1,567 ms~588.20%
DeepSeek V4 Flashvia AI GatewayDeepSeek · AI Gateway1,976 ms~22512.5%

How we measure

Honest and reproducible — real measurements from a Lambda in Frankfurt, identical for every endpoint.

  1. 1
    Vantage point: Frankfurt

    A Lambda function in AWS eu-central-1 (Frankfurt) sends identical mini-requests and times them. So the index measures latency the way German users experience it — not from the US.

  2. 2
    TTFT, throughput, errors

    TTFT = time to the first visible token (the server-distance-sensitive metric). Throughput = tokens/second (an approximation, ~). Error rate = share of failed calls.

  3. 3
    Several times a day, median

    Every 8 hours, two measurements per endpoint (the faster counts, to dampen outliers). From the daily values we take the median.

  4. 4
    Forward only

    Past latency from an EU location can't be measured after the fact — the head start from day 1 stays. Nobody publishes a from-Germany series like this.

Comparing "direct" and "via AI Gateway" isn't 1:1 — the gateway path has an extra, deliberately labelled hop, but it's a real path many EU developers use. For reasoning models we set minimal reasoning effort so TTFT reflects infrastructure, not thinking time. Values vary with time-of-day load; only the median over days is reliable. The OpenAI EU endpoint (eu.api.openai.com) is missing because our key isn't enabled for it.

Frequently asked

What is TTFT?

Time-to-first-token — the time from the request to the first word of the answer. It shapes an AI's perceived speed the most and is sensitive to server distance. That's why it's our primary metric.

Why measure from Frankfurt?

Because location matters: an endpoint in the US is noticeably slower from Germany than one in the EU. Big benchmarks measure from the US and don't reflect the German reality — we measure where your users are.

Why can nobody reconstruct this history?

Past latency from an EU location can't be measured retroactively. Anyone who starts later has missed the past days forever — which is exactly what makes the series valuable.

What does "via AI Gateway" mean?

Those endpoints run through a gateway (a service that routes requests to many models). That's an extra hop — we label it transparently. Direct endpoints (Bedrock Frankfurt, OpenAI) skip that detour.

Our own measurements from AWS eu-central-1 (Frankfurt), several times a day, an identical mini-request per endpoint. TTFT = time to first token; throughput is an approximation. Values vary with load; the median over days is the reliable figure. No warranty; not a substitute for your own load tests.

Fast, GDPR-compliant AI connectivity

We bring AI to your infrastructure performantly and compliantly — from endpoint choice to monitoring.