The big latency benchmarks measure from the US — which says little for users in Germany. We measure several times a day straight from Frankfurt: how fast the major AI endpoints start responding, how much they push through, and how reliable they are.
Time-to-first-token (TTFT) is the metric most sensitive to server distance — which is why it's the focus. Sorted by TTFT; throughput is an approximation (~).
We start recording the daily trend today — the TTFT curve grows from tomorrow. Today you see the current readings.
| Endpoint | TTFT | Throughput | Errors |
|---|---|---|---|
| Mistral Small | 317 ms | ~257.1 | 0% |
| Claude Haiku 4.5 | 565 ms | ~311 | 0% |
| GPT-5 mini | 668 ms | ~124.9 | 12.5% |
| GPT-5 mini | 671 ms | ~133.1 | 12.5% |
| Claude Haiku 4.5 | 795 ms | ~382.9 | 0% |
| Gemini 3 Flash | 1,567 ms | ~588.2 | 0% |
| DeepSeek V4 Flash | 1,976 ms | ~225 | 12.5% |
Honest and reproducible — real measurements from a Lambda in Frankfurt, identical for every endpoint.
A Lambda function in AWS eu-central-1 (Frankfurt) sends identical mini-requests and times them. So the index measures latency the way German users experience it — not from the US.
TTFT = time to the first visible token (the server-distance-sensitive metric). Throughput = tokens/second (an approximation, ~). Error rate = share of failed calls.
Every 8 hours, two measurements per endpoint (the faster counts, to dampen outliers). From the daily values we take the median.
Past latency from an EU location can't be measured after the fact — the head start from day 1 stays. Nobody publishes a from-Germany series like this.
Comparing "direct" and "via AI Gateway" isn't 1:1 — the gateway path has an extra, deliberately labelled hop, but it's a real path many EU developers use. For reasoning models we set minimal reasoning effort so TTFT reflects infrastructure, not thinking time. Values vary with time-of-day load; only the median over days is reliable. The OpenAI EU endpoint (eu.api.openai.com) is missing because our key isn't enabled for it.
Time-to-first-token — the time from the request to the first word of the answer. It shapes an AI's perceived speed the most and is sensitive to server distance. That's why it's our primary metric.
Because location matters: an endpoint in the US is noticeably slower from Germany than one in the EU. Big benchmarks measure from the US and don't reflect the German reality — we measure where your users are.
Past latency from an EU location can't be measured retroactively. Anyone who starts later has missed the past days forever — which is exactly what makes the series valuable.
Those endpoints run through a gateway (a service that routes requests to many models). That's an extra hop — we label it transparently. Direct endpoints (Bedrock Frankfurt, OpenAI) skip that detour.
Our own measurements from AWS eu-central-1 (Frankfurt), several times a day, an identical mini-request per endpoint. TTFT = time to first token; throughput is an approximation. Values vary with load; the median over days is the reliable figure. No warranty; not a substitute for your own load tests.
We bring AI to your infrastructure performantly and compliantly — from endpoint choice to monitoring.