Claude Fable 5 tops FrontierMath on hardest problems

In detail

Epoch AI reports Fable 5 at 87% accuracy on tiers 1–3 and 88% on the hardest tier 4 (v2).
OpenAI’s GPT‑5.5 scores about 75% on the same tier; earlier models were far lower.
Results use FrontierMath’s standard scaffold with maximum reasoning effort.

Why it matters

Big improvements on hard math benchmarks imply better real‑world reasoning for tasks like modeling, optimization and technical QA — capabilities that matter for advanced automation.

For you Ask vendors for benchmark evidence in domains you care about and validate model reasoning on your own sample problems.

Sources

The Decoder
TechCrunch