ModelsResearch

Claude Fable 5 tops FrontierMath on hardest problems

Anthropic’s Claude Fable 5 posts leading scores on the FrontierMath benchmark, outperforming GPT‑5.5 by a notable margin.

In detail

  • Epoch AI reports Fable 5 at 87% accuracy on tiers 1–3 and 88% on the hardest tier 4 (v2).
  • OpenAI’s GPT‑5.5 scores about 75% on the same tier; earlier models were far lower.
  • Results use FrontierMath’s standard scaffold with maximum reasoning effort.

Why it matters

Big improvements on hard math benchmarks imply better real‑world reasoning for tasks like modeling, optimization and technical QA — capabilities that matter for advanced automation.

For you Ask vendors for benchmark evidence in domains you care about and validate model reasoning on your own sample problems.

← All news

Summaries are generated automatically and link to the original source.