In detail
- Epoch AI reports Fable 5 at 87% accuracy on tiers 1–3 and 88% on the hardest tier 4 (v2).
- OpenAI’s GPT‑5.5 scores about 75% on the same tier; earlier models were far lower.
- Results use FrontierMath’s standard scaffold with maximum reasoning effort.
Why it matters
Big improvements on hard math benchmarks imply better real‑world reasoning for tasks like modeling, optimization and technical QA — capabilities that matter for advanced automation.
For you Ask vendors for benchmark evidence in domains you care about and validate model reasoning on your own sample problems.