In detail
- Context window: 1,000,000 tokens; MIT license
- FrontierSWE: 74.4%, about one point behind Anthropic Opus 4.8 and slightly ahead of OpenAI GPT‑5.5
- Terminal‑Bench 2.1: score rises from 63.5 (GLM‑5.1) to 81; SWE‑Marathon still trails Opus 4.8 by a wide margin
- Users can adjust 'thinking effort'; highest 'Max' mode uses extra compute for hardest problems
Why it matters
Reliable ultra‑long context matters for multi‑hour coding, research automation and agentic workflows; an open, well‑performing model widens options for companies wanting to run and customize models in‑house.
For you If you run long, agentic engineering tasks, pilot GLM‑5.2 to compare accuracy and cost against closed models and verify stability on your codebases.