In detail
- 103 tasks run three times: solved rate GLM 66% vs Opus 67%
- First‑attempt accuracy: Opus 53.7% vs GLM 47.6%
- GLM averaged 99 runs per task and used 860M tokens vs Opus's 80 runs and 439M tokens
- Pricing (from Zhipu and comparative sheets): GLM $1.40/M input, $4.40/M output; Opus $5/M input, $25/M output
Why it matters
Lower per‑token pricing from models like GLM‑5.2 can disrupt economics for coding use cases, but higher token consumption and lower first‑pass correctness affect latency and operational cost.
For you Benchmark alternative models on your actual dev tasks and include token usage and first‑attempt success in cost and SLA calculations before switching providers.