Benchmark tests 60 models for susceptibility to Russian propaganda

In detail

Scope: 60 models, 75 questions, three languages, 14 narratives; scoring 1–5 where 1 indicates repeating Russian talking points
Evaluation model: calibrated Claude Opus 4.5; validation by disinformation experts at Propastop
Top performers: Anthropic's Claude models, followed by Nvidia's Nemotron 3 and Alibaba's Qwen 3.6 Plus
Mistral's models, including Medium 3.5, rank in the bottom third; tests ran without web access

Why it matters

The results highlight real differences in models' ability to reject disinformation, which matters for organizations deploying LLMs in public communications, content moderation or intelligence‑adjacent tasks.

For you Check third‑party benchmarks on misinformation resistance for candidate models and run domain‑specific propaganda tests before deploying models in public‑facing or regulatory contexts.

Sources

The Decoder