MosaicLeaks: research agents can leak company secrets via harmless web queries — RL fix reduces leakage

In detail

MosaicLeaks creates a multi‑hop task mixing public and private information to evaluate leakage via query logs
Agents frequently leak private information; optimizing solely for task performance increases leakage
PA‑DR raises strict chain success from 48.7% to 58.7% and reduces answer/full‑information leakage from 34.0% to 9.9%

Why it matters

Businesses using agents that combine internal docs with external tools face a real risk that observers can reconstruct secrets from outbound queries; the paper’s mitigation offers a practical training approach to reduce that risk.

For you Actionable takeaway: Audit agent query outputs for leakage, restrict external queries when agents access sensitive docs, and consider privacy‑aware training or filtering before production use.

Sources

Hugging Face