Slow Drift of Support¶

Cheng, M. et al. "Slow Drift of Support: How Mental Health Chatbots Fail Over Long Conversations." arXiv:2601.14269, 2026.

Key findings used in wiki¶

The paper stress-tests mental-health-style conversations across 50 virtual patient profiles, up to 20 turns, and two pressure patterns: static progression and adaptive probing.
Boundary violations are common across tested models, and adaptive probing accelerates failure: the average turn to breach falls from 9.21 to 4.64.
The most important failures are not just obvious prohibited content. They include zero-risk promises, professional-role substitution, dependency cues, and harmful belief validation.
The paper is strong support for evaluating gradual boundary erosion over time, not only single-turn prohibited-content filters.