Lost in Conversation¶

Laban et al. "Lost in Conversation: Long-Context Unreliability in LLMs." arXiv:2505.06120, 2025.

Key findings used in wiki¶

The paper compares 200,000+ simulated conversations across six generation tasks and 15 LLMs in single-turn versus multi-turn settings.
Multi-turn, underspecified conversations perform substantially worse than single-turn, fully specified ones; models often make early wrong assumptions and then fail to recover.
The analysis separates degradation into loss of aptitude and rising unreliability, which makes the paper especially useful for benchmark methodology rather than only for headline performance claims.
It supports InvisibleBench's decision to test conversation arcs where need and context are revealed over time. Long context alone does not solve the problem.