Why Retrying Fails: Context Contamination in LLM Agent Pipelines

arXiv

Zhanfu Yang

May 12, 2026, 12:00 AM

arXiv:2605.08563v1 Announce Type: new Abstract: When an LLM agent fails a multi-step tool-augmented task and retries, the failed attempt typically remains in its context window -- contaminating the next attempt and elevating the per-step error rate beyond the base level. This context-contaminated restart phenomenon is widely observed in practice yet entirely lacks formal treatment. We introduce the Context-Contaminated Restart Model (CCRM): a chain of T tool-call steps, each failing with base rate epsilon_0; after any failed attempt, the subsequent attempt operates in contaminated context with elevated error rate epsilon_1 > epsilon_0. Under this model we derive five main results. (R1) An exact closed-form formula for P(succeed in at most K attempts). (R2) A cascade-overhead theorem giving the additional attempts Delta K incurred by contamination versus the clean-restart baseline. (R3) An optimal budget-allocation theorem identifying the pipeline depth T* that maximises success probability for a fixed total budget B=KT; we prove the closed form T* = sqrt(B * log(1/(1-epsilon_1)) / log(1/(1-epsilon_0))), with K*=B/T*. (R4) An information-theoretic lower bound via Le Cam's method showing K_CCRM is tight up to O(1). (R5) A clean-restart dominance theorem quantifying the exact benefit of context-clearing before retry. We validate CCRM on real SWE-bench Verified data: the IID model overestimates pass@3 by 17.4 percentage points (98.6% vs. 81.2%), while CCRM fits with error less than 0.001, implying a cascade ratio of epsilon_1/epsilon_0 = 7.1. Monte Carlo experiments confirm all theoretical predictions.