SHARP: A Self-Evolving Human-Auditable Rubric Policy for Financial Trading Agents

cs.LG updates on arXiv.org

Xiwen Chen, Wenhui Zhu, Songzhu Zheng, Kashif Rasul, Yueyue Deng, Huayu Li

May 11, 2026, 12:00 AM

arXiv:2605.06822v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed for autonomous financial trading, a domain requiring continuous adaptation to noisy, non-stationary markets. Existing self-improving agents typically address this through unbounded free-form prompt optimization. However, in low signal-to-noise environments with delayed scalar rewards (P\&L), this unstructured approach exacerbates the fundamental credit assignment problem: optimizers cannot reliably distinguish systematic logic flaws from stochastic market variance, inevitably leading to policy drift. To overcome this bottleneck, we introduce the Self-Evolving Human-Auditable Rubric Policy (SHARP), a neuro-symbolic framework that replaces unconstrained text mutation with structured, symbolic policy optimization. SHARP confines the agent's reasoning to a bounded, human-readable rubric of explicit condition-action rules. When sub-optimal trades occur, an attribution agent employs cross-sample reasoning across multiple samples to isolate specific rule failures. This enables targeted, atomic policy edits that are subsequently regularized through strict walk-forward validation. Evaluated across three diverse equity sectors and four LLM backbones, SHARP consistently transforms generic initial heuristics into highly robust strategies, lifting the empirical performance of compact models by 10 to 20 percentage points on average (e.g., GPT-4o-mini). Ultimately, SHARP demonstrates that LLMs can achieve dynamic and efficient adaptation while significantly enhancing the structural transparency and auditability demanded by institutional finance.