Additive Atomic Forests for Symbolic Function and Antiderivative Discovery
arXiv:2605.08130v1 Announce Type: new Abstract: We present a framework for the simultaneous symbolic recovery of a function and its antiderivative from data. The framework rests on three ideas. First, a derivative algebra: the observation that the product rule $\frac{d}{dx}[f \cdot g] = f'g + fg'$ and the chain rule, applied to a seed set of elementary functions, generate a self-expanding system of function-derivative pairs -- a living library that grows each time a new function is discovered. Second, two complementary primitives -- EML$\,(e^u - \ln v)$, which is theoretically complete for all elementary functions, and SOL$\,(\sin u - \cos v)$, introduced here, which makes trigonometric atoms available at depth~1 instead of depth~$\sim$8 -- that seed the library with core atoms cheaply. Third, additive atomic forests: finite sums of primitive trees, optionally composed via multiplicative nodes, whose derivatives are fitted to data by continuous optimisation or by exhaustive search over the library. Because differentiation of each atom is determined by construction, the forest simultaneously encodes a symbolic expression $F$ and its derivative $F'$; no symbolic integration step is required. The library is not a fixed object: it self-constructs from a small seed set by recursive application of the product rule, chain rule, and the two primitives, and it can grow as newly discovered functions are folded back in. The larger the library, the richer the expressible class of candidate functions. We give conditional completeness, additive-depth, and analytic simultaneous-recovery results for the framework. Empirically, in our reported runs on 17 classification benchmarks, sparse atom combinations match or exceed XGBoost on 13 datasets while producing interpretable formulas.
