Characterizing the Generalization Error of Random Feature Regression with Arbitrary Data-Augmentation

stat.ML updates on arXiv.org

Lucas Morisset, Alain Durmus, Adrien Hardy

May 12, 2026, 12:00 AM

arXiv:2605.10290v1 Announce Type: new Abstract: This paper aims at analyzing the regularization effect that data augmentation induces on supervised regression methods in the proportional regime, where the number of covariates grows proportionally to the number of samples. We provide a tight characterization of the test error, measured in mean squared error, in terms only of the population quantities of the true data, as well as first and second order statistics of the augmentation scheme. Our results are valid under misspecified feature maps, and for any network architecture where only the last readout layer is trained, and the rest of the network is either frozen or randomly initialized. We specify our results in the case of Gaussian data, and show that our asymptotic characterization is tight in this setting.