Matched-Learning-Rate Analysis of Attention Drift and Transfer Retention in Fine-Tuned CLIP

cs.LG updates on arXiv.org

Ruize Xia

Apr 21, 2026, 12:00 AM

arXiv:2604.16410v1 Announce Type: new Abstract: CLIP adaptation can improve in-domain accuracy while degrading out-of-domain transfer, but comparisons between Full Fine-Tuning (Full FT) and LoRA are often confounded by different learning-rate conventions. We study how adaptation method and optimization scale jointly shape attention drift and transfer retention in CLIP using a controlled matched-learning-rate comparison of Full FT and LoRA. The completed matrix contains 80 runs on CLIP ViT-B/32 across EuroSAT and Oxford-IIIT Pets, spanning four shared learning rates ($10^{-6}$, $5{\times}10^{-6}$, $10^{-5}$, $5{\times}10^{-5}$) and five seeds, and evaluates attention-drift metrics, best validation accuracy, and adapter-aware CIFAR-100 zero-shot accuracy. Learning rate strongly modulates structural change: on EuroSAT, Full FT moves from mild entropy broadening at $10^{-6}$ to marked contraction at $5{\times}10^{-5}$, whereas LoRA remains entropy-positive across the full matched grid. At matched learning rates, LoRA preserves substantially more zero-shot transfer than Full FT, averaging $45.13\%$ versus $11.28\%$ CIFAR-100 accuracy on EuroSAT and $58.01\%$ versus $8.54\%$ on Pets. Oxford-IIIT Pets also reveals a regime effect: low-learning-rate LoRA underfits in-domain, so method-only averages can obscure when LoRA becomes competitive. Supporting rollout, patch-to-patch, and CKA analyses are directionally consistent with the controlled matrix. Overall, matched-learning-rate evaluation materially changes the interpretation of Full FT versus LoRA, and attention drift is most useful as a descriptive diagnostic of representation preservation rather than a causal explanation of transfer behavior.