FACTOR: Counterfactual Training-Free Test-Time Adaptation for Open-Vocabulary Object Detection

cs.CV updates on arXiv.org

Kaixiang Zhao, Mao Ye, Lihua Zhou, Hu Wang, Luping Ji, Song Tang, Xiatian Zhu

May 6, 2026, 12:00 AM

arXiv:2605.03294v1 Announce Type: new Abstract: Open-vocabulary object detection often fails under distribution shifts, as it can be misled by spurious correlations between non-causal visual attributes (e.g., brightness, texture) and object categories. Existing test-time adaptation (TTA) methods either depend on costly online optimization or perform global calibration, overlooking the attribute-specific nature of these failures. To address this, we propose FACTOR (counterFACtual training-free Test-time adaptation for Open-vocabulaRy object detection), a lightweight framework grounded in counterfactual reasoning. By perturbing test images along non-causal attributes and comparing region-level predictions between original and counterfactual views, FACTOR quantifies attribute sensitivity, semantic relevance, and prediction variation to selectively suppress attribute-dependent predictions-without parameter updates. Experiments on PASCAL-C, COCO-C, and FoggyCityscapes show that FACTOR consistently outperforms prior TTA methods, demonstrating that explicit counterfactual reasoning effectively improves robustness under distribution shifts.