TRUST: Trajectory-guided State-Space Temporal Test-Time Adaptation
Published in ICLR, Catch, Adapt, and Operate: Monitoring ML Models Under Drift Workshop
Conferences
Vision-language models (VLMs) enable text-conditioned object detection, but their performance degrades under temporally evolving distribution shifts. We propose TRUST (TRajectory-gUided State-space Temporal test-time adaptation), a backpropagation-free Bayesian framework for video object detection that treats adaptation as temporal smoothness over a global cache capturing gradual distribution shift and an instance-level state-space filtering guided by object trajectories tracking. Our method maintains a global cache state that contains prototype vision embeddings and scale statistics. The instance-level state captures object dynamics through a Kalman-style trajectory tracking that leverages an embedding smoothing over the tracks. The resulting algorithm is backpropagation-free and works without online gradients. We evaluate on the SHIFT dataset, which provides videos with continuous intra-sequence gradual shifts.
Cited as F. Dadboud, H. Azad, M. Bolic, I. Mantegh, “TRUST: Trajectory-guided State-Space Temporal Test-Time Adaptation,” Catch, Adapt, and Operate: Monitoring ML Models Under Drift Workshop at ICLR 2026.
Download Paper