[2602.18528] Audio-Visual Continual Test-Time Adaptation without Forgetting

[2602.18528] Audio-Visual Continual Test-Time Adaptation without Forgetting

arXiv - Machine Learning 4 min read Article

Summary

The paper presents a novel method, AV-CTTA, for audio-visual continual test-time adaptation that minimizes catastrophic forgetting while improving model performance across non-stationary domains.

Why It Matters

This research addresses a significant challenge in machine learning where models struggle to adapt to changing data distributions without losing previously learned information. The proposed method enhances model robustness and efficiency, making it relevant for applications in dynamic environments.

Key Takeaways

  • AV-CTTA adapts audio-visual models at test-time without catastrophic forgetting.
  • The method focuses on optimizing the modality fusion layer for improved performance.
  • Dynamic parameter retrieval enhances adaptability to new data distributions.
  • Extensive experiments demonstrate significant performance improvements over existing methods.
  • The approach is applicable to both unimodal and bimodal corruptions.

Computer Science > Machine Learning arXiv:2602.18528 (cs) [Submitted on 20 Feb 2026] Title:Audio-Visual Continual Test-Time Adaptation without Forgetting Authors:Sarthak Kumar Maharana, Akshay Mehra, Bhavya Ramakrishna, Yunhui Guo, Guan-Ming Su View a PDF of the paper titled Audio-Visual Continual Test-Time Adaptation without Forgetting, by Sarthak Kumar Maharana and 4 other authors View PDF HTML (experimental) Abstract:Audio-visual continual test-time adaptation involves continually adapting a source audio-visual model at test-time, to unlabeled non-stationary domains, where either or both modalities can be distributionally shifted, which hampers online cross-modal learning and eventually leads to poor accuracy. While previous works have tackled this problem, we find that SOTA methods suffer from catastrophic forgetting, where the model's performance drops well below the source model due to continual parameter updates at test-time. In this work, we first show that adapting only the modality fusion layer to a target domain not only improves performance on that domain but can also enhance performance on subsequent domains. Based on this strong cross-task transferability of the fusion layer's parameters, we propose a method, $\texttt{AV-CTTA}$, that improves test-time performance of the models without access to any source data. Our approach works by using a selective parameter retrieval mechanism that dynamically retrieves the best fusion layer parameters from a buffer using...

Related Articles

Machine Learning

Making an AI native sovereign computational stack

I’ve been working on a personal project that ended up becoming a kind of full computing stack: identity / trust protocol decentralized ch...

Reddit - Artificial Intelligence · 1 min ·
Llms

An attack class that passes every current LLM filter - no payload, no injection signature, no log trace

https://shapingrooms.com/research I published a paper today on something I've been calling postural manipulation. The short version: ordi...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

What tools are sr MLEs using? (clawdbot, openspec, wispr) [D]

I'm already blasting cursor, but I want to level up my output. I heard that these kind of AI tools and workflows are being asked in SF. W...

Reddit - Machine Learning · 1 min ·
Machine Learning

[R] looking for academic collaborators

hey there, i am currently working with a research group at auckland university. we are currently working on neurodegenerative diseases - ...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime