[2512.00234] OmniFusion: Simultaneous Multilingual Multimodal

[2512.00234] OmniFusion: Simultaneous Multilingual Multimodal Translations via Modular Fusion

arXiv - AI April 02, 2026 4 min read

About this article

Abstract page for arXiv paper 2512.00234: OmniFusion: Simultaneous Multilingual Multimodal Translations via Modular Fusion

Computer Science > Computation and Language arXiv:2512.00234 (cs) [Submitted on 28 Nov 2025 (v1), last revised 1 Apr 2026 (this version, v2)] Title:OmniFusion: Simultaneous Multilingual Multimodal Translations via Modular Fusion Authors:Sai Koneru, Matthias Huck, Jan Niehues View a PDF of the paper titled OmniFusion: Simultaneous Multilingual Multimodal Translations via Modular Fusion, by Sai Koneru and 2 other authors View PDF HTML (experimental) Abstract:There has been significant progress in open-source text-only translation large language models (LLMs) with better language coverage and quality. However, these models can be only used in cascaded pipelines for speech translation (ST), performing automatic speech recognition first followed by translation. This introduces additional latency, which is particularly critical in simultaneous ST (SimulST), and prevents the model from exploiting multimodal context, such as images, which can aid disambiguation. Pretrained multimodal foundation models (MMFMs) already possess strong perception and reasoning capabilities across multiple modalities, but generally lack the multilingual coverage and specialized translation performance of dedicated translation LLMs. To build an effective multimodal translation system, we propose an end-to-end approach that fuses MMFMs with translation LLMs. We introduce a novel fusion strategy that connects hidden states from multiple layers of a pretrained MMFM to a translation LLM, enabling joint end-...

Originally published on April 02, 2026. Curated by AI News.

Llms

[2512.02966] Lumos: Let there be Language Model System Certification

Abstract page for arXiv paper 2512.02966: Lumos: Let there be Language Model System Certification

arXiv - AI · 4 min · about 4 hours ago

Llms

[2602.00750] Bypassing Prompt Injection Detectors through Evasive Injections

Abstract page for arXiv paper 2602.00750: Bypassing Prompt Injection Detectors through Evasive Injections

arXiv - AI · 4 min · about 4 hours ago

Llms

[2511.08225] Benchmarking Educational LLMs with Analytics: A Case Study on Gender Bias in Feedback

Abstract page for arXiv paper 2511.08225: Benchmarking Educational LLMs with Analytics: A Case Study on Gender Bias in Feedback

arXiv - AI · 4 min · about 4 hours ago

Llms

[2511.20224] DuoTok: Source-Aware Dual-Track Tokenization for Multi-Track Music Language Modeling

Abstract page for arXiv paper 2511.20224: DuoTok: Source-Aware Dual-Track Tokenization for Multi-Track Music Language Modeling

arXiv - AI · 3 min · about 4 hours ago

[2512.00234] OmniFusion: Simultaneous Multilingual Multimodal Translations via Modular Fusion

About this article

Related Articles

[2512.02966] Lumos: Let there be Language Model System Certification

[2602.00750] Bypassing Prompt Injection Detectors through Evasive Injections

[2511.08225] Benchmarking Educational LLMs with Analytics: A Case Study on Gender Bias in Feedback

[2511.20224] DuoTok: Source-Aware Dual-Track Tokenization for Multi-Track Music Language Modeling

No comments

Stay updated with AI News