Llms Machine Learning Ai Infrastructure Generative Ai Nlp

[2408.00539] Intermittent Semi-Working Mask: A New Masking Paradigm for LLMs

arXiv - AI February 18, 2026 4 min read Article

Summary

The paper introduces the Intermittent Semi-Working Mask (ISM), a novel masking paradigm for Large Language Models (LLMs) that enhances multi-turn dialogue and context-intensive tasks while maintaining efficiency.

Why It Matters

As LLMs face challenges in handling long histories in dialogues, ISM offers a solution that balances contextual understanding with inference efficiency. This innovation could significantly improve the performance of LLMs in real-world applications, making them more effective for complex tasks.

Key Takeaways

ISM integrates sparse bidirectional attention into causal LLMs.
It eliminates the need for triplet expansion during training.
The approach maintains KV-cache reuse, reducing latency.
ISM outperforms traditional causal baselines in multi-turn dialogues.
The method is architecture-agnostic and adds minimal latency.

Computer Science > Computation and Language arXiv:2408.00539 (cs) [Submitted on 1 Aug 2024 (v1), last revised 17 Feb 2026 (this version, v2)] Title:Intermittent Semi-Working Mask: A New Masking Paradigm for LLMs Authors:HaoYuan Hu, Mingcong Lu, Di Luo, XinYa Wu, Jiangcai Zhu, Taoye Yin, Zheng Li, Hao Wang, Shusheng Zhang, KeZun Zhang, KaiLai Shao, Chao Chen, Feng Wang View a PDF of the paper titled Intermittent Semi-Working Mask: A New Masking Paradigm for LLMs, by HaoYuan Hu and 12 other authors View PDF HTML (experimental) Abstract:Multi-turn dialogues and context-intensive tasks challenge Large Language Models (LLMs) to integrate long histories without sacrificing generation quality. Although prefix LLMs can better exploit historical context via bidirectional attention on prefix tokens, they are rarely used in practice because multi-turn training requires many duplicated triplets, and its bidirectional prefix prevents KV-cache reuse at inference time, driving up high cost and latency. To retain the contextual understanding of prefix mask while preserving the inference-time efficiency of causal mask, we introduce Intermittent Semi-working Mask (ISM), a masking scheme that injects sparse bidirectional attention into the causal backbone. ISM alternates bidirectional attention over query segments with unidirectional attention over answer segments, enabling the synthesis of in-context while preserving global causality. This design eliminates triplet expansion during training...

Read Original Article

[2408.00539] Intermittent Semi-Working Mask: A New Masking Paradigm for LLMs

Summary

Why It Matters

Key Takeaways

Related Articles

OpenClaw security checklist: practical safeguards for AI agents

I let Gemini in Google Maps plan my day and it went surprisingly well | The Verge

The person who replaces you probably won't be AI. It'll be someone from the next department over who learned to use it - opinion/discussion

Block Resets Management With AI As Cash App Adds Installment Transfers

No comments

Stay updated with AI News