[2602.15814] Avey-B
Summary
The paper 'Avey-B' presents a reformulated architecture for Avey, an autoregressive, attention-free model, demonstrating superior performance over traditional Transformer-based encoders in NLP tasks.
Why It Matters
This research is significant as it introduces a novel approach to NLP model architecture that prioritizes efficiency and performance, addressing the growing need for scalable solutions in the field. The innovations proposed could lead to advancements in various applications, including token classification and information retrieval.
Key Takeaways
- Avey-B reformulates the Avey model for encoder-only applications.
- Introduces innovations like decoupled parameterizations and stability-oriented normalization.
- Outperforms four popular Transformer-based encoders on standard benchmarks.
- Scales efficiently for processing long contexts in NLP tasks.
- Offers a promising alternative to traditional autoregressive models.
Computer Science > Computation and Language arXiv:2602.15814 (cs) [Submitted on 17 Feb 2026] Title:Avey-B Authors:Devang Acharya, Mohammad Hammoud View a PDF of the paper titled Avey-B, by Devang Acharya and Mohammad Hammoud View PDF HTML (experimental) Abstract:Compact pretrained bidirectional encoders remain the backbone of industrial NLP under tight compute and memory budgets. Their effectiveness stems from self-attention's ability to deliver high-quality bidirectional contextualization with sequence-level parallelism, as popularized by BERT-style architectures. Recently, Avey was introduced as an autoregressive, attention-free alternative that naturally admits an encoder-only adaptation. In this paper, we reformulate Avey for the encoder-only paradigm and propose several innovations to its architecture, including decoupled static and dynamic parameterizations, stability-oriented normalization, and neural compression. Results show that this reformulated architecture compares favorably to four widely used Transformer-based encoders, consistently outperforming them on standard token-classification and information-retrieval benchmarks while scaling more efficiently to long contexts. Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI) Cite as: arXiv:2602.15814 [cs.CL] (or arXiv:2602.15814v1 [cs.CL] for this version) https://doi.org/10.48550/arXiv.2602.15814 Focus to learn more arXiv-issued DOI via DataCite (pending registration) Submission histor...