Nlp Machine Learning Data Science

[2602.18487] The Million-Label NER: Breaking Scale Barriers with GLiNER bi-encoder

arXiv - Machine Learning February 24, 2026 3 min read Article

Summary

The paper presents GLiNER-bi-Encoder, a new architecture for Named Entity Recognition (NER) that enhances efficiency and scalability, enabling recognition of millions of entity types with improved performance.

Why It Matters

This research addresses the significant challenge of scaling NER systems to handle vast numbers of entity types efficiently. By introducing a bi-encoder architecture, it promises to improve processing speed and accuracy, which is crucial for applications in natural language processing and AI-driven data analysis.

Key Takeaways

GLiNER-bi-Encoder decouples label and context encoding for efficiency.
Achieves state-of-the-art zero-shot performance with a Micro-F1 score of 61.5%.
Offers up to 130 times throughput improvement over previous models.
Facilitates recognition of thousands to millions of entity types.
Introduces GLiNKER for high-performance entity linking across large knowledge bases.

Computer Science > Computation and Language arXiv:2602.18487 (cs) [Submitted on 11 Feb 2026] Title:The Million-Label NER: Breaking Scale Barriers with GLiNER bi-encoder Authors:Ihor Stepanov, Mykhailo Shtopko, Dmytro Vodianytskyi, Oleksandr Lukashov View a PDF of the paper titled The Million-Label NER: Breaking Scale Barriers with GLiNER bi-encoder, by Ihor Stepanov and 3 other authors View PDF HTML (experimental) Abstract:This paper introduces GLiNER-bi-Encoder, a novel architecture for Named Entity Recognition (NER) that harmonizes zero-shot flexibility with industrial-scale efficiency. While the original GLiNER framework offers strong generalization, its joint-encoding approach suffers from quadratic complexity as the number of entity labels increases. Our proposed bi-encoder design decouples the process into a dedicated label encoder and a context encoder, effectively removing the context-window bottleneck. This architecture enables the simultaneous recognition of thousands, and potentially millions, of entity types with minimal overhead. Experimental results demonstrate state-of-the-art zero-shot performance, achieving 61.5 percent Micro-F1 on the CrossNER benchmark. Crucially, by leveraging pre-computed label embeddings, GLiNER-bi-Encoder achieves up to a 130 times throughput improvement at 1024 labels compared to its uni-encoder predecessors. Furthermore, we introduce GLiNKER, a modular framework that leverages this architecture for high-performance entity linking a...

Read Original Article

[2602.18487] The Million-Label NER: Breaking Scale Barriers with GLiNER bi-encoder

Summary

Why It Matters

Key Takeaways

Related Articles

Anyone else feel like AI security is being figured out in production right now?

[D] ICML 2026 Average Score

Apple’s best product in its first 50 years | The Verge

[D] Is lossy compression acceptable for conversational agent memory? Every system today uses knowledge graph triples — here's why I think that's wrong.

No comments

Stay updated with AI News