[2602.18487] The Million-Label NER: Breaking Scale Barriers with GLiNER bi-encoder
Summary
The paper presents GLiNER-bi-Encoder, a new architecture for Named Entity Recognition (NER) that enhances efficiency and scalability, enabling recognition of millions of entity types with improved performance.
Why It Matters
This research addresses the significant challenge of scaling NER systems to handle vast numbers of entity types efficiently. By introducing a bi-encoder architecture, it promises to improve processing speed and accuracy, which is crucial for applications in natural language processing and AI-driven data analysis.
Key Takeaways
- GLiNER-bi-Encoder decouples label and context encoding for efficiency.
- Achieves state-of-the-art zero-shot performance with a Micro-F1 score of 61.5%.
- Offers up to 130 times throughput improvement over previous models.
- Facilitates recognition of thousands to millions of entity types.
- Introduces GLiNKER for high-performance entity linking across large knowledge bases.
Computer Science > Computation and Language arXiv:2602.18487 (cs) [Submitted on 11 Feb 2026] Title:The Million-Label NER: Breaking Scale Barriers with GLiNER bi-encoder Authors:Ihor Stepanov, Mykhailo Shtopko, Dmytro Vodianytskyi, Oleksandr Lukashov View a PDF of the paper titled The Million-Label NER: Breaking Scale Barriers with GLiNER bi-encoder, by Ihor Stepanov and 3 other authors View PDF HTML (experimental) Abstract:This paper introduces GLiNER-bi-Encoder, a novel architecture for Named Entity Recognition (NER) that harmonizes zero-shot flexibility with industrial-scale efficiency. While the original GLiNER framework offers strong generalization, its joint-encoding approach suffers from quadratic complexity as the number of entity labels increases. Our proposed bi-encoder design decouples the process into a dedicated label encoder and a context encoder, effectively removing the context-window bottleneck. This architecture enables the simultaneous recognition of thousands, and potentially millions, of entity types with minimal overhead. Experimental results demonstrate state-of-the-art zero-shot performance, achieving 61.5 percent Micro-F1 on the CrossNER benchmark. Crucially, by leveraging pre-computed label embeddings, GLiNER-bi-Encoder achieves up to a 130 times throughput improvement at 1024 labels compared to its uni-encoder predecessors. Furthermore, we introduce GLiNKER, a modular framework that leverages this architecture for high-performance entity linking a...