Falcon Perception

Hugging Face Blog April 01, 2026 15 min read

About this article

A Blog post by Technology Innovation Institute on Hugging Face

Back to Articles Falcon Perception Team Article Published April 1, 2026 Upvote 3 FalconPerception FalconPerception Follow tiiuae TL;DR — Falcon Perception is a 0.6B-parameter early-fusion Transformer for open-vocabulary grounding and segmentation from natural language prompts. The model processes image patches + text in one sequence using a hybrid attention mask, and produces variable numbers of instances with a small, structured token interface and lightweight output heads. On SA-Co, Falcon Perception reaches 68.0 Macro-F1 (vs. 62.3 for SAM 3) with the main remaining gap being presence calibration (MCC 0.64 vs. 0.82). We also introduce PBench, a diagnostic benchmark that breaks down performance by capability (attributes, OCR-guided disambiguation, spatial constraints, relations) and by dense long-context crowded scenes. We also relase Falcon OCR, a 0.3B-parameter model which reaches a score of 80.3 and 88.6 on the olmOCR benchmark and OmniDocBench respectively, while having the highest throughput of any open source OCR model. This post is a brief, practical write-up of what we built, why we built it this way, and what we learned along the way. The problem: why do perception systems end up as pipelines? Many open-vocabulary perception systems are built as modular pipelines: a (often frozen) vision backbone extracts features, a separate fusion/decoder stage combines them with language, and additional components handle matching and post-processing. This family...

Originally published on April 01, 2026. Curated by AI News.

Llms

[2604.00021] How Do Language Models Process Ethical Instructions? Deliberation, Consistency, and Other-Recognition Across Four Models

Abstract page for arXiv paper 2604.00021: How Do Language Models Process Ethical Instructions? Deliberation, Consistency, and Other-Recog...

arXiv - AI · 4 min · about 5 hours ago

Llms

[2604.01106] Inverse Design of Optical Multilayer Thin Films using Robust Masked Diffusion Models

Abstract page for arXiv paper 2604.01106: Inverse Design of Optical Multilayer Thin Films using Robust Masked Diffusion Models

arXiv - Machine Learning · 3 min · about 6 hours ago

Open Source Ai

Holo3: Breaking the Computer Use Frontier

A Blog post by H company on Hugging Face

Hugging Face Blog · 4 min · about 18 hours ago

Llms

[2407.03004] SemioLLM: Evaluating Large Language Models for Diagnostic Reasoning from Unstructured Clinical Narratives in Epilepsy

Abstract page for arXiv paper 2407.03004: SemioLLM: Evaluating Large Language Models for Diagnostic Reasoning from Unstructured Clinical ...

arXiv - AI · 4 min · 1 day ago

Falcon Perception

About this article

Related Articles

[2604.00021] How Do Language Models Process Ethical Instructions? Deliberation, Consistency, and Other-Recognition Across Four Models

[2604.01106] Inverse Design of Optical Multilayer Thin Films using Robust Masked Diffusion Models

Holo3: Breaking the Computer Use Frontier

[2407.03004] SemioLLM: Evaluating Large Language Models for Diagnostic Reasoning from Unstructured Clinical Narratives in Epilepsy

No comments

Stay updated with AI News