[2604.01554] EXHIB: A Benchmark for Realistic and Diverse Evaluation of Function Similarity in the Wild

[2604.01554] EXHIB: A Benchmark for Realistic and Diverse Evaluation of Function Similarity in the Wild

arXiv - Machine Learning 4 min read

About this article

Abstract page for arXiv paper 2604.01554: EXHIB: A Benchmark for Realistic and Diverse Evaluation of Function Similarity in the Wild

Computer Science > Cryptography and Security arXiv:2604.01554 (cs) [Submitted on 2 Apr 2026] Title:EXHIB: A Benchmark for Realistic and Diverse Evaluation of Function Similarity in the Wild Authors:Yiming Fan (1), Jun Yeon Won (1), Ding Zhu (1), Melih Sirlanci (1), Mahdi Khalili (1), Carter Yagemann (1) ((1) The Ohio State University) View a PDF of the paper titled EXHIB: A Benchmark for Realistic and Diverse Evaluation of Function Similarity in the Wild, by Yiming Fan (1) and 5 other authors View PDF HTML (experimental) Abstract:Binary Function Similarity Detection (BFSD) is a core problem in software security, supporting tasks such as vulnerability analysis, malware classification, and patch provenance. In the past few decades, numerous models and tools have been developed for this application; however, due to the lack of a comprehensive universal benchmark in this field, researchers have struggled to compare different models effectively. Existing datasets are limited in scope, often focusing on a narrow set of transformations or types of binaries, and fail to reflect the full diversity of real-world applications. We introduce EXHIB, a benchmark comprising five realistic datasets collected from the wild, each highlighting a distinct aspect of the BFSD problem space. We evaluate 9 representative models spanning multiple BFSD paradigms on EXHIB and observe performance degradations of up to 30% on firmware and semantic datasets compared to standard settings, revealing subst...

Originally published on April 03, 2026. Curated by AI News.

Related Articles

Machine Learning

HydraLM: 22× faster decoding and 16× smaller state memory in long-context inference experiments [P]

I’ve been experimenting with HydraLM, a long-context model for inference, and the numbers are getting a bit wild: the repo’s benchmark su...

Reddit - Machine Learning · 1 min ·
Machine Learning

How to know if a research-oriented role is for you? [D]

I’m currently a first-year Master’s student in Data Science & AI, and I’m trying to figure out whether a research-oriented career is ...

Reddit - Machine Learning · 1 min ·
Machine Learning

GPU Compass – open-source, real-time GPU pricing across 20+ clouds [P]

We maintain an open-source catalog of cloud GPU offerings (skypilot-catalog, Apache 2.0). It auto-fetches pricing from 20+ cloud APIs eve...

Reddit - Machine Learning · 1 min ·
5 AI Models Tried to Scam Me. Some of Them Were Scary Good | WIRED
Machine Learning

5 AI Models Tried to Scam Me. Some of Them Were Scary Good | WIRED

The cyber capabilities of AI models have experts rattled. AI’s social skills may be just as dangerous.

Wired - AI · 8 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime