[2603.11687] SemBench: A Universal Semantic Framework for LLM

[2603.11687] SemBench: A Universal Semantic Framework for LLM Evaluation

arXiv - AI March 27, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.11687: SemBench: A Universal Semantic Framework for LLM Evaluation

Computer Science > Computation and Language arXiv:2603.11687 (cs) [Submitted on 12 Mar 2026 (v1), last revised 26 Mar 2026 (this version, v2)] Title:SemBench: A Universal Semantic Framework for LLM Evaluation Authors:Mikel Zubillaga, Naiara Perez, Oscar Sainz, German Rigau View a PDF of the paper titled SemBench: A Universal Semantic Framework for LLM Evaluation, by Mikel Zubillaga and 3 other authors View PDF HTML (experimental) Abstract:Recent progress in Natural Language Processing (NLP) has been driven by the emergence of Large Language Models (LLMs), which exhibit remarkable generative and reasoning capabilities. However, despite their success, evaluating the true semantic understanding of these models remains a persistent challenge. Traditional benchmarks such as Word-in-Context (WiC) effectively probe this capability, but their creation is resource-intensive and often limited to high-resource languages. In this paper, we introduce SemBench, a framework for automatically generating synthetic benchmarks that assess the semantic competence of LLMs using only dictionary sense definitions and a sentence encoder. This approach eliminates the need for curated example sentences, making it both scalable and language-independent. We evaluate SemBench in three languages (English, Spanish, and Basque) spanning different levels of linguistic resources, and across a wide range of LLMs. Our results show that rankings derived from SemBench strongly correlate with those obtained fro...

Originally published on March 27, 2026. Curated by AI News.

Llms

I Asked ChatGPT 500 Questions. Here Are the Ads I Saw Most Often | WIRED

Ads are rolling out across the US on ChatGPT’s free tier. I asked OpenAI's bot 500 questions to see what these ads were like and how they...

Wired - AI · 9 min · 24 minutes ago

Llms

Abacus.Ai Claw LLM consumes an incredible amount of credit without any usage :(

Three days ago, I clicked the "Deploy OpenClaw In Seconds" button to get an overview of the new service, but I didn't build any automatio...

Reddit - Artificial Intelligence · 1 min · 24 minutes ago

Llms

Google’s Gemini AI app debuts in Hong Kong

Tech giant’s chatbot service tops Apple’s app store chart in the city.

AI Tools & Products · 2 min · about 2 hours ago

Llms

Google Launches Gemini Import Tools to Poach Users From Rival AI Apps

Anyone looking to switch their AI assistant will find it surprisingly easy, as it only takes a few steps to move from A to B. This is not...

AI Tools & Products · 4 min · about 2 hours ago

[2603.11687] SemBench: A Universal Semantic Framework for LLM Evaluation

About this article

Related Articles

I Asked ChatGPT 500 Questions. Here Are the Ads I Saw Most Often | WIRED

Abacus.Ai Claw LLM consumes an incredible amount of credit without any usage :(

Google’s Gemini AI app debuts in Hong Kong

Google Launches Gemini Import Tools to Poach Users From Rival AI Apps

No comments

Stay updated with AI News