[2603.00063] Measuring What AI Systems Might Do: Towards A Measurement

[2603.00063] Measuring What AI Systems Might Do: Towards A Measurement Science in AI

arXiv - Machine Learning March 03, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.00063: Measuring What AI Systems Might Do: Towards A Measurement Science in AI

Computer Science > Computers and Society arXiv:2603.00063 (cs) [Submitted on 10 Feb 2026] Title:Measuring What AI Systems Might Do: Towards A Measurement Science in AI Authors:Konstantinos Voudouris, Mirko Thalmann, Alex Kipnis, José Hernández-Orallo, Eric Schulz View a PDF of the paper titled Measuring What AI Systems Might Do: Towards A Measurement Science in AI, by Konstantinos Voudouris and Mirko Thalmann and Alex Kipnis and Jos\'e Hern\'andez-Orallo and Eric Schulz View PDF HTML (experimental) Abstract:Scientists, policy-makers, business leaders, and members of the public care about what modern artificial intelligence systems are disposed to do. Yet terms such as capabilities, propensities, skills, values, and abilities are routinely used interchangeably and conflated with observable performance, with AI evaluation practices rarely specifying what quantity they purport to measure. We argue that capabilities and propensities are dispositional properties - stable features of systems characterised by counterfactual relationships between contextual conditions and behavioural outputs. Measuring a disposition requires (i) hypothesising which contextual properties are causally relevant, (ii) independently operationalising and measuring those properties, and (iii) empirically mapping how variation in those properties affects the probability of the behaviour. Dominant approaches to AI evaluation, from benchmark averages to data-driven latent-variable models such as Item Respon...

Originally published on March 03, 2026. Curated by AI News.

Ai Startups

Anyone else following the drama behind the TurboQuant paper?

A few hours ago, the first author of a paper that played a significant role in the TQ paper posted about some ongoing issues: In May 2025...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

Llms

There are more AI health tools than ever—but how well do they work? | MIT Technology Review

Earlier this month, Microsoft launched Copilot Health, a new space within its Copilot app where users will be able to connect their medic...

MIT Technology Review · 11 min · about 9 hours ago

Ai Infrastructure

ScaleOps raises $130M to improve computing efficiency amid AI demand | TechCrunch

ScaleOps just raised $130M to tackle GPU shortages and soaring AI cloud costs by automating infrastructure in real time.

TechCrunch - AI · 5 min · about 10 hours ago

Machine Learning

AI chip startup Rebellions raises $400 million at $2.3B valuation in pre-IPO round | TechCrunch

The startup, which is planning to go public later this year, designs chips specifically for AI inference, another challenger to Nvidia's ...

TechCrunch - AI · 4 min · about 11 hours ago

[2603.00063] Measuring What AI Systems Might Do: Towards A Measurement Science in AI

About this article

Related Articles

Anyone else following the drama behind the TurboQuant paper?

There are more AI health tools than ever—but how well do they work? | MIT Technology Review

ScaleOps raises $130M to improve computing efficiency amid AI demand | TechCrunch

AI chip startup Rebellions raises $400 million at $2.3B valuation in pre-IPO round | TechCrunch

No comments

Stay updated with AI News