Machine Learning Ai Safety Ai Startups Data Science Ai Agents

[2602.16061] Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models

arXiv - Machine Learning February 19, 2026 4 min read Article

Summary

This paper presents a novel framework for partial identification of population quantities under missing data, utilizing weak shadow variables from pretrained models to improve estimation accuracy.

Why It Matters

Understanding how to estimate outcomes accurately in the presence of missing data is crucial for researchers and practitioners in social sciences and machine learning. This study introduces a method that leverages pretrained models to enhance estimation, potentially transforming approaches to data analysis in various fields.

Key Takeaways

Introduces a framework for partial identification using weak shadow variables.
Demonstrates how pretrained models can tighten estimation bounds effectively.
Provides a set-expansion estimator for valid coverage in identified sets.
Shows significant reduction in identification intervals using LLM predictions.
Addresses challenges of missing not at random (MNAR) data in practical scenarios.

Statistics > Machine Learning arXiv:2602.16061 (stat) [Submitted on 17 Feb 2026] Title:Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models Authors:Hongyu Chen, David Simchi-Levi, Ruoxuan Xiong View a PDF of the paper titled Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models, by Hongyu Chen and 2 other authors View PDF HTML (experimental) Abstract:Estimating population quantities such as mean outcomes from user feedback is fundamental to platform evaluation and social science, yet feedback is often missing not at random (MNAR): users with stronger opinions are more likely to respond, so standard estimators are biased and the estimand is not identified without additional assumptions. Existing approaches typically rely on strong parametric assumptions or bespoke auxiliary variables that may be unavailable in practice. In this paper, we develop a partial identification framework in which sharp bounds on the estimand are obtained by solving a pair of linear programs whose constraints encode the observed data structure. This formulation naturally incorporates outcome predictions from pretrained models, including large language models (LLMs), as additional linear constraints that tighten the feasible set. We call these predictions weak shadow variables: they satisfy a conditional independence assumption with respect to missingness but need not meet the completeness conditions required by classical ...

Read Original Article

[2602.16061] Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models

Summary

Why It Matters

Key Takeaways

Related Articles

[R], 31 MILLIONS High frequency data, Light GBM worked perfectly

[D] Those of you with 10+ years in ML — what is the public completely wrong about?

UMKC Announces New Master of Science in Artificial Intelligence

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

No comments

Stay updated with AI News