[2602.16061] Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models

[2602.16061] Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models

arXiv - Machine Learning 4 min read Article

Summary

This paper presents a novel framework for partial identification of population quantities under missing data, utilizing weak shadow variables from pretrained models to improve estimation accuracy.

Why It Matters

Understanding how to estimate outcomes accurately in the presence of missing data is crucial for researchers and practitioners in social sciences and machine learning. This study introduces a method that leverages pretrained models to enhance estimation, potentially transforming approaches to data analysis in various fields.

Key Takeaways

  • Introduces a framework for partial identification using weak shadow variables.
  • Demonstrates how pretrained models can tighten estimation bounds effectively.
  • Provides a set-expansion estimator for valid coverage in identified sets.
  • Shows significant reduction in identification intervals using LLM predictions.
  • Addresses challenges of missing not at random (MNAR) data in practical scenarios.

Statistics > Machine Learning arXiv:2602.16061 (stat) [Submitted on 17 Feb 2026] Title:Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models Authors:Hongyu Chen, David Simchi-Levi, Ruoxuan Xiong View a PDF of the paper titled Partial Identification under Missing Data Using Weak Shadow Variables from Pretrained Models, by Hongyu Chen and 2 other authors View PDF HTML (experimental) Abstract:Estimating population quantities such as mean outcomes from user feedback is fundamental to platform evaluation and social science, yet feedback is often missing not at random (MNAR): users with stronger opinions are more likely to respond, so standard estimators are biased and the estimand is not identified without additional assumptions. Existing approaches typically rely on strong parametric assumptions or bespoke auxiliary variables that may be unavailable in practice. In this paper, we develop a partial identification framework in which sharp bounds on the estimand are obtained by solving a pair of linear programs whose constraints encode the observed data structure. This formulation naturally incorporates outcome predictions from pretrained models, including large language models (LLMs), as additional linear constraints that tighten the feasible set. We call these predictions weak shadow variables: they satisfy a conditional independence assumption with respect to missingness but need not meet the completeness conditions required by classical ...

Related Articles

Machine Learning

[R], 31 MILLIONS High frequency data, Light GBM worked perfectly

We just published a paper on predicting adverse selection in high-frequency crypto markets using LightGBM, and I wanted to share it here ...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] Those of you with 10+ years in ML — what is the public completely wrong about?

For those of you who've been in ML/AI research or applied ML for 10+ years — what's the gap between what the public thinks AI is doing vs...

Reddit - Machine Learning · 1 min ·
UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime