[2602.20698] High-Dimensional Robust Mean Estimation with Untrusted Batches

[2602.20698] High-Dimensional Robust Mean Estimation with Untrusted Batches

arXiv - Machine Learning 4 min read Article

Summary

This paper presents algorithms for high-dimensional mean estimation in collaborative settings where data may come from untrusted sources, addressing challenges posed by adversarial users.

Why It Matters

As data increasingly comes from diverse and potentially malicious sources, understanding robust mean estimation is crucial for ensuring accuracy in machine learning applications. This research provides insights into handling adversarial data, which is vital for developing resilient AI systems.

Key Takeaways

  • The study introduces a double corruption model for mean estimation involving adversarial and heterogeneous data sources.
  • Two Sum-of-Squares based algorithms are proposed to address the challenges of high-dimensional data corruption.
  • The algorithms achieve a minimax-optimal error rate, highlighting the balance between adversarial influence and statistical heterogeneity.
  • The research emphasizes the importance of batch structure in mitigating the impact of adversarial users.
  • Findings are relevant for applications in AI where data integrity is critical, such as in finance and healthcare.

Computer Science > Machine Learning arXiv:2602.20698 (cs) [Submitted on 24 Feb 2026] Title:High-Dimensional Robust Mean Estimation with Untrusted Batches Authors:Maryam Aliakbarpour, Vladimir Braverman, Yuhan Liu, Junze Yin View a PDF of the paper titled High-Dimensional Robust Mean Estimation with Untrusted Batches, by Maryam Aliakbarpour and 3 other authors View PDF Abstract:We study high-dimensional mean estimation in a collaborative setting where data is contributed by $N$ users in batches of size $n$. In this environment, a learner seeks to recover the mean $\mu$ of a true distribution $P$ from a collection of sources that are both statistically heterogeneous and potentially malicious. We formalize this challenge through a double corruption landscape: an $\varepsilon$-fraction of users are entirely adversarial, while the remaining ``good'' users provide data from distributions that are related to $P$, but deviate by a proximity parameter $\alpha$. Unlike existing work on the untrusted batch model, which typically measures this deviation via total variation distance in discrete settings, we address the continuous, high-dimensional regime under two natural variants for deviation: (1) good batches are drawn from distributions with a mean-shift of $\sqrt{\alpha}$, or (2) an $\alpha$-fraction of samples within each good batch are adversarially corrupted. In particular, the second model presents significant new challenges: in high dimensions, unlike discrete settings, even ...

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Machine Learning

AI assistants are optimized to seem helpful. That is not the same thing as being helpful.

RLHF trains models on human feedback. Humans rate responses they like. And it turns out humans consistently rate confident, fluent, agree...

Reddit - Artificial Intelligence · 1 min ·
Llms

wtf bro did what? arc 3 2026

The Physarum Explorer is a high-speed, bio-inspired neural model designed specifically for ARC geometry. Here is the snapshot of its curr...

Reddit - Artificial Intelligence · 1 min ·
Meta Pauses Work With Mercor After Data Breach Puts AI Industry Secrets at Risk | WIRED
Machine Learning

Meta Pauses Work With Mercor After Data Breach Puts AI Industry Secrets at Risk | WIRED

Major AI labs are investigating a security incident that impacted Mercor, a leading data vendor. The incident could have exposed key data...

Wired - AI · 6 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime