[2602.15878] IT-OSE: Exploring Optimal Sample Size for Industrial Data Augmentation

[2602.15878] IT-OSE: Exploring Optimal Sample Size for Industrial Data Augmentation

arXiv - AI 4 min read Article

Summary

The paper presents IT-OSE, a method for estimating the optimal sample size for data augmentation in industrial settings, improving model performance significantly.

Why It Matters

Understanding optimal sample size in data augmentation is crucial for enhancing model accuracy and efficiency in industrial applications. This research addresses a gap in existing methodologies, providing a theoretical framework and practical solutions that can lead to better resource management and performance in machine learning tasks.

Key Takeaways

  • IT-OSE improves accuracy in classification tasks by an average of 4.38%.
  • Reduces mean absolute percentage error (MAPE) in regression tasks by an average of 18.80%.
  • Achieves optimal sample size estimation while significantly lowering computational and data costs.
  • Introduces an interval coverage and deviation (ICD) score for evaluating OSS intuitively.
  • Demonstrates generality across various sensor-based industrial scenarios.

Computer Science > Machine Learning arXiv:2602.15878 (cs) [Submitted on 3 Feb 2026] Title:IT-OSE: Exploring Optimal Sample Size for Industrial Data Augmentation Authors:Mingchun Sun, Rongqiang Zhao, Zhennan Huang, Songyu Ding, Jie Liu View a PDF of the paper titled IT-OSE: Exploring Optimal Sample Size for Industrial Data Augmentation, by Mingchun Sun and 4 other authors View PDF Abstract:In industrial scenarios, data augmentation is an effective approach to improve model performance. However, its benefits are not unidirectionally beneficial. There is no theoretical research or established estimation for the optimal sample size (OSS) in augmentation, nor is there an established metric to evaluate the accuracy of OSS or its deviation from the ground truth. To address these issues, we propose an information-theoretic optimal sample size estimation (IT-OSE) to provide reliable OSS estimation for industrial data augmentation. An interval coverage and deviation (ICD) score is proposed to evaluate the estimated OSS intuitively. The relationship between OSS and dominant factors is theoretically analyzed and formulated, thereby enhancing the interpretability. Experiments show that, compared to empirical estimation, the IT-OSE increases accuracy in classification tasks across baseline models by an average of 4.38%, and reduces MAPE in regression tasks across baseline models by an average of 18.80%. The improvements in downstream model performance are more stable. ICDdev in the ICD ...

Related Articles

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Accelerating science with AI and simulations
Machine Learning

Accelerating science with AI and simulations

MIT Professor Rafael Gómez-Bombarelli discusses the transformative potential of AI in scientific research, emphasizing its role in materi...

AI News - General · 10 min ·
Improving AI models’ ability to explain their predictions
Machine Learning

Improving AI models’ ability to explain their predictions

AI News - General · 9 min ·
When AI training wheels help and hinder learning
Machine Learning

When AI training wheels help and hinder learning

Policymakers and educators must strike a balance between encouraging AI proficiency and preserving motivation and intellectual curiosity....

AI News - General · 6 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime