[2603.14830] Dataset Distillation Efficiently Encodes Low-Dimensional

[2603.14830] Dataset Distillation Efficiently Encodes Low-Dimensional Representations from Gradient-Based Learning of Non-Linear Tasks

arXiv - Machine Learning March 31, 2026 4 min read

About this article

Abstract page for arXiv paper 2603.14830: Dataset Distillation Efficiently Encodes Low-Dimensional Representations from Gradient-Based Learning of Non-Linear Tasks

Computer Science > Machine Learning arXiv:2603.14830 (cs) [Submitted on 16 Mar 2026 (v1), last revised 30 Mar 2026 (this version, v2)] Title:Dataset Distillation Efficiently Encodes Low-Dimensional Representations from Gradient-Based Learning of Non-Linear Tasks Authors:Yuri Kinoshita, Naoki Nishikawa, Taro Toyoizumi View a PDF of the paper titled Dataset Distillation Efficiently Encodes Low-Dimensional Representations from Gradient-Based Learning of Non-Linear Tasks, by Yuri Kinoshita and 2 other authors View PDF HTML (experimental) Abstract:Dataset distillation, a training-aware data compression technique, has recently attracted increasing attention as an effective tool for mitigating costs of optimization and data storage. However, progress remains largely empirical. Mechanisms underlying the extraction of task-relevant information from the training process and the efficient encoding of such information into synthetic data points remain elusive. In this paper, we theoretically analyze practical algorithms of dataset distillation applied to the gradient-based training of two-layer neural networks with width $L$. By focusing on a non-linear task structure called multi-index model, we prove that the low-dimensional structure of the problem is efficiently encoded into the resulting distilled data. This dataset reproduces a model with high generalization ability for a required memory complexity of $\tilde{\Theta}$$(r^2d+L)$, where $d$ and $r$ are the input and intrinsic dime...

Originally published on March 31, 2026. Curated by AI News.

Machine Learning

Coherence Without Convergence: A New Protocol for Multi-Agent AI

Opening For the past year, most progress in multi-agent AI has followed a familiar pattern: Add more agents. Add more coordination. Watch...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Machine Learning

Week 6 AIPass update - answering the top questions from last post (file conflicts, remote models, scale)

Followup to last post with answers to the top questions from the comments. Appreciate everyone who jumped in. The most common one by a mi...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

Honest ChatGPT vs Claude comparison after using both daily for a month

got tired of reading comparisons that were obvisously written by people who tested each tool for 20 minutes so i ran both at $20/month fo...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Machine Learning

What if attention didn’t need matrix multiplication?

I built a cognitive architecture where all computation reduces to three bit operations: XOR, MAJ, POPCNT. No GEMM. No GPU. No floating-po...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

[2603.14830] Dataset Distillation Efficiently Encodes Low-Dimensional Representations from Gradient-Based Learning of Non-Linear Tasks

About this article

Related Articles

Coherence Without Convergence: A New Protocol for Multi-Agent AI

Week 6 AIPass update - answering the top questions from last post (file conflicts, remote models, scale)

Honest ChatGPT vs Claude comparison after using both daily for a month

What if attention didn’t need matrix multiplication?

No comments

Stay updated with AI News