Data Science

Data analysis, statistics, and data engineering

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

[P] I built an autonomous ML agent that runs experiments on tabular data indefinitely - inspired by Karpathy's AutoResearch

Inspired by Andrej Karpathy's AutoResearch, I built a system where Claude Code acts as an autonomous ML researcher on tabular binary clas...

Reddit - Machine Learning · 1 min · about 4 hours ago

Machine Learning

[D] Data curation and targeted replacement as a pre-training alignment and controllability method

Hi, r/MachineLearning: has much research been done in large-scale training scenarios where undesirable data has been replaced before trai...

Reddit - Machine Learning · 1 min · about 4 hours ago

Machine Learning

[P] I tested Meta’s brain-response model on posts. It predicted the Elon one almost perfectly.

I built an experimental UI and visualization layer around Meta’s open brain-response model just to see whether this stuff actually works ...

Reddit - Machine Learning · 1 min · about 8 hours ago

All Content

Llms

[2511.09396] Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque

Abstract page for arXiv paper 2511.09396: Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque

arXiv - AI · 3 min · 25 days ago

Machine Learning

[2509.13298] QDFlow: A Python package for physics simulations of quantum dot devices

Abstract page for arXiv paper 2509.13298: QDFlow: A Python package for physics simulations of quantum dot devices

arXiv - Machine Learning · 4 min · 25 days ago

Llms

[2511.03441] CareMedEval dataset: Evaluating Critical Appraisal and Reasoning in the Biomedical Field

Abstract page for arXiv paper 2511.03441: CareMedEval dataset: Evaluating Critical Appraisal and Reasoning in the Biomedical Field

arXiv - AI · 4 min · 25 days ago

Machine Learning

[2508.09844] On the Generalization Limits of Quantum Generative Adversarial Networks with Pure State Generators

Abstract page for arXiv paper 2508.09844: On the Generalization Limits of Quantum Generative Adversarial Networks with Pure State Generators

arXiv - Machine Learning · 3 min · 25 days ago

Llms

[2510.24702] Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

Abstract page for arXiv paper 2510.24702: Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

arXiv - AI · 4 min · 25 days ago

Llms

[2510.24178] MuSaG: A Multimodal German Sarcasm Dataset with Full-Modal Annotations

Abstract page for arXiv paper 2510.24178: MuSaG: A Multimodal German Sarcasm Dataset with Full-Modal Annotations

arXiv - AI · 4 min · 25 days ago

Machine Learning

[2505.21574] Do We Need All the Synthetic Data? Targeted Image Augmentation via Diffusion Models

Abstract page for arXiv paper 2505.21574: Do We Need All the Synthetic Data? Targeted Image Augmentation via Diffusion Models

arXiv - Machine Learning · 4 min · 25 days ago

Data Science

[2505.19328] BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Digital Behavioural Change

Abstract page for arXiv paper 2505.19328: BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Digital Behavioural Change

arXiv - Machine Learning · 4 min · 25 days ago

Llms

[2509.25541] Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play

Abstract page for arXiv paper 2509.25541: Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play

arXiv - AI · 4 min · 25 days ago

Data Science

[2502.17244] A dataset of high-resolution plantar pressures for gait analysis across varying footwear and walking speeds

Abstract page for arXiv paper 2502.17244: A dataset of high-resolution plantar pressures for gait analysis across varying footwear and wa...

arXiv - Machine Learning · 4 min · 25 days ago

Data Science

[2508.04735] ERDES: A Benchmark Video Dataset for Retinal Detachment and Macular Status Classification in Ocular Ultrasound

Abstract page for arXiv paper 2508.04735: ERDES: A Benchmark Video Dataset for Retinal Detachment and Macular Status Classification in Oc...

arXiv - AI · 4 min · 25 days ago

Data Science

[2508.01222] WebDS: An End-to-End Benchmark for Web-based Data Science

Abstract page for arXiv paper 2508.01222: WebDS: An End-to-End Benchmark for Web-based Data Science

arXiv - AI · 4 min · 25 days ago

Machine Learning

[2511.16849] Better audio representations are more brain-like: linking model-brain alignment with performance in downstream auditory tasks

Abstract page for arXiv paper 2511.16849: Better audio representations are more brain-like: linking model-brain alignment with performanc...

arXiv - Machine Learning · 4 min · 25 days ago

Machine Learning

[2510.16462] Buzz, Choose, Forget: A Meta-Bandit Framework for Bee-Like Decision Making

Abstract page for arXiv paper 2510.16462: Buzz, Choose, Forget: A Meta-Bandit Framework for Bee-Like Decision Making

arXiv - Machine Learning · 3 min · 25 days ago

Llms

[2406.06512] Merlin: A Computed Tomography Vision-Language Foundation Model and Dataset

Abstract page for arXiv paper 2406.06512: Merlin: A Computed Tomography Vision-Language Foundation Model and Dataset

arXiv - AI · 4 min · 25 days ago

Machine Learning

[2510.02903] Learning Explicit Single-Cell Dynamics Using ODE Representations

Abstract page for arXiv paper 2510.02903: Learning Explicit Single-Cell Dynamics Using ODE Representations

arXiv - Machine Learning · 4 min · 25 days ago

Llms

[2509.21465] Talking Trees: Reasoning-Assisted Induction of Decision Trees for Tabular Data

Abstract page for arXiv paper 2509.21465: Talking Trees: Reasoning-Assisted Induction of Decision Trees for Tabular Data

arXiv - Machine Learning · 4 min · 25 days ago

Machine Learning

[2506.02168] An Approximation Theory Perspective on Machine Learning

Abstract page for arXiv paper 2506.02168: An Approximation Theory Perspective on Machine Learning

arXiv - Machine Learning · 4 min · 25 days ago

Llms

[2508.03284] ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools

Abstract page for arXiv paper 2508.03284: ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools

arXiv - AI · 4 min · 25 days ago

Llms

[2504.20505] MuRAL: A Multi-Resident Ambient Sensor Dataset Annotated with Natural Language for Activities of Daily Living

Abstract page for arXiv paper 2504.20505: MuRAL: A Multi-Resident Ambient Sensor Dataset Annotated with Natural Language for Activities o...

arXiv - AI · 4 min · 25 days ago

Previous Page 14 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Data Science

Top This Week

[P] I built an autonomous ML agent that runs experiments on tabular data indefinitely - inspired by Karpathy's AutoResearch

[D] Data curation and targeted replacement as a pre-training alignment and controllability method

[P] I tested Meta’s brain-response model on posts. It predicted the Elon one almost perfectly.

All Content

[2511.09396] Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque

[2509.13298] QDFlow: A Python package for physics simulations of quantum dot devices

[2511.03441] CareMedEval dataset: Evaluating Critical Appraisal and Reasoning in the Biomedical Field

[2508.09844] On the Generalization Limits of Quantum Generative Adversarial Networks with Pure State Generators

[2510.24702] Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

[2510.24178] MuSaG: A Multimodal German Sarcasm Dataset with Full-Modal Annotations

[2505.21574] Do We Need All the Synthetic Data? Targeted Image Augmentation via Diffusion Models

[2505.19328] BAH Dataset for Ambivalence/Hesitancy Recognition in Videos for Digital Behavioural Change

[2509.25541] Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play

[2502.17244] A dataset of high-resolution plantar pressures for gait analysis across varying footwear and walking speeds

[2508.04735] ERDES: A Benchmark Video Dataset for Retinal Detachment and Macular Status Classification in Ocular Ultrasound

[2508.01222] WebDS: An End-to-End Benchmark for Web-based Data Science

[2511.16849] Better audio representations are more brain-like: linking model-brain alignment with performance in downstream auditory tasks

[2510.16462] Buzz, Choose, Forget: A Meta-Bandit Framework for Bee-Like Decision Making

[2406.06512] Merlin: A Computed Tomography Vision-Language Foundation Model and Dataset

[2510.02903] Learning Explicit Single-Cell Dynamics Using ODE Representations

[2509.21465] Talking Trees: Reasoning-Assisted Induction of Decision Trees for Tabular Data

[2506.02168] An Approximation Theory Perspective on Machine Learning

[2508.03284] ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools

[2504.20505] MuRAL: A Multi-Resident Ambient Sensor Dataset Annotated with Natural Language for Activities of Daily Living

Related Topics

Stay updated with AI News