Data Science

Data analysis, statistics, and data engineering

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · about 1 hour ago

Machine Learning

Accelerating science with AI and simulations

MIT Professor Rafael Gómez-Bombarelli discusses the transformative potential of AI in scientific research, emphasizing its role in materi...

AI News - General · 10 min · about 1 hour ago

Llms

[P] I built an autonomous ML agent that runs experiments on tabular data indefinitely - inspired by Karpathy's AutoResearch

Inspired by Andrej Karpathy's AutoResearch, I built a system where Claude Code acts as an autonomous ML researcher on tabular binary clas...

Reddit - Machine Learning · 1 min · about 8 hours ago

All Content

Machine Learning

[2603.03275] Learning Demographic-Conditioned Mobility Trajectories with Aggregate Supervision

Abstract page for arXiv paper 2603.03275: Learning Demographic-Conditioned Mobility Trajectories with Aggregate Supervision

arXiv - Machine Learning · 3 min · 26 days ago

Machine Learning

[2603.03230] SynthCharge: An Electric Vehicle Routing Instance Generator with Feasibility Screening to Enable Learning-Based Optimization and Benchmarking

Abstract page for arXiv paper 2603.03230: SynthCharge: An Electric Vehicle Routing Instance Generator with Feasibility Screening to Enabl...

arXiv - AI · 3 min · 26 days ago

Machine Learning

[2603.03207] I-CAM-UV: Integrating Causal Graphs over Non-Identical Variable Sets Using Causal Additive Models with Unobserved Variables

Abstract page for arXiv paper 2603.03207: I-CAM-UV: Integrating Causal Graphs over Non-Identical Variable Sets Using Causal Additive Mode...

arXiv - Machine Learning · 4 min · 26 days ago

Llms

[2603.03206] Understanding and Mitigating Dataset Corruption in LLM Steering

Abstract page for arXiv paper 2603.03206: Understanding and Mitigating Dataset Corruption in LLM Steering

arXiv - AI · 4 min · 26 days ago

Machine Learning

[2603.03172] Less Noise, Same Certificate: Retain Sensitivity for Unlearning

Abstract page for arXiv paper 2603.03172: Less Noise, Same Certificate: Retain Sensitivity for Unlearning

arXiv - Machine Learning · 4 min · 26 days ago

Machine Learning

[2603.02411] From Fewer Samples to Fewer Bits: Reframing Dataset Distillation as Joint Optimization of Precision and Compactness

Abstract page for arXiv paper 2603.02411: From Fewer Samples to Fewer Bits: Reframing Dataset Distillation as Joint Optimization of Preci...

arXiv - Machine Learning · 3 min · 26 days ago

Nlp

[2603.03056] Incremental Graph Construction Enables Robust Spectral Clustering of Texts

Abstract page for arXiv paper 2603.03056: Incremental Graph Construction Enables Robust Spectral Clustering of Texts

arXiv - Machine Learning · 3 min · 26 days ago

Machine Learning

[2603.02252] Whisper-RIR-Mega: A Paired Clean-Reverberant Speech Benchmark for ASR Robustness to Room Acoustics

Abstract page for arXiv paper 2603.02252: Whisper-RIR-Mega: A Paired Clean-Reverberant Speech Benchmark for ASR Robustness to Room Acoustics

arXiv - Machine Learning · 3 min · 26 days ago

Machine Learning

[2603.02935] Contextual Latent World Models for Offline Meta Reinforcement Learning

Abstract page for arXiv paper 2603.02935: Contextual Latent World Models for Offline Meta Reinforcement Learning

arXiv - Machine Learning · 3 min · 26 days ago

Llms

[2603.02840] Adapting Time Series Foundation Models through Data Mixtures

Abstract page for arXiv paper 2603.02840: Adapting Time Series Foundation Models through Data Mixtures

arXiv - Machine Learning · 4 min · 26 days ago

Ai Safety

[2603.02756] Rethinking Time Series Domain Generalization via Structure-Stratified Calibration

Abstract page for arXiv paper 2603.02756: Rethinking Time Series Domain Generalization via Structure-Stratified Calibration

arXiv - Machine Learning · 3 min · 26 days ago

Machine Learning

[2603.02212] GLEAN: Grounded Lightweight Evaluation Anchors for Contamination-Aware Tabular Reasoning

Abstract page for arXiv paper 2603.02212: GLEAN: Grounded Lightweight Evaluation Anchors for Contamination-Aware Tabular Reasoning

arXiv - AI · 3 min · 26 days ago

Llms

[2603.03072] TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning

Abstract page for arXiv paper 2603.03072: TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning

arXiv - AI · 4 min · 26 days ago

Nlp

[2603.02702] FinTexTS: Financial Text-Paired Time-Series Dataset via Semantic-Based and Multi-Level Pairing

Abstract page for arXiv paper 2603.02702: FinTexTS: Financial Text-Paired Time-Series Dataset via Semantic-Based and Multi-Level Pairing

arXiv - Machine Learning · 4 min · 26 days ago

Llms

[2603.02237] Concept Heterogeneity-aware Representation Steering

Abstract page for arXiv paper 2603.02237: Concept Heterogeneity-aware Representation Steering

arXiv - AI · 4 min · 26 days ago

Llms

[2603.02239] Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents

Abstract page for arXiv paper 2603.02239: Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foun...

arXiv - AI · 4 min · 26 days ago

Llms

[2603.02221] MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction

Abstract page for arXiv paper 2603.02221: MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabul...

arXiv - AI · 4 min · 26 days ago

Llms

[2603.02215] RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchical Curriculum Learning

Abstract page for arXiv paper 2603.02215: RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchi...

arXiv - AI · 4 min · 26 days ago

Llms

[D] Quantified analysis of 2,218 Gary Marcus claims - two independent LLM pipelines, scored against evidence

Built a dataset scoring every testable claim from Marcus's 474 Substack posts. Two pipelines (Claude Opus 4.6 and ChatGPT Codex) analyzed...

Reddit - Machine Learning · 1 min · 26 days ago

Machine Learning

[P] I trained Qwen2.5-1.5b with RLVR (GRPO) vs SFT and compared benchmark performance

Hello everyone. I trained Qwen2.5-1.5b-Instruct with both RLVR and SFT on the GSM8K dataset and compared the results across GSM8K and MAT...

Reddit - Machine Learning · 1 min · 26 days ago

Previous Page 17 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Data Science

Top This Week

UMKC Announces New Master of Science in Artificial Intelligence

Accelerating science with AI and simulations

[P] I built an autonomous ML agent that runs experiments on tabular data indefinitely - inspired by Karpathy's AutoResearch

All Content

[2603.03275] Learning Demographic-Conditioned Mobility Trajectories with Aggregate Supervision

[2603.03230] SynthCharge: An Electric Vehicle Routing Instance Generator with Feasibility Screening to Enable Learning-Based Optimization and Benchmarking

[2603.03207] I-CAM-UV: Integrating Causal Graphs over Non-Identical Variable Sets Using Causal Additive Models with Unobserved Variables

[2603.03206] Understanding and Mitigating Dataset Corruption in LLM Steering

[2603.03172] Less Noise, Same Certificate: Retain Sensitivity for Unlearning

[2603.02411] From Fewer Samples to Fewer Bits: Reframing Dataset Distillation as Joint Optimization of Precision and Compactness

[2603.03056] Incremental Graph Construction Enables Robust Spectral Clustering of Texts

[2603.02252] Whisper-RIR-Mega: A Paired Clean-Reverberant Speech Benchmark for ASR Robustness to Room Acoustics

[2603.02935] Contextual Latent World Models for Offline Meta Reinforcement Learning

[2603.02840] Adapting Time Series Foundation Models through Data Mixtures

[2603.02756] Rethinking Time Series Domain Generalization via Structure-Stratified Calibration

[2603.02212] GLEAN: Grounded Lightweight Evaluation Anchors for Contamination-Aware Tabular Reasoning

[2603.03072] TikZilla: Scaling Text-to-TikZ with High-Quality Data and Reinforcement Learning

[2603.02702] FinTexTS: Financial Text-Paired Time-Series Dataset via Semantic-Based and Multi-Level Pairing

[2603.02237] Concept Heterogeneity-aware Representation Steering

[2603.02239] Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents

[2603.02221] MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction

[2603.02215] RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchical Curriculum Learning

[D] Quantified analysis of 2,218 Gary Marcus claims - two independent LLM pipelines, scored against evidence

[P] I trained Qwen2.5-1.5b with RLVR (GRPO) vs SFT and compared benchmark performance

Related Topics

Stay updated with AI News