Data Science

Data analysis, statistics, and data engineering

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · 42 minutes ago

Llms

[2603.16629] MLLM-based Textual Explanations for Face Comparison

Abstract page for arXiv paper 2603.16629: MLLM-based Textual Explanations for Face Comparison

arXiv - AI · 4 min · about 2 hours ago

Machine Learning

[2603.14267] DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and Synchronization

Abstract page for arXiv paper 2603.14267: DiFlowDubber: Discrete Flow Matching for Automated Video Dubbing via Cross-Modal Alignment and ...

arXiv - AI · 4 min · about 2 hours ago

All Content

Nlp

[2603.02702] FinTexTS: Financial Text-Paired Time-Series Dataset via Semantic-Based and Multi-Level Pairing

Abstract page for arXiv paper 2603.02702: FinTexTS: Financial Text-Paired Time-Series Dataset via Semantic-Based and Multi-Level Pairing

arXiv - Machine Learning · 4 min · 26 days ago

Llms

[2603.02237] Concept Heterogeneity-aware Representation Steering

Abstract page for arXiv paper 2603.02237: Concept Heterogeneity-aware Representation Steering

arXiv - AI · 4 min · 26 days ago

Llms

[2603.02239] Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foundation Models and Agents

Abstract page for arXiv paper 2603.02239: Engineering Reasoning and Instruction (ERI) Benchmark: A Large Taxonomy-driven Dataset for Foun...

arXiv - AI · 4 min · 26 days ago

Llms

[2603.02221] MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction

Abstract page for arXiv paper 2603.02221: MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabul...

arXiv - AI · 4 min · 26 days ago

Llms

[2603.02215] RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchical Curriculum Learning

Abstract page for arXiv paper 2603.02215: RxnNano:Training Compact LLMs for Chemical Reaction and Retrosynthesis Prediction via Hierarchi...

arXiv - AI · 4 min · 26 days ago

Llms

[D] Quantified analysis of 2,218 Gary Marcus claims - two independent LLM pipelines, scored against evidence

Built a dataset scoring every testable claim from Marcus's 474 Substack posts. Two pipelines (Claude Opus 4.6 and ChatGPT Codex) analyzed...

Reddit - Machine Learning · 1 min · 26 days ago

Machine Learning

[P] I trained Qwen2.5-1.5b with RLVR (GRPO) vs SFT and compared benchmark performance

Hello everyone. I trained Qwen2.5-1.5b-Instruct with both RLVR and SFT on the GSM8K dataset and compared the results across GSM8K and MAT...

Reddit - Machine Learning · 1 min · 26 days ago

Machine Learning

[2510.18516] Decoding Dynamic Visual Experience from Calcium Imaging via Cell-Pattern-Aware Pretraining

Abstract page for arXiv paper 2510.18516: Decoding Dynamic Visual Experience from Calcium Imaging via Cell-Pattern-Aware Pretraining

arXiv - Machine Learning · 3 min · 27 days ago

Machine Learning

[2510.00504] A universal compression theory for lottery ticket hypothesis and neural scaling laws

Abstract page for arXiv paper 2510.00504: A universal compression theory for lottery ticket hypothesis and neural scaling laws

arXiv - Machine Learning · 4 min · 27 days ago

Machine Learning

[2507.21783] Domain Generalization and Adaptation in Intensive Care with Anchor Regression

Abstract page for arXiv paper 2507.21783: Domain Generalization and Adaptation in Intensive Care with Anchor Regression

arXiv - Machine Learning · 4 min · 27 days ago

Llms

[2506.05639] FictionalQA: A Dataset for Studying Memorization and Knowledge Acquisition

Abstract page for arXiv paper 2506.05639: FictionalQA: A Dataset for Studying Memorization and Knowledge Acquisition

arXiv - Machine Learning · 3 min · 27 days ago

Machine Learning

[2503.01441] A Randomized Linearly Convergent Frank-Wolfe-type Method for Smooth Convex Minimization over the Spectrahedron

Abstract page for arXiv paper 2503.01441: A Randomized Linearly Convergent Frank-Wolfe-type Method for Smooth Convex Minimization over th...

arXiv - Machine Learning · 3 min · 27 days ago

Machine Learning

[2504.08428] Standardization of Weighted Ranking Correlation Coefficients

Abstract page for arXiv paper 2504.08428: Standardization of Weighted Ranking Correlation Coefficients

arXiv - Machine Learning · 4 min · 27 days ago

Machine Learning

[2503.17592] A Benchmark Dataset for Machine Learning Surrogates of Pore-Scale CO2-Water Interaction

Abstract page for arXiv paper 2503.17592: A Benchmark Dataset for Machine Learning Surrogates of Pore-Scale CO2-Water Interaction

arXiv - Machine Learning · 3 min · 27 days ago

Machine Learning

[2406.04098] A Large-Scale Neutral Comparison Study of Survival Models on Low-Dimensional Data

Abstract page for arXiv paper 2406.04098: A Large-Scale Neutral Comparison Study of Survival Models on Low-Dimensional Data

arXiv - Machine Learning · 4 min · 27 days ago

Data Science

[2602.02734] WAXAL: A Large-Scale Multilingual African Language Speech Corpus

Abstract page for arXiv paper 2602.02734: WAXAL: A Large-Scale Multilingual African Language Speech Corpus