Data Science

Data analysis, statistics, and data engineering

Top This Week

Llms

[R] Forced Depth Consideration Reduces Type II Errors in LLM Self-Classification: Evidence from an Exploration Prompting Ablation Study - (200 trap prompts, 4 models, 8 Step-0 variants) [R]

LLM-Based task classifier tend to misroute prompts that look simple at first glance, but require deeper understanding - I call it "Type I...

Reddit - Machine Learning · 1 min ·
Machine Learning

Anyone have an S3-compatible store that actually saturates H100s without the AWS egress tax? [R]

We’re training on a cluster in Lambda Labs, but our main dataset ( over 40TB) is sitting in AWS S3. The egress fees are high, so we tried...

Reddit - Machine Learning · 1 min ·
UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·

All Content

[2602.13209] LemonadeBench: Evaluating the Economic Intuition of Large Language Models in Simple Markets
Llms

[2602.13209] LemonadeBench: Evaluating the Economic Intuition of Large Language Models in Simple Markets

The paper presents LemonadeBench, a benchmark for assessing the economic intuition of large language models (LLMs) through a simulated le...

arXiv - AI · 3 min ·
[2602.15816] Developing AI Agents with Simulated Data: Why, what, and how?
Machine Learning

[2602.15816] Developing AI Agents with Simulated Data: Why, what, and how?

This article discusses the significance of synthetic data generation through simulation for training AI agents, addressing challenges and...

arXiv - AI · 3 min ·
[2602.15791] Enhancing Building Semantics Preservation in AI Model Training with Large Language Model Encodings
Llms

[2602.15791] Enhancing Building Semantics Preservation in AI Model Training with Large Language Model Encodings

This article presents a novel approach to enhance building semantics preservation in AI model training using large language model encodin...

arXiv - AI · 4 min ·
[2602.15635] On inferring cumulative constraints
Ai Agents

[2602.15635] On inferring cumulative constraints

This paper presents a method for inferring cumulative constraints in scheduling problems, improving search performance and generating new...

arXiv - AI · 3 min ·
[2602.15306] Sparse Additive Model Pruning for Order-Based Causal Structure Learning
Machine Learning

[2602.15306] Sparse Additive Model Pruning for Order-Based Causal Structure Learning

This paper presents a novel pruning method for causal structure learning using sparse additive models, improving computational efficiency...

arXiv - Machine Learning · 4 min ·
[2602.15277] Accelerating Large-Scale Dataset Distillation via Exploration-Exploitation Optimization
Machine Learning

[2602.15277] Accelerating Large-Scale Dataset Distillation via Exploration-Exploitation Optimization

This paper presents Exploration-Exploitation Distillation (E^2D), a method for efficient large-scale dataset distillation that balances a...

arXiv - Machine Learning · 4 min ·
[2602.15553] RUVA: Personalized Transparent On-Device Graph Reasoning
Nlp

[2602.15553] RUVA: Personalized Transparent On-Device Graph Reasoning

The paper presents RUVA, a novel architecture for personalized on-device graph reasoning that enhances user control over AI-generated con...

arXiv - AI · 3 min ·
[2602.15531] GenAI-LA: Generative AI and Learning Analytics Workshop (LAK 2026), April 27--May 1, 2026, Bergen, Norway
Machine Learning

[2602.15531] GenAI-LA: Generative AI and Learning Analytics Workshop (LAK 2026), April 27--May 1, 2026, Bergen, Norway

The article presents the GenAI-LA workshop focusing on Generative AI and Learning Analytics, scheduled for April 27-May 1, 2026, in Berge...

arXiv - AI · 4 min ·
[2602.15181] Time-Archival Camera Virtualization for Sports and Visual Performances
Computer Vision

[2602.15181] Time-Archival Camera Virtualization for Sports and Visual Performances

This paper presents a novel approach to camera virtualization for sports and visual performances, enabling photorealistic rendering from ...

arXiv - Machine Learning · 4 min ·
[2602.15169] Learning the S-matrix from data: Rediscovering gravity from gauge theory via symbolic regression
Robotics

[2602.15169] Learning the S-matrix from data: Rediscovering gravity from gauge theory via symbolic regression

This article presents a novel approach using symbolic regression to reconstruct key analytic structures in scattering amplitudes from num...

arXiv - Machine Learning · 3 min ·
[2602.15161] Exploiting Layer-Specific Vulnerabilities to Backdoor Attack in Federated Learning
Machine Learning

[2602.15161] Exploiting Layer-Specific Vulnerabilities to Backdoor Attack in Federated Learning

This paper presents the Layer Smoothing Attack (LSA), a novel backdoor attack exploiting layer-specific vulnerabilities in federated lear...

arXiv - Machine Learning · 4 min ·
[2602.15298] X-MAP: eXplainable Misclassification Analysis and Profiling for Spam and Phishing Detection
Machine Learning

[2602.15298] X-MAP: eXplainable Misclassification Analysis and Profiling for Spam and Phishing Detection

The paper presents X-MAP, a framework for analyzing and profiling misclassifications in spam and phishing detection, enhancing interpreta...

arXiv - AI · 3 min ·
[2602.15154] Loss Knows Best: Detecting Annotation Errors in Videos via Loss Trajectories
Machine Learning

[2602.15154] Loss Knows Best: Detecting Annotation Errors in Videos via Loss Trajectories

The paper presents a novel method for detecting annotation errors in video datasets by analyzing loss trajectories, enhancing model train...

arXiv - Machine Learning · 4 min ·
[2602.15091] Mixture-of-Experts under Finite-Rate Gating: Communication--Generalization Trade-offs
Machine Learning

[2602.15091] Mixture-of-Experts under Finite-Rate Gating: Communication--Generalization Trade-offs

This paper explores the trade-offs in Mixture-of-Experts (MoE) architectures under finite-rate gating, focusing on communication efficien...

arXiv - Machine Learning · 3 min ·
[2602.15270] Enhancing Diversity and Feasibility: Joint Population Synthesis from Multi-source Data Using Generative Models
Machine Learning

[2602.15270] Enhancing Diversity and Feasibility: Joint Population Synthesis from Multi-source Data Using Generative Models

This paper presents a novel method for generating synthetic populations using multi-source data and Wasserstein Generative Adversarial Ne...

arXiv - AI · 4 min ·
[2602.15136] Universal priors: solving empirical Bayes via Bayesian inference and pretraining
Machine Learning

[2602.15136] Universal priors: solving empirical Bayes via Bayesian inference and pretraining

The paper explores how a pretrained transformer can effectively solve empirical Bayes problems by leveraging universal priors, demonstrat...

arXiv - Machine Learning · 3 min ·
[2602.15088] IT-DPC-SRI: A Cloud-Optimized Archive of Italian Radar Precipitation (2010-2025)
Data Science

[2602.15088] IT-DPC-SRI: A Cloud-Optimized Archive of Italian Radar Precipitation (2010-2025)

The article presents IT-DPC-SRI, a comprehensive cloud-optimized archive of Italian radar precipitation data from 2010 to 2025, addressin...

arXiv - Machine Learning · 4 min ·
[2602.15248] Predicting Invoice Dilution in Supply Chain Finance with Leakage Free Two Stage XGBoost, KAN (Kolmogorov Arnold Networks), and Ensemble Models
Machine Learning

[2602.15248] Predicting Invoice Dilution in Supply Chain Finance with Leakage Free Two Stage XGBoost, KAN (Kolmogorov Arnold Networks), and Ensemble Models

This paper presents a machine learning framework to predict invoice dilution in supply chain finance, utilizing advanced models like XGBo...

arXiv - AI · 3 min ·
[2602.15087] StrokeNeXt: A Siamese-encoder Approach for Brain Stroke Classification in Computed Tomography Imagery
Machine Learning

[2602.15087] StrokeNeXt: A Siamese-encoder Approach for Brain Stroke Classification in Computed Tomography Imagery

StrokeNeXt introduces a Siamese-encoder model for classifying brain strokes in CT images, achieving high accuracy and low misclassificati...

arXiv - Machine Learning · 3 min ·
[2602.15084] TokaMind: A Multi-Modal Transformer Foundation Model for Tokamak Plasma Dynamics
Llms

[2602.15084] TokaMind: A Multi-Modal Transformer Foundation Model for Tokamak Plasma Dynamics

TokaMind is a new open-source multi-modal transformer model designed for tokamak plasma dynamics, demonstrating superior performance on f...

arXiv - Machine Learning · 4 min ·
Previous Page 124 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime