Top Data Science This Month
The most engaging data science content from this month, curated by AI News.
-
1
[D] Solving the "Liquid-Solid Interface" Problem: 116 High-Fidelity Datasets of Coastal Physics (Waves, Saturated Sand, Light Transport)
Modern generative models (Sora, Runway, Kling) still struggle with the complex physics of the shoreline. I’ve spent months capturing 116 datasets from the Arabian Sea to document phenomena that are...
Reddit - Machine Learning · 5 days ago -
2
[2503.11832] Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning
Abstract page for arXiv paper 2503.11832: Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning
arXiv - Machine Learning · 24 days ago -
3
[2505.04733] Conformal Prediction with Corrupted Labels: Uncertain Imputation and Robust Re-weighting
This paper presents a framework for robust uncertainty quantification in machine learning when training data is corrupted. It introduces methods for re-weighting data and imputing labels to maintai...
arXiv - Machine Learning · 28 days ago -
4
[2602.22249] Improving Spatial Allocation for Energy System Coupling with Graph Neural Networks
This paper presents a novel approach using Heterogeneous Graph Neural Networks to improve spatial allocation in energy system coupling, addressing challenges of mismatched spatial resolutions.
arXiv - Machine Learning · 28 days ago -
5
AI Industry Questions
High school student J. Rollins seeks insights into careers in artificial intelligence, focusing on education, responsibilities, challenges, and growth opportunities in the industry.
Reddit - Artificial Intelligence · 28 days ago -
6
OpenAI Fires an Employee for Prediction Market Insider Trading | WIRED
OpenAI has terminated an employee for insider trading on prediction markets, raising concerns about the ethical implications of using confidential information for personal gain.
Wired - AI · 28 days ago -
7
[2602.10195] Versor: A Geometric Sequence Architecture
The paper introduces Versor, a novel geometric sequence architecture that leverages Conformal Geometric Algebra for enhanced performance and interpretability in machine learning tasks.
arXiv - Machine Learning · 28 days ago -
8
Advice Needed: What AI/ML Topic Would Be Most Useful for a Tech Talk to a Non-ML Tech Team? [D]
A PhD student seeks advice on AI/ML topics suitable for a tech talk aimed at a non-ML tech team in a manufacturing company, emphasizing practical applications.
Reddit - Machine Learning · 27 days ago -
9
Bringing Robotics AI to Embedded Platforms: Dataset Recording, VLA Fine‑Tuning, and On‑Device Optimizations
A Blog post by NXP on Hugging Face
Hugging Face Blog · 22 days ago -
10
[2602.22758] Decomposing Physician Disagreement in HealthBench
This paper analyzes physician disagreement in the HealthBench dataset, identifying key factors contributing to variance in evaluations and suggesting improvements for medical AI assessments.
arXiv - AI · 28 days ago -
11
[2602.22962] Scaling Laws of Global Weather Models
This article examines the scaling laws of global weather models, focusing on the relationship between model performance, dataset size, and compute budget, revealing insights into optimizing weather...
arXiv - Machine Learning · 28 days ago -
12
[2602.23089] Physics-informed neural particle flow for the Bayesian update step
This paper introduces a physics-informed neural particle flow method for the Bayesian update step, addressing computational challenges in high-dimensional nonlinear estimation.
arXiv - Machine Learning · 28 days ago -
13
[2602.23329] LLM Novice Uplift on Dual-Use, In Silico Biology Tasks
This article examines the effectiveness of large language models (LLMs) in enhancing novice users' performance on complex biological tasks, revealing significant accuracy improvements over traditio...
arXiv - AI · 28 days ago -
14
[2602.23159] Benchmarking Temporal Web3 Intelligence: Lessons from the FinSurvival 2025 Challenge
The paper presents the FinSurvival 2025 Challenge, focusing on benchmarking temporal Web3 intelligence using 21.8 million transaction records from the Aave v3 protocol to enhance understanding of u...
arXiv - Machine Learning · 28 days ago -
15
[2602.23280] Physics Informed Viscous Value Representations
This paper presents a novel approach to offline goal-conditioned reinforcement learning by introducing a physics-informed regularization derived from the viscosity solution of the Hamilton-Jacobi-B...
arXiv - Machine Learning · 28 days ago -
16
[2602.22381] Enhancing Renal Tumor Malignancy Prediction: Deep Learning with Automatic 3D CT Organ Focused Attention
This article presents a novel deep learning framework for predicting malignancy in renal tumors using 3D CT images, eliminating the need for manual segmentation and improving predictive accuracy.
arXiv - AI · 28 days ago -
17
[2602.22895] SPD Learn: A Geometric Deep Learning Python Library for Neural Decoding Through Trivialization
SPD Learn is a new Python library designed for geometric deep learning, specifically for neural decoding using symmetric positive definite matrices, enhancing reproducibility and integration in mac...
arXiv - Machine Learning · 28 days ago -
18
[2602.22903] PSQE: A Theoretical-Practical Approach to Pseudo Seed Quality Enhancement for Unsupervised MMEA
The paper presents PSQE, a method for enhancing pseudo seed quality in unsupervised multimodal entity alignment, addressing challenges in data integration for large language models.
arXiv - Machine Learning · 28 days ago -
19
[2602.22985] Kernel Integrated $R^2$: A Measure of Dependence
The paper introduces Kernel Integrated $R^2$, a novel statistical measure of dependence that enhances the integrated $R^2$ by utilizing reproducing kernel Hilbert spaces, allowing for analysis of c...
arXiv - Machine Learning · 28 days ago -
20
[2602.22710] Same Words, Different Judgments: Modality Effects on Preference Alignment
This study explores how modality affects preference alignment in AI systems, comparing human and synthetic evaluations of audio and text content. It finds that audio ratings are reliable but exhibi...
arXiv - AI · 28 days ago -
21
[2602.23012] Sequential Regression for Continuous Value Prediction using Residual Quantization
This article presents a novel approach to continuous value prediction using a residual quantization framework, enhancing prediction accuracy in recommendation systems.
arXiv - Machine Learning · 28 days ago -
22
[2602.23013] SubspaceAD: Training-Free Few-Shot Anomaly Detection via Subspace Modeling
The paper introduces SubspaceAD, a training-free method for few-shot anomaly detection that utilizes subspace modeling to achieve state-of-the-art results without complex training processes.
arXiv - Machine Learning · 28 days ago -
23
[2602.23023] Low-degree Lower bounds for clustering in moderate dimension
This paper explores the clustering of points from a mixture of isotropic Gaussians in moderate dimensions, establishing new polynomial lower bounds and presenting a novel algorithm for better perfo...
arXiv - Machine Learning · 28 days ago -
24
[2602.22740] AMLRIS: Alignment-aware Masked Learning for Referring Image Segmentation
The paper presents AMLRIS, a novel training strategy for Referring Image Segmentation (RIS) that enhances object segmentation through alignment-aware masked learning, achieving state-of-the-art res...
arXiv - AI · 28 days ago -
25
[2602.23079] Assessing Deanonymization Risks with Stylometry-Assisted LLM Agent
This article introduces a novel LLM agent designed to assess and mitigate deanonymization risks in textual data using a method called SALA, which combines stylometric features with LLM reasoning.
arXiv - Machine Learning · 28 days ago -
26
[2602.23132] From Agnostic to Specific: Latent Preference Diffusion for Multi-Behavior Sequential Recommendation
This paper presents FatsMB, a novel framework for Multi-Behavior Sequential Recommendation (MBSR) that enhances user preference modeling by transitioning from behavior-agnostic to behavior-specific...
arXiv - Machine Learning · 28 days ago -
27
[2602.22828] TCM-DiffRAG: Personalized Syndrome Differentiation Reasoning Method for Traditional Chinese Medicine based on Knowledge Graph and Chain of Thought
The article presents TCM-DiffRAG, a novel reasoning framework for Traditional Chinese Medicine (TCM) that enhances diagnosis through knowledge graphs and chain of thought methodologies.
arXiv - AI · 28 days ago -
28
[2602.23277] Zeroth-Order Stackelberg Control in Combinatorial Congestion Games
This article presents the ZO-Stackelberg method for optimizing network parameters in combinatorial congestion games, enhancing efficiency in achieving equilibrium without requiring differentiation ...
arXiv - Machine Learning · 28 days ago -
29
[2602.23295] ManifoldGD: Training-Free Hierarchical Manifold Guidance for Diffusion-Based Dataset Distillation
The paper presents ManifoldGD, a training-free framework for dataset distillation using hierarchical manifold guidance, improving efficiency and fidelity in data generation.
arXiv - Machine Learning · 28 days ago -
30
[2603.01986] Accurate, private, secure, federated U-statistics with higher degree
Abstract page for arXiv paper 2603.01986: Accurate, private, secure, federated U-statistics with higher degree
arXiv - Machine Learning · 24 days ago
Stay updated with AI News
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime