Data Science

Data analysis, statistics, and data engineering

Top This Week

Machine Learning

[P] ML project (XGBoost + Databricks + MLflow) — how to talk about “production issues” in interviews?

Hey all, I recently built an end-to-end fraud detection project using a large banking dataset: Trained an XGBoost model Used Databricks f...

Reddit - Machine Learning · 1 min ·
Harvard opens more free online courses in AI, data science, programming: Check full list and direct links
Data Science

Harvard opens more free online courses in AI, data science, programming: Check full list and direct links

AI News - General · 9 min ·
UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·

All Content

[2602.18745] Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code
Llms

[2602.18745] Synthesizing Multimodal Geometry Datasets from Scratch and Enabling Visual Alignment via Plotting Code

The paper presents a novel pipeline for synthesizing multimodal geometry datasets, introducing the GeoCode dataset which enhances visual-...

arXiv - AI · 3 min ·
[2602.19785] Unsupervised Anomaly Detection in NSL-KDD Using $β$-VAE: A Latent Space and Reconstruction Error Approach
Machine Learning

[2602.19785] Unsupervised Anomaly Detection in NSL-KDD Using $β$-VAE: A Latent Space and Reconstruction Error Approach

This paper presents an unsupervised anomaly detection method using β-VAE on the NSL-KDD dataset, comparing latent space structure and rec...

arXiv - Machine Learning · 3 min ·
[2602.19782] Addressing Instrument-Outcome Confounding in Mendelian Randomization through Representation Learning
Nlp

[2602.19782] Addressing Instrument-Outcome Confounding in Mendelian Randomization through Representation Learning

This article presents a novel representation learning framework aimed at addressing instrument-outcome confounding in Mendelian Randomiza...

arXiv - Machine Learning · 3 min ·
[2602.19770] The Confusion is Real: GRAPHIC - A Network Science Approach to Confusion Matrices in Deep Learning
Machine Learning

[2602.19770] The Confusion is Real: GRAPHIC - A Network Science Approach to Confusion Matrices in Deep Learning

The paper presents GRAPHIC, a novel approach using network science to analyze confusion matrices in deep learning, enhancing understandin...

arXiv - AI · 4 min ·
[2602.18729] MiSCHiEF: A Benchmark in Minimal-Pairs of Safety and Culture for Holistic Evaluation of Fine-Grained Image-Caption Alignment
Llms

[2602.18729] MiSCHiEF: A Benchmark in Minimal-Pairs of Safety and Culture for Holistic Evaluation of Fine-Grained Image-Caption Alignment

The paper presents MiSCHiEF, a benchmark for evaluating fine-grained image-caption alignment, focusing on safety and cultural contexts, h...

arXiv - AI · 4 min ·
[2602.19685] PerturbDiff: Functional Diffusion for Single-Cell Perturbation Modeling
Machine Learning

[2602.19685] PerturbDiff: Functional Diffusion for Single-Cell Perturbation Modeling

PerturbDiff introduces a novel approach to modeling single-cell responses to perturbations by utilizing a diffusion-based generative proc...

arXiv - AI · 4 min ·
[2602.19661] PaReGTA: An LLM-based EHR Data Encoding Approach to Capture Temporal Information
Llms

[2602.19661] PaReGTA: An LLM-based EHR Data Encoding Approach to Capture Temporal Information

The paper presents PaReGTA, an LLM-based framework for encoding temporal information in electronic health records (EHRs), enhancing patie...

arXiv - Machine Learning · 4 min ·
[2602.19654] NEXUS : A compact neural architecture for high-resolution spatiotemporal air quality forecasting in Delhi Nationa Capital Region
Machine Learning

[2602.19654] NEXUS : A compact neural architecture for high-resolution spatiotemporal air quality forecasting in Delhi Nationa Capital Region

The paper presents NEXUS, a compact neural architecture designed for high-resolution air quality forecasting in Delhi NCR, achieving impr...

arXiv - Machine Learning · 4 min ·
[2602.19641] Evaluating the Impact of Data Anonymization on Image Retrieval
Nlp

[2602.19641] Evaluating the Impact of Data Anonymization on Image Retrieval

This article evaluates how data anonymization affects the performance of Content-Based Image Retrieval (CBIR) systems, highlighting the b...

arXiv - Machine Learning · 4 min ·
[2602.18650] NutriOrion: A Hierarchical Multi-Agent Framework for Personalized Nutrition Intervention Grounded in Clinical Guidelines
Llms

[2602.18650] NutriOrion: A Hierarchical Multi-Agent Framework for Personalized Nutrition Intervention Grounded in Clinical Guidelines

NutriOrion presents a hierarchical multi-agent framework for personalized nutrition interventions, addressing the complexities of multimo...

arXiv - AI · 4 min ·
[2602.19610] Variational Inference for Bayesian MIDAS Regression
Machine Learning

[2602.19610] Variational Inference for Bayesian MIDAS Regression

This paper presents a Coordinate Ascent Variational Inference (CAVI) algorithm for Bayesian MIDAS regression, demonstrating significant s...

arXiv - Machine Learning · 4 min ·
[2602.18589] DM4CT: Benchmarking Diffusion Models for Computed Tomography Reconstruction
Machine Learning

[2602.18589] DM4CT: Benchmarking Diffusion Models for Computed Tomography Reconstruction

The paper presents DM4CT, a benchmark for evaluating diffusion models in computed tomography (CT) reconstruction, addressing practical ch...

arXiv - AI · 4 min ·
[2602.19591] Detecting High-Potential SMEs with Heterogeneous Graph Neural Networks
Machine Learning

[2602.19591] Detecting High-Potential SMEs with Heterogeneous Graph Neural Networks

This article presents SME-HGT, a Heterogeneous Graph Transformer framework designed to identify high-potential small and medium enterpris...

arXiv - Machine Learning · 3 min ·
[2602.18585] BloomNet: Exploring Single vs. Multiple Object Annotation for Flower Recognition Using YOLO Variants
Computer Vision

[2602.18585] BloomNet: Exploring Single vs. Multiple Object Annotation for Flower Recognition Using YOLO Variants

The paper explores the effectiveness of single versus multiple object annotation for flower recognition using various YOLO models, presen...

arXiv - AI · 4 min ·
[2602.19584] Interpolation-Driven Machine Learning Approaches for Plume Shine Dose Estimation: A Comparison of XGBoost, Random Forest, and TabNet
Machine Learning

[2602.19584] Interpolation-Driven Machine Learning Approaches for Plume Shine Dose Estimation: A Comparison of XGBoost, Random Forest, and TabNet

This article compares interpolation-driven machine learning approaches for plume shine dose estimation, evaluating XGBoost, Random Forest...

arXiv - AI · 4 min ·
[2602.19531] A Statistical Approach for Modeling Irregular Multivariate Time Series with Missing Observations
Machine Learning

[2602.19531] A Statistical Approach for Modeling Irregular Multivariate Time Series with Missing Observations

This paper presents a novel statistical method for modeling irregular multivariate time series with missing data, demonstrating superior ...

arXiv - AI · 4 min ·
[2602.19552] The Sample Complexity of Replicable Realizable PAC Learning
Machine Learning

[2602.19552] The Sample Complexity of Replicable Realizable PAC Learning

This paper explores the sample complexity of replicable realizable PAC learning, establishing a lower bound on sample complexity with nov...

arXiv - Machine Learning · 3 min ·
[2602.18551] From Static Spectra to Operando Infrared Dynamics: Physics Informed Flow Modeling and a Benchmark
Machine Learning

[2602.18551] From Static Spectra to Operando Infrared Dynamics: Physics Informed Flow Modeling and a Benchmark

This paper presents a novel approach to predicting operando infrared dynamics in lithium-ion batteries using a physics-informed flow mode...

arXiv - AI · 4 min ·
[2602.19533] Grokking Finite-Dimensional Algebra
Machine Learning

[2602.19533] Grokking Finite-Dimensional Algebra

This paper explores the grokking phenomenon in neural networks, focusing on learning multiplication in finite-dimensional algebras, exten...

arXiv - AI · 4 min ·
[2602.19528] Beyond Accuracy: A Unified Random Matrix Theory Diagnostic Framework for Crash Classification Models
Machine Learning

[2602.19528] Beyond Accuracy: A Unified Random Matrix Theory Diagnostic Framework for Crash Classification Models

This paper presents a novel diagnostic framework based on Random Matrix Theory for evaluating crash classification models, focusing on ov...

arXiv - Machine Learning · 4 min ·
Previous Page 76 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime