Data Science

Data analysis, statistics, and data engineering

Top This Week

Machine Learning

[P] citracer: a small CLI tool to trace where a concept comes from in a citation graph

Hi all, I made a small tool that I've been using for my own literature reviews and figured I'd share in case it's useful to anyone else. ...

Reddit - Machine Learning · 1 min ·
Data Science

What actually makes something the best AI meeting recorder?

I’ve been trying a few meeting tools lately and realized I care way less about flashy summaries than I thought. What I actually want is p...

Reddit - Artificial Intelligence · 1 min ·
Llms

[D] The Bitter Lesson of Optimization: Why training Neural Networks to update themselves is mathematically brutal (but probably inevitable)

Are we still stuck in the "feature engineering" era of optimization? We trust neural networks to learn unimaginably complex patterns from...

Reddit - Machine Learning · 1 min ·

All Content

[2602.15968] From Reflection to Repair: A Scoping Review of Dataset Documentation Tools
Data Science

[2602.15968] From Reflection to Repair: A Scoping Review of Dataset Documentation Tools

This article presents a scoping review of dataset documentation tools, analyzing motivations behind their design and factors affecting th...

arXiv - AI · 4 min ·
[2602.15958] DocSplit: A Comprehensive Benchmark Dataset and Evaluation Approach for Document Packet Recognition and Splitting
Data Science

[2602.15958] DocSplit: A Comprehensive Benchmark Dataset and Evaluation Approach for Document Packet Recognition and Splitting

The paper introduces DocSplit, a benchmark dataset and evaluation framework for document packet recognition and splitting, addressing cha...

arXiv - AI · 4 min ·
[2602.16709] Knowledge-Embedded Latent Projection for Robust Representation Learning
Machine Learning

[2602.16709] Knowledge-Embedded Latent Projection for Robust Representation Learning

This article presents a novel knowledge-embedded latent projection model aimed at improving representation learning in high-dimensional d...

arXiv - Machine Learning · 4 min ·
[2602.15923] A fully differentiable framework for training proxy Exchange Correlation Functionals for periodic systems
Machine Learning

[2602.15923] A fully differentiable framework for training proxy Exchange Correlation Functionals for periodic systems

This paper presents a fully differentiable framework for integrating machine learning models into Density Functional Theory (DFT) for per...

arXiv - Machine Learning · 4 min ·
[2602.16698] Causality is Key for Interpretability Claims to Generalise
Llms

[2602.16698] Causality is Key for Interpretability Claims to Generalise

This paper discusses the importance of causality in interpretability research for large language models, highlighting pitfalls in general...

arXiv - Machine Learning · 4 min ·
[2602.15919] Generalized Leverage Score for Scalable Assessment of Privacy Vulnerability
Machine Learning

[2602.15919] Generalized Leverage Score for Scalable Assessment of Privacy Vulnerability

The paper presents a method for assessing privacy vulnerability in machine learning models using a generalized leverage score, enabling e...

arXiv - Machine Learning · 3 min ·
[2602.16697] Protecting the Undeleted in Machine Unlearning
Machine Learning

[2602.16697] Protecting the Undeleted in Machine Unlearning

The paper discusses machine unlearning, focusing on the privacy risks associated with undeleted data when specific data points are remove...

arXiv - Machine Learning · 3 min ·
[2602.16684] Retrieval-Augmented Foundation Models for Matched Molecular Pair Transformations to Recapitulate Medicinal Chemistry Intuition
Llms

[2602.16684] Retrieval-Augmented Foundation Models for Matched Molecular Pair Transformations to Recapitulate Medicinal Chemistry Intuition

This article presents a novel approach using retrieval-augmented foundation models for matched molecular pair transformations, enhancing ...

arXiv - Machine Learning · 3 min ·
[2602.16673] Neighborhood Stability as a Measure of Nearest Neighbor Searchability
Data Science

[2602.16673] Neighborhood Stability as a Measure of Nearest Neighbor Searchability

The paper introduces two measures for assessing the searchability of datasets in clustering-based Approximate Nearest Neighbor Search (AN...

arXiv - Machine Learning · 3 min ·
[2602.15909] Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis
Machine Learning

[2602.15909] Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis

The paper presents Resp-Agent, an innovative agent-based system for generating multimodal respiratory sounds and diagnosing diseases, add...

arXiv - AI · 4 min ·
[2602.16643] Factorization Machine with Quadratic-Optimization Annealing for RNA Inverse Folding and Evaluation of Binary-Integer Encoding and Nucleotide Assignment
Machine Learning

[2602.16643] Factorization Machine with Quadratic-Optimization Annealing for RNA Inverse Folding and Evaluation of Binary-Integer Encoding and Nucleotide Assignment

This article presents a novel method using factorization machines with quadratic-optimization annealing (FMQA) to tackle the RNA inverse ...

arXiv - Machine Learning · 4 min ·
[2602.16600] Predicting The Cop Number Using Machine Learning
Machine Learning

[2602.16600] Predicting The Cop Number Using Machine Learning

This article explores the use of machine learning to predict the cop number in graph theory, demonstrating the effectiveness of classical...

arXiv - Machine Learning · 4 min ·
[2602.15890] Surrogate Modeling for Neutron Transport: A Neural Operator Approach
Machine Learning

[2602.15890] Surrogate Modeling for Neutron Transport: A Neural Operator Approach

This article presents a neural operator framework for surrogate modeling in neutron transport, demonstrating significant computational ef...

arXiv - Machine Learning · 4 min ·
[2602.16596] Sequential Membership Inference Attacks
Machine Learning

[2602.16596] Sequential Membership Inference Attacks

The paper presents a novel approach to Membership Inference Attacks (MIAs) by developing an optimal attack strategy, SeMI*, leveraging mo...

arXiv - Machine Learning · 4 min ·
[2602.16579] AIFL: A Global Daily Streamflow Forecasting Model Using Deterministic LSTM Pre-trained on ERA5-Land and Fine-tuned on IFS
Machine Learning

[2602.16579] AIFL: A Global Daily Streamflow Forecasting Model Using Deterministic LSTM Pre-trained on ERA5-Land and Fine-tuned on IFS

The paper presents AIFL, a deterministic LSTM model for global daily streamflow forecasting, trained on ERA5-Land and fine-tuned on IFS, ...

arXiv - AI · 4 min ·
[2602.16573] MoDE-Boost: Boosting Shared Mobility Demand with Edge-Ready Prediction Models
Machine Learning

[2602.16573] MoDE-Boost: Boosting Shared Mobility Demand with Edge-Ready Prediction Models

The paper presents MoDE-Boost, a novel approach using gradient boosting models to forecast urban mobility demand, enhancing efficiency in...

arXiv - Machine Learning · 4 min ·
[2602.16570] Steering diffusion models with quadratic rewards: a fine-grained analysis
Machine Learning

[2602.16570] Steering diffusion models with quadratic rewards: a fine-grained analysis

This article presents a detailed analysis of sampling from reward-tilted diffusion models, focusing on quadratic rewards and their comput...

arXiv - Machine Learning · 4 min ·
[2602.16531] Transfer Learning of Linear Regression with Multiple Pretrained Models: Benefiting from More Pretrained Models via Overparameterization Debiasing
Machine Learning

[2602.16531] Transfer Learning of Linear Regression with Multiple Pretrained Models: Benefiting from More Pretrained Models via Overparameterization Debiasing

This paper explores transfer learning in linear regression using multiple pretrained models, highlighting the benefits of overparameteriz...

arXiv - Machine Learning · 3 min ·
[2602.15866] NLP Privacy Risk Identification in Social Media (NLP-PRISM): A Survey
Nlp

[2602.15866] NLP Privacy Risk Identification in Social Media (NLP-PRISM): A Survey

This survey presents the NLP-PRISM framework for identifying privacy risks in social media NLP applications, analyzing 203 peer-reviewed ...

arXiv - AI · 4 min ·
[2602.16530] FEKAN: Feature-Enriched Kolmogorov-Arnold Networks
Machine Learning

[2602.16530] FEKAN: Feature-Enriched Kolmogorov-Arnold Networks

The paper introduces Feature-Enriched Kolmogorov-Arnold Networks (FEKAN), an advanced model that enhances computational efficiency and pr...

arXiv - Machine Learning · 4 min ·
Previous Page 113 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime