Top Data Science This Week
The most engaging data science content from this week, curated by AI News.
-
1
[P] Made a dataset but don't know what to do with it
This weekend I was looking for a dataset on major air crashes (I like planes) containing the text of their final reports. Surprisingly I was unable to find even a single open source dataset matchin...
Reddit - Machine Learning · 2 days ago -
2
Accelerating science with AI and simulations
MIT Professor Rafael Gómez-Bombarelli discusses the transformative potential of AI in scientific research, emphasizing its role in material discovery and future innovations.
AI News - General · about 6 hours ago -
3
[P] XGBoost + TF-IDF for emotion prediction — good state accuracy but struggling with intensity (need advice)
Hey everyone, I’m working on a small ML project (~1200 samples) where I’m trying to predict: Emotional state (classification — 6 classes) Intensity (1–5) of that emotion The dataset contains: journ...
Reddit - Machine Learning · 6 days ago -
4
[2603.22977] DariMis: Harm-Aware Modeling for Dari Misinformation Detection on YouTube
Abstract page for arXiv paper 2603.22977: DariMis: Harm-Aware Modeling for Dari Misinformation Detection on YouTube
arXiv - AI · 2 days ago -
5
3 Questions: How AI could optimize the power grid
MIT researchers explore how AI can optimize the power grid, enhancing efficiency, resilience against extreme weather, and supporting renewable energy integration.
AI News - General · 4 days ago -
6
Top 10 AI certifications and courses for 2026
This article reviews the top 10 AI certifications and courses for 2026, highlighting their significance in a rapidly evolving field and the skills they impart for career advancement.
AI Events · 3 days ago -
7
[2507.19116] Graph Structure Learning with Privacy Guarantees for Open Graph Data
Abstract page for arXiv paper 2507.19116: Graph Structure Learning with Privacy Guarantees for Open Graph Data
arXiv - AI · 3 days ago -
8
[P] Awesome Jewelry AI: curated resources for AI-generated jewelry imagery (papers, datasets, open-source models, tools)
Jewelry is one of the, if not the, hardest categories for AI image generation. Reflective metals, facet edges, prong geometry, and gemstone refraction all get destroyed by standard VAE compression ...
Reddit - Machine Learning · 5 days ago -
9
[2512.06737] Arc Gradient Descent: A Geometrically Motivated Gradient Descent-based Optimiser with Phase-Aware, User-Controlled Step Dynamics (proof-of-concept)
Abstract page for arXiv paper 2512.06737: Arc Gradient Descent: A Geometrically Motivated Gradient Descent-based Optimiser with Phase-Aware, User-Controlled Step Dynamics (proof-of-concept)
arXiv - AI · 2 days ago -
10
[P] Benchmark: Using XGBoost vs. DistilBERT for detecting "Month 2 Tanking" in cold email infrastructure?
I have been experimenting with Heuristic-based Deliverability Intelligence to solve the "Month 2 Tanking" problem. The Data Science Challenge: Most tools use simple regex for "Spam words." My hypot...
Reddit - Machine Learning · 6 days ago -
11
[2603.25464] Maximum Entropy Behavior Exploration for Sim2Real Zero-Shot Reinforcement Learning
Abstract page for arXiv paper 2603.25464: Maximum Entropy Behavior Exploration for Sim2Real Zero-Shot Reinforcement Learning
arXiv - Machine Learning · about 8 hours ago -
12
[2603.22876] Grounding Sim-to-Real Generalization in Dexterous Manipulation: An Empirical Study with Vision-Language-Action Models
Abstract page for arXiv paper 2603.22876: Grounding Sim-to-Real Generalization in Dexterous Manipulation: An Empirical Study with Vision-Language-Action Models
arXiv - AI · 2 days ago -
13
[R] How are you managing long-running preprocessing jobs at scale? Curious what's actually working
We're a small ML team for a project and we keep running into the same wall: large preprocessing jobs (think 50–100GB datasets) running on a single machine take hours, and when something fails halfw...
Reddit - Machine Learning · 3 days ago -
14
[2603.19288] Joint Return and Risk Modeling with Deep Neural Networks for Portfolio Construction
Abstract page for arXiv paper 2603.19288: Joint Return and Risk Modeling with Deep Neural Networks for Portfolio Construction
arXiv - Machine Learning · 4 days ago -
15
[R] Seeing arxiv endorser (eess.IV or cs.CV) CT lung nodule AI validation preprint
Sorry, I know these requests can be annoying, but I’m a medical physicist and no one I know uses arXiv. The preprint: post-deployment sensitivity analysis of a MONAI RetinaNet lung nodule detector ...
Reddit - Machine Learning · 6 days ago -
16
[D] Training a classifier entirely in SQL (no iterative optimization)
I implemented SEFR, which is a lightweight linear classifier, entirely in SQL (in Google BigQuery), and benchmarked it against Logistic Regression. On a 55k fraud detection dataset, SEFR achieves A...
Reddit - Machine Learning · 5 days ago -
17
[2603.19299] PRIME-CVD: A Parametrically Rendered Informatics Medical Environment for Education in Cardiovascular Risk Modelling
Abstract page for arXiv paper 2603.19299: PRIME-CVD: A Parametrically Rendered Informatics Medical Environment for Education in Cardiovascular Risk Modelling
arXiv - Machine Learning · 4 days ago -
18
I curated an 'Awesome List' for Generative AI in Jewelry- papers, datasets, open-source models and tools included!
Jewelry is one of the, if not the, hardest categories for AI image generation. Reflective metals, facet edges, prong geometry, and gemstone refraction all get destroyed by standard VAE compression ...
Reddit - Artificial Intelligence · 4 days ago -
19
[2603.19439] Subspace Projection Methods for Fast Spectral Embeddings of Evolving Graphs
Abstract page for arXiv paper 2603.19439: Subspace Projection Methods for Fast Spectral Embeddings of Evolving Graphs
arXiv - Machine Learning · 4 days ago -
20
[D] Solving the "Liquid-Solid Interface" Problem: 116 High-Fidelity Datasets of Coastal Physics (Waves, Saturated Sand, Light Transport)
Modern generative models (Sora, Runway, Kling) still struggle with the complex physics of the shoreline. I’ve spent months capturing 116 datasets from the Arabian Sea to document phenomena that are...
Reddit - Machine Learning · 5 days ago
Stay updated with AI News
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime