Top Data Science This Week

The most engaging data science content from this week, curated by AI News.

  1. 1

    [P] Made a dataset but don't know what to do with it

    This weekend I was looking for a dataset on major air crashes (I like planes) containing the text of their final reports. Surprisingly I was unable to find even a single open source dataset matchin...

    Reddit - Machine Learning · 2 days ago
  2. 2

    Accelerating science with AI and simulations

    MIT Professor Rafael Gómez-Bombarelli discusses the transformative potential of AI in scientific research, emphasizing its role in material discovery and future innovations.

    AI News - General · about 6 hours ago
  3. 3

    [P] XGBoost + TF-IDF for emotion prediction — good state accuracy but struggling with intensity (need advice)

    Hey everyone, I’m working on a small ML project (~1200 samples) where I’m trying to predict: Emotional state (classification — 6 classes) Intensity (1–5) of that emotion The dataset contains: journ...

    Reddit - Machine Learning · 6 days ago
  4. 4

    [2603.22977] DariMis: Harm-Aware Modeling for Dari Misinformation Detection on YouTube

    Abstract page for arXiv paper 2603.22977: DariMis: Harm-Aware Modeling for Dari Misinformation Detection on YouTube

    arXiv - AI · 2 days ago
  5. 5

    3 Questions: How AI could optimize the power grid

    MIT researchers explore how AI can optimize the power grid, enhancing efficiency, resilience against extreme weather, and supporting renewable energy integration.

    AI News - General · 4 days ago
  6. 6

    Top 10 AI certifications and courses for 2026

    This article reviews the top 10 AI certifications and courses for 2026, highlighting their significance in a rapidly evolving field and the skills they impart for career advancement.

    AI Events · 3 days ago
  7. 7

    [2507.19116] Graph Structure Learning with Privacy Guarantees for Open Graph Data

    Abstract page for arXiv paper 2507.19116: Graph Structure Learning with Privacy Guarantees for Open Graph Data

    arXiv - AI · 3 days ago
  8. 8

    [P] Awesome Jewelry AI: curated resources for AI-generated jewelry imagery (papers, datasets, open-source models, tools)

    Jewelry is one of the, if not the, hardest categories for AI image generation. Reflective metals, facet edges, prong geometry, and gemstone refraction all get destroyed by standard VAE compression ...

    Reddit - Machine Learning · 5 days ago
  9. 9

    [2512.06737] Arc Gradient Descent: A Geometrically Motivated Gradient Descent-based Optimiser with Phase-Aware, User-Controlled Step Dynamics (proof-of-concept)

    Abstract page for arXiv paper 2512.06737: Arc Gradient Descent: A Geometrically Motivated Gradient Descent-based Optimiser with Phase-Aware, User-Controlled Step Dynamics (proof-of-concept)

    arXiv - AI · 2 days ago
  10. 10

    [P] Benchmark: Using XGBoost vs. DistilBERT for detecting "Month 2 Tanking" in cold email infrastructure?

    I have been experimenting with Heuristic-based Deliverability Intelligence to solve the "Month 2 Tanking" problem. The Data Science Challenge: Most tools use simple regex for "Spam words." My hypot...

    Reddit - Machine Learning · 6 days ago
  11. 11

    [2603.25464] Maximum Entropy Behavior Exploration for Sim2Real Zero-Shot Reinforcement Learning

    Abstract page for arXiv paper 2603.25464: Maximum Entropy Behavior Exploration for Sim2Real Zero-Shot Reinforcement Learning

    arXiv - Machine Learning · about 8 hours ago
  12. 12

    [2603.22876] Grounding Sim-to-Real Generalization in Dexterous Manipulation: An Empirical Study with Vision-Language-Action Models

    Abstract page for arXiv paper 2603.22876: Grounding Sim-to-Real Generalization in Dexterous Manipulation: An Empirical Study with Vision-Language-Action Models

    arXiv - AI · 2 days ago
  13. 13

    [R] How are you managing long-running preprocessing jobs at scale? Curious what's actually working

    We're a small ML team for a project and we keep running into the same wall: large preprocessing jobs (think 50–100GB datasets) running on a single machine take hours, and when something fails halfw...

    Reddit - Machine Learning · 3 days ago
  14. 14

    [2603.19288] Joint Return and Risk Modeling with Deep Neural Networks for Portfolio Construction

    Abstract page for arXiv paper 2603.19288: Joint Return and Risk Modeling with Deep Neural Networks for Portfolio Construction

    arXiv - Machine Learning · 4 days ago
  15. 15

    [R] Seeing arxiv endorser (eess.IV or cs.CV) CT lung nodule AI validation preprint

    Sorry, I know these requests can be annoying, but I’m a medical physicist and no one I know uses arXiv. The preprint: post-deployment sensitivity analysis of a MONAI RetinaNet lung nodule detector ...

    Reddit - Machine Learning · 6 days ago
  16. 16

    [D] Training a classifier entirely in SQL (no iterative optimization)

    I implemented SEFR, which is a lightweight linear classifier, entirely in SQL (in Google BigQuery), and benchmarked it against Logistic Regression. On a 55k fraud detection dataset, SEFR achieves A...

    Reddit - Machine Learning · 5 days ago
  17. 17

    [2603.19299] PRIME-CVD: A Parametrically Rendered Informatics Medical Environment for Education in Cardiovascular Risk Modelling

    Abstract page for arXiv paper 2603.19299: PRIME-CVD: A Parametrically Rendered Informatics Medical Environment for Education in Cardiovascular Risk Modelling

    arXiv - Machine Learning · 4 days ago
  18. 18

    I curated an 'Awesome List' for Generative AI in Jewelry- papers, datasets, open-source models and tools included!

    Jewelry is one of the, if not the, hardest categories for AI image generation. Reflective metals, facet edges, prong geometry, and gemstone refraction all get destroyed by standard VAE compression ...

    Reddit - Artificial Intelligence · 4 days ago
  19. 19

    [2603.19439] Subspace Projection Methods for Fast Spectral Embeddings of Evolving Graphs

    Abstract page for arXiv paper 2603.19439: Subspace Projection Methods for Fast Spectral Embeddings of Evolving Graphs

    arXiv - Machine Learning · 4 days ago
  20. 20

    [D] Solving the "Liquid-Solid Interface" Problem: 116 High-Fidelity Datasets of Coastal Physics (Waves, Saturated Sand, Light Transport)

    Modern generative models (Sora, Runway, Kling) still struggle with the complex physics of the shoreline. I’ve spent months capturing 116 datasets from the Arabian Sea to document phenomena that are...

    Reddit - Machine Learning · 5 days ago

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime