Top Data Science This Month

The most engaging data science content from this month, curated by AI News.

This Week This Month Guide Trending
  1. 1

    [2511.23071] Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding

    Abstract page for arXiv paper 2511.23071: Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding

    arXiv - AI · 28 days ago
  2. 2

    [2605.08019] Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners

    Abstract page for arXiv paper 2605.08019: Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners

    arXiv - AI · about 9 hours ago
  3. 3

    Trained a Qwen2.5-0.5B-Instruct bf16 model on Reddit post summarization task with GRPO [P]

    So, a few days back I shared a post where I trained a tiny Qwen2.5-0.5B-Instruct model on smoltldr (reddit post summarization dataset of 2k rows), to output summaries of about 64 max length using R...

    Reddit - Machine Learning · 28 days ago
  4. 4

    [2604.21108] Machine learning and digital pragmatics: Which word category influences emoji use most?

    Abstract page for arXiv paper 2604.21108: Machine learning and digital pragmatics: Which word category influences emoji use most?

    arXiv - Machine Learning · 17 days ago
  5. 5

    [2511.09376] From Decision Trees to Boolean Logic: A Fast and Unified SHAP Algorithm

    Abstract page for arXiv paper 2511.09376: From Decision Trees to Boolean Logic: A Fast and Unified SHAP Algorithm

    arXiv - Machine Learning · 27 days ago
  6. 6

    [2603.18492] AIMER: Calibration-Free Task-Agnostic MoE Pruning

    Abstract page for arXiv paper 2603.18492: AIMER: Calibration-Free Task-Agnostic MoE Pruning

    arXiv - Machine Learning · 27 days ago
  7. 7

    [2505.09368] RobustSpring: Benchmarking Robustness to Image Corruptions for Optical Flow, Scene Flow and Stereo

    Abstract page for arXiv paper 2505.09368: RobustSpring: Benchmarking Robustness to Image Corruptions for Optical Flow, Scene Flow and Stereo

    arXiv - Machine Learning · 27 days ago
  8. 8

    6 Months Using AI for Actual Work: What's Incredible, What's Overhyped, and What's Quietly Dangerous

    Six months ago I committed to using AI tools for everything I possibly could in my work. Every day, every task, every workflow. Here's the honest report as of April 2026. What's Genuinely Incredibl...

    Reddit - Artificial Intelligence · about 1 month ago
  9. 9

    [2507.07067] How to Bridge the Sim-to-Real Gap in Digital Twin-Aided Telecommunication Networks

    Abstract page for arXiv paper 2507.07067: How to Bridge the Sim-to-Real Gap in Digital Twin-Aided Telecommunication Networks

    arXiv - Machine Learning · 27 days ago
  10. 10

    [2604.21753] Neural surrogates for crystal growth dynamics with variable supersaturation: explicit vs. implicit conditioning

    Abstract page for arXiv paper 2604.21753: Neural surrogates for crystal growth dynamics with variable supersaturation: explicit vs. implicit conditioning

    arXiv - Machine Learning · 17 days ago
  11. 11

    [2510.02050] Multidata Causal Discovery for Statistical Hurricane Intensity Forecasting

    Abstract page for arXiv paper 2510.02050: Multidata Causal Discovery for Statistical Hurricane Intensity Forecasting

    arXiv - Machine Learning · 27 days ago
  12. 12

    [2604.21789] Compliance Moral Hazard and the Backfiring Mandate

    Abstract page for arXiv paper 2604.21789: Compliance Moral Hazard and the Backfiring Mandate

    arXiv - Machine Learning · 17 days ago
  13. 13

    [2510.02311] Inferring Dynamic Physical Properties from Video Foundation Models

    Abstract page for arXiv paper 2510.02311: Inferring Dynamic Physical Properties from Video Foundation Models

    arXiv - Machine Learning · 27 days ago
  14. 14

    My Intrusion Detection ML Model Failed in Real Lab Testing [D]

    I’m building a small ML-based cyber attack detection project using a self-created lab environment. Setup includes: GNS3 simulated network Kali attacker node Ubuntu victim server Windows normal clie...

    Reddit - Machine Learning · 25 days ago
  15. 15

    SGOCR: A Spatially-Grounded OCR-focused Pipeline & V1 Dataset [P]

    Hello everyone! I've been independently researching & developing small-but-powerful vision-language models (VLMs) and noticed a gap in visual datasets - none were teaching my model to simply gr...

    Reddit - Machine Learning · 21 days ago
  16. 16

    [2604.21893] Revealing Geography-Driven Signals in Zone-Level Claim Frequency Models: An Empirical Study using Environmental and Visual Predictors

    Abstract page for arXiv paper 2604.21893: Revealing Geography-Driven Signals in Zone-Level Claim Frequency Models: An Empirical Study using Environmental and Visual Predictors

    arXiv - Machine Learning · 17 days ago
  17. 17

    [2506.03374] Product Quantization for Surface Soil Similarity

    Abstract page for arXiv paper 2506.03374: Product Quantization for Surface Soil Similarity

    arXiv - Machine Learning · 17 days ago
  18. 18

    [2605.02318] Can Causal Discovery Algorithms Help in Generating Legal Arguments?

    Abstract page for arXiv paper 2605.02318: Can Causal Discovery Algorithms Help in Generating Legal Arguments?

    arXiv - AI · 6 days ago
  19. 19

    [2604.23195] AnalogRetriever: Learning Cross-Modal Representations for Analog Circuit Retrieval

    Abstract page for arXiv paper 2604.23195: AnalogRetriever: Learning Cross-Modal Representations for Analog Circuit Retrieval

    arXiv - AI · 12 days ago
  20. 20

    [2303.03237] Convergence Rates for Non-Log-Concave Sampling and Log-Partition Estimation

    Abstract page for arXiv paper 2303.03237: Convergence Rates for Non-Log-Concave Sampling and Log-Partition Estimation

    arXiv - Machine Learning · 17 days ago
  21. 21

    If AI is about to get 10x smarter, how do we prevent the internet from collapsing under synthetic noise?

    Im all for acceleration. I think the faster we hit AGI the better. but theres a bottleneck nobody here talks about enough-training data. right now we are quietly poisoning the well. More than half ...

    Reddit - Artificial Intelligence · 14 days ago
  22. 22

    [2512.08216] Tumor-anchored deep feature random forests for out-of-distribution detection in lung cancer segmentation

    Abstract page for arXiv paper 2512.08216: Tumor-anchored deep feature random forests for out-of-distribution detection in lung cancer segmentation

    arXiv - Machine Learning · 17 days ago
  23. 23

    [2605.07444] Accelerated and data-efficient flow prediction in stirred tanks via physics-informed learning

    Abstract page for arXiv paper 2605.07444: Accelerated and data-efficient flow prediction in stirred tanks via physics-informed learning

    arXiv - AI · about 9 hours ago
  24. 24

    [2605.07462] The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

    Abstract page for arXiv paper 2605.07462: The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

    arXiv - AI · about 9 hours ago
  25. 25

    Free Registration & $20K Prize Pool: 2nd MLC-SLM Challenge 2026 on Multilingual Speech LLMs [N]

    Hi everyone, The 2nd Multilingual Conversational Speech Language Models Challenge 2026 is now open for registration. This year’s challenge focuses on Speech LLMs for real-world multilingual convers...

    Reddit - Machine Learning · 13 days ago
  26. 26

    [2511.09907] Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis

    Abstract page for arXiv paper 2511.09907: Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis

    arXiv - AI · about 9 hours ago
  27. 27

    [P] QLoRA Fine-Tuning of Qwen2.5-1.5B for CEFR English Proficiency Classification (A1–C2) [P]

    I fine-tuned Qwen2.5-1.5B for multi-class CEFR English proficiency classification using QLoRA (4-bit NF4). The goal was to classify English text into one of the 6 CEFR levels (A1 → C2), which can b...

    Reddit - Machine Learning · 7 days ago
  28. 28

    [2407.04183] Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms

    Abstract page for arXiv paper 2407.04183: Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms

    arXiv - AI · about 9 hours ago
  29. 29

    [2602.02320] A Large-Scale Dataset for Molecular Structure-Language Description via a Rule-Regularized Method

    Abstract page for arXiv paper 2602.02320: A Large-Scale Dataset for Molecular Structure-Language Description via a Rule-Regularized Method

    arXiv - AI · about 9 hours ago
  30. 30

    IJCAI-ECAI 2026: Decision Notification and ChairingTool Status Thread [D]

    Creating a discussion thread for IJCAI-ECAI 2026 final decision notifications. The official paper notification date is April 29, 2026 AoE, so decisions may appear at different local times depending...

    Reddit - Machine Learning · 13 days ago

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime