Top Data Science This Month

The most engaging data science content from this month, curated by AI News.

This Week This Month Guide Trending

1

[2511.23071] Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding

Abstract page for arXiv paper 2511.23071: Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding

arXiv - AI · 28 days ago
2

[2605.08019] Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners

Abstract page for arXiv paper 2605.08019: Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners

arXiv - AI · about 9 hours ago
3

Trained a Qwen2.5-0.5B-Instruct bf16 model on Reddit post summarization task with GRPO [P]

So, a few days back I shared a post where I trained a tiny Qwen2.5-0.5B-Instruct model on smoltldr (reddit post summarization dataset of 2k rows), to output summaries of about 64 max length using R...

Reddit - Machine Learning · 28 days ago
4

[2604.21108] Machine learning and digital pragmatics: Which word category influences emoji use most?

Abstract page for arXiv paper 2604.21108: Machine learning and digital pragmatics: Which word category influences emoji use most?

arXiv - Machine Learning · 17 days ago
5

[2511.09376] From Decision Trees to Boolean Logic: A Fast and Unified SHAP Algorithm

Abstract page for arXiv paper 2511.09376: From Decision Trees to Boolean Logic: A Fast and Unified SHAP Algorithm

arXiv - Machine Learning · 27 days ago
6

[2603.18492] AIMER: Calibration-Free Task-Agnostic MoE Pruning

Abstract page for arXiv paper 2603.18492: AIMER: Calibration-Free Task-Agnostic MoE Pruning

arXiv - Machine Learning · 27 days ago
7

[2505.09368] RobustSpring: Benchmarking Robustness to Image Corruptions for Optical Flow, Scene Flow and Stereo

Abstract page for arXiv paper 2505.09368: RobustSpring: Benchmarking Robustness to Image Corruptions for Optical Flow, Scene Flow and Stereo

arXiv - Machine Learning · 27 days ago
8

6 Months Using AI for Actual Work: What's Incredible, What's Overhyped, and What's Quietly Dangerous

Six months ago I committed to using AI tools for everything I possibly could in my work. Every day, every task, every workflow. Here's the honest report as of April 2026. What's Genuinely Incredibl...

Reddit - Artificial Intelligence · about 1 month ago
9

[2507.07067] How to Bridge the Sim-to-Real Gap in Digital Twin-Aided Telecommunication Networks

Abstract page for arXiv paper 2507.07067: How to Bridge the Sim-to-Real Gap in Digital Twin-Aided Telecommunication Networks

arXiv - Machine Learning · 27 days ago
10

[2604.21753] Neural surrogates for crystal growth dynamics with variable supersaturation: explicit vs. implicit conditioning

Abstract page for arXiv paper 2604.21753: Neural surrogates for crystal growth dynamics with variable supersaturation: explicit vs. implicit conditioning

arXiv - Machine Learning · 17 days ago
11

[2510.02050] Multidata Causal Discovery for Statistical Hurricane Intensity Forecasting

Abstract page for arXiv paper 2510.02050: Multidata Causal Discovery for Statistical Hurricane Intensity Forecasting

arXiv - Machine Learning · 27 days ago
12

[2604.21789] Compliance Moral Hazard and the Backfiring Mandate

Abstract page for arXiv paper 2604.21789: Compliance Moral Hazard and the Backfiring Mandate

arXiv - Machine Learning · 17 days ago
13

[2510.02311] Inferring Dynamic Physical Properties from Video Foundation Models

Abstract page for arXiv paper 2510.02311: Inferring Dynamic Physical Properties from Video Foundation Models

arXiv - Machine Learning · 27 days ago
14

My Intrusion Detection ML Model Failed in Real Lab Testing [D]

I’m building a small ML-based cyber attack detection project using a self-created lab environment. Setup includes: GNS3 simulated network Kali attacker node Ubuntu victim server Windows normal clie...

Reddit - Machine Learning · 25 days ago
15

SGOCR: A Spatially-Grounded OCR-focused Pipeline & V1 Dataset [P]

Hello everyone! I've been independently researching & developing small-but-powerful vision-language models (VLMs) and noticed a gap in visual datasets - none were teaching my model to simply gr...

Reddit - Machine Learning · 21 days ago
16

[2604.21893] Revealing Geography-Driven Signals in Zone-Level Claim Frequency Models: An Empirical Study using Environmental and Visual Predictors

Abstract page for arXiv paper 2604.21893: Revealing Geography-Driven Signals in Zone-Level Claim Frequency Models: An Empirical Study using Environmental and Visual Predictors

arXiv - Machine Learning · 17 days ago
17

[2506.03374] Product Quantization for Surface Soil Similarity

Abstract page for arXiv paper 2506.03374: Product Quantization for Surface Soil Similarity

arXiv - Machine Learning · 17 days ago
18

[2605.02318] Can Causal Discovery Algorithms Help in Generating Legal Arguments?

Abstract page for arXiv paper 2605.02318: Can Causal Discovery Algorithms Help in Generating Legal Arguments?

arXiv - AI · 6 days ago
19

[2604.23195] AnalogRetriever: Learning Cross-Modal Representations for Analog Circuit Retrieval

Abstract page for arXiv paper 2604.23195: AnalogRetriever: Learning Cross-Modal Representations for Analog Circuit Retrieval

arXiv - AI · 12 days ago
20

[2303.03237] Convergence Rates for Non-Log-Concave Sampling and Log-Partition Estimation

Abstract page for arXiv paper 2303.03237: Convergence Rates for Non-Log-Concave Sampling and Log-Partition Estimation

arXiv - Machine Learning · 17 days ago
21

If AI is about to get 10x smarter, how do we prevent the internet from collapsing under synthetic noise?

Im all for acceleration. I think the faster we hit AGI the better. but theres a bottleneck nobody here talks about enough-training data. right now we are quietly poisoning the well. More than half ...

Reddit - Artificial Intelligence · 14 days ago
22

[2512.08216] Tumor-anchored deep feature random forests for out-of-distribution detection in lung cancer segmentation

Abstract page for arXiv paper 2512.08216: Tumor-anchored deep feature random forests for out-of-distribution detection in lung cancer segmentation

arXiv - Machine Learning · 17 days ago
23

[2605.07444] Accelerated and data-efficient flow prediction in stirred tanks via physics-informed learning

Abstract page for arXiv paper 2605.07444: Accelerated and data-efficient flow prediction in stirred tanks via physics-informed learning

arXiv - AI · about 9 hours ago
24

[2605.07462] The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

Abstract page for arXiv paper 2605.07462: The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

arXiv - AI · about 9 hours ago
25

Free Registration & $20K Prize Pool: 2nd MLC-SLM Challenge 2026 on Multilingual Speech LLMs [N]

Hi everyone, The 2nd Multilingual Conversational Speech Language Models Challenge 2026 is now open for registration. This year’s challenge focuses on Speech LLMs for real-world multilingual convers...

Reddit - Machine Learning · 13 days ago
26

[2511.09907] Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis

Abstract page for arXiv paper 2511.09907: Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis

arXiv - AI · about 9 hours ago
27

[P] QLoRA Fine-Tuning of Qwen2.5-1.5B for CEFR English Proficiency Classification (A1–C2) [P]

I fine-tuned Qwen2.5-1.5B for multi-class CEFR English proficiency classification using QLoRA (4-bit NF4). The goal was to classify English text into one of the 6 CEFR levels (A1 → C2), which can b...

Reddit - Machine Learning · 7 days ago
28

[2407.04183] Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms

Abstract page for arXiv paper 2407.04183: Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms

arXiv - AI · about 9 hours ago
29

[2602.02320] A Large-Scale Dataset for Molecular Structure-Language Description via a Rule-Regularized Method

Abstract page for arXiv paper 2602.02320: A Large-Scale Dataset for Molecular Structure-Language Description via a Rule-Regularized Method

arXiv - AI · about 9 hours ago
30

IJCAI-ECAI 2026: Decision Notification and ChairingTool Status Thread [D]

Creating a discussion thread for IJCAI-ECAI 2026 final decision notifications. The official paper notification date is April 29, 2026 AoE, so decisions may appear at different local times depending...

Reddit - Machine Learning · 13 days ago

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime