Top Data Science This Month
The most engaging data science content from this month, curated by AI News.
-
1
[2511.23071] Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding
Abstract page for arXiv paper 2511.23071: Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding
arXiv - AI · 28 days ago -
2
[2605.08019] Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners
Abstract page for arXiv paper 2605.08019: Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners
arXiv - AI · about 9 hours ago -
3
Trained a Qwen2.5-0.5B-Instruct bf16 model on Reddit post summarization task with GRPO [P]
So, a few days back I shared a post where I trained a tiny Qwen2.5-0.5B-Instruct model on smoltldr (reddit post summarization dataset of 2k rows), to output summaries of about 64 max length using R...
Reddit - Machine Learning · 28 days ago -
4
[2604.21108] Machine learning and digital pragmatics: Which word category influences emoji use most?
Abstract page for arXiv paper 2604.21108: Machine learning and digital pragmatics: Which word category influences emoji use most?
arXiv - Machine Learning · 17 days ago -
5
[2511.09376] From Decision Trees to Boolean Logic: A Fast and Unified SHAP Algorithm
Abstract page for arXiv paper 2511.09376: From Decision Trees to Boolean Logic: A Fast and Unified SHAP Algorithm
arXiv - Machine Learning · 27 days ago -
6
[2603.18492] AIMER: Calibration-Free Task-Agnostic MoE Pruning
Abstract page for arXiv paper 2603.18492: AIMER: Calibration-Free Task-Agnostic MoE Pruning
arXiv - Machine Learning · 27 days ago -
7
[2505.09368] RobustSpring: Benchmarking Robustness to Image Corruptions for Optical Flow, Scene Flow and Stereo
Abstract page for arXiv paper 2505.09368: RobustSpring: Benchmarking Robustness to Image Corruptions for Optical Flow, Scene Flow and Stereo
arXiv - Machine Learning · 27 days ago -
8
6 Months Using AI for Actual Work: What's Incredible, What's Overhyped, and What's Quietly Dangerous
Six months ago I committed to using AI tools for everything I possibly could in my work. Every day, every task, every workflow. Here's the honest report as of April 2026. What's Genuinely Incredibl...
Reddit - Artificial Intelligence · about 1 month ago -
9
[2507.07067] How to Bridge the Sim-to-Real Gap in Digital Twin-Aided Telecommunication Networks
Abstract page for arXiv paper 2507.07067: How to Bridge the Sim-to-Real Gap in Digital Twin-Aided Telecommunication Networks
arXiv - Machine Learning · 27 days ago -
10
[2604.21753] Neural surrogates for crystal growth dynamics with variable supersaturation: explicit vs. implicit conditioning
Abstract page for arXiv paper 2604.21753: Neural surrogates for crystal growth dynamics with variable supersaturation: explicit vs. implicit conditioning
arXiv - Machine Learning · 17 days ago -
11
[2510.02050] Multidata Causal Discovery for Statistical Hurricane Intensity Forecasting
Abstract page for arXiv paper 2510.02050: Multidata Causal Discovery for Statistical Hurricane Intensity Forecasting
arXiv - Machine Learning · 27 days ago -
12
[2604.21789] Compliance Moral Hazard and the Backfiring Mandate
Abstract page for arXiv paper 2604.21789: Compliance Moral Hazard and the Backfiring Mandate
arXiv - Machine Learning · 17 days ago -
13
[2510.02311] Inferring Dynamic Physical Properties from Video Foundation Models
Abstract page for arXiv paper 2510.02311: Inferring Dynamic Physical Properties from Video Foundation Models
arXiv - Machine Learning · 27 days ago -
14
My Intrusion Detection ML Model Failed in Real Lab Testing [D]
I’m building a small ML-based cyber attack detection project using a self-created lab environment. Setup includes: GNS3 simulated network Kali attacker node Ubuntu victim server Windows normal clie...
Reddit - Machine Learning · 25 days ago -
15
SGOCR: A Spatially-Grounded OCR-focused Pipeline & V1 Dataset [P]
Hello everyone! I've been independently researching & developing small-but-powerful vision-language models (VLMs) and noticed a gap in visual datasets - none were teaching my model to simply gr...
Reddit - Machine Learning · 21 days ago -
16
[2604.21893] Revealing Geography-Driven Signals in Zone-Level Claim Frequency Models: An Empirical Study using Environmental and Visual Predictors
Abstract page for arXiv paper 2604.21893: Revealing Geography-Driven Signals in Zone-Level Claim Frequency Models: An Empirical Study using Environmental and Visual Predictors
arXiv - Machine Learning · 17 days ago -
17
[2506.03374] Product Quantization for Surface Soil Similarity
Abstract page for arXiv paper 2506.03374: Product Quantization for Surface Soil Similarity
arXiv - Machine Learning · 17 days ago -
18
[2605.02318] Can Causal Discovery Algorithms Help in Generating Legal Arguments?
Abstract page for arXiv paper 2605.02318: Can Causal Discovery Algorithms Help in Generating Legal Arguments?
arXiv - AI · 6 days ago -
19
[2604.23195] AnalogRetriever: Learning Cross-Modal Representations for Analog Circuit Retrieval
Abstract page for arXiv paper 2604.23195: AnalogRetriever: Learning Cross-Modal Representations for Analog Circuit Retrieval
arXiv - AI · 12 days ago -
20
[2303.03237] Convergence Rates for Non-Log-Concave Sampling and Log-Partition Estimation
Abstract page for arXiv paper 2303.03237: Convergence Rates for Non-Log-Concave Sampling and Log-Partition Estimation
arXiv - Machine Learning · 17 days ago -
21
If AI is about to get 10x smarter, how do we prevent the internet from collapsing under synthetic noise?
Im all for acceleration. I think the faster we hit AGI the better. but theres a bottleneck nobody here talks about enough-training data. right now we are quietly poisoning the well. More than half ...
Reddit - Artificial Intelligence · 14 days ago -
22
[2512.08216] Tumor-anchored deep feature random forests for out-of-distribution detection in lung cancer segmentation
Abstract page for arXiv paper 2512.08216: Tumor-anchored deep feature random forests for out-of-distribution detection in lung cancer segmentation
arXiv - Machine Learning · 17 days ago -
23
[2605.07444] Accelerated and data-efficient flow prediction in stirred tanks via physics-informed learning
Abstract page for arXiv paper 2605.07444: Accelerated and data-efficient flow prediction in stirred tanks via physics-informed learning
arXiv - AI · about 9 hours ago -
24
[2605.07462] The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment
Abstract page for arXiv paper 2605.07462: The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment
arXiv - AI · about 9 hours ago -
25
Free Registration & $20K Prize Pool: 2nd MLC-SLM Challenge 2026 on Multilingual Speech LLMs [N]
Hi everyone, The 2nd Multilingual Conversational Speech Language Models Challenge 2026 is now open for registration. This year’s challenge focuses on Speech LLMs for real-world multilingual convers...
Reddit - Machine Learning · 13 days ago -
26
[2511.09907] Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis
Abstract page for arXiv paper 2511.09907: Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis
arXiv - AI · about 9 hours ago -
27
[P] QLoRA Fine-Tuning of Qwen2.5-1.5B for CEFR English Proficiency Classification (A1–C2) [P]
I fine-tuned Qwen2.5-1.5B for multi-class CEFR English proficiency classification using QLoRA (4-bit NF4). The goal was to classify English text into one of the 6 CEFR levels (A1 → C2), which can b...
Reddit - Machine Learning · 7 days ago -
28
[2407.04183] Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms
Abstract page for arXiv paper 2407.04183: Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms
arXiv - AI · about 9 hours ago -
29
[2602.02320] A Large-Scale Dataset for Molecular Structure-Language Description via a Rule-Regularized Method
Abstract page for arXiv paper 2602.02320: A Large-Scale Dataset for Molecular Structure-Language Description via a Rule-Regularized Method
arXiv - AI · about 9 hours ago -
30
IJCAI-ECAI 2026: Decision Notification and ChairingTool Status Thread [D]
Creating a discussion thread for IJCAI-ECAI 2026 final decision notifications. The official paper notification date is April 29, 2026 AoE, so decisions may appear at different local times depending...
Reddit - Machine Learning · 13 days ago
Stay updated with AI News
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime