I tried building a memory-first AI… and ended up discovering smaller models can beat larger ones
Dataset Model Acc F1 Δ vs Log Δ vs Static Avg Params Peak Params Steps Infer ms Size Banking77-20 Logistic TF-IDF 92.37% 0.9230 +0.00pp +...
ML algorithms, training, and inference
Dataset Model Acc F1 Δ vs Log Δ vs Static Avg Params Peak Params Steps Infer ms Size Banking77-20 Logistic TF-IDF 92.37% 0.9230 +0.00pp +...
Muon has quickly been adopted in LLM training, yet we don't see it being talked about in other contexts. Searches for Muon on ConvNets tu...
TL;DR: I built an open-source pipeline that runs Karpathy's autoresearch on SageMaker Spot instances — 25 autonomous ML experiments for $...
Abstract page for arXiv paper 2603.24400: Neural Network Models for Contextual Regression
Abstract page for arXiv paper 2603.24396: Exploring How Fair Model Representations Relate to Fair Recommendations
Abstract page for arXiv paper 2603.24392: Federated fairness-aware classification under differential privacy
Abstract page for arXiv paper 2603.24369: Adaptive decision-making for stochastic service network design
Abstract page for arXiv paper 2603.24323: Connecting Meteorite Spectra to Lunar Surface Composition Using Hyperspectral Imaging and Machi...
Abstract page for arXiv paper 2603.24304: CGRL: Causal-Guided Representation Learning for Graph Out-of-Distribution Generalization
Abstract page for arXiv paper 2603.24239: DVM: Real-Time Kernel Generation for Dynamic AI Models
Abstract page for arXiv paper 2603.24226: UniScale: Synergistic Entire Space Data and Model Scaling for Search Ranking
Abstract page for arXiv paper 2603.24209: HEART-PFL: Stable Personalized Federated Learning under Heterogeneity with Hierarchical Directi...
Abstract page for arXiv paper 2603.24196: Quantum Neural Physics: Solving Partial Differential Equations on Quantum Simulators using Quan...
Abstract page for arXiv paper 2603.24167: Walma: Learning to See Memory Corruption in WebAssembly
Abstract page for arXiv paper 2603.24150: A visual observation on the geometry of UMAP projections of the difference vectors of antonym a...
Abstract page for arXiv paper 2603.24139: Tutor-Student Reinforcement Learning: A Dynamic Curriculum for Robust Deepfake Detection
Abstract page for arXiv paper 2603.24111: Toward a Multi-Layer ML-Based Security Framework for Industrial IoT
Abstract page for arXiv paper 2603.24083: Knowledge-Guided Manipulation Using Multi-Task Reinforcement Learning
Abstract page for arXiv paper 2603.24016: COVTrack++: Learning Open-Vocabulary Multi-Object Tracking from Continuous Videos via a Synergi...
Abstract page for arXiv paper 2603.24054: Hierarchical Spatial-Temporal Graph-Enhanced Model for Map-Matching
Abstract page for arXiv paper 2603.24041: Minimal Sufficient Representations for Self-interpretable Deep Neural Networks
Abstract page for arXiv paper 2603.23974: Machine vision with small numbers of detected photons per inference
Abstract page for arXiv paper 2603.23971: The Price Reversal Phenomenon: When Cheaper Reasoning Models End Up Costing More
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime