Trained a Qwen2.5-0.5B-Instruct bf16 model on Reddit post summarization task with GRPO [P]
So, a few days back I shared a post where I trained a tiny Qwen2.5-0.5B-Instruct model on smoltldr (reddit post summarization dataset of ...
ML algorithms, training, and inference
So, a few days back I shared a post where I trained a tiny Qwen2.5-0.5B-Instruct model on smoltldr (reddit post summarization dataset of ...
Meta is working to build an AI version of its CEO Mark Zuckerberg, which he will use to interact with employees, according to a report fr...
We build our inner voices from the voices we're in dialogue with. Vygotsky established this nearly a century ago. For people in sustained...
Abstract page for arXiv paper 2603.25328: Macroscopic Characteristics of Mixed Traffic Flow with Deep Reinforcement Learning Based Automa...
Abstract page for arXiv paper 2603.24647: Can LLMs Beat Classical Hyperparameter Optimization Algorithms? A Study on autoresearch
Abstract page for arXiv paper 2603.25326: Evaluating Language Models for Harmful Manipulation
Abstract page for arXiv paper 2603.24644: Physics-Informed Neural Network Digital Twin for Dynamic Tray-Wise Modeling of Distillation Col...
Abstract page for arXiv paper 2603.24639: Experiential Reflective Learning for Self-Improving LLM Agents
Abstract page for arXiv paper 2603.25284: SliderQuant: Accurate Post-Training Quantization for LLMs
Abstract page for arXiv paper 2603.25283: A Gait Foundation Model Predicts Multi-System Health Phenotypes from 3D Skeletal Motion
Abstract page for arXiv paper 2603.24638: How unconstrained machine-learning models learn physical symmetries
Abstract page for arXiv paper 2603.25273: Distribution and Clusters Approximations as Abstract Domains in Probabilistic Abstract Interpre...
Abstract page for arXiv paper 2603.25266: Probabilistic Abstract Interpretation on Neural Networks via Grids Approximation
Abstract page for arXiv paper 2603.25158: Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills
Abstract page for arXiv paper 2603.25133: RubricEval: A Rubric-Level Meta-Evaluation Benchmark for LLM Judges in Instruction Following
Abstract page for arXiv paper 2603.25097: ElephantBroker: A Knowledge-Grounded Cognitive Runtime for Trustworthy AI Agents
Abstract page for arXiv paper 2603.25075: Sparse Visual Thought Circuits in Vision-Language Models
Abstract page for arXiv paper 2603.25046: MP-MoE: Matrix Profile-Guided Mixture of Experts for Precipitation Forecasting
Abstract page for arXiv paper 2603.25035: Mechanistically Interpreting Compression in Vision-Language Models
Abstract page for arXiv paper 2603.25031: From Stateless to Situated: Building a Psychological World for LLM-Based Emotional Support
Abstract page for arXiv paper 2603.25022: A Public Theory of Distillation Resistance via Constraint-Coupled Reasoning Architectures
Abstract page for arXiv paper 2603.24967: The Anatomy of Uncertainty in LLMs
Abstract page for arXiv paper 2603.24963: Design Once, Deploy at Scale: Template-Driven ML Development for Large Model Ecosystems
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime