The loss curve said tie. The judges said otherwise. Seeking replication for an early LLM training result [R]
TL;DR - I've written two novel functions that shape the training signal for LLMs. Early tests show people prefer responses from models tr...
ML algorithms, training, and inference
TL;DR - I've written two novel functions that shape the training signal for LLMs. Early tests show people prefer responses from models tr...
Last week I run a fun experiment on Dark Hex. Here's a visualization of two iterations (1800 vs 1900) of agent playing agains each other ...
I built a small pytorch sampler called dynabatch after facing this specific batching issue while fine tuning a NLLB-200 600M model. Train...
Abstract page for arXiv paper 2604.04528: Receding-Horizon Control via Drifting Models
Abstract page for arXiv paper 2604.04482: Scalable and Explainable Learner-Video Interaction Prediction using Multimodal Large Language M...
Abstract page for arXiv paper 2604.04468: What Makes a Sale? Rethinking End-to-End Seller--Buyer Retail Dynamics with LLM Agents
Abstract page for arXiv paper 2604.04448: PSY-STEP: Structuring Therapeutic Targets and Action Sequences for Proactive Counseling Dialogu...
Abstract page for arXiv paper 2604.04403: MolDA: Molecular Understanding and Generation via Large Language Diffusion Model
Abstract page for arXiv paper 2604.04383: Optimizing Service Operations via LLM-Powered Multi-Agent Simulation
Abstract page for arXiv paper 2604.04344: Domain-Contextualized Inference: A Computable Graph Architecture for Explicit-Domain Reasoning
Abstract page for arXiv paper 2604.04297: PanLUNA: An Efficient and Robust Query-Unified Multimodal Model for Edge Biosignal Intelligence
Abstract page for arXiv paper 2604.04281: Preservation Is Not Enough for Width Growth: Regime-Sensitive Selection of Dense LM Warm Starts
Abstract page for arXiv paper 2604.04274: InferenceEvolve: Towards Automated Causal Effect Estimators through Self-Evolving AI
Abstract page for arXiv paper 2604.04220: TimeSeek: Temporal Reliability of Agentic Forecasters
Abstract page for arXiv paper 2604.04190: Schema-Aware Planning and Hybrid Knowledge Toolset for Reliable Knowledge Graph Triple Verifica...
Abstract page for arXiv paper 2604.04182: Comparative reversal learning reveals rigid adaptation in LLMs under non-stationary uncertainty
Abstract page for arXiv paper 2604.04171: A Model of Understanding in Deep Learning Systems
Abstract page for arXiv paper 2604.04157: Readable Minds: Emergent Theory-of-Mind-Like Behavior in LLM Poker Agents
Abstract page for arXiv paper 2604.04145: Solar-VLM: Multimodal Vision-Language Models for Augmented Solar Power Forecasting
Abstract page for arXiv paper 2604.04131: Profile-Then-Reason: Bounded Semantic Complexity for Tool-Augmented Language Agents
Abstract page for arXiv paper 2604.04106: InsTraj: Instructing Diffusion Models with Travel Intentions to Generate Real-world Trajectories
Abstract page for arXiv paper 2604.03976: Quantifying Trust: Financial Risk Management for Trustworthy AI Agents
Abstract page for arXiv paper 2604.03898: LLM-Agent-based Social Simulation for Attitude Diffusion
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime