The loss curve said tie. The judges said otherwise. Seeking replication for an early LLM training result [R]
TL;DR - I've written two novel functions that shape the training signal for LLMs. Early tests show people prefer responses from models tr...
ML algorithms, training, and inference
TL;DR - I've written two novel functions that shape the training signal for LLMs. Early tests show people prefer responses from models tr...
Last week I run a fun experiment on Dark Hex. Here's a visualization of two iterations (1800 vs 1900) of agent playing agains each other ...
I built a small pytorch sampler called dynabatch after facing this specific batching issue while fine tuning a NLLB-200 600M model. Train...
Abstract page for arXiv paper 2604.03888: PolySwarm: A Multi-Agent Large Language Model Framework for Prediction Market Trading and Laten...
Abstract page for arXiv paper 2604.03893: FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning
Abstract page for arXiv paper 2604.03820: Affording Process Auditability with QualAnalyzer: An Atomistic LLM Analysis Tool for Qualitativ...
Abstract page for arXiv paper 2604.03742: Structured Multi-Criteria Evaluation of Large Language Models with Fuzzy Analytic Hierarchy Pro...
Abstract page for arXiv paper 2604.03675: PRAISE: Prefix-Based Rollout Reuse in Agentic Search Training
Abstract page for arXiv paper 2604.03660: TableVision: A Large-Scale Benchmark for Spatially Grounded Reasoning over Complex Hierarchical...
Abstract page for arXiv paper 2604.03656: Beyond Retrieval: Modeling Confidence Decay and Deterministic Agentic Platforms in Generative E...
Abstract page for arXiv paper 2604.03631: Single-agent vs. Multi-agents for Automated Video Analysis of On-Screen Collaborative Learning ...
Abstract page for arXiv paper 2604.03630: A Multimodal Foundation Model of Spatial Transcriptomics and Histology for Biological Discovery...
Abstract page for arXiv paper 2604.03589: Entropy and Attention Dynamics in Small Language Models: A Trace-Level Structural Analysis on t...
Abstract page for arXiv paper 2604.03571: Selective Forgetting for Large Reasoning Models
Abstract page for arXiv paper 2604.03557: When Do Hallucinations Arise? A Graph Perspective on the Evolution of Path Reuse and Path Compr...
Abstract page for arXiv paper 2604.03527: Explainable Model Routing for Agentic Workflows
Abstract page for arXiv paper 2604.03524: Structural Rigidity and the 57-Token Predictive Window: A Physical Framework for Inference-Laye...
Abstract page for arXiv paper 2604.03506: BioAlchemy: Distilling Biological Literature into Reasoning-Ready Reinforcement Learning Traini...
Abstract page for arXiv paper 2604.03498: Resource-Conscious Modeling for Next- Day Discharge Prediction Using Clinical Notes
Abstract page for arXiv paper 2604.03393: TABQAWORLD: Optimizing Multimodal Reasoning for Multi-Turn Table Question Answering
Abstract page for arXiv paper 2604.03387: Hume's Representational Conditions for Causal Judgment: What Bayesian Formalization Abstracted ...
Abstract page for arXiv paper 2604.03376: VERT: Reliable LLM Judges for Radiology Report Evaluation
Abstract page for arXiv paper 2604.03356: Evaluating Artificial Intelligence Through a Christian Understanding of Human Flourishing
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime