Machine Learning

ML algorithms, training, and inference

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

The loss curve said tie. The judges said otherwise. Seeking replication for an early LLM training result [R]

TL;DR - I've written two novel functions that shape the training signal for LLMs. Early tests show people prefer responses from models tr...

Reddit - Machine Learning · 1 min · about 2 hours ago

Machine Learning

Fast experiment on T4 GPU. Self play training on Dark Hex (Colab notebook) [P]

Last week I run a fun experiment on Dark Hex. Here's a visualization of two iterations (1800 vs 1900) of agent playing agains each other ...

Reddit - Machine Learning · 1 min · about 2 hours ago

Machine Learning

Dynamic batching for Encoder-Decoder MT training or generation when long sequence caps the batch size [P]

I built a small pytorch sampler called dynabatch after facing this specific batching issue while fine tuning a NLLB-200 600M model. Train...

Reddit - Machine Learning · 1 min · about 3 hours ago

All Content

Llms

[2604.03888] PolySwarm: A Multi-Agent Large Language Model Framework for Prediction Market Trading and Latency Arbitrage

Abstract page for arXiv paper 2604.03888: PolySwarm: A Multi-Agent Large Language Model Framework for Prediction Market Trading and Laten...

arXiv - AI · 4 min · 21 days ago

Llms

[2604.03893] FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning

Abstract page for arXiv paper 2604.03893: FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning

arXiv - AI · 4 min · 21 days ago

Llms

[2604.03820] Affording Process Auditability with QualAnalyzer: An Atomistic LLM Analysis Tool for Qualitative Research

Abstract page for arXiv paper 2604.03820: Affording Process Auditability with QualAnalyzer: An Atomistic LLM Analysis Tool for Qualitativ...

arXiv - AI · 3 min · 21 days ago

Llms

[2604.03742] Structured Multi-Criteria Evaluation of Large Language Models with Fuzzy Analytic Hierarchy Process and DualJudge

Abstract page for arXiv paper 2604.03742: Structured Multi-Criteria Evaluation of Large Language Models with Fuzzy Analytic Hierarchy Pro...

arXiv - AI · 4 min · 21 days ago

Llms

[2604.03675] PRAISE: Prefix-Based Rollout Reuse in Agentic Search Training

Abstract page for arXiv paper 2604.03675: PRAISE: Prefix-Based Rollout Reuse in Agentic Search Training

arXiv - AI · 3 min · 21 days ago

Llms

[2604.03660] TableVision: A Large-Scale Benchmark for Spatially Grounded Reasoning over Complex Hierarchical Tables

Abstract page for arXiv paper 2604.03660: TableVision: A Large-Scale Benchmark for Spatially Grounded Reasoning over Complex Hierarchical...

arXiv - AI · 4 min · 21 days ago

Llms

[2604.03656] Beyond Retrieval: Modeling Confidence Decay and Deterministic Agentic Platforms in Generative Engine Optimization

Abstract page for arXiv paper 2604.03656: Beyond Retrieval: Modeling Confidence Decay and Deterministic Agentic Platforms in Generative E...

arXiv - AI · 4 min · 21 days ago

Llms

[2604.03631] Single-agent vs. Multi-agents for Automated Video Analysis of On-Screen Collaborative Learning Behaviors

Abstract page for arXiv paper 2604.03631: Single-agent vs. Multi-agents for Automated Video Analysis of On-Screen Collaborative Learning ...

arXiv - AI · 4 min · 21 days ago

Llms

[2604.03630] A Multimodal Foundation Model of Spatial Transcriptomics and Histology for Biological Discovery and Clinical Prediction

Abstract page for arXiv paper 2604.03630: A Multimodal Foundation Model of Spatial Transcriptomics and Histology for Biological Discovery...

arXiv - AI · 4 min · 21 days ago

Llms

[2604.03589] Entropy and Attention Dynamics in Small Language Models: A Trace-Level Structural Analysis on the TruthfulQA Benchmark

Abstract page for arXiv paper 2604.03589: Entropy and Attention Dynamics in Small Language Models: A Trace-Level Structural Analysis on t...

arXiv - AI · 4 min · 21 days ago

Machine Learning

[2604.03571] Selective Forgetting for Large Reasoning Models

Abstract page for arXiv paper 2604.03571: Selective Forgetting for Large Reasoning Models

arXiv - AI · 4 min · 21 days ago

Llms

[2604.03557] When Do Hallucinations Arise? A Graph Perspective on the Evolution of Path Reuse and Path Compression

Abstract page for arXiv paper 2604.03557: When Do Hallucinations Arise? A Graph Perspective on the Evolution of Path Reuse and Path Compr...

arXiv - AI · 3 min · 21 days ago

Machine Learning

[2604.03527] Explainable Model Routing for Agentic Workflows

Abstract page for arXiv paper 2604.03527: Explainable Model Routing for Agentic Workflows

arXiv - AI · 3 min · 21 days ago

Llms

[2604.03524] Structural Rigidity and the 57-Token Predictive Window: A Physical Framework for Inference-Layer Governability in Large Language Models

Abstract page for arXiv paper 2604.03524: Structural Rigidity and the 57-Token Predictive Window: A Physical Framework for Inference-Laye...

arXiv - AI · 4 min · 21 days ago

Machine Learning

[2604.03506] BioAlchemy: Distilling Biological Literature into Reasoning-Ready Reinforcement Learning Training Data

Abstract page for arXiv paper 2604.03506: BioAlchemy: Distilling Biological Literature into Reasoning-Ready Reinforcement Learning Traini...

arXiv - AI · 4 min · 21 days ago

Llms

[2604.03498] Resource-Conscious Modeling for Next- Day Discharge Prediction Using Clinical Notes

Abstract page for arXiv paper 2604.03498: Resource-Conscious Modeling for Next- Day Discharge Prediction Using Clinical Notes

arXiv - AI · 3 min · 21 days ago

Machine Learning

[2604.03393] TABQAWORLD: Optimizing Multimodal Reasoning for Multi-Turn Table Question Answering

Abstract page for arXiv paper 2604.03393: TABQAWORLD: Optimizing Multimodal Reasoning for Multi-Turn Table Question Answering

arXiv - AI · 4 min · 21 days ago

Machine Learning

[2604.03387] Hume's Representational Conditions for Causal Judgment: What Bayesian Formalization Abstracted Away

Abstract page for arXiv paper 2604.03387: Hume's Representational Conditions for Causal Judgment: What Bayesian Formalization Abstracted ...

arXiv - AI · 3 min · 21 days ago

Llms

[2604.03376] VERT: Reliable LLM Judges for Radiology Report Evaluation

Abstract page for arXiv paper 2604.03376: VERT: Reliable LLM Judges for Radiology Report Evaluation

arXiv - AI · 4 min · 21 days ago

Llms

[2604.03356] Evaluating Artificial Intelligence Through a Christian Understanding of Human Flourishing

Abstract page for arXiv paper 2604.03356: Evaluating Artificial Intelligence Through a Christian Understanding of Human Flourishing

arXiv - AI · 3 min · 21 days ago

Previous Page 263 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Machine Learning

Top This Week

The loss curve said tie. The judges said otherwise. Seeking replication for an early LLM training result [R]

Fast experiment on T4 GPU. Self play training on Dark Hex (Colab notebook) [P]

Dynamic batching for Encoder-Decoder MT training or generation when long sequence caps the batch size [P]

All Content

[2604.03888] PolySwarm: A Multi-Agent Large Language Model Framework for Prediction Market Trading and Latency Arbitrage

[2604.03893] FeynmanBench: Benchmarking Multimodal LLMs on Diagrammatic Physics Reasoning

[2604.03820] Affording Process Auditability with QualAnalyzer: An Atomistic LLM Analysis Tool for Qualitative Research

[2604.03742] Structured Multi-Criteria Evaluation of Large Language Models with Fuzzy Analytic Hierarchy Process and DualJudge

[2604.03675] PRAISE: Prefix-Based Rollout Reuse in Agentic Search Training

[2604.03660] TableVision: A Large-Scale Benchmark for Spatially Grounded Reasoning over Complex Hierarchical Tables

[2604.03656] Beyond Retrieval: Modeling Confidence Decay and Deterministic Agentic Platforms in Generative Engine Optimization

[2604.03631] Single-agent vs. Multi-agents for Automated Video Analysis of On-Screen Collaborative Learning Behaviors

[2604.03630] A Multimodal Foundation Model of Spatial Transcriptomics and Histology for Biological Discovery and Clinical Prediction

[2604.03589] Entropy and Attention Dynamics in Small Language Models: A Trace-Level Structural Analysis on the TruthfulQA Benchmark

[2604.03571] Selective Forgetting for Large Reasoning Models

[2604.03557] When Do Hallucinations Arise? A Graph Perspective on the Evolution of Path Reuse and Path Compression

[2604.03527] Explainable Model Routing for Agentic Workflows

[2604.03524] Structural Rigidity and the 57-Token Predictive Window: A Physical Framework for Inference-Layer Governability in Large Language Models

[2604.03506] BioAlchemy: Distilling Biological Literature into Reasoning-Ready Reinforcement Learning Training Data

[2604.03498] Resource-Conscious Modeling for Next- Day Discharge Prediction Using Clinical Notes

[2604.03393] TABQAWORLD: Optimizing Multimodal Reasoning for Multi-Turn Table Question Answering

[2604.03387] Hume's Representational Conditions for Causal Judgment: What Bayesian Formalization Abstracted Away

[2604.03376] VERT: Reliable LLM Judges for Radiology Report Evaluation

[2604.03356] Evaluating Artificial Intelligence Through a Christian Understanding of Human Flourishing

Related Topics

Stay updated with AI News