Independent researcher looking for technical feedback on a paper about a revision-capable language model [P]
Hi everyone! I am an independent researcher working on Reviser, a language model that generates through cursor-relative edit actions on a...
GPT, Claude, Gemini, and other LLMs
Hi everyone! I am an independent researcher working on Reviser, a language model that generates through cursor-relative edit actions on a...
After 4.7 was released, I gave it a try. A few things that really concern me: 1. It confidently hallucinates. My work involves writing co...
I’ve been a big fan of Claude and was planning to the max plan up until about 10 days ago when it became a lot dumber and constantly made...
Abstract page for arXiv paper 2510.16688: Pursuing Minimal Sufficiency in Spatial Reasoning
Abstract page for arXiv paper 2510.00507: Graph2Eval: Automatic Multimodal Task Generation for Agents via Knowledge Graphs
Abstract page for arXiv paper 2509.25149: Pretraining Large Language Models with NVFP4
Abstract page for arXiv paper 2510.00177: PrefDisco: Benchmarking Proactive Personalized Reasoning
Abstract page for arXiv paper 2509.24210: BeyondBench: Contamination-Resistant Evaluation of Reasoning in Language Models
Abstract page for arXiv paper 2509.23886: Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer
Abstract page for arXiv paper 2509.20321: Conversational Speech Reveals Structural Robustness Failures in SpeechLLM Backbones
Abstract page for arXiv paper 2508.06249: In-Training Defenses against Emergent Misalignment in Language Models
Abstract page for arXiv paper 2507.01785: MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining
Abstract page for arXiv paper 2506.23508: Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective
Abstract page for arXiv paper 2506.01062: SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models
Abstract page for arXiv paper 2505.19255: VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
Abstract page for arXiv paper 2504.04372: Assessing the Impact of Code Changes on the Fault Localizability of Large Language Models
Abstract page for arXiv paper 2602.00485: Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models
Abstract page for arXiv paper 2601.03604: Interleaved Tool-Call Reasoning for Protein Function Understanding
Abstract page for arXiv paper 2512.10534: Achieving Olympia-Level Geometry Large Language Model Agent via Complexity Boosting Reinforceme...
Abstract page for arXiv paper 2601.22571: PerfGuard: A Performance-Aware Agent for Visual Content Generation
Abstract page for arXiv paper 2512.14106: HydroGEM: A Self Supervised Zero Shot Hybrid TCN Transformer Foundation Model for Continental S...
Abstract page for arXiv paper 2512.07081: ClinNoteAgents: An LLM Multi-Agent System for Predicting and Interpreting Heart Failure 30-Day ...
Abstract page for arXiv paper 2505.13770: Ice Cream Doesn't Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Infe...
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime