New case alleging chatbot involvement in mass murder: Bigger disaster, smaller AI involvement
Today, April 29, 2026, a new case, Stacey, et al. v. Altman, et al. was filed in a California federal court against OpenAI, alleging the ...
GPT, Claude, Gemini, and other LLMs
Today, April 29, 2026, a new case, Stacey, et al. v. Altman, et al. was filed in a California federal court against OpenAI, alleging the ...
Abstract page for arXiv paper 2603.09723: RbtAct: Rebuttal as Supervision for Actionable Review Feedback Generation
Abstract page for arXiv paper 2601.21225: MGSM-Pro: A Simple Strategy for Robust Multilingual Mathematical Reasoning Evaluation
Abstract page for arXiv paper 2602.09937: Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis?
Abstract page for arXiv paper 2506.15963: On the Limits of Sparse Autoencoders: A Theoretical Framework and Reweighted Remedy
Abstract page for arXiv paper 2601.16529: SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters fo...
Abstract page for arXiv paper 2601.15160: Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning
Abstract page for arXiv paper 2511.22235: Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon ...
Abstract page for arXiv paper 2511.21471: SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition
Abstract page for arXiv paper 2511.05854: Can a Small Model Learn to Look Before It Leaps? Dynamic Learning and Proactive Correction for ...
Abstract page for arXiv paper 2510.26905: Cognition Envelopes for Bounded Decision Making in Autonomous UAS Operations
Abstract page for arXiv paper 2505.20065: SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety
Abstract page for arXiv paper 2510.09782: The Geometry of Reasoning: Flowing Logics in Representation Space
Abstract page for arXiv paper 2510.07972: SHE: Stepwise Hybrid Examination Reinforcement Learning Framework for E-commerce Search Relevance
Abstract page for arXiv paper 2509.21782: Benchmarking MLLM-based Web Understanding: Reasoning, Robustness and Safety
Abstract page for arXiv paper 2508.03284: ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools
Abstract page for arXiv paper 2505.02888: When Your Own Output Becomes Your Training Data: Noise-to-Meaning Loops and a Formal RSI Trigger
Abstract page for arXiv paper 2507.15796: From Privacy to Trust in the Agentic Era: A Taxonomy of Challenges in Trustworthy Federated Lea...
Abstract page for arXiv paper 2507.15518: HAMLET: A Hierarchical and Adaptive Multi-Agent Framework for Live Embodied Theatrics
Abstract page for arXiv paper 2502.01534: Preference Leakage: A Contamination Problem in LLM-as-a-judge
Abstract page for arXiv paper 2506.08321: LeanTutor: Towards a Verified AI Mathematical Proof Tutor
Abstract page for arXiv paper 2505.21668: R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning
Abstract page for arXiv paper 2505.21281: RLJP: Legal Judgment Prediction via First-Order Logic Rule-enhanced with Large Language Models
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime