[P] ClaudeFormer: Building a Transformer Out of Claudes — Collaboration Request
I'm looking to work with people interested in math, machine learning, or agentic coding, on creating a multi-agent framework to do fronti...
GPT, Claude, Gemini, and other LLMs
I'm looking to work with people interested in math, machine learning, or agentic coding, on creating a multi-agent framework to do fronti...
Ads are rolling out across the US on ChatGPT’s free tier. I asked OpenAI's bot 500 questions to see what these ads were like and how they...
Three days ago, I clicked the "Deploy OpenClaw In Seconds" button to get an overview of the new service, but I didn't build any automatio...
Abstract page for arXiv paper 2603.25226: WebTestBench: Evaluating Computer-Use Agents towards End-to-End Automated Web Testing
Abstract page for arXiv paper 2603.25196: A Decade-Scale Benchmark Evaluating LLMs' Clinical Practice Guidelines Detection and Adherence ...
Abstract page for arXiv paper 2603.25187: Probing the Lack of Stable Internal Beliefs in LLMs
Abstract page for arXiv paper 2603.25250: Activation Matters: Test-time Activated Negative Labels for OOD Detection with Vision-Language ...
Abstract page for arXiv paper 2603.25164: PIDP-Attack: Combining Prompt Injection with Database Poisoning Attacks on Retrieval-Augmented ...
Abstract page for arXiv paper 2603.25145: Learning to Rank Caption Chains for Video-Text Alignment
Abstract page for arXiv paper 2603.25155: Photon: Speedup Volume Understanding with Efficient Multimodal Large Language Models
Abstract page for arXiv paper 2603.25146: Factors Influencing the Quality of AI-Generated Code: A Synthesis of Empirical Evidence
Abstract page for arXiv paper 2603.25015: Imperative Interference: Social Register Shapes Instruction Topology in Large Language Models
Abstract page for arXiv paper 2603.25112: Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory
Abstract page for arXiv paper 2603.24946: MobileDev-Bench: A Comprehensive Benchmark for Evaluating Language Models on Mobile Application...
Abstract page for arXiv paper 2603.24917: Estimating near-verbatim extraction risk in language models with decoding-constrained beam search
Abstract page for arXiv paper 2603.25099: Large Language Models as Optimization Controllers: Adaptive Continuation for SIMP Topology Opti...
Abstract page for arXiv paper 2603.25063: TopoPilot: Reliable Conversational Workflow Automation for Topological Data Analysis and Visual...
Abstract page for arXiv paper 2603.25056: The System Prompt Is the Attack Surface: How LLM Agent Configuration Shapes Security and Create...
Abstract page for arXiv paper 2603.25052: Closing the Confidence-Faithfulness Gap in Large Language Models
Abstract page for arXiv paper 2603.24989: Learning Rollout from Sampling:An R1-Style Tokenized Traffic Simulation Model
Abstract page for arXiv paper 2603.24986: Rethinking Health Agents: From Siloed AI to Collaborative Decision Mediators
Abstract page for arXiv paper 2603.24940: Evaluating adaptive and generative AI-based feedback and recommendations in a knowledge-graph-i...
Abstract page for arXiv paper 2603.24617: Multi-LLM Query Optimization
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime