Comparing SVG generation for top models
These are the top open and closed model: Opus 4.7, GPT-5.5 Pro, DeepSeek V4, GLM-5.1 and Gemini 3.1 Pro. They both show similar performan...
GPT, Claude, Gemini, and other LLMs
These are the top open and closed model: Opus 4.7, GPT-5.5 Pro, DeepSeek V4, GLM-5.1 and Gemini 3.1 Pro. They both show similar performan...
Thanks to the incredible feedback on my last post, I’m officially moving away from the "distributed veto" system (where 8 LLM agents argu...
Caught between fears of job loss and social stigma, Gen Z’s opinions of AI are hitting new lows.
Abstract page for arXiv paper 2603.03379: MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning
Abstract page for arXiv paper 2603.03612: Why Are Linear RNNs More Parallelizable?
Abstract page for arXiv paper 2603.03371: Sleeper Cell: Injecting Latent Malice Temporal Backdoors into Tool-Using LLMs
Abstract page for arXiv paper 2603.03597: NuMuon: Nuclear-Norm-Constrained Muon for Compressible LLM Training
Abstract page for arXiv paper 2603.03538: Online Learnability of Chain-of-Thought Verifiers: Soundness and Completeness Trade-offs
Abstract page for arXiv paper 2603.03535: Trade-offs in Ensembling, Merging and Routing Among Parameter-Efficient Experts
Abstract page for arXiv paper 2603.03352: Perfect score on IPhO 2025 theory by Gemini agent
Abstract page for arXiv paper 2603.03527: Logit-Level Uncertainty Quantification in Vision-Language Models for Histopathology Image Analysis
Abstract page for arXiv paper 2603.03524: Test-Time Meta-Adaptation with Self-Synthesis
Abstract page for arXiv paper 2603.03517: MMAI Gym for Science: Training Liquid Foundation Models for Drug Discovery
Abstract page for arXiv paper 2603.03332: Fragile Thoughts: How Large Language Models Handle Chain-of-Thought Perturbations
Abstract page for arXiv paper 2603.03330: Certainty robustness: Evaluating LLM stability under self-challenging prompts
Abstract page for arXiv paper 2603.03329: AutoHarness: improving LLM agents by automatically synthesizing a code harness
Abstract page for arXiv paper 2603.03328: StructLens: A Structural Lens for Language Models via Maximum Spanning Trees
Abstract page for arXiv paper 2603.03326: Controllable and explainable personality sliders for LLMs at inference time
Abstract page for arXiv paper 2603.03325: IntPro: A Proxy Agent for Context-Aware Intent Understanding via Retrieval-conditioned Inference
Abstract page for arXiv paper 2603.03324: Controlling Chat Style in Language Models via Single-Direction Editing
Abstract page for arXiv paper 2603.03323: Discern Truth from Falsehood: Reducing Over-Refusal via Contrastive Refinement
Abstract page for arXiv paper 2603.03322: Can Large Language Models Derive New Knowledge? A Dynamic Benchmark for Biological Knowledge Di...
Abstract page for arXiv paper 2603.03321: DIALEVAL: Automated Type-Theoretic Evaluation of LLM Instruction Following
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime