Comparing SVG generation for top models
These are the top open and closed model: Opus 4.7, GPT-5.5 Pro, DeepSeek V4, GLM-5.1 and Gemini 3.1 Pro. They both show similar performan...
GPT, Claude, Gemini, and other LLMs
These are the top open and closed model: Opus 4.7, GPT-5.5 Pro, DeepSeek V4, GLM-5.1 and Gemini 3.1 Pro. They both show similar performan...
Thanks to the incredible feedback on my last post, I’m officially moving away from the "distributed veto" system (where 8 LLM agents argu...
Caught between fears of job loss and social stigma, Gen Z’s opinions of AI are hitting new lows.
Abstract page for arXiv paper 2603.03389: Towards Improved Sentence Representations using Token Graphs
Abstract page for arXiv paper 2603.03320: From We to Me: Theory Informed Narrative Shift with Abductive Reasoning
Abstract page for arXiv paper 2603.03319: Automated Concept Discovery for LLM-as-a-Judge Preference Analysis
Abstract page for arXiv paper 2603.03378: AOI: Turning Failed Trajectories into Training Signals for Autonomous Cloud Diagnosis
Abstract page for arXiv paper 2603.03318: Quantum-Inspired Self-Attention in a Large Language Model
Abstract page for arXiv paper 2603.03314: Towards Self-Robust LLMs: Intrinsic Prompt Noise Resistance via CoIPO
Abstract page for arXiv paper 2603.03313: How does fine-tuning improve sensorimotor representations in large language models?
Abstract page for arXiv paper 2603.03308: Old Habits Die Hard: How Conversational History Geometrically Traps LLMs
Abstract page for arXiv paper 2603.03306: Token-Oriented Object Notation vs JSON: A Benchmark of Plain and Constrained Decoding Generation
Abstract page for arXiv paper 2603.03305: Draft-Conditioned Constrained Decoding for Structured Generation in LLMs
Abstract page for arXiv paper 2603.03303: HumanLM: Simulating Users with State Alignment Beats Response Imitation
Abstract page for arXiv paper 2603.03301: From Exact Hits to Close Enough: Semantic Caching for LLM Embeddings
Abstract page for arXiv paper 2603.03298: TATRA: Training-Free Instance-Adaptive Prompting Through Rephrasing and Aggregation
Abstract page for arXiv paper 2603.03297: TTSR: Test-Time Self-Reflection for Continual Reasoning Improvement
Abstract page for arXiv paper 2603.03296: PlugMem: A Task-Agnostic Plugin Memory Module for LLM Agents
Abstract page for arXiv paper 2603.03295: Language Model Goal Selection Differs from Humans' in an Open-Ended Task
Abstract page for arXiv paper 2603.03294: Fine-Tuning and Evaluating Conversational AI for Agricultural Advisory
Abstract page for arXiv paper 2603.03292: From Conflict to Consensus: Boosting Medical Reasoning via Multi-Round Agentic RAG
Abstract page for arXiv paper 2603.03291: One Bias After Another: Mechanistic Reward Shaping and Persistent Biases in Language Reward Models
Abstract page for arXiv paper 2603.03290: AriadneMem: Threading the Maze of Lifelong Memory for LLM Agents
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime