OpenAI talks about not talking about goblins | The Verge
References to goblins and gremlins spiked with the release of GPT-5.1’s ‘Nerdy’ personality, and then spread to other models.
ML algorithms, training, and inference
References to goblins and gremlins spiked with the release of GPT-5.1’s ‘Nerdy’ personality, and then spread to other models.
These are the top open and closed model: Opus 4.7, GPT-5.5 Pro, DeepSeek V4, GLM-5.1 and Gemini 3.1 Pro. They both show similar performan...
Caught between fears of job loss and social stigma, Gen Z’s opinions of AI are hitting new lows.
Abstract page for arXiv paper 2603.10047: Toward Epistemic Stability: Engineering Consistent Procedures for Industrial LLM Hallucination ...
Abstract page for arXiv paper 2603.09030: PlayWorld: Learning Robot World Models from Autonomous Play
Abstract page for arXiv paper 2602.08392: ST-BiBench: Benchmarking Multi-Stream Multimodal Coordination in Bimanual Embodied Tasks for MLLMs
Abstract page for arXiv paper 2601.11109: Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning
Abstract page for arXiv paper 2601.08565: Rewriting Video: Text-Driven Reauthoring of Video Footage
Abstract page for arXiv paper 2512.18388: Exploration vs. Fixation: Scaffolding Divergent and Convergent Thinking for Human-AI Co-Creatio...
Abstract page for arXiv paper 2601.00263: Parallel Universes, Parallel Languages: A Comprehensive Study on LLM-based Multilingual Counter...
Abstract page for arXiv paper 2512.11919: A fine-grained look at causal effects in causal spaces
Abstract page for arXiv paper 2510.15746: LLMs Judge Themselves: A Game-Theoretic Framework for Human-Aligned Evaluation
Abstract page for arXiv paper 2511.06448: When AI Agents Collude Online: Financial Fraud Risks by Collaborative LLM Agents on Social Plat...
Abstract page for arXiv paper 2511.06391: HatePrototypes: Interpretable and Transferable Representations for Implicit and Explicit Hate S...
Abstract page for arXiv paper 2510.25890: ATLAS: A Layered Constraint-Guided Framework for Structured Artifact Generation in LLM-Assisted...
Abstract page for arXiv paper 2510.15148: XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
Abstract page for arXiv paper 2510.13829: A Linguistics-Aware LLM Watermarking via Syntactic Predictability
Abstract page for arXiv paper 2510.06800: FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipe...
Abstract page for arXiv paper 2509.24186: Measuring Competency, Not Performance: Item-Aware Evaluation Across Medical Benchmarks
Abstract page for arXiv paper 2509.23279: Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing
Abstract page for arXiv paper 2509.22258: Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks
Abstract page for arXiv paper 2509.05892: Challenges in Deep Learning-Based Small Organ Segmentation: A Benchmarking Perspective for Medi...
Abstract page for arXiv paper 2506.13130: ZINA: Multimodal Fine-grained Hallucination Detection and Editing
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime