Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

Arc Gate —LLM proxy that hits P=1.00 R=1.00 F1=1.00 on indirect/roleplay prompt injection (beats OpenAI Moderation and LlamaGuard)

Benchmarked on 40 out-of-distribution prompts, indirect requests, roleplay framings, hypothetical scenarios, technical phrasings. The stu...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Llms

Claude can now plug directly into Photoshop, Blender, and Ableton | The Verge

Anthropic has launched a set of connectors for Claude that allow the AI chatbot to tap into popular creative software

The Verge - AI · 4 min · about 2 hours ago

Llms

Built a multiplayer map where you can see everyone's Claude Code activity as creatures battling it out

Hello r/artificial I built this specifically for Claude Code users - every prompt you run feeds a digital pet called a Prompt Creature. T...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

All Content

Llms

[2510.15982] AMiD: Knowledge Distillation for LLMs with $α$-mixture Assistant Distribution

Abstract page for arXiv paper 2510.15982: AMiD: Knowledge Distillation for LLMs with $α$-mixture Assistant Distribution

arXiv - AI · 4 min · about 2 months ago

Llms

[2406.06512] Merlin: A Computed Tomography Vision-Language Foundation Model and Dataset

Abstract page for arXiv paper 2406.06512: Merlin: A Computed Tomography Vision-Language Foundation Model and Dataset

arXiv - AI · 4 min · about 2 months ago

Llms

[2405.15374] Leveraging Large Language Models for Semantic Query Processing in a Scholarly Knowledge Graph

Abstract page for arXiv paper 2405.15374: Leveraging Large Language Models for Semantic Query Processing in a Scholarly Knowledge Graph

arXiv - AI · 4 min · about 2 months ago

Llms

[2509.23405] Planner Aware Path Learning in Diffusion Language Models Training

Abstract page for arXiv paper 2509.23405: Planner Aware Path Learning in Diffusion Language Models Training

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2509.22263] Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning

Abstract page for arXiv paper 2509.22263: Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2509.21465] Talking Trees: Reasoning-Assisted Induction of Decision Trees for Tabular Data

Abstract page for arXiv paper 2509.21465: Talking Trees: Reasoning-Assisted Induction of Decision Trees for Tabular Data

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2509.17874] Deep Hierarchical Learning with Nested Subspace Networks for Large Language Models

Abstract page for arXiv paper 2509.17874: Deep Hierarchical Learning with Nested Subspace Networks for Large Language Models

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.09937] Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis?

Abstract page for arXiv paper 2602.09937: Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis?

arXiv - AI · 4 min · about 2 months ago

Llms

[2506.15963] On the Limits of Sparse Autoencoders: A Theoretical Framework and Reweighted Remedy

Abstract page for arXiv paper 2506.15963: On the Limits of Sparse Autoencoders: A Theoretical Framework and Reweighted Remedy

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2601.16529] SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care

Abstract page for arXiv paper 2601.16529: SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters fo...

arXiv - AI · 3 min · about 2 months ago

Llms

[2601.15160] Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning

Abstract page for arXiv paper 2601.15160: Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning

arXiv - AI · 4 min · about 2 months ago

Llms

[2511.22235] Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation

Abstract page for arXiv paper 2511.22235: Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon ...

arXiv - AI · 4 min · about 2 months ago

Llms

[2511.21471] SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition

Abstract page for arXiv paper 2511.21471: SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition

arXiv - AI · 4 min · about 2 months ago

Llms

[2511.05854] Can a Small Model Learn to Look Before It Leaps? Dynamic Learning and Proactive Correction for Hallucination Detection

Abstract page for arXiv paper 2511.05854: Can a Small Model Learn to Look Before It Leaps? Dynamic Learning and Proactive Correction for ...

arXiv - AI · 4 min · about 2 months ago

Llms

[2510.26905] Cognition Envelopes for Bounded Decision Making in Autonomous UAS Operations

Abstract page for arXiv paper 2510.26905: Cognition Envelopes for Bounded Decision Making in Autonomous UAS Operations

arXiv - AI · 4 min · about 2 months ago

Llms

[2505.20065] SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety

Abstract page for arXiv paper 2505.20065: SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety

arXiv - AI · 4 min · about 2 months ago

Llms

[2510.09782] The Geometry of Reasoning: Flowing Logics in Representation Space

Abstract page for arXiv paper 2510.09782: The Geometry of Reasoning: Flowing Logics in Representation Space

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2510.07972] SHE: Stepwise Hybrid Examination Reinforcement Learning Framework for E-commerce Search Relevance

Abstract page for arXiv paper 2510.07972: SHE: Stepwise Hybrid Examination Reinforcement Learning Framework for E-commerce Search Relevance

arXiv - AI · 4 min · about 2 months ago

Llms

[2509.21782] Benchmarking MLLM-based Web Understanding: Reasoning, Robustness and Safety

Abstract page for arXiv paper 2509.21782: Benchmarking MLLM-based Web Understanding: Reasoning, Robustness and Safety

arXiv - AI · 4 min · about 2 months ago

Llms

[2508.03284] ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools

Abstract page for arXiv paper 2508.03284: ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools

arXiv - AI · 4 min · about 2 months ago

Previous Page 253 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

Arc Gate —LLM proxy that hits P=1.00 R=1.00 F1=1.00 on indirect/roleplay prompt injection (beats OpenAI Moderation and LlamaGuard)

Claude can now plug directly into Photoshop, Blender, and Ableton | The Verge

Built a multiplayer map where you can see everyone's Claude Code activity as creatures battling it out

All Content

[2510.15982] AMiD: Knowledge Distillation for LLMs with $α$-mixture Assistant Distribution

[2406.06512] Merlin: A Computed Tomography Vision-Language Foundation Model and Dataset

[2405.15374] Leveraging Large Language Models for Semantic Query Processing in a Scholarly Knowledge Graph

[2509.23405] Planner Aware Path Learning in Diffusion Language Models Training

[2509.22263] Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning

[2509.21465] Talking Trees: Reasoning-Assisted Induction of Decision Trees for Tabular Data

[2509.17874] Deep Hierarchical Learning with Nested Subspace Networks for Large Language Models

[2602.09937] Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis?

[2506.15963] On the Limits of Sparse Autoencoders: A Theoretical Framework and Reweighted Remedy

[2601.16529] SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care

[2601.15160] Knowledge Graphs are Implicit Reward Models: Path-Derived Signals Enable Compositional Reasoning

[2511.22235] Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation

[2511.21471] SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition

[2511.05854] Can a Small Model Learn to Look Before It Leaps? Dynamic Learning and Proactive Correction for Hallucination Detection

[2510.26905] Cognition Envelopes for Bounded Decision Making in Autonomous UAS Operations

[2505.20065] SafeDPO: A Simple Approach to Direct Preference Optimization with Enhanced Safety

[2510.09782] The Geometry of Reasoning: Flowing Logics in Representation Space

[2510.07972] SHE: Stepwise Hybrid Examination Reinforcement Learning Framework for E-commerce Search Relevance

[2509.21782] Benchmarking MLLM-based Web Understanding: Reasoning, Robustness and Safety

[2508.03284] ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools

Related Topics

Stay updated with AI News