Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

What is the scientific value of administering the standard Rorschach test to LLMs when the training data is almost certainly contaminated? (R) + [D]

A recent paper published in JMIR Mental Health (Csigó & Cserey, 2026) caught my attention. The researchers administered the 10 standa...

Reddit - Machine Learning · 1 min · about 1 hour ago

Llms

Arc Gate —LLM proxy that hits P=1.00 R=1.00 F1=1.00 on indirect/roleplay prompt injection (beats OpenAI Moderation and LlamaGuard)

Benchmarked on 40 out-of-distribution prompts, indirect requests, roleplay framings, hypothetical scenarios, technical phrasings. The stu...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Llms

Claude can now plug directly into Photoshop, Blender, and Ableton | The Verge

Anthropic has launched a set of connectors for Claude that allow the AI chatbot to tap into popular creative software

The Verge - AI · 4 min · about 4 hours ago

All Content

Llms

[2508.03284] ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools

Abstract page for arXiv paper 2508.03284: ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools

arXiv - AI · 4 min · about 2 months ago

Llms

[2505.02888] When Your Own Output Becomes Your Training Data: Noise-to-Meaning Loops and a Formal RSI Trigger

Abstract page for arXiv paper 2505.02888: When Your Own Output Becomes Your Training Data: Noise-to-Meaning Loops and a Formal RSI Trigger

arXiv - AI · 3 min · about 2 months ago

Llms

[2507.15796] From Privacy to Trust in the Agentic Era: A Taxonomy of Challenges in Trustworthy Federated Learning Through the Lens of Trust Report 2.0

Abstract page for arXiv paper 2507.15796: From Privacy to Trust in the Agentic Era: A Taxonomy of Challenges in Trustworthy Federated Lea...

arXiv - AI · 4 min · about 2 months ago

Llms

[2507.15518] HAMLET: A Hierarchical and Adaptive Multi-Agent Framework for Live Embodied Theatrics

Abstract page for arXiv paper 2507.15518: HAMLET: A Hierarchical and Adaptive Multi-Agent Framework for Live Embodied Theatrics

arXiv - AI · 4 min · about 2 months ago

Llms

[2502.01534] Preference Leakage: A Contamination Problem in LLM-as-a-judge

Abstract page for arXiv paper 2502.01534: Preference Leakage: A Contamination Problem in LLM-as-a-judge

arXiv - AI · 4 min · about 2 months ago

Llms

[2506.08321] LeanTutor: Towards a Verified AI Mathematical Proof Tutor

Abstract page for arXiv paper 2506.08321: LeanTutor: Towards a Verified AI Mathematical Proof Tutor

arXiv - AI · 3 min · about 2 months ago

Llms

[2505.21668] R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning

Abstract page for arXiv paper 2505.21668: R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning

arXiv - AI · 4 min · about 2 months ago

Llms

[2505.21281] RLJP: Legal Judgment Prediction via First-Order Logic Rule-enhanced with Large Language Models

Abstract page for arXiv paper 2505.21281: RLJP: Legal Judgment Prediction via First-Order Logic Rule-enhanced with Large Language Models

arXiv - AI · 4 min · about 2 months ago

Llms

[2504.20505] MuRAL: A Multi-Resident Ambient Sensor Dataset Annotated with Natural Language for Activities of Daily Living

Abstract page for arXiv paper 2504.20505: MuRAL: A Multi-Resident Ambient Sensor Dataset Annotated with Natural Language for Activities o...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.04317] World Properties without World Models: Recovering Spatial and Temporal Structure from Co-occurrence Statistics in Static Word Embeddings

Abstract page for arXiv paper 2603.04317: World Properties without World Models: Recovering Spatial and Temporal Structure from Co-occurr...

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.04257] Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

Abstract page for arXiv paper 2603.04257: Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2603.04293] LabelBuddy: An Open Source Music and Audio Language Annotation Tagging Tool Using AI Assistance

Abstract page for arXiv paper 2603.04293: LabelBuddy: An Open Source Music and Audio Language Annotation Tagging Tool Using AI Assistance

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.04277] VANGUARD: Vehicle-Anchored Ground Sample Distance Estimation for UAVs in GPS-Denied Environments

Abstract page for arXiv paper 2603.04277: VANGUARD: Vehicle-Anchored Ground Sample Distance Estimation for UAVs in GPS-Denied Environments

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.04259] When AI Fails, What Works? A Data-Driven Taxonomy of Real-World AI Risk Mitigation Strategies

Abstract page for arXiv paper 2603.04259: When AI Fails, What Works? A Data-Driven Taxonomy of Real-World AI Risk Mitigation Strategies

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.04222] PRAM-R: A Perception-Reasoning-Action-Memory Framework with LLM-Guided Modality Routing for Adaptive Autonomous Driving

Abstract page for arXiv paper 2603.04222: PRAM-R: A Perception-Reasoning-Action-Memory Framework with LLM-Guided Modality Routing for Ada...

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.04165] PlaneCycle: Training-Free 2D-to-3D Lifting of Foundation Models Without Adapters

Abstract page for arXiv paper 2603.04165: PlaneCycle: Training-Free 2D-to-3D Lifting of Foundation Models Without Adapters

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.04177] CodeTaste: Can LLMs Generate Human-Level Code Refactorings?

Abstract page for arXiv paper 2603.04177: CodeTaste: Can LLMs Generate Human-Level Code Refactorings?

arXiv - AI · 3 min · about 2 months ago

$[2603.04128] Crab$^{+}$: A Scalable and Unified Audio-Visual Scene Understanding Model with Explicit Cooperation$

Llms

[2603.04128] Crab$^{+}$: A Scalable and Unified Audio-Visual Scene Understanding Model with Explicit Cooperation

Abstract page for arXiv paper 2603.04128: Crab$^{+}$: A Scalable and Unified Audio-Visual Scene Understanding Model with Explicit Coopera...

arXiv - AI · 4 min · about 2 months ago

Llms

[2603.04162] Bielik-Q2-Sharp: A Comparative Study of Extreme 2-bit Quantization Methods for a Polish 11B Language Model

Abstract page for arXiv paper 2603.04162: Bielik-Q2-Sharp: A Comparative Study of Extreme 2-bit Quantization Methods for a Polish 11B Lan...

arXiv - AI · 3 min · about 2 months ago

Llms

[2603.04069] Monitoring Emergent Reward Hacking During Generation via Internal Activations

Abstract page for arXiv paper 2603.04069: Monitoring Emergent Reward Hacking During Generation via Internal Activations

arXiv - AI · 4 min · about 2 months ago

Previous Page 254 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

What is the scientific value of administering the standard Rorschach test to LLMs when the training data is almost certainly contaminated? (R) + [D]

Arc Gate —LLM proxy that hits P=1.00 R=1.00 F1=1.00 on indirect/roleplay prompt injection (beats OpenAI Moderation and LlamaGuard)

Claude can now plug directly into Photoshop, Blender, and Ableton | The Verge

All Content

[2508.03284] ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools

[2505.02888] When Your Own Output Becomes Your Training Data: Noise-to-Meaning Loops and a Formal RSI Trigger

[2507.15796] From Privacy to Trust in the Agentic Era: A Taxonomy of Challenges in Trustworthy Federated Learning Through the Lens of Trust Report 2.0

[2507.15518] HAMLET: A Hierarchical and Adaptive Multi-Agent Framework for Live Embodied Theatrics

[2502.01534] Preference Leakage: A Contamination Problem in LLM-as-a-judge

[2506.08321] LeanTutor: Towards a Verified AI Mathematical Proof Tutor

[2505.21668] R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning

[2505.21281] RLJP: Legal Judgment Prediction via First-Order Logic Rule-enhanced with Large Language Models

[2504.20505] MuRAL: A Multi-Resident Ambient Sensor Dataset Annotated with Natural Language for Activities of Daily Living

[2603.04317] World Properties without World Models: Recovering Spatial and Temporal Structure from Co-occurrence Statistics in Static Word Embeddings

[2603.04257] Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

[2603.04293] LabelBuddy: An Open Source Music and Audio Language Annotation Tagging Tool Using AI Assistance

[2603.04277] VANGUARD: Vehicle-Anchored Ground Sample Distance Estimation for UAVs in GPS-Denied Environments

[2603.04259] When AI Fails, What Works? A Data-Driven Taxonomy of Real-World AI Risk Mitigation Strategies

[2603.04222] PRAM-R: A Perception-Reasoning-Action-Memory Framework with LLM-Guided Modality Routing for Adaptive Autonomous Driving

[2603.04165] PlaneCycle: Training-Free 2D-to-3D Lifting of Foundation Models Without Adapters

[2603.04177] CodeTaste: Can LLMs Generate Human-Level Code Refactorings?

[2603.04128] Crab$^{+}$: A Scalable and Unified Audio-Visual Scene Understanding Model with Explicit Cooperation

[2603.04162] Bielik-Q2-Sharp: A Comparative Study of Extreme 2-bit Quantization Methods for a Polish 11B Language Model

[2603.04069] Monitoring Emergent Reward Hacking During Generation via Internal Activations

Related Topics

Stay updated with AI News