Large Language Models

GPT, Claude, Gemini, and other LLMs

Top This Week

Llms

What is the scientific value of administering the standard Rorschach test to LLMs when the training data is almost certainly contaminated? (R) + [D]

A recent paper published in JMIR Mental Health (Csigó & Cserey, 2026) caught my attention. The researchers administered the 10 standa...

Reddit - Machine Learning · 1 min ·
Llms

Arc Gate —LLM proxy that hits P=1.00 R=1.00 F1=1.00 on indirect/roleplay prompt injection (beats OpenAI Moderation and LlamaGuard)

Benchmarked on 40 out-of-distribution prompts, indirect requests, roleplay framings, hypothetical scenarios, technical phrasings. The stu...

Reddit - Artificial Intelligence · 1 min ·
Claude can now plug directly into Photoshop, Blender, and Ableton | The Verge
Llms

Claude can now plug directly into Photoshop, Blender, and Ableton | The Verge

Anthropic has launched a set of connectors for Claude that allow the AI chatbot to tap into popular creative software

The Verge - AI · 4 min ·

All Content

[2508.03284] ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools
Llms

[2508.03284] ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools

Abstract page for arXiv paper 2508.03284: ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools

arXiv - AI · 4 min ·
[2505.02888] When Your Own Output Becomes Your Training Data: Noise-to-Meaning Loops and a Formal RSI Trigger
Llms

[2505.02888] When Your Own Output Becomes Your Training Data: Noise-to-Meaning Loops and a Formal RSI Trigger

Abstract page for arXiv paper 2505.02888: When Your Own Output Becomes Your Training Data: Noise-to-Meaning Loops and a Formal RSI Trigger

arXiv - AI · 3 min ·
[2507.15796] From Privacy to Trust in the Agentic Era: A Taxonomy of Challenges in Trustworthy Federated Learning Through the Lens of Trust Report 2.0
Llms

[2507.15796] From Privacy to Trust in the Agentic Era: A Taxonomy of Challenges in Trustworthy Federated Learning Through the Lens of Trust Report 2.0

Abstract page for arXiv paper 2507.15796: From Privacy to Trust in the Agentic Era: A Taxonomy of Challenges in Trustworthy Federated Lea...

arXiv - AI · 4 min ·
[2507.15518] HAMLET: A Hierarchical and Adaptive Multi-Agent Framework for Live Embodied Theatrics
Llms

[2507.15518] HAMLET: A Hierarchical and Adaptive Multi-Agent Framework for Live Embodied Theatrics

Abstract page for arXiv paper 2507.15518: HAMLET: A Hierarchical and Adaptive Multi-Agent Framework for Live Embodied Theatrics

arXiv - AI · 4 min ·
[2502.01534] Preference Leakage: A Contamination Problem in LLM-as-a-judge
Llms

[2502.01534] Preference Leakage: A Contamination Problem in LLM-as-a-judge

Abstract page for arXiv paper 2502.01534: Preference Leakage: A Contamination Problem in LLM-as-a-judge

arXiv - AI · 4 min ·
[2506.08321] LeanTutor: Towards a Verified AI Mathematical Proof Tutor
Llms

[2506.08321] LeanTutor: Towards a Verified AI Mathematical Proof Tutor

Abstract page for arXiv paper 2506.08321: LeanTutor: Towards a Verified AI Mathematical Proof Tutor

arXiv - AI · 3 min ·
[2505.21668] R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning
Llms

[2505.21668] R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning

Abstract page for arXiv paper 2505.21668: R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning

arXiv - AI · 4 min ·
[2505.21281] RLJP: Legal Judgment Prediction via First-Order Logic Rule-enhanced with Large Language Models
Llms

[2505.21281] RLJP: Legal Judgment Prediction via First-Order Logic Rule-enhanced with Large Language Models

Abstract page for arXiv paper 2505.21281: RLJP: Legal Judgment Prediction via First-Order Logic Rule-enhanced with Large Language Models

arXiv - AI · 4 min ·
[2504.20505] MuRAL: A Multi-Resident Ambient Sensor Dataset Annotated with Natural Language for Activities of Daily Living
Llms

[2504.20505] MuRAL: A Multi-Resident Ambient Sensor Dataset Annotated with Natural Language for Activities of Daily Living

Abstract page for arXiv paper 2504.20505: MuRAL: A Multi-Resident Ambient Sensor Dataset Annotated with Natural Language for Activities o...

arXiv - AI · 4 min ·
[2603.04317] World Properties without World Models: Recovering Spatial and Temporal Structure from Co-occurrence Statistics in Static Word Embeddings
Llms

[2603.04317] World Properties without World Models: Recovering Spatial and Temporal Structure from Co-occurrence Statistics in Static Word Embeddings

Abstract page for arXiv paper 2603.04317: World Properties without World Models: Recovering Spatial and Temporal Structure from Co-occurr...

arXiv - AI · 3 min ·
[2603.04257] Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory
Llms

[2603.04257] Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

Abstract page for arXiv paper 2603.04257: Memex(RL): Scaling Long-Horizon LLM Agents via Indexed Experience Memory

arXiv - Machine Learning · 4 min ·
[2603.04293] LabelBuddy: An Open Source Music and Audio Language Annotation Tagging Tool Using AI Assistance
Llms

[2603.04293] LabelBuddy: An Open Source Music and Audio Language Annotation Tagging Tool Using AI Assistance

Abstract page for arXiv paper 2603.04293: LabelBuddy: An Open Source Music and Audio Language Annotation Tagging Tool Using AI Assistance

arXiv - AI · 3 min ·
[2603.04277] VANGUARD: Vehicle-Anchored Ground Sample Distance Estimation for UAVs in GPS-Denied Environments
Llms

[2603.04277] VANGUARD: Vehicle-Anchored Ground Sample Distance Estimation for UAVs in GPS-Denied Environments

Abstract page for arXiv paper 2603.04277: VANGUARD: Vehicle-Anchored Ground Sample Distance Estimation for UAVs in GPS-Denied Environments

arXiv - AI · 4 min ·
[2603.04259] When AI Fails, What Works? A Data-Driven Taxonomy of Real-World AI Risk Mitigation Strategies
Llms

[2603.04259] When AI Fails, What Works? A Data-Driven Taxonomy of Real-World AI Risk Mitigation Strategies

Abstract page for arXiv paper 2603.04259: When AI Fails, What Works? A Data-Driven Taxonomy of Real-World AI Risk Mitigation Strategies

arXiv - AI · 4 min ·
[2603.04222] PRAM-R: A Perception-Reasoning-Action-Memory Framework with LLM-Guided Modality Routing for Adaptive Autonomous Driving
Llms

[2603.04222] PRAM-R: A Perception-Reasoning-Action-Memory Framework with LLM-Guided Modality Routing for Adaptive Autonomous Driving

Abstract page for arXiv paper 2603.04222: PRAM-R: A Perception-Reasoning-Action-Memory Framework with LLM-Guided Modality Routing for Ada...

arXiv - AI · 3 min ·
[2603.04165] PlaneCycle: Training-Free 2D-to-3D Lifting of Foundation Models Without Adapters
Llms

[2603.04165] PlaneCycle: Training-Free 2D-to-3D Lifting of Foundation Models Without Adapters

Abstract page for arXiv paper 2603.04165: PlaneCycle: Training-Free 2D-to-3D Lifting of Foundation Models Without Adapters

arXiv - AI · 3 min ·
[2603.04177] CodeTaste: Can LLMs Generate Human-Level Code Refactorings?
Llms

[2603.04177] CodeTaste: Can LLMs Generate Human-Level Code Refactorings?

Abstract page for arXiv paper 2603.04177: CodeTaste: Can LLMs Generate Human-Level Code Refactorings?

arXiv - AI · 3 min ·
[2603.04128] Crab$^{+}$: A Scalable and Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
Llms

[2603.04128] Crab$^{+}$: A Scalable and Unified Audio-Visual Scene Understanding Model with Explicit Cooperation

Abstract page for arXiv paper 2603.04128: Crab$^{+}$: A Scalable and Unified Audio-Visual Scene Understanding Model with Explicit Coopera...

arXiv - AI · 4 min ·
[2603.04162] Bielik-Q2-Sharp: A Comparative Study of Extreme 2-bit Quantization Methods for a Polish 11B Language Model
Llms

[2603.04162] Bielik-Q2-Sharp: A Comparative Study of Extreme 2-bit Quantization Methods for a Polish 11B Language Model

Abstract page for arXiv paper 2603.04162: Bielik-Q2-Sharp: A Comparative Study of Extreme 2-bit Quantization Methods for a Polish 11B Lan...

arXiv - AI · 3 min ·
[2603.04069] Monitoring Emergent Reward Hacking During Generation via Internal Activations
Llms

[2603.04069] Monitoring Emergent Reward Hacking During Generation via Internal Activations

Abstract page for arXiv paper 2603.04069: Monitoring Emergent Reward Hacking During Generation via Internal Activations

arXiv - AI · 4 min ·
Previous Page 254 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime