Large Language Models

GPT, Claude, Gemini, and other LLMs

Top This Week

Paper Finds That Leading AI Chatbots Like ChatGPT and Claude Remain Incredibly Sycophantic, Resulting in Twisted Effects on Users
Llms

Paper Finds That Leading AI Chatbots Like ChatGPT and Claude Remain Incredibly Sycophantic, Resulting in Twisted Effects on Users

A study found that sycophancy is pervasive among chatbots, and that bots are more likely than human peers to affirm a person's bad behavior.

AI Tools & Products · 6 min ·
Popular AI gateway startup LiteLLM ditches controversial startup Delve | TechCrunch
Llms

Popular AI gateway startup LiteLLM ditches controversial startup Delve | TechCrunch

LiteLLM had obtained two security compliance certifications via Delve and fell victim to some horrific credential-stealing malware last w...

TechCrunch - AI · 3 min ·
Llms

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

Greetings all - I've posted mostly in r/claudecode and r/aigamedev a couple of times previously. Working with CC for personal projects re...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2603.20405] Putnam 2025 Problems in Rocq using Opus 4.6 and Rocq-MCP
Llms

[2603.20405] Putnam 2025 Problems in Rocq using Opus 4.6 and Rocq-MCP

Abstract page for arXiv paper 2603.20405: Putnam 2025 Problems in Rocq using Opus 4.6 and Rocq-MCP

arXiv - Machine Learning · 3 min ·
[2603.19225] FinTradeBench: A Financial Reasoning Benchmark for LLMs
Llms

[2603.19225] FinTradeBench: A Financial Reasoning Benchmark for LLMs

Abstract page for arXiv paper 2603.19225: FinTradeBench: A Financial Reasoning Benchmark for LLMs

arXiv - AI · 4 min ·
[2603.19220] Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation
Llms

[2603.19220] Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Abstract page for arXiv paper 2603.19220: Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

arXiv - Machine Learning · 4 min ·
[2603.18873] Evaluating LLM-Generated Lessons from the Language Learning Students' Perspective: A Short Case Study on Duolingo
Llms

[2603.18873] Evaluating LLM-Generated Lessons from the Language Learning Students' Perspective: A Short Case Study on Duolingo

Abstract page for arXiv paper 2603.18873: Evaluating LLM-Generated Lessons from the Language Learning Students' Perspective: A Short Case...

arXiv - AI · 4 min ·
[2603.18415] The Spillover Effects of Peer AI Rinsing on Corporate Green Innovation
Llms

[2603.18415] The Spillover Effects of Peer AI Rinsing on Corporate Green Innovation

Abstract page for arXiv paper 2603.18415: The Spillover Effects of Peer AI Rinsing on Corporate Green Innovation

arXiv - AI · 4 min ·
[2603.17775] CoVerRL: Breaking the Consensus Trap in Label-Free Reasoning via Generator-Verifier Co-Evolution
Llms

[2603.17775] CoVerRL: Breaking the Consensus Trap in Label-Free Reasoning via Generator-Verifier Co-Evolution

Abstract page for arXiv paper 2603.17775: CoVerRL: Breaking the Consensus Trap in Label-Free Reasoning via Generator-Verifier Co-Evolution

arXiv - Machine Learning · 4 min ·
[2603.17655] Interpretable Cross-Domain Few-Shot Learning with Rectified Target-Domain Local Alignment
Llms

[2603.17655] Interpretable Cross-Domain Few-Shot Learning with Rectified Target-Domain Local Alignment

Abstract page for arXiv paper 2603.17655: Interpretable Cross-Domain Few-Shot Learning with Rectified Target-Domain Local Alignment

arXiv - AI · 4 min ·
[2603.16960] Adversarial attacks against Modern Vision-Language Models
Llms

[2603.16960] Adversarial attacks against Modern Vision-Language Models

Abstract page for arXiv paper 2603.16960: Adversarial attacks against Modern Vision-Language Models

arXiv - AI · 3 min ·
[2603.14635] Compute Allocation for Reasoning-Intensive Retrieval Agents
Llms

[2603.14635] Compute Allocation for Reasoning-Intensive Retrieval Agents

Abstract page for arXiv paper 2603.14635: Compute Allocation for Reasoning-Intensive Retrieval Agents

arXiv - AI · 3 min ·
[2603.16065] Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models
Llms

[2603.16065] Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models

Abstract page for arXiv paper 2603.16065: Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models

arXiv - AI · 4 min ·
[2603.14672] Seamless Deception: Larger Language Models Are Better Knowledge Concealers
Llms

[2603.14672] Seamless Deception: Larger Language Models Are Better Knowledge Concealers

Abstract page for arXiv paper 2603.14672: Seamless Deception: Larger Language Models Are Better Knowledge Concealers

arXiv - AI · 3 min ·
[2603.14602] PA3: Policy-Aware Agent Alignment through Chain-of-Thought
Llms

[2603.14602] PA3: Policy-Aware Agent Alignment through Chain-of-Thought

Abstract page for arXiv paper 2603.14602: PA3: Policy-Aware Agent Alignment through Chain-of-Thought

arXiv - Machine Learning · 3 min ·
[2603.13406] Nuanced Emotion Recognition Based on a Segment-based MLLM Framework Leveraging Qwen3-Omni for AH Detection
Llms

[2603.13406] Nuanced Emotion Recognition Based on a Segment-based MLLM Framework Leveraging Qwen3-Omni for AH Detection

Abstract page for arXiv paper 2603.13406: Nuanced Emotion Recognition Based on a Segment-based MLLM Framework Leveraging Qwen3-Omni for A...

arXiv - AI · 4 min ·
[2603.13275] PREBA: Surgical Duration Prediction via PCA-Weighted Retrieval-Augmented LLMs and Bayesian Averaging Aggregation
Llms

[2603.13275] PREBA: Surgical Duration Prediction via PCA-Weighted Retrieval-Augmented LLMs and Bayesian Averaging Aggregation

Abstract page for arXiv paper 2603.13275: PREBA: Surgical Duration Prediction via PCA-Weighted Retrieval-Augmented LLMs and Bayesian Aver...

arXiv - Machine Learning · 4 min ·
[2603.07496] From Thinker to Society: Security in Hierarchical Autonomy Evolution of AI Agents
Llms

[2603.07496] From Thinker to Society: Security in Hierarchical Autonomy Evolution of AI Agents

Abstract page for arXiv paper 2603.07496: From Thinker to Society: Security in Hierarchical Autonomy Evolution of AI Agents

arXiv - AI · 3 min ·
[2602.11549] Native Reasoning Models: Training Language Models to Reason on Unverifiable Data
Llms

[2602.11549] Native Reasoning Models: Training Language Models to Reason on Unverifiable Data

Abstract page for arXiv paper 2602.11549: Native Reasoning Models: Training Language Models to Reason on Unverifiable Data

arXiv - Machine Learning · 4 min ·
[2602.07077] CALM: Class-Conditional Sparse Attention Vectors for Large Audio-Language Models
Llms

[2602.07077] CALM: Class-Conditional Sparse Attention Vectors for Large Audio-Language Models

Abstract page for arXiv paper 2602.07077: CALM: Class-Conditional Sparse Attention Vectors for Large Audio-Language Models

arXiv - AI · 4 min ·
[2602.00319] Detecting AI-Generated Content in Academic Peer Reviews
Llms

[2602.00319] Detecting AI-Generated Content in Academic Peer Reviews

Abstract page for arXiv paper 2602.00319: Detecting AI-Generated Content in Academic Peer Reviews

arXiv - Machine Learning · 3 min ·
[2601.20009] LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?
Llms

[2601.20009] LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?

Abstract page for arXiv paper 2601.20009: LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?

arXiv - Machine Learning · 4 min ·
[2601.14958] Script Sensitivity: Benchmarking Language Models on Unicode, Romanized and Mixed-Script Sinhala
Llms

[2601.14958] Script Sensitivity: Benchmarking Language Models on Unicode, Romanized and Mixed-Script Sinhala

Abstract page for arXiv paper 2601.14958: Script Sensitivity: Benchmarking Language Models on Unicode, Romanized and Mixed-Script Sinhala

arXiv - AI · 3 min ·
Previous Page 32 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime