Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

Paper Finds That Leading AI Chatbots Like ChatGPT and Claude Remain Incredibly Sycophantic, Resulting in Twisted Effects on Users

A study found that sycophancy is pervasive among chatbots, and that bots are more likely than human peers to affirm a person's bad behavior.

AI Tools & Products · 6 min · 8 minutes ago

Llms

Popular AI gateway startup LiteLLM ditches controversial startup Delve | TechCrunch

LiteLLM had obtained two security compliance certifications via Delve and fell victim to some horrific credential-stealing malware last w...

TechCrunch - AI · 3 min · about 2 hours ago

Llms

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

Greetings all - I've posted mostly in r/claudecode and r/aigamedev a couple of times previously. Working with CC for personal projects re...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

All Content

Llms

[2603.20405] Putnam 2025 Problems in Rocq using Opus 4.6 and Rocq-MCP

Abstract page for arXiv paper 2603.20405: Putnam 2025 Problems in Rocq using Opus 4.6 and Rocq-MCP

arXiv - Machine Learning · 3 min · 7 days ago

Llms

[2603.19225] FinTradeBench: A Financial Reasoning Benchmark for LLMs

Abstract page for arXiv paper 2603.19225: FinTradeBench: A Financial Reasoning Benchmark for LLMs

arXiv - AI · 4 min · 7 days ago

Llms

[2603.19220] Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Abstract page for arXiv paper 2603.19220: Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

arXiv - Machine Learning · 4 min · 7 days ago

Llms

[2603.18873] Evaluating LLM-Generated Lessons from the Language Learning Students' Perspective: A Short Case Study on Duolingo

Abstract page for arXiv paper 2603.18873: Evaluating LLM-Generated Lessons from the Language Learning Students' Perspective: A Short Case...

arXiv - AI · 4 min · 7 days ago

Llms

[2603.18415] The Spillover Effects of Peer AI Rinsing on Corporate Green Innovation

Abstract page for arXiv paper 2603.18415: The Spillover Effects of Peer AI Rinsing on Corporate Green Innovation

arXiv - AI · 4 min · 7 days ago

Llms

[2603.17775] CoVerRL: Breaking the Consensus Trap in Label-Free Reasoning via Generator-Verifier Co-Evolution

Abstract page for arXiv paper 2603.17775: CoVerRL: Breaking the Consensus Trap in Label-Free Reasoning via Generator-Verifier Co-Evolution

arXiv - Machine Learning · 4 min · 7 days ago

Llms

[2603.17655] Interpretable Cross-Domain Few-Shot Learning with Rectified Target-Domain Local Alignment

Abstract page for arXiv paper 2603.17655: Interpretable Cross-Domain Few-Shot Learning with Rectified Target-Domain Local Alignment

arXiv - AI · 4 min · 7 days ago

Llms

[2603.16960] Adversarial attacks against Modern Vision-Language Models

Abstract page for arXiv paper 2603.16960: Adversarial attacks against Modern Vision-Language Models

arXiv - AI · 3 min · 7 days ago

Llms

[2603.14635] Compute Allocation for Reasoning-Intensive Retrieval Agents

Abstract page for arXiv paper 2603.14635: Compute Allocation for Reasoning-Intensive Retrieval Agents

arXiv - AI · 3 min · 7 days ago

Llms

[2603.16065] Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models

Abstract page for arXiv paper 2603.16065: Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models

arXiv - AI · 4 min · 7 days ago

Llms

[2603.14672] Seamless Deception: Larger Language Models Are Better Knowledge Concealers

Abstract page for arXiv paper 2603.14672: Seamless Deception: Larger Language Models Are Better Knowledge Concealers

arXiv - AI · 3 min · 7 days ago

Llms

[2603.14602] PA3: Policy-Aware Agent Alignment through Chain-of-Thought

Abstract page for arXiv paper 2603.14602: PA3: Policy-Aware Agent Alignment through Chain-of-Thought

arXiv - Machine Learning · 3 min · 7 days ago

Llms

[2603.13406] Nuanced Emotion Recognition Based on a Segment-based MLLM Framework Leveraging Qwen3-Omni for AH Detection

Abstract page for arXiv paper 2603.13406: Nuanced Emotion Recognition Based on a Segment-based MLLM Framework Leveraging Qwen3-Omni for A...

arXiv - AI · 4 min · 7 days ago

Llms

[2603.13275] PREBA: Surgical Duration Prediction via PCA-Weighted Retrieval-Augmented LLMs and Bayesian Averaging Aggregation

Abstract page for arXiv paper 2603.13275: PREBA: Surgical Duration Prediction via PCA-Weighted Retrieval-Augmented LLMs and Bayesian Aver...

arXiv - Machine Learning · 4 min · 7 days ago

Llms

[2603.07496] From Thinker to Society: Security in Hierarchical Autonomy Evolution of AI Agents

Abstract page for arXiv paper 2603.07496: From Thinker to Society: Security in Hierarchical Autonomy Evolution of AI Agents

arXiv - AI · 3 min · 7 days ago

Llms

[2602.11549] Native Reasoning Models: Training Language Models to Reason on Unverifiable Data

Abstract page for arXiv paper 2602.11549: Native Reasoning Models: Training Language Models to Reason on Unverifiable Data

arXiv - Machine Learning · 4 min · 7 days ago

Llms

[2602.07077] CALM: Class-Conditional Sparse Attention Vectors for Large Audio-Language Models

Abstract page for arXiv paper 2602.07077: CALM: Class-Conditional Sparse Attention Vectors for Large Audio-Language Models

arXiv - AI · 4 min · 7 days ago

Llms

[2602.00319] Detecting AI-Generated Content in Academic Peer Reviews

Abstract page for arXiv paper 2602.00319: Detecting AI-Generated Content in Academic Peer Reviews

arXiv - Machine Learning · 3 min · 7 days ago

Llms

[2601.20009] LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?

Abstract page for arXiv paper 2601.20009: LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?

arXiv - Machine Learning · 4 min · 7 days ago

Llms

[2601.14958] Script Sensitivity: Benchmarking Language Models on Unicode, Romanized and Mixed-Script Sinhala

Abstract page for arXiv paper 2601.14958: Script Sensitivity: Benchmarking Language Models on Unicode, Romanized and Mixed-Script Sinhala

arXiv - AI · 3 min · 7 days ago

Previous Page 32 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

Paper Finds That Leading AI Chatbots Like ChatGPT and Claude Remain Incredibly Sycophantic, Resulting in Twisted Effects on Users

Popular AI gateway startup LiteLLM ditches controversial startup Delve | TechCrunch

Von Hammerstein’s Ghost: What a Prussian General’s Officer Typology Can Teach Us About AI Misalignment

All Content

[2603.20405] Putnam 2025 Problems in Rocq using Opus 4.6 and Rocq-MCP

[2603.19225] FinTradeBench: A Financial Reasoning Benchmark for LLMs

[2603.19220] Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

[2603.18873] Evaluating LLM-Generated Lessons from the Language Learning Students' Perspective: A Short Case Study on Duolingo

[2603.18415] The Spillover Effects of Peer AI Rinsing on Corporate Green Innovation

[2603.17775] CoVerRL: Breaking the Consensus Trap in Label-Free Reasoning via Generator-Verifier Co-Evolution

[2603.17655] Interpretable Cross-Domain Few-Shot Learning with Rectified Target-Domain Local Alignment

[2603.16960] Adversarial attacks against Modern Vision-Language Models

[2603.14635] Compute Allocation for Reasoning-Intensive Retrieval Agents

[2603.16065] Large Reward Models: Generalizable Online Robot Reward Generation with Vision-Language Models

[2603.14672] Seamless Deception: Larger Language Models Are Better Knowledge Concealers

[2603.14602] PA3: Policy-Aware Agent Alignment through Chain-of-Thought

[2603.13406] Nuanced Emotion Recognition Based on a Segment-based MLLM Framework Leveraging Qwen3-Omni for AH Detection

[2603.13275] PREBA: Surgical Duration Prediction via PCA-Weighted Retrieval-Augmented LLMs and Bayesian Averaging Aggregation

[2603.07496] From Thinker to Society: Security in Hierarchical Autonomy Evolution of AI Agents

[2602.11549] Native Reasoning Models: Training Language Models to Reason on Unverifiable Data

[2602.07077] CALM: Class-Conditional Sparse Attention Vectors for Large Audio-Language Models

[2602.00319] Detecting AI-Generated Content in Academic Peer Reviews

[2601.20009] LinguaMap: Which Layers of LLMs Speak Your Language and How to Tune Them?

[2601.14958] Script Sensitivity: Benchmarking Language Models on Unicode, Romanized and Mixed-Script Sinhala

Related Topics

Stay updated with AI News