Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

My AI spent last night modifying its own codebase

I've been working on a local AI system called Apis that runs completely offline through Ollama. During a background run, Apis identified ...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

Fake users generated by AI can't simulate humans — review of 182 research papers. Your thoughts?

https://www.researchsquare.com/article/rs-9057643/v1 There’s a massive trend right now where tech companies, businesses, even researchers...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

Depth-first pruning seems to transfer from GPT-2 to Llama (unexpectedly well)

TL;DR: Removing the right transformer layers (instead of shrinking all layers) gives smaller, faster models with minimal quality loss — a...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

All Content

Llms

[2603.22289] MERIT: Memory-Enhanced Retrieval for Interpretable Knowledge Tracing

Abstract page for arXiv paper 2603.22289: MERIT: Memory-Enhanced Retrieval for Interpretable Knowledge Tracing

arXiv - AI · 3 min · 6 days ago

Llms

[2603.22713] Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Constraints

Abstract page for arXiv paper 2603.22713: Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Con...

arXiv - Machine Learning · 4 min · 6 days ago

Llms

[2603.22288] Evaluating Prompting Strategies for Chart Question Answering with Large Language Models

Abstract page for arXiv paper 2603.22288: Evaluating Prompting Strategies for Chart Question Answering with Large Language Models

arXiv - AI · 3 min · 6 days ago

Llms

[2603.22287] Founder effects shape the evolutionary dynamics of multimodality in open LLM families

Abstract page for arXiv paper 2603.22287: Founder effects shape the evolutionary dynamics of multimodality in open LLM families

arXiv - AI · 4 min · 6 days ago

Llms

[2502.04188] Automated Microservice Pattern Instance Detection Using Infrastructure-as-Code Artifacts and Large Language Models

Abstract page for arXiv paper 2502.04188: Automated Microservice Pattern Instance Detection Using Infrastructure-as-Code Artifacts and La...

arXiv - AI · 4 min · 6 days ago

Llms

[2603.23406] Beyond Preset Identities: How Agents Form Stances and Boundaries in Generative Societies

Abstract page for arXiv paper 2603.23406: Beyond Preset Identities: How Agents Form Stances and Boundaries in Generative Societies

arXiv - AI · 4 min · 6 days ago

Llms

[2603.23346] RelayS2S: A Dual-Path Speculative Generation for Real-Time Dialogue

Abstract page for arXiv paper 2603.23346: RelayS2S: A Dual-Path Speculative Generation for Real-Time Dialogue

arXiv - AI · 4 min · 6 days ago

Llms

[2603.23292] LLM Olympiad: Why Model Evaluation Needs a Sealed Exam

Abstract page for arXiv paper 2603.23292: LLM Olympiad: Why Model Evaluation Needs a Sealed Exam

arXiv - AI · 3 min · 6 days ago

Llms

[2603.22586] A Foundation Model for Instruction-Conditioned In-Context Time Series Tasks

Abstract page for arXiv paper 2603.22586: A Foundation Model for Instruction-Conditioned In-Context Time Series Tasks

arXiv - Machine Learning · 3 min · 6 days ago

Llms

[2603.23234] MemCollab: Cross-Agent Memory Collaboration via Contrastive Trajectory Distillation

Abstract page for arXiv paper 2603.23234: MemCollab: Cross-Agent Memory Collaboration via Contrastive Trajectory Distillation

arXiv - AI · 4 min · 6 days ago

Llms

[2603.23231] PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments

Abstract page for arXiv paper 2603.23231: PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task E...

arXiv - AI · 4 min · 6 days ago

Llms

[2603.23114] Between Rules and Reality: On the Context Sensitivity of LLM Moral Judgment

Abstract page for arXiv paper 2603.23114: Between Rules and Reality: On the Context Sensitivity of LLM Moral Judgment

arXiv - AI · 3 min · 6 days ago

Llms

[2603.23085] MedCausalX: Adaptive Causal Reasoning with Self-Reflection for Trustworthy Medical Vision-Language Models

Abstract page for arXiv paper 2603.23085: MedCausalX: Adaptive Causal Reasoning with Self-Reflection for Trustworthy Medical Vision-Langu...

arXiv - AI · 4 min · 6 days ago

Llms

[2603.22455] SkillRouter: Retrieve-and-Rerank Skill Selection for LLM Agents at Scale

Abstract page for arXiv paper 2603.22455: SkillRouter: Retrieve-and-Rerank Skill Selection for LLM Agents at Scale

arXiv - Machine Learning · 4 min · 6 days ago

Llms

[2603.23004] Can Large Language Models Reason and Optimize Under Constraints?

Abstract page for arXiv paper 2603.23004: Can Large Language Models Reason and Optimize Under Constraints?

arXiv - AI · 3 min · 6 days ago

Llms

[2603.22978] JFTA-Bench: Evaluate LLM's Ability of Tracking and Analyzing Malfunctions Using Fault Trees

Abstract page for arXiv paper 2603.22978: JFTA-Bench: Evaluate LLM's Ability of Tracking and Analyzing Malfunctions Using Fault Trees

arXiv - AI · 3 min · 6 days ago

Llms

[2603.22942] Optimizing Small Language Models for NL2SQL via Chain-of-Thought Fine-Tuning

Abstract page for arXiv paper 2603.22942: Optimizing Small Language Models for NL2SQL via Chain-of-Thought Fine-Tuning

arXiv - AI · 3 min · 6 days ago

Llms

[2603.22370] FAAR: Format-Aware Adaptive Rounding for NVFP4

Abstract page for arXiv paper 2603.22370: FAAR: Format-Aware Adaptive Rounding for NVFP4

arXiv - AI · 4 min · 6 days ago

Llms

[2603.22935] Ran Score: a LLM-based Evaluation Score for Radiology Report Generation

Abstract page for arXiv paper 2603.22935: Ran Score: a LLM-based Evaluation Score for Radiology Report Generation

arXiv - AI · 4 min · 6 days ago

Llms

[2603.22934] ProGRank: Probe-Gradient Reranking to Defend Dense-Retriever RAG from Corpus Poisoning

Abstract page for arXiv paper 2603.22934: ProGRank: Probe-Gradient Reranking to Defend Dense-Retriever RAG from Corpus Poisoning

arXiv - AI · 4 min · 6 days ago

Previous Page 38 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

My AI spent last night modifying its own codebase

Fake users generated by AI can't simulate humans — review of 182 research papers. Your thoughts?

Depth-first pruning seems to transfer from GPT-2 to Llama (unexpectedly well)

All Content

[2603.22289] MERIT: Memory-Enhanced Retrieval for Interpretable Knowledge Tracing

[2603.22713] Non-Adversarial Imitation Learning Provably Free of Compounding Errors: The Role of Bellman Constraints

[2603.22288] Evaluating Prompting Strategies for Chart Question Answering with Large Language Models

[2603.22287] Founder effects shape the evolutionary dynamics of multimodality in open LLM families

[2502.04188] Automated Microservice Pattern Instance Detection Using Infrastructure-as-Code Artifacts and Large Language Models

[2603.23406] Beyond Preset Identities: How Agents Form Stances and Boundaries in Generative Societies

[2603.23346] RelayS2S: A Dual-Path Speculative Generation for Real-Time Dialogue

[2603.23292] LLM Olympiad: Why Model Evaluation Needs a Sealed Exam

[2603.22586] A Foundation Model for Instruction-Conditioned In-Context Time Series Tasks

[2603.23234] MemCollab: Cross-Agent Memory Collaboration via Contrastive Trajectory Distillation

[2603.23231] PERMA: Benchmarking Personalized Memory Agents via Event-Driven Preference and Realistic Task Environments

[2603.23114] Between Rules and Reality: On the Context Sensitivity of LLM Moral Judgment

[2603.23085] MedCausalX: Adaptive Causal Reasoning with Self-Reflection for Trustworthy Medical Vision-Language Models

[2603.22455] SkillRouter: Retrieve-and-Rerank Skill Selection for LLM Agents at Scale

[2603.23004] Can Large Language Models Reason and Optimize Under Constraints?

[2603.22978] JFTA-Bench: Evaluate LLM's Ability of Tracking and Analyzing Malfunctions Using Fault Trees

[2603.22942] Optimizing Small Language Models for NL2SQL via Chain-of-Thought Fine-Tuning

[2603.22370] FAAR: Format-Aware Adaptive Rounding for NVFP4

[2603.22935] Ran Score: a LLM-based Evaluation Score for Radiology Report Generation

[2603.22934] ProGRank: Probe-Gradient Reranking to Defend Dense-Retriever RAG from Corpus Poisoning

Related Topics

Stay updated with AI News