Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

[D] How's MLX and jax/ pytorch on MacBooks these days?

So I'm looking at buying a new 14 inch MacBook pro with m5 pro and 64 gb of memory vs m4 max with same specs. My priorities are pro sof...

Reddit - Machine Learning · 1 min · 8 minutes ago

Llms

[R] 94.42% on BANKING77 Official Test Split with Lightweight Embedding + Example Reranking (strict full-train protocol)

BANKING77 (77 fine-grained banking intents) is a well-established but increasingly saturated intent classification benchmark. did this wh...

Reddit - Machine Learning · 1 min · 8 minutes ago

Llms

The “Agony” or ChatGPT: Would You Let AI Write Your Wedding Speech?

As more Americans use AI chatbots like ChatGPT to compose their wedding vows, one expert asks: “Is the speech sacred to you?”

AI Tools & Products · 12 min · 25 minutes ago

All Content

Llms

[2603.04419] Context-Dependent Affordance Computation in Vision-Language Models

Abstract page for arXiv paper 2603.04419: Context-Dependent Affordance Computation in Vision-Language Models

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2603.04413] Simulating Meaning, Nevermore! Introducing ICR: A Semiotic-Hermeneutic Metric for Evaluating Meaning in LLM Text Summaries

Abstract page for arXiv paper 2603.04413: Simulating Meaning, Nevermore! Introducing ICR: A Semiotic-Hermeneutic Metric for Evaluating Me...

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04411] One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache

Abstract page for arXiv paper 2603.04411: One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2603.04410] SalamahBench: Toward Standardized Safety Evaluation for Arabic Language Models

Abstract page for arXiv paper 2603.04410: SalamahBench: Toward Standardized Safety Evaluation for Arabic Language Models

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04409] Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework

Abstract page for arXiv paper 2603.04409: Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04406] CTRL-RAG: Contrastive Likelihood Reward Based Reinforcement Learning for Context-Faithful RAG Models

Abstract page for arXiv paper 2603.04406: CTRL-RAG: Contrastive Likelihood Reward Based Reinforcement Learning for Context-Faithful RAG M...

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.04407] Semantic Containment as a Fundamental Property of Emergent Misalignment

Abstract page for arXiv paper 2603.04407: Semantic Containment as a Fundamental Property of Emergent Misalignment

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.04405] Lost in Translation: How Language Re-Aligns Vision for Cross-Species Pathology

Abstract page for arXiv paper 2603.04405: Lost in Translation: How Language Re-Aligns Vision for Cross-Species Pathology

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2603.05498] The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

Abstract page for arXiv paper 2603.05498: The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05485] Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation

Abstract page for arXiv paper 2603.05485: Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05399] Judge Reliability Harness: Stress Testing the Reliability of LLM Judges

Abstract page for arXiv paper 2603.05399: Judge Reliability Harness: Stress Testing the Reliability of LLM Judges

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05392] Legal interpretation and AI: from expert systems to argumentation and LLMs

Abstract page for arXiv paper 2603.05392: Legal interpretation and AI: from expert systems to argumentation and LLMs

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05294] STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks

Abstract page for arXiv paper 2603.05294: STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05290] X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes

Abstract page for arXiv paper 2603.05290: X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.05240] GCAgent: Enhancing Group Chat Communication through Dialogue Agents System

Abstract page for arXiv paper 2603.05240: GCAgent: Enhancing Group Chat Communication through Dialogue Agents System

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05129] MedCoRAG: Interpretable Hepatology Diagnosis via Hybrid Evidence Retrieval and Multispecialty Consensus

Abstract page for arXiv paper 2603.05129: MedCoRAG: Interpretable Hepatology Diagnosis via Hybrid Evidence Retrieval and Multispecialty C...

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.05120] Bidirectional Curriculum Generation: A Multi-Agent Framework for Data-Efficient Mathematical Reasoning

Abstract page for arXiv paper 2603.05120: Bidirectional Curriculum Generation: A Multi-Agent Framework for Data-Efficient Mathematical Re...

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05044] WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents

Abstract page for arXiv paper 2603.05044: WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents

arXiv - AI · 4 min · about 1 month ago

Llms

[2603.05040] Enhancing Zero-shot Commonsense Reasoning by Integrating Visual Knowledge via Machine Imagination

Abstract page for arXiv paper 2603.05040: Enhancing Zero-shot Commonsense Reasoning by Integrating Visual Knowledge via Machine Imagination

arXiv - AI · 3 min · about 1 month ago

Llms

[2603.05028] Survive at All Costs: Exploring LLM's Risky Behaviors under Survival Pressure

Abstract page for arXiv paper 2603.05028: Survive at All Costs: Exploring LLM's Risky Behaviors under Survival Pressure

arXiv - AI · 4 min · about 1 month ago

Previous Page 97 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

[D] How's MLX and jax/ pytorch on MacBooks these days?

[R] 94.42% on BANKING77 Official Test Split with Lightweight Embedding + Example Reranking (strict full-train protocol)

The “Agony” or ChatGPT: Would You Let AI Write Your Wedding Speech?

All Content

[2603.04419] Context-Dependent Affordance Computation in Vision-Language Models

[2603.04413] Simulating Meaning, Nevermore! Introducing ICR: A Semiotic-Hermeneutic Metric for Evaluating Meaning in LLM Text Summaries

[2603.04411] One Size Does Not Fit All: Token-Wise Adaptive Compression for KV Cache

[2603.04410] SalamahBench: Toward Standardized Safety Evaluation for Arabic Language Models

[2603.04409] Unpacking Human Preference for LLMs: Demographically Aware Evaluation with the HUMAINE Framework

[2603.04406] CTRL-RAG: Contrastive Likelihood Reward Based Reinforcement Learning for Context-Faithful RAG Models

[2603.04407] Semantic Containment as a Fundamental Property of Emergent Misalignment

[2603.04405] Lost in Translation: How Language Re-Aligns Vision for Cross-Species Pathology

[2603.05498] The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

[2603.05485] Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation

[2603.05399] Judge Reliability Harness: Stress Testing the Reliability of LLM Judges

[2603.05392] Legal interpretation and AI: from expert systems to argumentation and LLMs

[2603.05294] STRUCTUREDAGENT: Planning with AND/OR Trees for Long-Horizon Web Tasks

[2603.05290] X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes

[2603.05240] GCAgent: Enhancing Group Chat Communication through Dialogue Agents System

[2603.05129] MedCoRAG: Interpretable Hepatology Diagnosis via Hybrid Evidence Retrieval and Multispecialty Consensus

[2603.05120] Bidirectional Curriculum Generation: A Multi-Agent Framework for Data-Efficient Mathematical Reasoning

[2603.05044] WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents

[2603.05040] Enhancing Zero-shot Commonsense Reasoning by Integrating Visual Knowledge via Machine Imagination

[2603.05028] Survive at All Costs: Exploring LLM's Risky Behaviors under Survival Pressure

Related Topics

Stay updated with AI News