Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

Claude mythos preview GameJam contestant

Claude was able to create this Indie Game Jam Challenge with simple user guided prompts in the Godong engine with Mythos Preview with Zer...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

I implemented meta paper [P]

github link : genji970/Scaling-Test-Time-Compute-for-Agentic-Coding-: paper implementation of Meta Ai paper link : https://arxiv.org/abs/...

Reddit - Machine Learning · 1 min · about 2 hours ago

Llms

How do I actually learn AI/ML deeply enough to build systems (not just follow tutorials)? [D]

I'm stuck in a loop where I consume AI/ML content but can’t move towards actually building real systems. - I understand things at a surfa...

Reddit - Machine Learning · 1 min · about 2 hours ago

All Content

Llms

[2510.19842] DAG-Math: Graph-of-Thought Guided Mathematical Reasoning in LLMs

Abstract page for arXiv paper 2510.19842: DAG-Math: Graph-of-Thought Guided Mathematical Reasoning in LLMs

arXiv - Machine Learning · 4 min · 2 months ago

Llms

[2603.01399] Quasar: Quantized Self-Speculative Acceleration for Rapid Inference via Memory-Efficient Verification

Abstract page for arXiv paper 2603.01399: Quasar: Quantized Self-Speculative Acceleration for Rapid Inference via Memory-Efficient Verifi...

arXiv - Machine Learning · 4 min · 2 months ago

Llms

[2510.04284] Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning

Abstract page for arXiv paper 2510.04284: Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning

arXiv - AI · 4 min · 2 months ago

Llms

[2510.04040] FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning

Abstract page for arXiv paper 2510.04040: FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning

arXiv - AI · 4 min · 2 months ago

Llms

[2510.03605] Understanding the Role of Training Data in Test-Time Scaling

Abstract page for arXiv paper 2510.03605: Understanding the Role of Training Data in Test-Time Scaling

arXiv - Machine Learning · 4 min · 2 months ago

Llms

[2603.01327] SWE-Adept: An LLM-Based Agentic Framework for Deep Codebase Analysis and Structured Issue Resolution

Abstract page for arXiv paper 2603.01327: SWE-Adept: An LLM-Based Agentic Framework for Deep Codebase Analysis and Structured Issue Resol...

arXiv - Machine Learning · 4 min · 2 months ago

Llms

[2603.01326] Truth as a Trajectory: What Internal Representations Reveal About Large Language Model Reasoning

Abstract page for arXiv paper 2603.01326: Truth as a Trajectory: What Internal Representations Reveal About Large Language Model Reasoning

arXiv - Machine Learning · 4 min · 2 months ago

Llms

[2509.23465] ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems

Abstract page for arXiv paper 2509.23465: ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Pro...

arXiv - AI · 4 min · 2 months ago

Llms

[2509.23415] From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents

Abstract page for arXiv paper 2509.23415: From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database ...

arXiv - AI · 4 min · 2 months ago

Llms

[2509.21993] Bilinear representation mitigates reversal curse and enables consistent model editing

Abstract page for arXiv paper 2509.21993: Bilinear representation mitigates reversal curse and enables consistent model editing

arXiv - Machine Learning · 4 min · 2 months ago

Llms

[2603.01236] AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models

Abstract page for arXiv paper 2603.01236: AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in...

arXiv - Machine Learning · 4 min · 2 months ago

Llms

[2509.21028] Who Gets Cited Most? Benchmarking Long-Context Numerical Reasoning on Scientific Articles

Abstract page for arXiv paper 2509.21028: Who Gets Cited Most? Benchmarking Long-Context Numerical Reasoning on Scientific Articles

arXiv - AI · 3 min · 2 months ago

Llms

[2603.01214] Reasoning Boosts Opinion Alignment in LLMs

Abstract page for arXiv paper 2603.01214: Reasoning Boosts Opinion Alignment in LLMs

arXiv - Machine Learning · 3 min · 2 months ago

Llms

[2509.12282] AISSISTANT: Human-AI Collaborative Review and Perspective Research Workflows in Data Science

Abstract page for arXiv paper 2509.12282: AISSISTANT: Human-AI Collaborative Review and Perspective Research Workflows in Data Science

arXiv - Machine Learning · 4 min · 2 months ago

Llms

[2603.01213] Can AI Agents Agree?

Abstract page for arXiv paper 2603.01213: Can AI Agents Agree?

arXiv - Machine Learning · 3 min · 2 months ago

Llms

[2509.03906] Toward Clinically Explainable AI for Medical Diagnosis: A Foundation Model with Human-Compatible Reasoning via Reinforcement Learning

Abstract page for arXiv paper 2509.03906: Toward Clinically Explainable AI for Medical Diagnosis: A Foundation Model with Human-Compatibl...

arXiv - AI · 4 min · 2 months ago

Llms

[2509.01938] EigenBench: A Comparative Behavioral Measure of Value Alignment

Abstract page for arXiv paper 2509.01938: EigenBench: A Comparative Behavioral Measure of Value Alignment

arXiv - Machine Learning · 4 min · 2 months ago

Llms

[2508.20729] Re4: Scientific Computing Agent with Rewriting, Resolution, Review and Revision

Abstract page for arXiv paper 2508.20729: Re4: Scientific Computing Agent with Rewriting, Resolution, Review and Revision

arXiv - AI · 4 min · 2 months ago

Llms

[2508.15030] Collab-REC: An LLM-based Agentic Framework for Balancing Recommendations in Tourism

Abstract page for arXiv paper 2508.15030: Collab-REC: An LLM-based Agentic Framework for Balancing Recommendations in Tourism

arXiv - AI · 3 min · 2 months ago

Llms

[2507.16145] SpiroLLM: Finetuning Pretrained LLMs to Understand Spirogram Time Series with Clinical Validation in COPD Reporting

Abstract page for arXiv paper 2507.16145: SpiroLLM: Finetuning Pretrained LLMs to Understand Spirogram Time Series with Clinical Validati...

arXiv - AI · 4 min · 2 months ago

Previous Page 310 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

Claude mythos preview GameJam contestant

I implemented meta paper [P]

How do I actually learn AI/ML deeply enough to build systems (not just follow tutorials)? [D]

All Content

[2510.19842] DAG-Math: Graph-of-Thought Guided Mathematical Reasoning in LLMs

[2603.01399] Quasar: Quantized Self-Speculative Acceleration for Rapid Inference via Memory-Efficient Verification

[2510.04284] Doctor-R1: Mastering Clinical Inquiry with Experiential Agentic Reinforcement Learning

[2510.04040] FaithCoT-Bench: Benchmarking Instance-Level Faithfulness of Chain-of-Thought Reasoning

[2510.03605] Understanding the Role of Training Data in Test-Time Scaling

[2603.01327] SWE-Adept: An LLM-Based Agentic Framework for Deep Codebase Analysis and Structured Issue Resolution

[2603.01326] Truth as a Trajectory: What Internal Representations Reveal About Large Language Model Reasoning

[2509.23465] ViTSP: A Vision Language Models Guided Framework for Solving Large-Scale Traveling Salesman Problems

[2509.23415] From Conversation to Query Execution: Benchmarking User and Tool Interactions for EHR Database Agents

[2509.21993] Bilinear representation mitigates reversal curse and enables consistent model editing

[2603.01236] AgilePruner: An Empirical Study of Attention and Diversity for Adaptive Visual Token Pruning in Large Vision-Language Models

[2509.21028] Who Gets Cited Most? Benchmarking Long-Context Numerical Reasoning on Scientific Articles

[2603.01214] Reasoning Boosts Opinion Alignment in LLMs

[2509.12282] AISSISTANT: Human-AI Collaborative Review and Perspective Research Workflows in Data Science

[2603.01213] Can AI Agents Agree?

[2509.03906] Toward Clinically Explainable AI for Medical Diagnosis: A Foundation Model with Human-Compatible Reasoning via Reinforcement Learning

[2509.01938] EigenBench: A Comparative Behavioral Measure of Value Alignment

[2508.20729] Re4: Scientific Computing Agent with Rewriting, Resolution, Review and Revision

[2508.15030] Collab-REC: An LLM-based Agentic Framework for Balancing Recommendations in Tourism

[2507.16145] SpiroLLM: Finetuning Pretrained LLMs to Understand Spirogram Time Series with Clinical Validation in COPD Reporting

Related Topics

Stay updated with AI News