Large Language Models

GPT, Claude, Gemini, and other LLMs

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Llms

Bluesky’s new app is an AI for customizing your feed | The Verge

Eventually Attie will be able to vibe code entire apps for the AT Protocol.

The Verge - AI · 3 min · about 1 hour ago

Llms

Nicolas Carlini (67.2k citations on Google Scholar) says Claude is a better security researcher than him, made $3.7 million from exploiting smart contracts, and found vulnerabilities in Linux and Ghost

Link: https://m.youtube.com/watch?v=1sd26pWhfmg The Linux exploit is especially interesting because it was introduced in 2003 and was nev...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Llms

[P] I built an autonomous ML agent that runs experiments on tabular data indefinitely - inspired by Karpathy's AutoResearch

Inspired by Andrej Karpathy's AutoResearch, I built a system where Claude Code acts as an autonomous ML researcher on tabular binary clas...

Reddit - Machine Learning · 1 min · about 4 hours ago

All Content

Llms

[2506.11167] Towards a general-purpose foundation model for fMRI analysis

Abstract page for arXiv paper 2506.11167: Towards a general-purpose foundation model for fMRI analysis

arXiv - Machine Learning · 4 min · 5 days ago

Llms

[2508.09537] From Context to Intent: Reasoning-Guided Function-Level Code Completion

Abstract page for arXiv paper 2508.09537: From Context to Intent: Reasoning-Guided Function-Level Code Completion

arXiv - AI · 4 min · 5 days ago

Llms

[2505.22318] Flying Pigs, FaR and Beyond: Evaluating LLM Reasoning in Counterfactual Worlds

Abstract page for arXiv paper 2505.22318: Flying Pigs, FaR and Beyond: Evaluating LLM Reasoning in Counterfactual Worlds

arXiv - Machine Learning · 4 min · 5 days ago

Llms

[2504.16956] GeneMamba: An Efficient and Effective Foundation Model on Single Cell Data

Abstract page for arXiv paper 2504.16956: GeneMamba: An Efficient and Effective Foundation Model on Single Cell Data

arXiv - Machine Learning · 4 min · 5 days ago

Llms

[2412.08686] LatentQA: Teaching LLMs to Decode Activations Into Natural Language

Abstract page for arXiv paper 2412.08686: LatentQA: Teaching LLMs to Decode Activations Into Natural Language

arXiv - Machine Learning · 4 min · 5 days ago

Llms

[2410.12164] Table-LLM-Specialist: Language Model Specialists for Tables using Iterative Generator-Validator Fine-tuning

Abstract page for arXiv paper 2410.12164: Table-LLM-Specialist: Language Model Specialists for Tables using Iterative Generator-Validator...

arXiv - Machine Learning · 4 min · 5 days ago

Llms

[2504.07396] Automating quantum feature map design via large language models

Abstract page for arXiv paper 2504.07396: Automating quantum feature map design via large language models

arXiv - AI · 4 min · 5 days ago

Llms

[2502.01969] Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration

Abstract page for arXiv paper 2502.01969: Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration

arXiv - AI · 4 min · 5 days ago

Llms

[2306.05036] Mapping the Challenges of HCI: An Application and Evaluation of ChatGPT for Mining Insights at Scale

Abstract page for arXiv paper 2306.05036: Mapping the Challenges of HCI: An Application and Evaluation of ChatGPT for Mining Insights at ...

arXiv - AI · 4 min · 5 days ago

Llms

[2601.12138] DriveSafe: A Hierarchical Risk Taxonomy for Safety-Critical LLM-Based Driving Assistants

Abstract page for arXiv paper 2601.12138: DriveSafe: A Hierarchical Risk Taxonomy for Safety-Critical LLM-Based Driving Assistants

arXiv - AI · 3 min · 5 days ago

Llms

[2511.22076] Hybrid Stackelberg Game and Diffusion-based Auction for Two-tier Agentic AI Task Offloading in Internet of Agents

Abstract page for arXiv paper 2511.22076: Hybrid Stackelberg Game and Diffusion-based Auction for Two-tier Agentic AI Task Offloading in ...

arXiv - AI · 4 min · 5 days ago

Llms

[2601.18858] Representational Homomorphism Predicts and Improves Compositional Generalization In Transformer Language Model

Abstract page for arXiv paper 2601.18858: Representational Homomorphism Predicts and Improves Compositional Generalization In Transformer...

arXiv - AI · 4 min · 5 days ago

Llms

[2510.05318] BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation for Large Language Models via Lens of Dynamic Interactions

Abstract page for arXiv paper 2510.05318: BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation for Large Language Models via Lens of Dynami...

arXiv - AI · 4 min · 5 days ago

Llms

[2510.00415] Towards Self-Evolving Benchmarks: Synthesizing Agent Trajectories via Test-Time Exploration under Validate-by-Reproduce Paradigm

Abstract page for arXiv paper 2510.00415: Towards Self-Evolving Benchmarks: Synthesizing Agent Trajectories via Test-Time Exploration und...

arXiv - AI · 4 min · 5 days ago

Llms

[2603.23501] MedObvious: Exposing the Medical Moravec's Paradox in VLMs via Clinical Triage

Abstract page for arXiv paper 2603.23501: MedObvious: Exposing the Medical Moravec's Paradox in VLMs via Clinical Triage

arXiv - AI · 4 min · 5 days ago

Llms

[2603.23485] Failure of contextual invariance in gender inference with large language models

Abstract page for arXiv paper 2603.23485: Failure of contextual invariance in gender inference with large language models

arXiv - AI · 3 min · 5 days ago

Llms

[2603.23482] ReqFusion: A Multi-Provider Framework for Automated PEGS Analysis Across Software Domains

Abstract page for arXiv paper 2603.23482: ReqFusion: A Multi-Provider Framework for Automated PEGS Analysis Across Software Domains

arXiv - AI · 4 min · 5 days ago

Llms

[2510.16051] GUIrilla: A Scalable Framework for Automated Desktop UI Exploration

Abstract page for arXiv paper 2510.16051: GUIrilla: A Scalable Framework for Automated Desktop UI Exploration

arXiv - AI · 4 min · 5 days ago

Llms

[2603.23447] 3DCity-LLM: Empowering Multi-modality Large Language Models for 3D City-scale Perception and Understanding

Abstract page for arXiv paper 2603.23447: 3DCity-LLM: Empowering Multi-modality Large Language Models for 3D City-scale Perception and Un...

arXiv - AI · 4 min · 5 days ago

Llms

[2603.23443] Evaluating LLM-Based Test Generation Under Software Evolution

Abstract page for arXiv paper 2603.23443: Evaluating LLM-Based Test Generation Under Software Evolution

arXiv - AI · 4 min · 5 days ago

Previous Page 15 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Large Language Models

Top This Week

Bluesky’s new app is an AI for customizing your feed | The Verge

Nicolas Carlini (67.2k citations on Google Scholar) says Claude is a better security researcher than him, made $3.7 million from exploiting smart contracts, and found vulnerabilities in Linux and Ghost

[P] I built an autonomous ML agent that runs experiments on tabular data indefinitely - inspired by Karpathy's AutoResearch

All Content

[2506.11167] Towards a general-purpose foundation model for fMRI analysis

[2508.09537] From Context to Intent: Reasoning-Guided Function-Level Code Completion

[2505.22318] Flying Pigs, FaR and Beyond: Evaluating LLM Reasoning in Counterfactual Worlds

[2504.16956] GeneMamba: An Efficient and Effective Foundation Model on Single Cell Data

[2412.08686] LatentQA: Teaching LLMs to Decode Activations Into Natural Language

[2410.12164] Table-LLM-Specialist: Language Model Specialists for Tables using Iterative Generator-Validator Fine-tuning

[2504.07396] Automating quantum feature map design via large language models

[2502.01969] Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration

[2306.05036] Mapping the Challenges of HCI: An Application and Evaluation of ChatGPT for Mining Insights at Scale

[2601.12138] DriveSafe: A Hierarchical Risk Taxonomy for Safety-Critical LLM-Based Driving Assistants

[2511.22076] Hybrid Stackelberg Game and Diffusion-based Auction for Two-tier Agentic AI Task Offloading in Internet of Agents

[2601.18858] Representational Homomorphism Predicts and Improves Compositional Generalization In Transformer Language Model

[2510.05318] BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation for Large Language Models via Lens of Dynamic Interactions

[2510.00415] Towards Self-Evolving Benchmarks: Synthesizing Agent Trajectories via Test-Time Exploration under Validate-by-Reproduce Paradigm

[2603.23501] MedObvious: Exposing the Medical Moravec's Paradox in VLMs via Clinical Triage

[2603.23485] Failure of contextual invariance in gender inference with large language models

[2603.23482] ReqFusion: A Multi-Provider Framework for Automated PEGS Analysis Across Software Domains

[2510.16051] GUIrilla: A Scalable Framework for Automated Desktop UI Exploration

[2603.23447] 3DCity-LLM: Empowering Multi-modality Large Language Models for 3D City-scale Perception and Understanding

[2603.23443] Evaluating LLM-Based Test Generation Under Software Evolution

Related Topics

Stay updated with AI News