Large Language Models

GPT, Claude, Gemini, and other LLMs

Top This Week

Bluesky’s new app is an AI for customizing your feed | The Verge
Llms

Bluesky’s new app is an AI for customizing your feed | The Verge

Eventually Attie will be able to vibe code entire apps for the AT Protocol.

The Verge - AI · 3 min ·
Llms

Nicolas Carlini (67.2k citations on Google Scholar) says Claude is a better security researcher than him, made $3.7 million from exploiting smart contracts, and found vulnerabilities in Linux and Ghost

Link: https://m.youtube.com/watch?v=1sd26pWhfmg The Linux exploit is especially interesting because it was introduced in 2003 and was nev...

Reddit - Artificial Intelligence · 1 min ·
Llms

[P] I built an autonomous ML agent that runs experiments on tabular data indefinitely - inspired by Karpathy's AutoResearch

Inspired by Andrej Karpathy's AutoResearch, I built a system where Claude Code acts as an autonomous ML researcher on tabular binary clas...

Reddit - Machine Learning · 1 min ·

All Content

Claude Users Hit a New Reality of AI Rationing
Llms

Claude Users Hit a New Reality of AI Rationing

In building apps with Anthropic AI assistant Claude, users regularly receive a message that their daily limit has been reached and must w...

AI Tools & Products · 6 min ·
Sephora Launches AI-Powered Shopping App in ChatGPT
Llms

Sephora Launches AI-Powered Shopping App in ChatGPT

Sephora has launched an AI-powered shopping app within ChatGPT, offering a new personalised beauty discovery experience.

AI Tools & Products · 3 min ·
Gemini could let you transfer chat history from other AI apps, like a game of ‘telephone’
Llms

Gemini could let you transfer chat history from other AI apps, like a game of ‘telephone’

Google seems to understand that the current AI landscape has users jumping from one model to the next, looking for...

AI Tools & Products · 3 min ·
[2603.18788] Mi:dm K 2.5 Pro
Llms

[2603.18788] Mi:dm K 2.5 Pro

Abstract page for arXiv paper 2603.18788: Mi:dm K 2.5 Pro

arXiv - AI · 4 min ·
[2603.17729] SARE: Sample-wise Adaptive Reasoning for Training-free Fine-grained Visual Recognition
Llms

[2603.17729] SARE: Sample-wise Adaptive Reasoning for Training-free Fine-grained Visual Recognition

Abstract page for arXiv paper 2603.17729: SARE: Sample-wise Adaptive Reasoning for Training-free Fine-grained Visual Recognition

arXiv - AI · 4 min ·
[2602.01047] Residual Decoding: Mitigating Hallucinations in Large Vision-Language Models via History-Aware Residual Guidance
Llms

[2602.01047] Residual Decoding: Mitigating Hallucinations in Large Vision-Language Models via History-Aware Residual Guidance

Abstract page for arXiv paper 2602.01047: Residual Decoding: Mitigating Hallucinations in Large Vision-Language Models via History-Aware ...

arXiv - AI · 4 min ·
[2602.07023] Behavioral Consistency Validation for LLM Agents: An Analysis of Trading-Style Switching through Stock-Market Simulation
Llms

[2602.07023] Behavioral Consistency Validation for LLM Agents: An Analysis of Trading-Style Switching through Stock-Market Simulation

Abstract page for arXiv paper 2602.07023: Behavioral Consistency Validation for LLM Agents: An Analysis of Trading-Style Switching throug...

arXiv - AI · 4 min ·
[2601.13719] Hierarchical Long Video Understanding with Audiovisual Entity Cohesion and Agentic Search
Llms

[2601.13719] Hierarchical Long Video Understanding with Audiovisual Entity Cohesion and Agentic Search

Abstract page for arXiv paper 2601.13719: Hierarchical Long Video Understanding with Audiovisual Entity Cohesion and Agentic Search

arXiv - AI · 4 min ·
[2603.13606] NCCL EP: Towards a Unified Expert Parallel Communication API for NCCL
Llms

[2603.13606] NCCL EP: Towards a Unified Expert Parallel Communication API for NCCL

Abstract page for arXiv paper 2603.13606: NCCL EP: Towards a Unified Expert Parallel Communication API for NCCL

arXiv - AI · 4 min ·
[2601.07315] VLM-CAD: VLM-Optimized Collaborative Agent Design Workflow for Analog Circuit Sizing
Llms

[2601.07315] VLM-CAD: VLM-Optimized Collaborative Agent Design Workflow for Analog Circuit Sizing

Abstract page for arXiv paper 2601.07315: VLM-CAD: VLM-Optimized Collaborative Agent Design Workflow for Analog Circuit Sizing

arXiv - AI · 4 min ·
[2512.22387] AI-Generated Code Is Not Reproducible (Yet): An Empirical Study of Dependency Gaps in LLM-Based Coding Agents
Llms

[2512.22387] AI-Generated Code Is Not Reproducible (Yet): An Empirical Study of Dependency Gaps in LLM-Based Coding Agents

Abstract page for arXiv paper 2512.22387: AI-Generated Code Is Not Reproducible (Yet): An Empirical Study of Dependency Gaps in LLM-Based...

arXiv - AI · 4 min ·
[2512.02487] Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Understanding
Llms

[2512.02487] Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Understanding

Abstract page for arXiv paper 2512.02487: Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Und...

arXiv - AI · 4 min ·
[2510.26865] Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench
Llms

[2510.26865] Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench

Abstract page for arXiv paper 2510.26865: Do Vision-Language Models Measure Up? Benchmarking Visual Measurement Reading with MeasureBench

arXiv - AI · 4 min ·
[2511.05919] Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs
Llms

[2511.05919] Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs

Abstract page for arXiv paper 2511.05919: Injecting Falsehoods: Adversarial Man-in-the-Middle Attacks Undermining Factual Recall in LLMs

arXiv - AI · 4 min ·
[2510.23421] Quantifying Systemic Vulnerability in the Foundation Model Industry
Llms

[2510.23421] Quantifying Systemic Vulnerability in the Foundation Model Industry

Abstract page for arXiv paper 2510.23421: Quantifying Systemic Vulnerability in the Foundation Model Industry

arXiv - AI · 3 min ·
[2511.12449] MOON2.0: Dynamic Modality-balanced Multimodal Representation Learning for E-commerce Product Understanding
Llms

[2511.12449] MOON2.0: Dynamic Modality-balanced Multimodal Representation Learning for E-commerce Product Understanding

Abstract page for arXiv paper 2511.12449: MOON2.0: Dynamic Modality-balanced Multimodal Representation Learning for E-commerce Product Un...

arXiv - AI · 4 min ·
[2510.15994] MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents
Llms

[2510.15994] MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents

Abstract page for arXiv paper 2510.15994: MCP Security Bench (MSB): Benchmarking Attacks Against Model Context Protocol in LLM Agents

arXiv - AI · 4 min ·
[2510.14967] Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn Search Agents
Llms

[2510.14967] Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn Search Agents

Abstract page for arXiv paper 2510.14967: Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn Sear...

arXiv - AI · 4 min ·
[2510.01483] VL-KnG: Persistent Spatiotemporal Knowledge Graphs from Egocentric Video for Embodied Scene Understanding
Llms

[2510.01483] VL-KnG: Persistent Spatiotemporal Knowledge Graphs from Egocentric Video for Embodied Scene Understanding

Abstract page for arXiv paper 2510.01483: VL-KnG: Persistent Spatiotemporal Knowledge Graphs from Egocentric Video for Embodied Scene Und...

arXiv - AI · 3 min ·
[2509.20502] MARS: toward more efficient multi-agent collaboration for LLM reasoning
Llms

[2509.20502] MARS: toward more efficient multi-agent collaboration for LLM reasoning

Abstract page for arXiv paper 2509.20502: MARS: toward more efficient multi-agent collaboration for LLM reasoning

arXiv - AI · 4 min ·
Previous Page 14 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime