AI Startups

AI startup funding, launches, and acquisitions

Top This Week

Llms

[D] The problem with comparing AI memory system benchmarks — different evaluation methods make scores meaningless

I've been reviewing how various AI memory systems evaluate their performance and noticed a fundamental issue with cross-system comparison...

Reddit - Machine Learning · 1 min ·
Exclusive: Runway launches $10M fund, Builders program to support early stage AI startups | TechCrunch
Machine Learning

Exclusive: Runway launches $10M fund, Builders program to support early stage AI startups | TechCrunch

Runway is launching a $10 million fund and startup program to back companies building with its AI video models, as it pushes toward inter...

TechCrunch - AI · 7 min ·
The Download: AI health tools and the Pentagon’s Anthropic culture war | MIT Technology Review
Ai Startups

The Download: AI health tools and the Pentagon’s Anthropic culture war | MIT Technology Review

California has defied Trump's demands to stop AI regulation.

MIT Technology Review · 5 min ·

All Content

[2602.22769] AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications
Llms

[2602.22769] AMA-Bench: Evaluating Long-Horizon Memory for Agentic Applications

The paper introduces AMA-Bench, a new benchmark for evaluating long-horizon memory in Large Language Models (LLMs) for agentic applicatio...

arXiv - Machine Learning · 4 min ·
[2602.22758] Decomposing Physician Disagreement in HealthBench
Data Science

[2602.22758] Decomposing Physician Disagreement in HealthBench

This paper analyzes physician disagreement in the HealthBench dataset, identifying key factors contributing to variance in evaluations an...

arXiv - AI · 3 min ·
[2602.22680] Toward Personalized LLM-Powered Agents: Foundations, Evaluation, and Future Directions
Llms

[2602.22680] Toward Personalized LLM-Powered Agents: Foundations, Evaluation, and Future Directions

This survey paper explores the development of personalized LLM-powered agents, focusing on their foundations, evaluation metrics, and fut...

arXiv - AI · 4 min ·
[2602.22638] MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios
Llms

[2602.22638] MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios

MobilityBench introduces a benchmark for evaluating LLM-based route-planning agents, addressing challenges in real-world mobility scenari...

arXiv - AI · 4 min ·
[2602.22585] Correcting Human Labels for Rater Effects in AI Evaluation: An Item Response Theory Approach
Machine Learning

[2602.22585] Correcting Human Labels for Rater Effects in AI Evaluation: An Item Response Theory Approach

This paper explores the integration of psychometric rater models into AI evaluation, aiming to correct human label biases and improve the...

arXiv - Machine Learning · 3 min ·
[2602.22557] CourtGuard: A Model-Agnostic Framework for Zero-Shot Policy Adaptation in LLM Safety
Llms

[2602.22557] CourtGuard: A Model-Agnostic Framework for Zero-Shot Policy Adaptation in LLM Safety

CourtGuard introduces a model-agnostic framework for zero-shot policy adaptation in LLM safety, enhancing adaptability and performance wi...

arXiv - Machine Learning · 3 min ·
[2602.22532] Coarse-to-Fine Learning of Dynamic Causal Structures
Ai Startups

[2602.22532] Coarse-to-Fine Learning of Dynamic Causal Structures

The paper presents DyCausal, a framework for learning dynamic causal structures in time series data, addressing challenges of time-varyin...

arXiv - Machine Learning · 4 min ·
[2602.22520] TEFL: Prediction-Residual-Guided Rolling Forecasting for Multi-Horizon Time Series
Machine Learning

[2602.22520] TEFL: Prediction-Residual-Guided Rolling Forecasting for Multi-Horizon Time Series

The paper presents TEFL, a novel framework for multi-horizon time series forecasting that utilizes prediction residuals to enhance accura...

arXiv - Machine Learning · 4 min ·
[2602.22480] VeRO: An Evaluation Harness for Agents to Optimize Agents
Llms

[2602.22480] VeRO: An Evaluation Harness for Agents to Optimize Agents

The paper introduces VeRO, an evaluation harness designed for optimizing coding agents through structured evaluation and benchmarking, ad...

arXiv - Machine Learning · 3 min ·
[2602.22470] Beyond performance-wise Contribution Evaluation in Federated Learning
Machine Learning

[2602.22470] Beyond performance-wise Contribution Evaluation in Federated Learning

This paper explores the limitations of current evaluation methods in federated learning, emphasizing the need for a multidimensional appr...

arXiv - Machine Learning · 3 min ·
[2602.22442] A Framework for Assessing AI Agent Decisions and Outcomes in AutoML Pipelines
Llms

[2602.22442] A Framework for Assessing AI Agent Decisions and Outcomes in AutoML Pipelines

This article presents a framework for evaluating AI agent decisions in AutoML pipelines, emphasizing decision-centric metrics over tradit...

arXiv - AI · 4 min ·
[2602.22273] FIRE: A Comprehensive Benchmark for Financial Intelligence and Reasoning Evaluation
Llms

[2602.22273] FIRE: A Comprehensive Benchmark for Financial Intelligence and Reasoning Evaluation

The FIRE benchmark evaluates financial intelligence and reasoning in LLMs through diverse theoretical and practical assessments, providin...

arXiv - Machine Learning · 3 min ·
The Pentagon’s battle with Anthropic is really a war over who controls AI
Ai Safety

The Pentagon’s battle with Anthropic is really a war over who controls AI

The Pentagon's ultimatum to Anthropic over AI control raises critical questions about military access to advanced technologies and the et...

AI Tools & Products · 11 min ·
Ai Startups

hottest job in ai right now

The article discusses the current demand for a specific job role in AI, highlighting its relevance in the industry and potential career o...

Reddit - Artificial Intelligence · 1 min ·
Ai Safety

Anthropic Rejects Latest Pentagon Offer, Escalating AI Feud

Anthropic has rejected the Pentagon's latest offer, intensifying the ongoing conflict over AI regulations and military applications.

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[D] No-code workflows and the future of AI-driven video creation

The article explores the rise of no-code machine learning tools, focusing on how they simplify video creation workflows, particularly thr...

Reddit - Machine Learning · 1 min ·
20+ Best AI Project Ideas for 2026: Trending AI Projects
Ai Startups

20+ Best AI Project Ideas for 2026: Trending AI Projects

This article presents over 20 AI project ideas tailored for various skill levels, providing a roadmap for building portfolio-ready projec...

AI Events ·
‘Uncanny Valley’: Pentagon vs. ‘Woke’ Anthropic, Agentic vs. Mimetic, and Trump vs. State of the Union | WIRED
Ai Agents

‘Uncanny Valley’: Pentagon vs. ‘Woke’ Anthropic, Agentic vs. Mimetic, and Trump vs. State of the Union | WIRED

The Uncanny Valley podcast discusses the escalating feud between Anthropic and the Pentagon over AI technology use, the concept of agenti...

Wired - AI · 32 min ·
Ai Safety

Anthropic rejects latest Pentagon offer: ‘We cannot in good conscience accede to their request’

Anthropic has declined the Pentagon's latest offer, citing ethical concerns about aligning with military interests in AI development.

Reddit - Artificial Intelligence · 1 min ·
Anthropic CEO stands firm as Pentagon deadline looms | TechCrunch
Ai Safety

Anthropic CEO stands firm as Pentagon deadline looms | TechCrunch

Anthropic CEO Dario Amodei refuses Pentagon demands for unrestricted military access to AI systems, citing concerns over democratic value...

TechCrunch - AI · 4 min ·
Previous Page 41 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime