AI Startups

AI startup funding, launches, and acquisitions

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[2603.05659] When Rubrics Fail: Error Enumeration as Reward in Reference-Free RL Post-Training for Virtual Try-On

Abstract page for arXiv paper 2603.05659: When Rubrics Fail: Error Enumeration as Reward in Reference-Free RL Post-Training for Virtual T...

arXiv - AI · 4 min · about 1 hour ago

Machine Learning

[2512.16081] Evaluation of Generative Models for Emotional 3D Animation Generation in VR

Abstract page for arXiv paper 2512.16081: Evaluation of Generative Models for Emotional 3D Animation Generation in VR

arXiv - AI · 4 min · about 1 hour ago

Llms

[2510.16635] MA-SAPO: Multi-Agent Reasoning for Score-Aware Prompt Optimization

Abstract page for arXiv paper 2510.16635: MA-SAPO: Multi-Agent Reasoning for Score-Aware Prompt Optimization

arXiv - AI · 4 min · about 1 hour ago

All Content

Llms

[2602.20213] CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions

CodeHacker is an automated framework designed to generate test cases that identify vulnerabilities in competitive programming solutions, ...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.20193] When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks

This paper investigates the impact of encoder-side poisoning on text-to-image models, revealing that traditional evaluations of backdoor ...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20170] CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation

The paper introduces CAGE, a framework for culturally adaptive red-teaming benchmark generation, addressing the limitations of existing b...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.21143] A Benchmark for Deep Information Synthesis

The paper introduces DEEPSYNTH, a benchmark for evaluating large language models on complex tasks requiring deep information synthesis an...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.21044] LogicGraph : Benchmarking Multi-Path Logical Reasoning via Neuro-Symbolic Generation and Verification

LogicGraph introduces a benchmark for evaluating multi-path logical reasoning in large language models, highlighting their limitations in...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.20878] Diagnosing Causal Reasoning in Vision-Language Models via Structured Relevance Graphs

This article introduces Vision-Language Causal Graphs (VLCGs) to enhance causal reasoning in Vision-Language Models (LVLMs), addressing t...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20813] Pressure Reveals Character: Behavioural Alignment Evaluation at Depth

This paper presents a novel evaluation framework for assessing the alignment of language models under realistic pressure, revealing behav...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20812] Qwen-BIM: developing large language model for BIM-based design with domain-specific benchmark and dataset

The paper presents Qwen-BIM, a large language model tailored for BIM-based design, introducing a domain-specific benchmark and dataset th...

arXiv - AI · 4 min · about 1 month ago

Ai Startups

[2602.20810] POMDPPlanners: Open-Source Package for POMDP Planning

POMDPPlanners is an open-source Python package designed for the empirical evaluation of POMDP planning algorithms, integrating advanced f...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.20687] How Foundational Skills Influence VLM-based Embodied Agents:A Native Perspective

This article discusses the limitations of current benchmarks for vision-language model (VLM)-driven embodied agents and introduces Native...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.20638] Identifying two piecewise linear additive value functions from anonymous preference information

The paper discusses a method for identifying two piecewise linear additive value functions from anonymous preference information, enhanci...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.20571] CausalReasoningBenchmark: A Real-World Benchmark for Disentangled Evaluation of Causal Identification and Estimation

The CausalReasoningBenchmark introduces a new framework for evaluating automated causal inference, distinguishing between identification ...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.20494] KairosVL: Orchestrating Time Series and Semantics for Unified Reasoning

The paper introduces KairosVL, a novel framework that enhances time series analysis by integrating semantic reasoning, achieving competit...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.20303] Multilevel Determinants of Overweight and Obesity Among U.S. Children Aged 10-17: Comparative Evaluation of Statistical and Machine Learning Approaches Using the 2021 National Survey of Children's Health

This study evaluates multilevel determinants of overweight and obesity among U.S. children aged 10-17, comparing statistical and machine ...

arXiv - Machine Learning · 4 min · about 1 month ago

Ai Safety

Anthropic Drops Flagship Safety Pledge

Anthropic has announced the discontinuation of its flagship safety pledge, raising concerns about AI safety commitments in the industry.

Reddit - Artificial Intelligence · 1 min · about 1 month ago

Machine Learning

Pete Hegseth’s Pentagon AI bro squad includes a former Uber executive and a private equity billionaire | The Verge

The article discusses a Pentagon meeting involving Defense Secretary Pete Hegseth, former Uber executive Emil Michael, and private equity...

The Verge - AI · 11 min · about 1 month ago

Ai Startups

‘A game changer’: AI app offers support to caregivers of children with autism

The Behavior Buddy app, developed by UT San Antonio researchers, offers caregivers of children with autism practical support and behavior...

AI Tools & Products · 5 min · about 1 month ago

Ai Safety

AI-linked fears roil some corners of Wall Street after years of hype and gains

Concerns over AI spending are causing volatility in Wall Street, as investors question profitability. Major companies like IBM and Master...

AI Tools & Products · 5 min · about 1 month ago

Llms

India's AI boom pushes firms to trade near-term revenue for users | TechCrunch

India's AI market is witnessing a pivotal shift as tech firms transition from free offerings to monetization strategies, testing user con...

TechCrunch - AI · 7 min · about 1 month ago

Ai Infrastructure

Nvidia challenger AI chip startup MatX raised $500M | TechCrunch

MatX, an AI chip startup founded by ex-Google engineers, has raised $500M in Series B funding to develop processors aimed at outperformin...

TechCrunch - AI · 4 min · about 1 month ago

Previous Page 49 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Startups

Top This Week

[2603.05659] When Rubrics Fail: Error Enumeration as Reward in Reference-Free RL Post-Training for Virtual Try-On

[2512.16081] Evaluation of Generative Models for Emotional 3D Animation Generation in VR

[2510.16635] MA-SAPO: Multi-Agent Reasoning for Score-Aware Prompt Optimization

All Content

[2602.20213] CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions

[2602.20193] When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks

[2602.20170] CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation

[2602.21143] A Benchmark for Deep Information Synthesis

[2602.21044] LogicGraph : Benchmarking Multi-Path Logical Reasoning via Neuro-Symbolic Generation and Verification

[2602.20878] Diagnosing Causal Reasoning in Vision-Language Models via Structured Relevance Graphs

[2602.20813] Pressure Reveals Character: Behavioural Alignment Evaluation at Depth

[2602.20812] Qwen-BIM: developing large language model for BIM-based design with domain-specific benchmark and dataset

[2602.20810] POMDPPlanners: Open-Source Package for POMDP Planning

[2602.20687] How Foundational Skills Influence VLM-based Embodied Agents:A Native Perspective

[2602.20638] Identifying two piecewise linear additive value functions from anonymous preference information

[2602.20571] CausalReasoningBenchmark: A Real-World Benchmark for Disentangled Evaluation of Causal Identification and Estimation

[2602.20494] KairosVL: Orchestrating Time Series and Semantics for Unified Reasoning

[2602.20303] Multilevel Determinants of Overweight and Obesity Among U.S. Children Aged 10-17: Comparative Evaluation of Statistical and Machine Learning Approaches Using the 2021 National Survey of Children's Health

Anthropic Drops Flagship Safety Pledge

Pete Hegseth’s Pentagon AI bro squad includes a former Uber executive and a private equity billionaire | The Verge

‘A game changer’: AI app offers support to caregivers of children with autism

AI-linked fears roil some corners of Wall Street after years of hype and gains

India's AI boom pushes firms to trade near-term revenue for users | TechCrunch

Nvidia challenger AI chip startup MatX raised $500M | TechCrunch

Related Topics

Stay updated with AI News