AI Startups

AI startup funding, launches, and acquisitions

Top This Week

[2603.05659] When Rubrics Fail: Error Enumeration as Reward in Reference-Free RL Post-Training for Virtual Try-On
Machine Learning

[2603.05659] When Rubrics Fail: Error Enumeration as Reward in Reference-Free RL Post-Training for Virtual Try-On

Abstract page for arXiv paper 2603.05659: When Rubrics Fail: Error Enumeration as Reward in Reference-Free RL Post-Training for Virtual T...

arXiv - AI · 4 min ·
[2512.16081] Evaluation of Generative Models for Emotional 3D Animation Generation in VR
Machine Learning

[2512.16081] Evaluation of Generative Models for Emotional 3D Animation Generation in VR

Abstract page for arXiv paper 2512.16081: Evaluation of Generative Models for Emotional 3D Animation Generation in VR

arXiv - AI · 4 min ·
[2510.16635] MA-SAPO: Multi-Agent Reasoning for Score-Aware Prompt Optimization
Llms

[2510.16635] MA-SAPO: Multi-Agent Reasoning for Score-Aware Prompt Optimization

Abstract page for arXiv paper 2510.16635: MA-SAPO: Multi-Agent Reasoning for Score-Aware Prompt Optimization

arXiv - AI · 4 min ·

All Content

[2602.20213] CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions
Llms

[2602.20213] CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions

CodeHacker is an automated framework designed to generate test cases that identify vulnerabilities in competitive programming solutions, ...

arXiv - AI · 3 min ·
[2602.20193] When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks
Machine Learning

[2602.20193] When Backdoors Go Beyond Triggers: Semantic Drift in Diffusion Models Under Encoder Attacks

This paper investigates the impact of encoder-side poisoning on text-to-image models, revealing that traditional evaluations of backdoor ...

arXiv - AI · 3 min ·
[2602.20170] CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation
Llms

[2602.20170] CAGE: A Framework for Culturally Adaptive Red-Teaming Benchmark Generation

The paper introduces CAGE, a framework for culturally adaptive red-teaming benchmark generation, addressing the limitations of existing b...

arXiv - AI · 3 min ·
[2602.21143] A Benchmark for Deep Information Synthesis
Llms

[2602.21143] A Benchmark for Deep Information Synthesis

The paper introduces DEEPSYNTH, a benchmark for evaluating large language models on complex tasks requiring deep information synthesis an...

arXiv - Machine Learning · 4 min ·
[2602.21044] LogicGraph : Benchmarking Multi-Path Logical Reasoning via Neuro-Symbolic Generation and Verification
Llms

[2602.21044] LogicGraph : Benchmarking Multi-Path Logical Reasoning via Neuro-Symbolic Generation and Verification

LogicGraph introduces a benchmark for evaluating multi-path logical reasoning in large language models, highlighting their limitations in...

arXiv - AI · 4 min ·
[2602.20878] Diagnosing Causal Reasoning in Vision-Language Models via Structured Relevance Graphs
Llms

[2602.20878] Diagnosing Causal Reasoning in Vision-Language Models via Structured Relevance Graphs

This article introduces Vision-Language Causal Graphs (VLCGs) to enhance causal reasoning in Vision-Language Models (LVLMs), addressing t...

arXiv - AI · 3 min ·
[2602.20813] Pressure Reveals Character: Behavioural Alignment Evaluation at Depth
Llms

[2602.20813] Pressure Reveals Character: Behavioural Alignment Evaluation at Depth

This paper presents a novel evaluation framework for assessing the alignment of language models under realistic pressure, revealing behav...

arXiv - AI · 3 min ·
[2602.20812] Qwen-BIM: developing large language model for BIM-based design with domain-specific benchmark and dataset
Llms

[2602.20812] Qwen-BIM: developing large language model for BIM-based design with domain-specific benchmark and dataset

The paper presents Qwen-BIM, a large language model tailored for BIM-based design, introducing a domain-specific benchmark and dataset th...

arXiv - AI · 4 min ·
[2602.20810] POMDPPlanners: Open-Source Package for POMDP Planning
Ai Startups

[2602.20810] POMDPPlanners: Open-Source Package for POMDP Planning

POMDPPlanners is an open-source Python package designed for the empirical evaluation of POMDP planning algorithms, integrating advanced f...

arXiv - AI · 3 min ·
[2602.20687] How Foundational Skills Influence VLM-based Embodied Agents:A Native Perspective
Llms

[2602.20687] How Foundational Skills Influence VLM-based Embodied Agents:A Native Perspective

This article discusses the limitations of current benchmarks for vision-language model (VLM)-driven embodied agents and introduces Native...

arXiv - AI · 4 min ·
[2602.20638] Identifying two piecewise linear additive value functions from anonymous preference information
Machine Learning

[2602.20638] Identifying two piecewise linear additive value functions from anonymous preference information

The paper discusses a method for identifying two piecewise linear additive value functions from anonymous preference information, enhanci...

arXiv - AI · 3 min ·
[2602.20571] CausalReasoningBenchmark: A Real-World Benchmark for Disentangled Evaluation of Causal Identification and Estimation
Machine Learning

[2602.20571] CausalReasoningBenchmark: A Real-World Benchmark for Disentangled Evaluation of Causal Identification and Estimation

The CausalReasoningBenchmark introduces a new framework for evaluating automated causal inference, distinguishing between identification ...

arXiv - AI · 4 min ·
[2602.20494] KairosVL: Orchestrating Time Series and Semantics for Unified Reasoning
Machine Learning

[2602.20494] KairosVL: Orchestrating Time Series and Semantics for Unified Reasoning

The paper introduces KairosVL, a novel framework that enhances time series analysis by integrating semantic reasoning, achieving competit...

arXiv - AI · 3 min ·
[2602.20303] Multilevel Determinants of Overweight and Obesity Among U.S. Children Aged 10-17: Comparative Evaluation of Statistical and Machine Learning Approaches Using the 2021 National Survey of Children's Health
Machine Learning

[2602.20303] Multilevel Determinants of Overweight and Obesity Among U.S. Children Aged 10-17: Comparative Evaluation of Statistical and Machine Learning Approaches Using the 2021 National Survey of Children's Health

This study evaluates multilevel determinants of overweight and obesity among U.S. children aged 10-17, comparing statistical and machine ...

arXiv - Machine Learning · 4 min ·
Ai Safety

Anthropic Drops Flagship Safety Pledge

Anthropic has announced the discontinuation of its flagship safety pledge, raising concerns about AI safety commitments in the industry.

Reddit - Artificial Intelligence · 1 min ·
Pete Hegseth’s Pentagon AI bro squad includes a former Uber executive and a private equity billionaire | The Verge
Machine Learning

Pete Hegseth’s Pentagon AI bro squad includes a former Uber executive and a private equity billionaire | The Verge

The article discusses a Pentagon meeting involving Defense Secretary Pete Hegseth, former Uber executive Emil Michael, and private equity...

The Verge - AI · 11 min ·
‘A game changer’: AI app offers support to caregivers of children with autism
Ai Startups

‘A game changer’: AI app offers support to caregivers of children with autism

The Behavior Buddy app, developed by UT San Antonio researchers, offers caregivers of children with autism practical support and behavior...

AI Tools & Products · 5 min ·
AI-linked fears roil some corners of Wall Street after years of hype and gains
Ai Safety

AI-linked fears roil some corners of Wall Street after years of hype and gains

Concerns over AI spending are causing volatility in Wall Street, as investors question profitability. Major companies like IBM and Master...

AI Tools & Products · 5 min ·
India's AI boom pushes firms to trade near-term revenue for users | TechCrunch
Llms

India's AI boom pushes firms to trade near-term revenue for users | TechCrunch

India's AI market is witnessing a pivotal shift as tech firms transition from free offerings to monetization strategies, testing user con...

TechCrunch - AI · 7 min ·
Nvidia challenger AI chip startup MatX raised $500M | TechCrunch
Ai Infrastructure

Nvidia challenger AI chip startup MatX raised $500M | TechCrunch

MatX, an AI chip startup founded by ex-Google engineers, has raised $500M in Series B funding to develop processors aimed at outperformin...

TechCrunch - AI · 4 min ·
Previous Page 49 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime