AI Startups

AI startup funding, launches, and acquisitions

Top This Week

Inside Real Estate Launches Streams AI Mobile App to Boost Agent Productivity and Response
Ai Startups

Inside Real Estate Launches Streams AI Mobile App to Boost Agent Productivity and Response

Inside Real Estate launched Streams, an AI-powered mobile app that delivers real-time lead insights, follow-ups and productivity tools to...

AI Tools & Products · 5 min ·
[2603.05659] When Rubrics Fail: Error Enumeration as Reward in Reference-Free RL Post-Training for Virtual Try-On
Machine Learning

[2603.05659] When Rubrics Fail: Error Enumeration as Reward in Reference-Free RL Post-Training for Virtual Try-On

Abstract page for arXiv paper 2603.05659: When Rubrics Fail: Error Enumeration as Reward in Reference-Free RL Post-Training for Virtual T...

arXiv - AI · 4 min ·
[2512.16081] Evaluation of Generative Models for Emotional 3D Animation Generation in VR
Machine Learning

[2512.16081] Evaluation of Generative Models for Emotional 3D Animation Generation in VR

Abstract page for arXiv paper 2512.16081: Evaluation of Generative Models for Emotional 3D Animation Generation in VR

arXiv - AI · 4 min ·

All Content

[2602.18891] Orchestrating LLM Agents for Scientific Research: A Pilot Study of Multiple Choice Question (MCQ) Generation and Evaluation
Llms

[2602.18891] Orchestrating LLM Agents for Scientific Research: A Pilot Study of Multiple Choice Question (MCQ) Generation and Evaluation

This pilot study explores the orchestration of LLM agents in scientific research, focusing on the generation and evaluation of multiple-c...

arXiv - AI · 4 min ·
[2602.18806] Think$^{2}$: Grounded Metacognitive Reasoning in Large Language Models
Llms

[2602.18806] Think$^{2}$: Grounded Metacognitive Reasoning in Large Language Models

The paper presents a metacognitive framework for Large Language Models (LLMs) that enhances their reasoning capabilities by integrating p...

arXiv - AI · 3 min ·
[2602.18729] MiSCHiEF: A Benchmark in Minimal-Pairs of Safety and Culture for Holistic Evaluation of Fine-Grained Image-Caption Alignment
Llms

[2602.18729] MiSCHiEF: A Benchmark in Minimal-Pairs of Safety and Culture for Holistic Evaluation of Fine-Grained Image-Caption Alignment

The paper presents MiSCHiEF, a benchmark for evaluating fine-grained image-caption alignment, focusing on safety and cultural contexts, h...

arXiv - AI · 4 min ·
[2602.19619] Is Your Diffusion Sampler Actually Correct? A Sampler-Centric Evaluation of Discrete Diffusion Language Models
Llms

[2602.19619] Is Your Diffusion Sampler Actually Correct? A Sampler-Centric Evaluation of Discrete Diffusion Language Models

This article evaluates the accuracy of discrete diffusion language models (dLLMs) through a sampler-centric framework, revealing signific...

arXiv - Machine Learning · 3 min ·
[2602.19591] Detecting High-Potential SMEs with Heterogeneous Graph Neural Networks
Machine Learning

[2602.19591] Detecting High-Potential SMEs with Heterogeneous Graph Neural Networks

This article presents SME-HGT, a Heterogeneous Graph Transformer framework designed to identify high-potential small and medium enterpris...

arXiv - Machine Learning · 3 min ·
[2602.18583] Luna-2: Scalable Single-Token Evaluation with Small Language Models
Llms

[2602.18583] Luna-2: Scalable Single-Token Evaluation with Small Language Models

Luna-2 introduces a scalable architecture for single-token evaluation using small language models, enhancing accuracy and reducing costs ...

arXiv - Machine Learning · 4 min ·
[2602.19531] A Statistical Approach for Modeling Irregular Multivariate Time Series with Missing Observations
Machine Learning

[2602.19531] A Statistical Approach for Modeling Irregular Multivariate Time Series with Missing Observations

This paper presents a novel statistical method for modeling irregular multivariate time series with missing data, demonstrating superior ...

arXiv - AI · 4 min ·
[2602.18548] 1D-Bench: A Benchmark for Iterative UI Code Generation with Visual Feedback in Real-World
Data Science

[2602.18548] 1D-Bench: A Benchmark for Iterative UI Code Generation with Visual Feedback in Real-World

The paper introduces 1D-Bench, a benchmark for evaluating iterative UI code generation with visual feedback, aimed at improving design-to...

arXiv - AI · 4 min ·
[2602.18532] VLANeXt: Recipes for Building Strong VLA Models
Llms

[2602.18532] VLANeXt: Recipes for Building Strong VLA Models

The paper presents VLANeXt, a framework for building effective Vision-Language-Action (VLA) models, addressing inconsistencies in trainin...

arXiv - AI · 4 min ·
[2602.19455] SenTSR-Bench: Thinking with Injected Knowledge for Time-Series Reasoning
Llms

[2602.19455] SenTSR-Bench: Thinking with Injected Knowledge for Time-Series Reasoning

The paper introduces SenTSR-Bench, a framework that enhances time-series reasoning by integrating insights from specialized time-series l...

arXiv - AI · 4 min ·
[2602.18483] Red Teaming LLMs as Socio-Technical Practice: From Exploration and Data Creation to Evaluation
Llms

[2602.18483] Red Teaming LLMs as Socio-Technical Practice: From Exploration and Data Creation to Evaluation

The article examines red teaming as a socio-technical practice in evaluating large language models (LLMs), highlighting the importance of...

arXiv - AI · 4 min ·
[2602.18481] AlphaForgeBench: Benchmarking End-to-End Trading Strategy Design with Large Language Models
Llms

[2602.18481] AlphaForgeBench: Benchmarking End-to-End Trading Strategy Design with Large Language Models

The paper introduces AlphaForgeBench, a framework for evaluating trading strategies using Large Language Models (LLMs), addressing issues...

arXiv - AI · 4 min ·
[2602.19237] Evaluating SAP RPT-1 for Enterprise Business Process Prediction: In-Context Learning vs. Traditional Machine Learning on Structured SAP Data
Llms

[2602.19237] Evaluating SAP RPT-1 for Enterprise Business Process Prediction: In-Context Learning vs. Traditional Machine Learning on Structured SAP Data

This article evaluates SAP's RPT-1 model for enterprise business process prediction, comparing its performance against traditional machin...

arXiv - AI · 4 min ·
[2602.18458] The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research
Robotics

[2602.18458] The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research

The article presents a novel evaluation framework for mechanistic interpretability research, utilizing AI agents to enhance research rigo...

arXiv - Machine Learning · 3 min ·
[2602.18443] From "Help" to Helpful: A Hierarchical Assessment of LLMs in Mental e-Health Applications
Llms

[2602.18443] From "Help" to Helpful: A Hierarchical Assessment of LLMs in Mental e-Health Applications

This study evaluates the effectiveness of large language models (LLMs) in generating subject lines for mental health counseling emails, h...

arXiv - AI · 3 min ·
[2602.19068] TimeRadar: A Domain-Rotatable Foundation Model for Time Series Anomaly Detection
Llms

[2602.19068] TimeRadar: A Domain-Rotatable Foundation Model for Time Series Anomaly Detection

TimeRadar introduces a novel approach to time series anomaly detection using a domain-rotatable foundation model that enhances the differ...

arXiv - Machine Learning · 4 min ·
[2602.19367] Time Series, Vision, and Language: Exploring the Limits of Alignment in Contrastive Representation Spaces
Machine Learning

[2602.19367] Time Series, Vision, and Language: Exploring the Limits of Alignment in Contrastive Representation Spaces

This paper investigates the alignment of representations from time series, vision, and language modalities, revealing insights into their...

arXiv - AI · 4 min ·
[2602.18645] Adaptive Time Series Reasoning via Segment Selection
Machine Learning

[2602.18645] Adaptive Time Series Reasoning via Segment Selection

The paper presents ARTIST, a novel approach to time series reasoning that utilizes adaptive segment selection to improve accuracy in answ...

arXiv - Machine Learning · 4 min ·
[2602.19006] Evaluating Large Language Models on Quantum Mechanics: A Comparative Study Across Diverse Models and Tasks
Llms

[2602.19006] Evaluating Large Language Models on Quantum Mechanics: A Comparative Study Across Diverse Models and Tasks

This article evaluates 15 large language models on quantum mechanics problem-solving across diverse tasks, revealing performance stratifi...

arXiv - AI · 4 min ·
[2602.18613] Diagnosing LLM Reranker Behavior Under Fixed Evidence Pools
Llms

[2602.18613] Diagnosing LLM Reranker Behavior Under Fixed Evidence Pools

This paper presents a diagnostic method for evaluating LLM reranker behavior using fixed evidence pools, isolating ranking policies from ...

arXiv - Machine Learning · 3 min ·
Previous Page 53 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime