AI Startups

AI startup funding, launches, and acquisitions

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Ai Startups

Inside Real Estate Launches Streams AI Mobile App to Boost Agent Productivity and Response

Inside Real Estate launched Streams, an AI-powered mobile app that delivers real-time lead insights, follow-ups and productivity tools to...

AI Tools & Products · 5 min · about 6 hours ago

Machine Learning

[2603.05659] When Rubrics Fail: Error Enumeration as Reward in Reference-Free RL Post-Training for Virtual Try-On

Abstract page for arXiv paper 2603.05659: When Rubrics Fail: Error Enumeration as Reward in Reference-Free RL Post-Training for Virtual T...

arXiv - AI · 4 min · about 7 hours ago

Machine Learning

[2512.16081] Evaluation of Generative Models for Emotional 3D Animation Generation in VR

Abstract page for arXiv paper 2512.16081: Evaluation of Generative Models for Emotional 3D Animation Generation in VR

arXiv - AI · 4 min · about 7 hours ago

All Content

Llms

[2602.18891] Orchestrating LLM Agents for Scientific Research: A Pilot Study of Multiple Choice Question (MCQ) Generation and Evaluation

This pilot study explores the orchestration of LLM agents in scientific research, focusing on the generation and evaluation of multiple-c...

arXiv - AI · 4 min · about 1 month ago

$[2602.18806] Think$^{2}$: Grounded Metacognitive Reasoning in Large Language Models$

Llms

[2602.18806] Think$^{2}$: Grounded Metacognitive Reasoning in Large Language Models

The paper presents a metacognitive framework for Large Language Models (LLMs) that enhances their reasoning capabilities by integrating p...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.18729] MiSCHiEF: A Benchmark in Minimal-Pairs of Safety and Culture for Holistic Evaluation of Fine-Grained Image-Caption Alignment

The paper presents MiSCHiEF, a benchmark for evaluating fine-grained image-caption alignment, focusing on safety and cultural contexts, h...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.19619] Is Your Diffusion Sampler Actually Correct? A Sampler-Centric Evaluation of Discrete Diffusion Language Models

This article evaluates the accuracy of discrete diffusion language models (dLLMs) through a sampler-centric framework, revealing signific...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.19591] Detecting High-Potential SMEs with Heterogeneous Graph Neural Networks

This article presents SME-HGT, a Heterogeneous Graph Transformer framework designed to identify high-potential small and medium enterpris...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.18583] Luna-2: Scalable Single-Token Evaluation with Small Language Models

Luna-2 introduces a scalable architecture for single-token evaluation using small language models, enhancing accuracy and reducing costs ...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.19531] A Statistical Approach for Modeling Irregular Multivariate Time Series with Missing Observations

This paper presents a novel statistical method for modeling irregular multivariate time series with missing data, demonstrating superior ...

arXiv - AI · 4 min · about 1 month ago

Data Science

[2602.18548] 1D-Bench: A Benchmark for Iterative UI Code Generation with Visual Feedback in Real-World

The paper introduces 1D-Bench, a benchmark for evaluating iterative UI code generation with visual feedback, aimed at improving design-to...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.18532] VLANeXt: Recipes for Building Strong VLA Models

The paper presents VLANeXt, a framework for building effective Vision-Language-Action (VLA) models, addressing inconsistencies in trainin...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.19455] SenTSR-Bench: Thinking with Injected Knowledge for Time-Series Reasoning

The paper introduces SenTSR-Bench, a framework that enhances time-series reasoning by integrating insights from specialized time-series l...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.18483] Red Teaming LLMs as Socio-Technical Practice: From Exploration and Data Creation to Evaluation

The article examines red teaming as a socio-technical practice in evaluating large language models (LLMs), highlighting the importance of...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.18481] AlphaForgeBench: Benchmarking End-to-End Trading Strategy Design with Large Language Models

The paper introduces AlphaForgeBench, a framework for evaluating trading strategies using Large Language Models (LLMs), addressing issues...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.19237] Evaluating SAP RPT-1 for Enterprise Business Process Prediction: In-Context Learning vs. Traditional Machine Learning on Structured SAP Data

This article evaluates SAP's RPT-1 model for enterprise business process prediction, comparing its performance against traditional machin...

arXiv - AI · 4 min · about 1 month ago

Robotics

[2602.18458] The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research

The article presents a novel evaluation framework for mechanistic interpretability research, utilizing AI agents to enhance research rigo...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.18443] From "Help" to Helpful: A Hierarchical Assessment of LLMs in Mental e-Health Applications

This study evaluates the effectiveness of large language models (LLMs) in generating subject lines for mental health counseling emails, h...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.19068] TimeRadar: A Domain-Rotatable Foundation Model for Time Series Anomaly Detection

TimeRadar introduces a novel approach to time series anomaly detection using a domain-rotatable foundation model that enhances the differ...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.19367] Time Series, Vision, and Language: Exploring the Limits of Alignment in Contrastive Representation Spaces

This paper investigates the alignment of representations from time series, vision, and language modalities, revealing insights into their...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.18645] Adaptive Time Series Reasoning via Segment Selection

The paper presents ARTIST, a novel approach to time series reasoning that utilizes adaptive segment selection to improve accuracy in answ...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.19006] Evaluating Large Language Models on Quantum Mechanics: A Comparative Study Across Diverse Models and Tasks

This article evaluates 15 large language models on quantum mechanics problem-solving across diverse tasks, revealing performance stratifi...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.18613] Diagnosing LLM Reranker Behavior Under Fixed Evidence Pools

This paper presents a diagnostic method for evaluating LLM reranker behavior using fixed evidence pools, isolating ranking policies from ...

arXiv - Machine Learning · 3 min · about 1 month ago

Previous Page 53 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Startups

Top This Week

Inside Real Estate Launches Streams AI Mobile App to Boost Agent Productivity and Response

[2603.05659] When Rubrics Fail: Error Enumeration as Reward in Reference-Free RL Post-Training for Virtual Try-On

[2512.16081] Evaluation of Generative Models for Emotional 3D Animation Generation in VR

All Content

[2602.18891] Orchestrating LLM Agents for Scientific Research: A Pilot Study of Multiple Choice Question (MCQ) Generation and Evaluation

[2602.18806] Think$^{2}$: Grounded Metacognitive Reasoning in Large Language Models

[2602.18729] MiSCHiEF: A Benchmark in Minimal-Pairs of Safety and Culture for Holistic Evaluation of Fine-Grained Image-Caption Alignment

[2602.19619] Is Your Diffusion Sampler Actually Correct? A Sampler-Centric Evaluation of Discrete Diffusion Language Models

[2602.19591] Detecting High-Potential SMEs with Heterogeneous Graph Neural Networks

[2602.18583] Luna-2: Scalable Single-Token Evaluation with Small Language Models

[2602.19531] A Statistical Approach for Modeling Irregular Multivariate Time Series with Missing Observations

[2602.18548] 1D-Bench: A Benchmark for Iterative UI Code Generation with Visual Feedback in Real-World

[2602.18532] VLANeXt: Recipes for Building Strong VLA Models

[2602.19455] SenTSR-Bench: Thinking with Injected Knowledge for Time-Series Reasoning

[2602.18483] Red Teaming LLMs as Socio-Technical Practice: From Exploration and Data Creation to Evaluation

[2602.18481] AlphaForgeBench: Benchmarking End-to-End Trading Strategy Design with Large Language Models

[2602.19237] Evaluating SAP RPT-1 for Enterprise Business Process Prediction: In-Context Learning vs. Traditional Machine Learning on Structured SAP Data

[2602.18458] The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research

[2602.18443] From "Help" to Helpful: A Hierarchical Assessment of LLMs in Mental e-Health Applications

[2602.19068] TimeRadar: A Domain-Rotatable Foundation Model for Time Series Anomaly Detection

[2602.19367] Time Series, Vision, and Language: Exploring the Limits of Alignment in Contrastive Representation Spaces

[2602.18645] Adaptive Time Series Reasoning via Segment Selection

[2602.19006] Evaluating Large Language Models on Quantum Mechanics: A Comparative Study Across Diverse Models and Tasks

[2602.18613] Diagnosing LLM Reranker Behavior Under Fixed Evidence Pools

Related Topics

Stay updated with AI News