AI Startups

AI startup funding, launches, and acquisitions

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Llms

[2601.13227] Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

Abstract page for arXiv paper 2601.13227: Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

arXiv - AI · 3 min · about 2 hours ago

Llms

[2602.00095] EDU-CIRCUIT-HW: Evaluating Multimodal Large Language Models on Real-World University-Level STEM Student Handwritten Solutions

Abstract page for arXiv paper 2602.00095: EDU-CIRCUIT-HW: Evaluating Multimodal Large Language Models on Real-World University-Level STEM...

arXiv - AI · 4 min · about 2 hours ago

Nlp

[2601.13222] Incorporating Q&A Nuggets into Retrieval-Augmented Generation

Abstract page for arXiv paper 2601.13222: Incorporating Q&A Nuggets into Retrieval-Augmented Generation

arXiv - AI · 3 min · about 2 hours ago

All Content

Ai Startups

[2602.10541] FastLSQ: A Framework for One-Shot PDE Solving

Abstract page for arXiv paper 2602.10541: FastLSQ: A Framework for One-Shot PDE Solving

arXiv - Machine Learning · 3 min · 25 days ago

Llms

[2511.09396] Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque

Abstract page for arXiv paper 2511.09396: Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque

arXiv - AI · 3 min · 25 days ago

Ai Startups

[2510.26840] SpotIt: Evaluating Text-to-SQL Evaluation with Formal Verification

Abstract page for arXiv paper 2510.26840: SpotIt: Evaluating Text-to-SQL Evaluation with Formal Verification

arXiv - AI · 4 min · 25 days ago

Robotics

[2509.25106] Towards Personalized Deep Research: Benchmarks and Evaluations

Abstract page for arXiv paper 2509.25106: Towards Personalized Deep Research: Benchmarks and Evaluations

arXiv - AI · 4 min · 25 days ago

Machine Learning

[2602.05286] HealthMamba: An Uncertainty-aware Spatiotemporal Graph State Space Model for Effective and Reliable Healthcare Facility Visit Prediction

Abstract page for arXiv paper 2602.05286: HealthMamba: An Uncertainty-aware Spatiotemporal Graph State Space Model for Effective and Reli...

arXiv - AI · 4 min · 25 days ago

Llms

[2412.13091] LMUnit: Fine-grained Evaluation with Natural Language Unit Tests

Abstract page for arXiv paper 2412.13091: LMUnit: Fine-grained Evaluation with Natural Language Unit Tests

arXiv - AI · 3 min · 25 days ago

Machine Learning

[2509.22580] The Lie of the Average: How Class Incremental Learning Evaluation Deceives You?

Abstract page for arXiv paper 2509.22580: The Lie of the Average: How Class Incremental Learning Evaluation Deceives You?

arXiv - Machine Learning · 4 min · 25 days ago

Ai Startups

[2508.06066] Effective Sample Size and Generalization Bounds for Temporal Networks

Abstract page for arXiv paper 2508.06066: Effective Sample Size and Generalization Bounds for Temporal Networks

arXiv - AI · 4 min · 25 days ago

Llms

[2602.09937] Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis?

Abstract page for arXiv paper 2602.09937: Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis?

arXiv - AI · 4 min · 25 days ago

Llms

[2601.16529] SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care

Abstract page for arXiv paper 2601.16529: SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters fo...

arXiv - AI · 3 min · 25 days ago

Llms

[2509.21782] Benchmarking MLLM-based Web Understanding: Reasoning, Robustness and Safety

Abstract page for arXiv paper 2509.21782: Benchmarking MLLM-based Web Understanding: Reasoning, Robustness and Safety

arXiv - AI · 4 min · 25 days ago

Machine Learning

[2505.13033] TSPulse: Tiny Pre-Trained Models with Disentangled Representations for Rapid Time-Series Analysis

Abstract page for arXiv paper 2505.13033: TSPulse: Tiny Pre-Trained Models with Disentangled Representations for Rapid Time-Series Analysis

arXiv - AI · 4 min · 25 days ago

Llms

[2502.01534] Preference Leakage: A Contamination Problem in LLM-as-a-judge

Abstract page for arXiv paper 2502.01534: Preference Leakage: A Contamination Problem in LLM-as-a-judge

arXiv - AI · 4 min · 25 days ago

Ai Startups

[2412.06531] Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

Abstract page for arXiv paper 2412.06531: Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

arXiv - AI · 4 min · 25 days ago

Machine Learning

[2412.01654] FSMLP: Modelling Channel Dependencies With Simplex Theory Based Multi-Layer Perceptions In Frequency Domain

Abstract page for arXiv paper 2412.01654: FSMLP: Modelling Channel Dependencies With Simplex Theory Based Multi-Layer Perceptions In Freq...

arXiv - Machine Learning · 4 min · 25 days ago

Machine Learning

[2603.04356] RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots

Abstract page for arXiv paper 2603.04356: RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots

arXiv - AI · 4 min · 25 days ago

Ai Startups

[2603.04334] SpotIt+: Verification-based Text-to-SQL Evaluation with Database Constraints

Abstract page for arXiv paper 2603.04334: SpotIt+: Verification-based Text-to-SQL Evaluation with Database Constraints

arXiv - AI · 3 min · 25 days ago

Generative Ai

[2603.04325] Scalable Evaluation of the Realism of Synthetic Environmental Augmentations in Images

Abstract page for arXiv paper 2603.04325: Scalable Evaluation of the Realism of Synthetic Environmental Augmentations in Images

arXiv - Machine Learning · 4 min · 25 days ago

Machine Learning

[2603.04198] Stable and Steerable Sparse Autoencoders with Weight Regularization

Abstract page for arXiv paper 2603.04198: Stable and Steerable Sparse Autoencoders with Weight Regularization

arXiv - Machine Learning · 3 min · 25 days ago

Llms

[2603.04162] Bielik-Q2-Sharp: A Comparative Study of Extreme 2-bit Quantization Methods for a Polish 11B Language Model

Abstract page for arXiv paper 2603.04162: Bielik-Q2-Sharp: A Comparative Study of Extreme 2-bit Quantization Methods for a Polish 11B Lan...

arXiv - AI · 3 min · 25 days ago

Previous Page 19 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Startups

Top This Week

[2601.13227] Insider Knowledge: How Much Can RAG Systems Gain from Evaluation Secrets?

[2602.00095] EDU-CIRCUIT-HW: Evaluating Multimodal Large Language Models on Real-World University-Level STEM Student Handwritten Solutions

[2601.13222] Incorporating Q&A Nuggets into Retrieval-Augmented Generation

All Content

[2602.10541] FastLSQ: A Framework for One-Shot PDE Solving

[2511.09396] Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque

[2510.26840] SpotIt: Evaluating Text-to-SQL Evaluation with Formal Verification

[2509.25106] Towards Personalized Deep Research: Benchmarks and Evaluations

[2602.05286] HealthMamba: An Uncertainty-aware Spatiotemporal Graph State Space Model for Effective and Reliable Healthcare Facility Visit Prediction

[2412.13091] LMUnit: Fine-grained Evaluation with Natural Language Unit Tests

[2509.22580] The Lie of the Average: How Class Incremental Learning Evaluation Deceives You?

[2508.06066] Effective Sample Size and Generalization Bounds for Temporal Networks

[2602.09937] Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis?

[2601.16529] SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters for Emergency Care

[2509.21782] Benchmarking MLLM-based Web Understanding: Reasoning, Robustness and Safety

[2505.13033] TSPulse: Tiny Pre-Trained Models with Disentangled Representations for Rapid Time-Series Analysis

[2502.01534] Preference Leakage: A Contamination Problem in LLM-as-a-judge

[2412.06531] Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation

[2412.01654] FSMLP: Modelling Channel Dependencies With Simplex Theory Based Multi-Layer Perceptions In Frequency Domain

[2603.04356] RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots

[2603.04334] SpotIt+: Verification-based Text-to-SQL Evaluation with Database Constraints

[2603.04325] Scalable Evaluation of the Realism of Synthetic Environmental Augmentations in Images

[2603.04198] Stable and Steerable Sparse Autoencoders with Weight Regularization

[2603.04162] Bielik-Q2-Sharp: A Comparative Study of Extreme 2-bit Quantization Methods for a Polish 11B Language Model

Related Topics

Stay updated with AI News