AI Startups

AI startup funding, launches, and acquisitions

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

[D] The problem with comparing AI memory system benchmarks — different evaluation methods make scores meaningless

I've been reviewing how various AI memory systems evaluate their performance and noticed a fundamental issue with cross-system comparison...

Reddit - Machine Learning · 1 min · about 2 hours ago

Machine Learning

Exclusive: Runway launches $10M fund, Builders program to support early stage AI startups | TechCrunch

Runway is launching a $10 million fund and startup program to back companies building with its AI video models, as it pushes toward inter...

TechCrunch - AI · 7 min · about 2 hours ago

Ai Startups

The Download: AI health tools and the Pentagon’s Anthropic culture war | MIT Technology Review

California has defied Trump's demands to stop AI regulation.

MIT Technology Review · 5 min · about 4 hours ago

All Content

Llms

[2602.22752] Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction

This article presents a study on the operational validity of using Large Language Models (LLMs) to simulate social media user behavior th...

arXiv - AI · 4 min · about 1 month ago

Ai Safety

[2602.22710] Same Words, Different Judgments: Modality Effects on Preference Alignment

This study explores how modality affects preference alignment in AI systems, comparing human and synthetic evaluations of audio and text ...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.22903] PSQE: A Theoretical-Practical Approach to Pseudo Seed Quality Enhancement for Unsupervised MMEA

The paper presents PSQE, a method for enhancing pseudo seed quality in unsupervised multimodal entity alignment, addressing challenges in...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.22827] TARAZ: Persian Short-Answer Question Benchmark for Cultural Evaluation of Language Models

The paper presents TARAZ, a Persian short-answer question benchmark designed to evaluate the cultural competence of large language models...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.22801] Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving

This article explores the application of diffusion models in end-to-end autonomous driving, demonstrating their effectiveness through ext...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.22570] Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation

The paper discusses the evaluation challenges in text-to-image generation, focusing on classifier-free guidance (CFG) and proposing a new...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.22609] EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning

EvolveGen introduces a novel framework for generating hardware model checking benchmarks using reinforcement learning, addressing the ben...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.22543] Ruyi2 Technical Report

The Ruyi2 Technical Report presents advancements in adaptive computing strategies for Large Language Models (LLMs), focusing on efficienc...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.22488] Explainability-Aware Evaluation of Transfer Learning Models for IoT DDoS Detection Under Resource Constraints

This article evaluates transfer learning models for IoT DDoS detection, focusing on explainability and resource constraints. It analyzes ...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.23305] A Proper Scoring Rule for Virtual Staining

The paper introduces a novel scoring rule for evaluating generative virtual staining models in high-throughput screening, emphasizing the...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.22221] Misinformation Exposure in the Chinese Web: A Cross-System Evaluation of Search Engines, LLMs, and AI Overviews

This article evaluates misinformation exposure on the Chinese web by comparing traditional search engines, LLMs, and AI-generated overvie...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.23060] RhythmBERT: A Self-Supervised Language Model Based on Latent Representations of ECG Waveforms for Heart Disease Detection

RhythmBERT is a novel self-supervised language model designed for ECG waveform analysis, enhancing heart disease detection by treating EC...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.22902] A Data-Driven Approach to Support Clinical Renal Replacement Therapy

This study explores a machine learning approach to predict membrane fouling in patients undergoing Continuous Renal Replacement Therapy (...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.23199] SC-Arena: A Natural Language Benchmark for Single-Cell Reasoning with Knowledge-Augmented Evaluation

SC-Arena introduces a natural language benchmark for evaluating single-cell reasoning in large language models, addressing gaps in curren...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.22831] Moral Preferences of LLMs Under Directed Contextual Influence

This paper explores how contextual influences affect the moral decision-making of large language models (LLMs) in scenarios akin to troll...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.23161] PATRA: Pattern-Aware Alignment and Balanced Reasoning for Time Series Question Answering

The paper presents PATRA, a novel model for Time Series Question Answering that enhances reasoning by incorporating pattern awareness and...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.22747] Set-based v.s. Distribution-based Representations of Epistemic Uncertainty: A Comparative Study

This study compares set-based and distribution-based representations of epistemic uncertainty in neural networks, highlighting their rela...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.22703] Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning

The paper presents GeoPerceive, a benchmark for evaluating geometric perception in vision-language models (VLMs), and introduces GeoDPO, ...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.22953] General Agent Evaluation

This paper introduces a framework for evaluating general-purpose agents, proposing a Unified Protocol and Exgentic framework, and benchma...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.22610] DP-aware AdaLN-Zero: Taming Conditioning-Induced Heavy-Tailed Gradients in Differentially Private Diffusion

The paper introduces DP-aware AdaLN-Zero, a novel mechanism to mitigate heavy-tailed gradients in differentially private diffusion models...

arXiv - Machine Learning · 4 min · about 1 month ago

Previous Page 40 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Startups

Top This Week

[D] The problem with comparing AI memory system benchmarks — different evaluation methods make scores meaningless

Exclusive: Runway launches $10M fund, Builders program to support early stage AI startups | TechCrunch

The Download: AI health tools and the Pentagon’s Anthropic culture war | MIT Technology Review

All Content

[2602.22752] Towards Simulating Social Media Users with LLMs: Evaluating the Operational Validity of Conditioned Comment Prediction

[2602.22710] Same Words, Different Judgments: Modality Effects on Preference Alignment

[2602.22903] PSQE: A Theoretical-Practical Approach to Pseudo Seed Quality Enhancement for Unsupervised MMEA

[2602.22827] TARAZ: Persian Short-Answer Question Benchmark for Cultural Evaluation of Language Models

[2602.22801] Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving

[2602.22570] Guidance Matters: Rethinking the Evaluation Pitfall for Text-to-Image Generation

[2602.22609] EvolveGen: Algorithmic Level Hardware Model Checking Benchmark Generation through Reinforcement Learning

[2602.22543] Ruyi2 Technical Report

[2602.22488] Explainability-Aware Evaluation of Transfer Learning Models for IoT DDoS Detection Under Resource Constraints

[2602.23305] A Proper Scoring Rule for Virtual Staining

[2602.22221] Misinformation Exposure in the Chinese Web: A Cross-System Evaluation of Search Engines, LLMs, and AI Overviews

[2602.23060] RhythmBERT: A Self-Supervised Language Model Based on Latent Representations of ECG Waveforms for Heart Disease Detection

[2602.22902] A Data-Driven Approach to Support Clinical Renal Replacement Therapy

[2602.23199] SC-Arena: A Natural Language Benchmark for Single-Cell Reasoning with Knowledge-Augmented Evaluation

[2602.22831] Moral Preferences of LLMs Under Directed Contextual Influence

[2602.23161] PATRA: Pattern-Aware Alignment and Balanced Reasoning for Time Series Question Answering

[2602.22747] Set-based v.s. Distribution-based Representations of Epistemic Uncertainty: A Comparative Study

[2602.22703] Enhancing Geometric Perception in VLMs via Translator-Guided Reinforcement Learning

[2602.22953] General Agent Evaluation

[2602.22610] DP-aware AdaLN-Zero: Taming Conditioning-Induced Heavy-Tailed Gradients in Differentially Private Diffusion

Related Topics

Stay updated with AI News