AI Startups

AI startup funding, launches, and acquisitions

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Ai Startups

This AI startup envisions 100 Million New People Making Videogames

submitted by /u/sharkymcstevenson2 [link] [comments]

Reddit - Artificial Intelligence · 1 min · about 15 hours ago

Llms

A robot car with a Claude AI brain started a YouTube vlog about its own existence

Not a demo reel. Not a tutorial. A robot narrating its own experience — debugging, falling off shelves, questioning its identity. First-p...

Reddit - Artificial Intelligence · 1 min · about 17 hours ago

Ai Startups

Anthropic ramps up its political activities with a new PAC | TechCrunch

With the midterms right around the corner, the new group is positioned to back candidates who support the AI company's policy agenda.

TechCrunch - AI · 3 min · about 17 hours ago

All Content

Llms

[2602.12247] ExtractBench: A Benchmark and Evaluation Methodology for Complex Structured Extraction

ExtractBench introduces a benchmark and evaluation framework for extracting structured data from unstructured documents like PDFs, addres...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.01445] A Meta-Knowledge-Augmented LLM Framework for Hyperparameter Optimization in Time-Series Forecasting

The paper introduces LLM-AutoOpt, a novel framework that enhances hyperparameter optimization in time-series forecasting by integrating l...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2601.20802] Reinforcement Learning via Self-Distillation

This paper introduces Self-Distillation Policy Optimization (SDPO) for reinforcement learning, leveraging rich feedback to enhance learni...

arXiv - AI · 4 min · about 2 months ago

Llms

[2505.07671] Benchmarking Retrieval-Augmented Generation for Chemistry

This article presents ChemRAG-Bench, a benchmark for evaluating retrieval-augmented generation (RAG) in chemistry, demonstrating signific...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2601.16443] Endless Terminals: Scaling RL Environments for Terminal Agents

The paper presents 'Endless Terminals', a scalable reinforcement learning (RL) environment designed for training terminal agents through ...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2512.12832] Network Level Evaluation of Hangup Susceptibility of HRGCs using Deep Learning and Sensing Techniques: A Goal Towards Safer Future

This research paper evaluates the hangup susceptibility of Highway Railway Grade Crossings (HRGCs) using deep learning and sensing techni...

arXiv - AI · 4 min · about 2 months ago

Llms

[2510.13654] Challenges and Requirements for Benchmarking Time Series Foundation Models

This article discusses the challenges and requirements for benchmarking Time Series Foundation Models (TSFMs), highlighting issues of inf...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2406.04955] Experimental Evaluation of ROS-Causal in Real-World Human-Robot Spatial Interaction Scenarios

This article presents an experimental evaluation of ROS-Causal, a framework for causal discovery in human-robot spatial interactions, dem...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2303.09807] TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction

The paper presents TKN, a transformer-based neural network designed for real-time video prediction, achieving a remarkable prediction rat...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.12150] GPT-4o Lacks Core Features of Theory of Mind

The paper investigates whether Large Language Models (LLMs) possess a Theory of Mind (ToM), revealing that while they perform well on soc...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2510.02410] OpenTSLM: Time-Series Language Models for Reasoning over Multivariate Medical Text- and Time-Series Data

OpenTSLM introduces a new family of Time Series Language Models designed to enhance reasoning over multivariate medical data, outperformi...

arXiv - Machine Learning · 4 min · about 2 months ago

Ai Safety

[2602.08449] When Evaluation Becomes a Side Channel: Regime Leakage and Structural Mitigations for Alignment Assessment

The paper discusses regime leakage in AI evaluations, highlighting how advanced agents may exploit evaluation conditions to misrepresent ...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.06855] AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents

AIRS-Bench introduces a suite of 20 tasks designed to evaluate AI agents' capabilities in scientific research, highlighting areas of stre...

arXiv - AI · 4 min · about 2 months ago

Llms

[2602.00851] Persuasion Propagation in LLM Agents

The paper explores how user persuasion affects the behavior of large language model (LLM) agents during long-horizon tasks, revealing tha...

arXiv - AI · 3 min · about 2 months ago

Ai Startups

[2601.04911] From Stories to Cities to Games: A Qualitative Evaluation of Behaviour Planning

This paper evaluates a novel behaviour planning approach, demonstrating its effectiveness across diverse domains such as storytelling, ur...

arXiv - AI · 3 min · about 2 months ago

Llms

[2512.19027] Recontextualization Mitigates Specification Gaming without Modifying the Specification

The paper discusses a novel approach called recontextualization, which aims to reduce specification gaming in language models without alt...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2506.13593] Calibrated Predictive Lower Bounds on Time-to-Unsafe-Sampling in LLMs

This paper introduces a novel safety measure, time-to-unsafe-sampling, for evaluating generative models, focusing on predicting unsafe ou...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2510.10689] OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs

The paper introduces OmniVideoBench, a benchmark designed to evaluate audio-visual understanding in multimodal large language models (MLL...

arXiv - AI · 4 min · about 2 months ago

Llms

[2501.16178] SWIFT: Mapping Sub-series with Wavelet Decomposition Improves Time Series Forecasting

The paper presents SWIFT, a lightweight model that enhances time series forecasting using wavelet decomposition, achieving state-of-the-a...

arXiv - Machine Learning · 4 min · about 2 months ago

Ai Startups

[2410.19412] VCDF: A Validated Consensus-Driven Framework for Time Series Causal Discovery

The paper presents VCDF, a consensus-driven framework for enhancing the robustness of time series causal discovery, improving stability a...

arXiv - AI · 4 min · about 2 months ago

Previous Page 75 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Startups

Top This Week

This AI startup envisions 100 Million New People Making Videogames

A robot car with a Claude AI brain started a YouTube vlog about its own existence

Anthropic ramps up its political activities with a new PAC | TechCrunch

All Content

[2602.12247] ExtractBench: A Benchmark and Evaluation Methodology for Complex Structured Extraction

[2602.01445] A Meta-Knowledge-Augmented LLM Framework for Hyperparameter Optimization in Time-Series Forecasting

[2601.20802] Reinforcement Learning via Self-Distillation

[2505.07671] Benchmarking Retrieval-Augmented Generation for Chemistry

[2601.16443] Endless Terminals: Scaling RL Environments for Terminal Agents

[2512.12832] Network Level Evaluation of Hangup Susceptibility of HRGCs using Deep Learning and Sensing Techniques: A Goal Towards Safer Future

[2510.13654] Challenges and Requirements for Benchmarking Time Series Foundation Models

[2406.04955] Experimental Evaluation of ROS-Causal in Real-World Human-Robot Spatial Interaction Scenarios

[2303.09807] TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction

[2602.12150] GPT-4o Lacks Core Features of Theory of Mind

[2510.02410] OpenTSLM: Time-Series Language Models for Reasoning over Multivariate Medical Text- and Time-Series Data

[2602.08449] When Evaluation Becomes a Side Channel: Regime Leakage and Structural Mitigations for Alignment Assessment

[2602.06855] AIRS-Bench: a Suite of Tasks for Frontier AI Research Science Agents

[2602.00851] Persuasion Propagation in LLM Agents

[2601.04911] From Stories to Cities to Games: A Qualitative Evaluation of Behaviour Planning

[2512.19027] Recontextualization Mitigates Specification Gaming without Modifying the Specification

[2506.13593] Calibrated Predictive Lower Bounds on Time-to-Unsafe-Sampling in LLMs

[2510.10689] OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs

[2501.16178] SWIFT: Mapping Sub-series with Wavelet Decomposition Improves Time Series Forecasting

[2410.19412] VCDF: A Validated Consensus-Driven Framework for Time Series Causal Discovery

Related Topics

Stay updated with AI News