[P] LLM with a 9-line seed + 5 rounds of contrastive feedback outperforms Optuna on 96% of benchmarks
submitted by /u/se4u [link] [comments]
AI startup funding, launches, and acquisitions
submitted by /u/se4u [link] [comments]
OpenAI's decision last week to shut down Sora, its AI video-generation tool, just six months after releasing it to the public raised imme...
Many times when I try to deeply understand a topic in machine learning — whether it's a new architecture, a quantization method, a full t...
Netflix will use AI models developed by Affleck’s company InterPositive to change the way filmmakers produce projects.
GPT-5.4 is billed as "our most capable and efficient frontier model for professional work."
Luma introduced Luma Agents, powered by its new “Unified Intelligence” models, designed to coordinate multiple AI systems and generate en...
Called Automations, the new system gives users a way to automatically launch agents within their coding environment, triggered by a new a...
AI procurement startup Lio announced a $30 million Series A in a round led by Andreessen Horowitz.
On this episode of Build Mode, David Park joins Isabelle Johannessen to discuss how he and his team are intentionally iterating, fundrais...
Abstract page for arXiv paper 2602.10541: FastLSQ: A Framework for One-Shot PDE Solving
Abstract page for arXiv paper 2511.09396: Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque
Abstract page for arXiv paper 2510.26840: SpotIt: Evaluating Text-to-SQL Evaluation with Formal Verification
Abstract page for arXiv paper 2509.25106: Towards Personalized Deep Research: Benchmarks and Evaluations
Abstract page for arXiv paper 2602.05286: HealthMamba: An Uncertainty-aware Spatiotemporal Graph State Space Model for Effective and Reli...
Abstract page for arXiv paper 2412.13091: LMUnit: Fine-grained Evaluation with Natural Language Unit Tests
Abstract page for arXiv paper 2509.22580: The Lie of the Average: How Class Incremental Learning Evaluation Deceives You?
Abstract page for arXiv paper 2508.06066: Effective Sample Size and Generalization Bounds for Temporal Networks
Abstract page for arXiv paper 2602.09937: Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis?
Abstract page for arXiv paper 2601.16529: SycoEval-EM: Sycophancy Evaluation of Large Language Models in Simulated Clinical Encounters fo...
Abstract page for arXiv paper 2509.21782: Benchmarking MLLM-based Web Understanding: Reasoning, Robustness and Safety
Abstract page for arXiv paper 2505.13033: TSPulse: Tiny Pre-Trained Models with Disentangled Representations for Rapid Time-Series Analysis
Abstract page for arXiv paper 2502.01534: Preference Leakage: A Contamination Problem in LLM-as-a-judge
Abstract page for arXiv paper 2412.06531: Unraveling the Complexity of Memory in RL Agents: an Approach for Classification and Evaluation
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime