GitHub rushed to fix a critical vulnerability in less than six hours | The Verge
A critical remote code execution vulnerability was discovered using an AI model and patched within hours.
ML algorithms, training, and inference
A critical remote code execution vulnerability was discovered using an AI model and patched within hours.
We visited Scout AI's training ground where it's working on AI agents that give individual soldiers control of fleets of autonomous vehic...
General Motors is planning to bring Google’s Gemini AI assistant to around four million vehicles across the US.
Abstract page for arXiv paper 2511.06448: When AI Agents Collude Online: Financial Fraud Risks by Collaborative LLM Agents on Social Plat...
Abstract page for arXiv paper 2511.06391: HatePrototypes: Interpretable and Transferable Representations for Implicit and Explicit Hate S...
Abstract page for arXiv paper 2510.25890: ATLAS: A Layered Constraint-Guided Framework for Structured Artifact Generation in LLM-Assisted...
Abstract page for arXiv paper 2510.15148: XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
Abstract page for arXiv paper 2510.13829: A Linguistics-Aware LLM Watermarking via Syntactic Predictability
Abstract page for arXiv paper 2510.06800: FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipe...
Abstract page for arXiv paper 2509.24186: Measuring Competency, Not Performance: Item-Aware Evaluation Across Medical Benchmarks
Abstract page for arXiv paper 2509.23279: Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing
Abstract page for arXiv paper 2509.22258: Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks
Abstract page for arXiv paper 2509.05892: Challenges in Deep Learning-Based Small Organ Segmentation: A Benchmarking Perspective for Medi...
Abstract page for arXiv paper 2506.13130: ZINA: Multimodal Fine-grained Hallucination Detection and Editing
Abstract page for arXiv paper 2506.09749: Large Language Models for Combinatorial Optimization of Design Structure Matrix
Abstract page for arXiv paper 2505.15925: VERDI: VLM-Embedded Reasoning for Autonomous Driving
Abstract page for arXiv paper 2503.12575: BalancedDPO: Adaptive Multi-Metric Alignment
Abstract page for arXiv paper 2503.11572: Implicit Bias-Like Patterns in Reasoning Models
Abstract page for arXiv paper 2501.11782: Human-AI Collaborative Game Testing with Vision Language Models
Abstract page for arXiv paper 2501.07813: Talk to Right Specialists: Iterative Routing in Multi-agent Systems for Question Answering
Abstract page for arXiv paper 2408.11871: MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models
Abstract page for arXiv paper 2406.14194: VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model
Abstract page for arXiv paper 2604.01438: ClawSafety: "Safe" LLMs, Unsafe Agents
Get the latest news, tools, and insights delivered to your inbox.
Daily or weekly digest • Unsubscribe anytime