Machine Learning

ML algorithms, training, and inference

This Week's Best | Monthly Best | Guide | Trending

RSS

Top This Week

Machine Learning

GitHub rushed to fix a critical vulnerability in less than six hours | The Verge

A critical remote code execution vulnerability was discovered using an AI model and patched within hours.

The Verge - AI · 4 min · about 1 hour ago

Machine Learning

Coby Adcock's Scout AI raises $100 million to train its models for war. We visited its bootcamp. | TechCrunch

We visited Scout AI's training ground where it's working on AI agents that give individual soldiers control of fleets of autonomous vehic...

TechCrunch - AI · 11 min · about 1 hour ago

Llms

General Motors is adding Gemini to four million cars | The Verge

General Motors is planning to bring Google’s Gemini AI assistant to around four million vehicles across the US.

The Verge - AI · 4 min · about 3 hours ago

All Content

Llms

[2511.06448] When AI Agents Collude Online: Financial Fraud Risks by Collaborative LLM Agents on Social Platforms

Abstract page for arXiv paper 2511.06448: When AI Agents Collude Online: Financial Fraud Risks by Collaborative LLM Agents on Social Plat...

arXiv - AI · 4 min · 22 days ago

Machine Learning

[2511.06391] HatePrototypes: Interpretable and Transferable Representations for Implicit and Explicit Hate Speech Detection

Abstract page for arXiv paper 2511.06391: HatePrototypes: Interpretable and Transferable Representations for Implicit and Explicit Hate S...

arXiv - AI · 4 min · 22 days ago

Llms

[2510.25890] ATLAS: A Layered Constraint-Guided Framework for Structured Artifact Generation in LLM-Assisted MDE

Abstract page for arXiv paper 2510.25890: ATLAS: A Layered Constraint-Guided Framework for Structured Artifact Generation in LLM-Assisted...

arXiv - AI · 4 min · 22 days ago

Llms

[2510.15148] XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

Abstract page for arXiv paper 2510.15148: XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

arXiv - AI · 4 min · 22 days ago

Llms

[2510.13829] A Linguistics-Aware LLM Watermarking via Syntactic Predictability

Abstract page for arXiv paper 2510.13829: A Linguistics-Aware LLM Watermarking via Syntactic Predictability

arXiv - AI · 3 min · 22 days ago

Llms

[2510.06800] FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipeline

Abstract page for arXiv paper 2510.06800: FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipe...

arXiv - AI · 4 min · 22 days ago

Llms

[2509.24186] Measuring Competency, Not Performance: Item-Aware Evaluation Across Medical Benchmarks

Abstract page for arXiv paper 2509.24186: Measuring Competency, Not Performance: Item-Aware Evaluation Across Medical Benchmarks

arXiv - AI · 4 min · 22 days ago

Machine Learning

[2509.23279] Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing

Abstract page for arXiv paper 2509.23279: Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing

arXiv - AI · 3 min · 22 days ago

Llms

[2509.22258] Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks

Abstract page for arXiv paper 2509.22258: Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks

arXiv - AI · 4 min · 22 days ago

Llms

[2509.05892] Challenges in Deep Learning-Based Small Organ Segmentation: A Benchmarking Perspective for Medical Research with Limited Datasets

Abstract page for arXiv paper 2509.05892: Challenges in Deep Learning-Based Small Organ Segmentation: A Benchmarking Perspective for Medi...

arXiv - AI · 4 min · 22 days ago

Llms

[2506.13130] ZINA: Multimodal Fine-grained Hallucination Detection and Editing

Abstract page for arXiv paper 2506.13130: ZINA: Multimodal Fine-grained Hallucination Detection and Editing

arXiv - AI · 3 min · 22 days ago

Llms

[2506.09749] Large Language Models for Combinatorial Optimization of Design Structure Matrix

Abstract page for arXiv paper 2506.09749: Large Language Models for Combinatorial Optimization of Design Structure Matrix

arXiv - AI · 4 min · 22 days ago

Llms

[2505.15925] VERDI: VLM-Embedded Reasoning for Autonomous Driving

Abstract page for arXiv paper 2505.15925: VERDI: VLM-Embedded Reasoning for Autonomous Driving

arXiv - AI · 4 min · 22 days ago

Machine Learning

[2503.12575] BalancedDPO: Adaptive Multi-Metric Alignment

Abstract page for arXiv paper 2503.12575: BalancedDPO: Adaptive Multi-Metric Alignment

arXiv - AI · 4 min · 22 days ago

Llms

[2503.11572] Implicit Bias-Like Patterns in Reasoning Models

Abstract page for arXiv paper 2503.11572: Implicit Bias-Like Patterns in Reasoning Models

arXiv - AI · 3 min · 22 days ago

Llms

[2501.11782] Human-AI Collaborative Game Testing with Vision Language Models

Abstract page for arXiv paper 2501.11782: Human-AI Collaborative Game Testing with Vision Language Models

arXiv - AI · 4 min · 22 days ago

Machine Learning

[2501.07813] Talk to Right Specialists: Iterative Routing in Multi-agent Systems for Question Answering

Abstract page for arXiv paper 2501.07813: Talk to Right Specialists: Iterative Routing in Multi-agent Systems for Question Answering

arXiv - AI · 4 min · 22 days ago

Llms

[2408.11871] MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models

Abstract page for arXiv paper 2408.11871: MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models

arXiv - AI · 3 min · 22 days ago

Llms

[2406.14194] VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model

Abstract page for arXiv paper 2406.14194: VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model

arXiv - AI · 4 min · 22 days ago

Llms

[2604.01438] ClawSafety: "Safe" LLMs, Unsafe Agents

Abstract page for arXiv paper 2604.01438: ClawSafety: "Safe" LLMs, Unsafe Agents

arXiv - AI · 4 min · 22 days ago

Previous Page 275 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Machine Learning

Top This Week

GitHub rushed to fix a critical vulnerability in less than six hours | The Verge

Coby Adcock's Scout AI raises $100 million to train its models for war. We visited its bootcamp. | TechCrunch

General Motors is adding Gemini to four million cars | The Verge

All Content

[2511.06448] When AI Agents Collude Online: Financial Fraud Risks by Collaborative LLM Agents on Social Platforms

[2511.06391] HatePrototypes: Interpretable and Transferable Representations for Implicit and Explicit Hate Speech Detection

[2510.25890] ATLAS: A Layered Constraint-Guided Framework for Structured Artifact Generation in LLM-Assisted MDE

[2510.15148] XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

[2510.13829] A Linguistics-Aware LLM Watermarking via Syntactic Predictability

[2510.06800] FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipeline

[2509.24186] Measuring Competency, Not Performance: Item-Aware Evaluation Across Medical Benchmarks

[2509.23279] Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing

[2509.22258] Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks

[2509.05892] Challenges in Deep Learning-Based Small Organ Segmentation: A Benchmarking Perspective for Medical Research with Limited Datasets

[2506.13130] ZINA: Multimodal Fine-grained Hallucination Detection and Editing

[2506.09749] Large Language Models for Combinatorial Optimization of Design Structure Matrix

[2505.15925] VERDI: VLM-Embedded Reasoning for Autonomous Driving

[2503.12575] BalancedDPO: Adaptive Multi-Metric Alignment

[2503.11572] Implicit Bias-Like Patterns in Reasoning Models

[2501.11782] Human-AI Collaborative Game Testing with Vision Language Models

[2501.07813] Talk to Right Specialists: Iterative Routing in Multi-agent Systems for Question Answering

[2408.11871] MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models

[2406.14194] VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model

[2604.01438] ClawSafety: "Safe" LLMs, Unsafe Agents

Related Topics

Stay updated with AI News