Machine Learning

ML algorithms, training, and inference

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

OpenAI talks about not talking about goblins | The Verge

References to goblins and gremlins spiked with the release of GPT-5.1’s ‘Nerdy’ personality, and then spread to other models.

The Verge - AI · 4 min · about 1 hour ago

Llms

Comparing SVG generation for top models

These are the top open and closed model: Opus 4.7, GPT-5.5 Pro, DeepSeek V4, GLM-5.1 and Gemini 3.1 Pro. They both show similar performan...

Reddit - Artificial Intelligence · 1 min · about 3 hours ago

Llms

The more young people use AI, the more they hate it | The Verge

Caught between fears of job loss and social stigma, Gen Z’s opinions of AI are hitting new lows.

The Verge - AI · 13 min · about 3 hours ago

All Content

Llms

[2603.10047] Toward Epistemic Stability: Engineering Consistent Procedures for Industrial LLM Hallucination Reduction

Abstract page for arXiv paper 2603.10047: Toward Epistemic Stability: Engineering Consistent Procedures for Industrial LLM Hallucination ...

arXiv - AI · 4 min · 23 days ago

Machine Learning

[2603.09030] PlayWorld: Learning Robot World Models from Autonomous Play

Abstract page for arXiv paper 2603.09030: PlayWorld: Learning Robot World Models from Autonomous Play

arXiv - AI · 4 min · 23 days ago

Llms

[2602.08392] ST-BiBench: Benchmarking Multi-Stream Multimodal Coordination in Bimanual Embodied Tasks for MLLMs

Abstract page for arXiv paper 2602.08392: ST-BiBench: Benchmarking Multi-Stream Multimodal Coordination in Bimanual Embodied Tasks for MLLMs

arXiv - AI · 4 min · 23 days ago

Llms

[2601.11109] Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning

Abstract page for arXiv paper 2601.11109: Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning

arXiv - AI · 3 min · 23 days ago

Machine Learning

[2601.08565] Rewriting Video: Text-Driven Reauthoring of Video Footage

Abstract page for arXiv paper 2601.08565: Rewriting Video: Text-Driven Reauthoring of Video Footage

arXiv - AI · 3 min · 23 days ago

Machine Learning

[2512.18388] Exploration vs. Fixation: Scaffolding Divergent and Convergent Thinking for Human-AI Co-Creation with Generative Models

Abstract page for arXiv paper 2512.18388: Exploration vs. Fixation: Scaffolding Divergent and Convergent Thinking for Human-AI Co-Creatio...

arXiv - AI · 4 min · 23 days ago

Llms

[2601.00263] Parallel Universes, Parallel Languages: A Comprehensive Study on LLM-based Multilingual Counterfactual Example Generation

Abstract page for arXiv paper 2601.00263: Parallel Universes, Parallel Languages: A Comprehensive Study on LLM-based Multilingual Counter...

arXiv - AI · 4 min · 23 days ago

Llms

[2512.11919] A fine-grained look at causal effects in causal spaces

Abstract page for arXiv paper 2512.11919: A fine-grained look at causal effects in causal spaces

arXiv - AI · 4 min · 23 days ago

Llms

[2510.15746] LLMs Judge Themselves: A Game-Theoretic Framework for Human-Aligned Evaluation

Abstract page for arXiv paper 2510.15746: LLMs Judge Themselves: A Game-Theoretic Framework for Human-Aligned Evaluation

arXiv - AI · 4 min · 23 days ago

Llms

[2511.06448] When AI Agents Collude Online: Financial Fraud Risks by Collaborative LLM Agents on Social Platforms

Abstract page for arXiv paper 2511.06448: When AI Agents Collude Online: Financial Fraud Risks by Collaborative LLM Agents on Social Plat...

arXiv - AI · 4 min · 23 days ago

Machine Learning

[2511.06391] HatePrototypes: Interpretable and Transferable Representations for Implicit and Explicit Hate Speech Detection

Abstract page for arXiv paper 2511.06391: HatePrototypes: Interpretable and Transferable Representations for Implicit and Explicit Hate S...

arXiv - AI · 4 min · 23 days ago

Llms

[2510.25890] ATLAS: A Layered Constraint-Guided Framework for Structured Artifact Generation in LLM-Assisted MDE

Abstract page for arXiv paper 2510.25890: ATLAS: A Layered Constraint-Guided Framework for Structured Artifact Generation in LLM-Assisted...

arXiv - AI · 4 min · 23 days ago

Llms

[2510.15148] XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

Abstract page for arXiv paper 2510.15148: XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

arXiv - AI · 4 min · 23 days ago

Llms

[2510.13829] A Linguistics-Aware LLM Watermarking via Syntactic Predictability

Abstract page for arXiv paper 2510.13829: A Linguistics-Aware LLM Watermarking via Syntactic Predictability

arXiv - AI · 3 min · 23 days ago

Llms

[2510.06800] FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipeline

Abstract page for arXiv paper 2510.06800: FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipe...

arXiv - AI · 4 min · 23 days ago

Llms

[2509.24186] Measuring Competency, Not Performance: Item-Aware Evaluation Across Medical Benchmarks

Abstract page for arXiv paper 2509.24186: Measuring Competency, Not Performance: Item-Aware Evaluation Across Medical Benchmarks

arXiv - AI · 4 min · 23 days ago

Machine Learning

[2509.23279] Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing

Abstract page for arXiv paper 2509.23279: Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing

arXiv - AI · 3 min · 23 days ago

Llms

[2509.22258] Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks

Abstract page for arXiv paper 2509.22258: Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks

arXiv - AI · 4 min · 23 days ago

Llms

[2509.05892] Challenges in Deep Learning-Based Small Organ Segmentation: A Benchmarking Perspective for Medical Research with Limited Datasets

Abstract page for arXiv paper 2509.05892: Challenges in Deep Learning-Based Small Organ Segmentation: A Benchmarking Perspective for Medi...

arXiv - AI · 4 min · 23 days ago

Llms

[2506.13130] ZINA: Multimodal Fine-grained Hallucination Detection and Editing

Abstract page for arXiv paper 2506.13130: ZINA: Multimodal Fine-grained Hallucination Detection and Editing

arXiv - AI · 3 min · 23 days ago

Previous Page 292 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

Machine Learning

Top This Week

OpenAI talks about not talking about goblins | The Verge

Comparing SVG generation for top models

The more young people use AI, the more they hate it | The Verge

All Content

[2603.10047] Toward Epistemic Stability: Engineering Consistent Procedures for Industrial LLM Hallucination Reduction

[2603.09030] PlayWorld: Learning Robot World Models from Autonomous Play

[2602.08392] ST-BiBench: Benchmarking Multi-Stream Multimodal Coordination in Bimanual Embodied Tasks for MLLMs

[2601.11109] Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning

[2601.08565] Rewriting Video: Text-Driven Reauthoring of Video Footage

[2512.18388] Exploration vs. Fixation: Scaffolding Divergent and Convergent Thinking for Human-AI Co-Creation with Generative Models

[2601.00263] Parallel Universes, Parallel Languages: A Comprehensive Study on LLM-based Multilingual Counterfactual Example Generation

[2512.11919] A fine-grained look at causal effects in causal spaces

[2510.15746] LLMs Judge Themselves: A Game-Theoretic Framework for Human-Aligned Evaluation

[2511.06448] When AI Agents Collude Online: Financial Fraud Risks by Collaborative LLM Agents on Social Platforms

[2511.06391] HatePrototypes: Interpretable and Transferable Representations for Implicit and Explicit Hate Speech Detection

[2510.25890] ATLAS: A Layered Constraint-Guided Framework for Structured Artifact Generation in LLM-Assisted MDE

[2510.15148] XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models

[2510.13829] A Linguistics-Aware LLM Watermarking via Syntactic Predictability

[2510.06800] FURINA: A Fully Customizable Role-Playing Benchmark via Scalable Multi-Agent Collaboration Pipeline

[2509.24186] Measuring Competency, Not Performance: Item-Aware Evaluation Across Medical Benchmarks

[2509.23279] Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing

[2509.22258] Beyond Classification Accuracy: Neural-MedBench and the Need for Deeper Reasoning Benchmarks

[2509.05892] Challenges in Deep Learning-Based Small Organ Segmentation: A Benchmarking Perspective for Medical Research with Limited Datasets

[2506.13130] ZINA: Multimodal Fine-grained Hallucination Detection and Editing

Related Topics

Stay updated with AI News