AI Infrastructure

GPUs, training clusters, MLOps, and deployment

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

Seeking Critique on Research Approach to Open Set Recognition (Novelty Detection) [R]

Hey guys, I'm an independent researcher working on a project that tries to address a very specific failure mode in LLMs and embedding bas...

Reddit - Machine Learning · 1 min · about 2 hours ago

Machine Learning

What if attention didn’t need matrix multiplication?

I built a cognitive architecture where all computation reduces to three bit operations: XOR, MAJ, POPCNT. No GEMM. No GPU. No floating-po...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Machine Learning

WTF. Its real. AllBirds (the shoe company) is pivoting to inference.

I'm profoundly ambivalent re: how to feel about this; is it great -- what a scrappy, bold pivot! Or wildly dumb - its so far from their c...

Reddit - Artificial Intelligence · 1 min · about 6 hours ago

All Content

Machine Learning

[2510.03272] Where to Add PDE Diffusion in Transformers

This paper investigates the optimal placement of PDE diffusion layers in transformer architectures, revealing that their insertion order ...

arXiv - AI · 4 min · about 2 months ago

Ai Safety

[2602.08449] When Evaluation Becomes a Side Channel: Regime Leakage and Structural Mitigations for Alignment Assessment

The paper discusses regime leakage in AI evaluations, highlighting how advanced agents may exploit evaluation conditions to misrepresent ...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2602.07849] LQA: A Lightweight Quantized-Adaptive Framework for Vision-Language Models on the Edge

The paper presents LQA, a lightweight quantized-adaptive framework designed to enhance the deployment of Vision-Language Models (VLMs) on...

arXiv - AI · 3 min · about 2 months ago

Llms

[2509.23106] Effective Quantization of Muon Optimizer States

The paper presents the 8-bit Muon optimizer, which enhances computational efficiency and reduces memory usage in large-scale machine lear...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2509.22067] The Rogue Scalpel: Activation Steering Compromises LLM Safety

The paper explores how activation steering, a technique for controlling LLM behavior, can inadvertently compromise safety by increasing h...

arXiv - AI · 3 min · about 2 months ago

Ai Agents

[2602.01848] ROMA: Recursive Open Meta-Agent Framework for Long-Horizon Multi-Agent Systems

The paper introduces ROMA, a Recursive Open Meta-Agent Framework designed to enhance performance in long-horizon multi-agent systems by a...

arXiv - AI · 4 min · about 2 months ago

Llms

[2601.21972] Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic

The paper presents Multi-Agent Actor-Critic (MAAC) methods for optimizing decentralized collaboration among large language models (LLMs),...

arXiv - AI · 4 min · about 2 months ago

Llms

[2508.13415] MAVIS: Multi-Objective Alignment via Inference-Time Value-Guided Selection

The paper introduces MAVIS, a framework for aligning large language models (LLMs) to multiple objectives at inference time, enhancing fle...

arXiv - Machine Learning · 4 min · about 2 months ago

Ai Safety

[2601.08005] Internal Deployment Gaps in AI Regulation

This article examines the regulatory gaps in AI deployment within organizations, highlighting issues that allow internal systems to evade...

arXiv - AI · 3 min · about 2 months ago

Machine Learning

[2507.12549] The Serial Scaling Hypothesis

The article presents the Serial Scaling Hypothesis, which identifies limitations in current parallel computing architectures for inherent...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2511.11079] ARCTraj: A Dataset and Benchmark of Human Reasoning Trajectories for Abstract Problem Solving

ARCTraj introduces a dataset and framework for modeling human reasoning in abstract problem-solving, providing insights into the iterativ...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2511.07262] AgenticSciML: Collaborative Multi-Agent Systems for Emergent Discovery in Scientific Machine Learning

The paper introduces AgenticSciML, a multi-agent system designed to enhance scientific machine learning through collaborative reasoning, ...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2511.06185] Dataforge: Agentic Platform for Autonomous Data Engineering

The article presents Dataforge, an LLM-powered platform designed to automate data engineering processes, enhancing efficiency in preparin...

arXiv - AI · 3 min · about 2 months ago

Llms

[2506.13593] Calibrated Predictive Lower Bounds on Time-to-Unsafe-Sampling in LLMs

This paper introduces a novel safety measure, time-to-unsafe-sampling, for evaluating generative models, focusing on predicting unsafe ou...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2506.11087] Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization

This article presents PrinMix, a new SVD-based framework for enhancing delta compression in large language models (LLMs), addressing stor...

arXiv - AI · 4 min · about 2 months ago

Llms

[2510.10193] SAFER: Risk-Constrained Sample-then-Filter in Large Language Models

The paper presents SAFER, a two-stage risk control framework for large language models (LLMs) that enhances output trustworthiness in ris...

arXiv - AI · 4 min · about 2 months ago

Llms

[2510.03777] GuidedSampling: Steering LLMs Towards Diverse Candidate Solutions at Inference-Time

The paper introduces GuidedSampling, a novel inference algorithm designed to enhance the diversity of candidate solutions generated by la...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2505.19645] MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE

The paper discusses the advantages of speculative decoding (SD) in accelerating sparse mixture of experts (MoE) models, revealing that Mo...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2505.11304] Heterogeneity-Aware Client Sampling for Optimal and Efficient Federated Learning

This paper presents a novel approach to federated learning by addressing the challenges posed by heterogeneous client capabilities. The p...

arXiv - AI · 4 min · about 2 months ago

Llms

[2508.03346] Making Slow Thinking Faster: Compressing LLM Chain-of-Thought via Step Entropy

This article presents a novel framework for compressing Chain-of-Thought (CoT) prompts in Large Language Models (LLMs) to enhance inferen...

arXiv - AI · 4 min · about 2 months ago

Previous Page 159 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Infrastructure

Top This Week

Seeking Critique on Research Approach to Open Set Recognition (Novelty Detection) [R]

What if attention didn’t need matrix multiplication?

WTF. Its real. AllBirds (the shoe company) is pivoting to inference.

All Content

[2510.03272] Where to Add PDE Diffusion in Transformers

[2602.08449] When Evaluation Becomes a Side Channel: Regime Leakage and Structural Mitigations for Alignment Assessment

[2602.07849] LQA: A Lightweight Quantized-Adaptive Framework for Vision-Language Models on the Edge

[2509.23106] Effective Quantization of Muon Optimizer States

[2509.22067] The Rogue Scalpel: Activation Steering Compromises LLM Safety

[2602.01848] ROMA: Recursive Open Meta-Agent Framework for Long-Horizon Multi-Agent Systems

[2601.21972] Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic

[2508.13415] MAVIS: Multi-Objective Alignment via Inference-Time Value-Guided Selection

[2601.08005] Internal Deployment Gaps in AI Regulation

[2507.12549] The Serial Scaling Hypothesis

[2511.11079] ARCTraj: A Dataset and Benchmark of Human Reasoning Trajectories for Abstract Problem Solving

[2511.07262] AgenticSciML: Collaborative Multi-Agent Systems for Emergent Discovery in Scientific Machine Learning

[2511.06185] Dataforge: Agentic Platform for Autonomous Data Engineering

[2506.13593] Calibrated Predictive Lower Bounds on Time-to-Unsafe-Sampling in LLMs

[2506.11087] Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization

[2510.10193] SAFER: Risk-Constrained Sample-then-Filter in Large Language Models

[2510.03777] GuidedSampling: Steering LLMs Towards Diverse Candidate Solutions at Inference-Time

[2505.19645] MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE

[2505.11304] Heterogeneity-Aware Client Sampling for Optimal and Efficient Federated Learning

[2508.03346] Making Slow Thinking Faster: Compressing LLM Chain-of-Thought via Step Entropy

Related Topics

Stay updated with AI News