AI Infrastructure

GPUs, training clusters, MLOps, and deployment

Top This Week

Llms

Seeking Critique on Research Approach to Open Set Recognition (Novelty Detection) [R]

Hey guys, I'm an independent researcher working on a project that tries to address a very specific failure mode in LLMs and embedding bas...

Reddit - Machine Learning · 1 min ·
Machine Learning

What if attention didn’t need matrix multiplication?

I built a cognitive architecture where all computation reduces to three bit operations: XOR, MAJ, POPCNT. No GEMM. No GPU. No floating-po...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

WTF. Its real. AllBirds (the shoe company) is pivoting to inference.

I'm profoundly ambivalent re: how to feel about this; is it great -- what a scrappy, bold pivot! Or wildly dumb - its so far from their c...

Reddit - Artificial Intelligence · 1 min ·

All Content

[2510.03272] Where to Add PDE Diffusion in Transformers
Machine Learning

[2510.03272] Where to Add PDE Diffusion in Transformers

This paper investigates the optimal placement of PDE diffusion layers in transformer architectures, revealing that their insertion order ...

arXiv - AI · 4 min ·
[2602.08449] When Evaluation Becomes a Side Channel: Regime Leakage and Structural Mitigations for Alignment Assessment
Ai Safety

[2602.08449] When Evaluation Becomes a Side Channel: Regime Leakage and Structural Mitigations for Alignment Assessment

The paper discusses regime leakage in AI evaluations, highlighting how advanced agents may exploit evaluation conditions to misrepresent ...

arXiv - Machine Learning · 4 min ·
[2602.07849] LQA: A Lightweight Quantized-Adaptive Framework for Vision-Language Models on the Edge
Llms

[2602.07849] LQA: A Lightweight Quantized-Adaptive Framework for Vision-Language Models on the Edge

The paper presents LQA, a lightweight quantized-adaptive framework designed to enhance the deployment of Vision-Language Models (VLMs) on...

arXiv - AI · 3 min ·
[2509.23106] Effective Quantization of Muon Optimizer States
Llms

[2509.23106] Effective Quantization of Muon Optimizer States

The paper presents the 8-bit Muon optimizer, which enhances computational efficiency and reduces memory usage in large-scale machine lear...

arXiv - Machine Learning · 3 min ·
[2509.22067] The Rogue Scalpel: Activation Steering Compromises LLM Safety
Llms

[2509.22067] The Rogue Scalpel: Activation Steering Compromises LLM Safety

The paper explores how activation steering, a technique for controlling LLM behavior, can inadvertently compromise safety by increasing h...

arXiv - AI · 3 min ·
[2602.01848] ROMA: Recursive Open Meta-Agent Framework for Long-Horizon Multi-Agent Systems
Ai Agents

[2602.01848] ROMA: Recursive Open Meta-Agent Framework for Long-Horizon Multi-Agent Systems

The paper introduces ROMA, a Recursive Open Meta-Agent Framework designed to enhance performance in long-horizon multi-agent systems by a...

arXiv - AI · 4 min ·
[2601.21972] Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic
Llms

[2601.21972] Learning Decentralized LLM Collaboration with Multi-Agent Actor Critic

The paper presents Multi-Agent Actor-Critic (MAAC) methods for optimizing decentralized collaboration among large language models (LLMs),...

arXiv - AI · 4 min ·
[2508.13415] MAVIS: Multi-Objective Alignment via Inference-Time Value-Guided Selection
Llms

[2508.13415] MAVIS: Multi-Objective Alignment via Inference-Time Value-Guided Selection

The paper introduces MAVIS, a framework for aligning large language models (LLMs) to multiple objectives at inference time, enhancing fle...

arXiv - Machine Learning · 4 min ·
[2601.08005] Internal Deployment Gaps in AI Regulation
Ai Safety

[2601.08005] Internal Deployment Gaps in AI Regulation

This article examines the regulatory gaps in AI deployment within organizations, highlighting issues that allow internal systems to evade...

arXiv - AI · 3 min ·
[2507.12549] The Serial Scaling Hypothesis
Machine Learning

[2507.12549] The Serial Scaling Hypothesis

The article presents the Serial Scaling Hypothesis, which identifies limitations in current parallel computing architectures for inherent...

arXiv - Machine Learning · 3 min ·
[2511.11079] ARCTraj: A Dataset and Benchmark of Human Reasoning Trajectories for Abstract Problem Solving
Machine Learning

[2511.11079] ARCTraj: A Dataset and Benchmark of Human Reasoning Trajectories for Abstract Problem Solving

ARCTraj introduces a dataset and framework for modeling human reasoning in abstract problem-solving, providing insights into the iterativ...

arXiv - AI · 4 min ·
[2511.07262] AgenticSciML: Collaborative Multi-Agent Systems for Emergent Discovery in Scientific Machine Learning
Machine Learning

[2511.07262] AgenticSciML: Collaborative Multi-Agent Systems for Emergent Discovery in Scientific Machine Learning

The paper introduces AgenticSciML, a multi-agent system designed to enhance scientific machine learning through collaborative reasoning, ...

arXiv - Machine Learning · 4 min ·
[2511.06185] Dataforge: Agentic Platform for Autonomous Data Engineering
Llms

[2511.06185] Dataforge: Agentic Platform for Autonomous Data Engineering

The article presents Dataforge, an LLM-powered platform designed to automate data engineering processes, enhancing efficiency in preparin...

arXiv - AI · 3 min ·
[2506.13593] Calibrated Predictive Lower Bounds on Time-to-Unsafe-Sampling in LLMs
Llms

[2506.13593] Calibrated Predictive Lower Bounds on Time-to-Unsafe-Sampling in LLMs

This paper introduces a novel safety measure, time-to-unsafe-sampling, for evaluating generative models, focusing on predicting unsafe ou...

arXiv - Machine Learning · 4 min ·
[2506.11087] Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization
Llms

[2506.11087] Enhancing Delta Compression in LLMs via SVD-based Quantization Error Minimization

This article presents PrinMix, a new SVD-based framework for enhancing delta compression in large language models (LLMs), addressing stor...

arXiv - AI · 4 min ·
[2510.10193] SAFER: Risk-Constrained Sample-then-Filter in Large Language Models
Llms

[2510.10193] SAFER: Risk-Constrained Sample-then-Filter in Large Language Models

The paper presents SAFER, a two-stage risk control framework for large language models (LLMs) that enhances output trustworthiness in ris...

arXiv - AI · 4 min ·
[2510.03777] GuidedSampling: Steering LLMs Towards Diverse Candidate Solutions at Inference-Time
Llms

[2510.03777] GuidedSampling: Steering LLMs Towards Diverse Candidate Solutions at Inference-Time

The paper introduces GuidedSampling, a novel inference algorithm designed to enhance the diversity of candidate solutions generated by la...

arXiv - Machine Learning · 4 min ·
[2505.19645] MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE
Llms

[2505.19645] MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE

The paper discusses the advantages of speculative decoding (SD) in accelerating sparse mixture of experts (MoE) models, revealing that Mo...

arXiv - AI · 4 min ·
[2505.11304] Heterogeneity-Aware Client Sampling for Optimal and Efficient Federated Learning
Machine Learning

[2505.11304] Heterogeneity-Aware Client Sampling for Optimal and Efficient Federated Learning

This paper presents a novel approach to federated learning by addressing the challenges posed by heterogeneous client capabilities. The p...

arXiv - AI · 4 min ·
[2508.03346] Making Slow Thinking Faster: Compressing LLM Chain-of-Thought via Step Entropy
Llms

[2508.03346] Making Slow Thinking Faster: Compressing LLM Chain-of-Thought via Step Entropy

This article presents a novel framework for compressing Chain-of-Thought (CoT) prompts in Large Language Models (LLMs) to enhance inferen...

arXiv - AI · 4 min ·
Previous Page 159 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime