AI Infrastructure

GPUs, training clusters, MLOps, and deployment

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min · about 3 hours ago

Machine Learning

Your prompts aren’t the problem — something else is

I keep seeing people focus heavily on prompt optimization. But in practice, a lot of failures I’ve observed don’t come from the prompt it...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

Ai Infrastructure

[P] GPU friendly lossless 12-bit BF16 format with 0.03% escape rate and 1 integer ADD decode works for AMD & NVIDIA

Hi everyone : ) I just released a new research prototype It’s a lossless BF16 compression format that stores weights in 12 bits by replac...

Reddit - Machine Learning · 1 min · about 9 hours ago

All Content

Machine Learning

[2502.02088] Dual-IPO: Dual-Iterative Preference Optimization for Text-to-Video Generation

The paper presents Dual-IPO, a novel framework for optimizing text-to-video generation by iteratively improving both the reward and video...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2502.05435] Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning

This paper presents the Unbiased Sliced Wasserstein RBF kernel, a novel approach for enhancing audio captioning systems by addressing exp...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2405.06727] Approximation Error and Complexity Bounds for ReLU Networks on Low-Regular Function Spaces

This paper explores the approximation capabilities of ReLU neural networks on low-regularity function spaces, establishing bounds on appr...

arXiv - Machine Learning · 3 min · about 1 month ago

Nlp

[2602.10195] Versor: A Geometric Sequence Architecture

The paper introduces Versor, a novel geometric sequence architecture that leverages Conformal Geometric Algebra for enhanced performance ...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2508.01780] LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?

The paper presents LiveMCPBench, a benchmark designed to evaluate the capabilities of agents using Model Context Protocol (MCP) tools in ...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.05535] Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification

This paper presents Evidential Uncertainty Quantification (EUQ) to detect misbehaviors in large vision-language models (LVLMs), addressin...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2504.13359] Cost-of-Pass: An Economic Framework for Evaluating Language Models

The paper presents an economic framework for evaluating language models by analyzing the tradeoff between performance and inference costs...

arXiv - AI · 4 min · about 1 month ago

Ai Infrastructure

[2601.22669] Beyond Fixed Rounds: Data-Free Early Stopping for Practical Federated Learning

This paper introduces a data-free early stopping framework for federated learning, enhancing efficiency and privacy by eliminating the ne...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2512.03383] UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs

The paper presents UniQL, a unified framework for quantization and low-rank compression of large language models (LLMs) tailored for edge...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.23334] Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators

This paper presents a novel bitwise systolic array architecture designed for runtime-reconfigurable multi-precision quantized multiplicat...

arXiv - AI · 3 min · about 1 month ago

Llms

[2509.26238] Beyond Linear Probes: Dynamic Safety Monitoring for Language Models

This paper presents Truncated Polynomial Classifiers (TPCs) for dynamic safety monitoring in large language models, enhancing efficiency ...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2509.21936] Statistical Advantage of Softmax Attention: Insights from Single-Location Regression

This article explores the statistical advantages of softmax attention mechanisms in large language models, particularly in single-locatio...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2509.22935] Compute-Optimal Quantization-Aware Training

This paper explores Compute-Optimal Quantization-Aware Training (QAT), revealing how optimal compute allocation between full-precision an...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2509.03810] Online time series prediction using feature adjustment

The paper presents a novel approach to online time series prediction, addressing challenges related to distribution shifts and delayed fe...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.23225] Why Diffusion Language Models Struggle with Truly Parallel (Non-Autoregressive) Decoding?

This paper investigates why Diffusion Language Models (DLMs) often default to autoregressive decoding instead of utilizing their potentia...

arXiv - AI · 4 min · about 1 month ago

Llms

[2507.03772] Skewed Score: A statistical framework to assess autograders

The paper presents a statistical framework for assessing autograders used in evaluating LLM outputs, addressing reliability and bias issu...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2506.14261] RL-Obfuscation: Can Language Models Learn to Evade Latent-Space Monitors?

This article explores RL-Obfuscation, a method for training language models to evade latent-space monitors that detect undesirable behavi...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.23153] Efficient Encoder-Free Fourier-based 3D Large Multimodal Model

This article presents Fase3D, an innovative encoder-free Fourier-based model for processing 3D multimodal data, enhancing efficiency and ...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.23057] Affine-Scaled Attention: Towards Flexible and Stable Transformer Attention

The paper introduces Affine-Scaled Attention, a novel approach to Transformer attention that enhances flexibility and stability by modify...

arXiv - AI · 4 min · about 1 month ago

Llms

[2602.23036] LLMServingSim 2.0: A Unified Simulator for Heterogeneous and Disaggregated LLM Serving Infrastructure

LLMServingSim 2.0 introduces a unified simulator for heterogeneous and disaggregated large language model (LLM) serving infrastructures, ...

arXiv - AI · 4 min · about 1 month ago

Previous Page 69 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Infrastructure

Top This Week

UMKC Announces New Master of Science in Artificial Intelligence

Your prompts aren’t the problem — something else is

[P] GPU friendly lossless 12-bit BF16 format with 0.03% escape rate and 1 integer ADD decode works for AMD & NVIDIA

All Content

[2502.02088] Dual-IPO: Dual-Iterative Preference Optimization for Text-to-Video Generation

[2502.05435] Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning

[2405.06727] Approximation Error and Complexity Bounds for ReLU Networks on Low-Regular Function Spaces

[2602.10195] Versor: A Geometric Sequence Architecture

[2508.01780] LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?

[2602.05535] Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification

[2504.13359] Cost-of-Pass: An Economic Framework for Evaluating Language Models

[2601.22669] Beyond Fixed Rounds: Data-Free Early Stopping for Practical Federated Learning

[2512.03383] UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs

[2602.23334] Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators

[2509.26238] Beyond Linear Probes: Dynamic Safety Monitoring for Language Models

[2509.21936] Statistical Advantage of Softmax Attention: Insights from Single-Location Regression

[2509.22935] Compute-Optimal Quantization-Aware Training

[2509.03810] Online time series prediction using feature adjustment

[2602.23225] Why Diffusion Language Models Struggle with Truly Parallel (Non-Autoregressive) Decoding?

[2507.03772] Skewed Score: A statistical framework to assess autograders

[2506.14261] RL-Obfuscation: Can Language Models Learn to Evade Latent-Space Monitors?

[2602.23153] Efficient Encoder-Free Fourier-based 3D Large Multimodal Model

[2602.23057] Affine-Scaled Attention: Towards Flexible and Stable Transformer Attention

[2602.23036] LLMServingSim 2.0: A Unified Simulator for Heterogeneous and Disaggregated LLM Serving Infrastructure

Related Topics

Stay updated with AI News