AI Infrastructure

GPUs, training clusters, MLOps, and deployment

Top This Week

UMKC Announces New Master of Science in Artificial Intelligence
Ai Infrastructure

UMKC Announces New Master of Science in Artificial Intelligence

UMKC announces a new Master of Science in Artificial Intelligence program aimed at addressing workforce demand for AI expertise, set to l...

AI News - General · 4 min ·
Machine Learning

Your prompts aren’t the problem — something else is

I keep seeing people focus heavily on prompt optimization. But in practice, a lot of failures I’ve observed don’t come from the prompt it...

Reddit - Artificial Intelligence · 1 min ·
Ai Infrastructure

[P] GPU friendly lossless 12-bit BF16 format with 0.03% escape rate and 1 integer ADD decode works for AMD & NVIDIA

Hi everyone : ) I just released a new research prototype It’s a lossless BF16 compression format that stores weights in 12 bits by replac...

Reddit - Machine Learning · 1 min ·

All Content

[2502.02088] Dual-IPO: Dual-Iterative Preference Optimization for Text-to-Video Generation
Machine Learning

[2502.02088] Dual-IPO: Dual-Iterative Preference Optimization for Text-to-Video Generation

The paper presents Dual-IPO, a novel framework for optimizing text-to-video generation by iteratively improving both the reward and video...

arXiv - AI · 4 min ·
[2502.05435] Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning
Machine Learning

[2502.05435] Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning

This paper presents the Unbiased Sliced Wasserstein RBF kernel, a novel approach for enhancing audio captioning systems by addressing exp...

arXiv - Machine Learning · 4 min ·
[2405.06727] Approximation Error and Complexity Bounds for ReLU Networks on Low-Regular Function Spaces
Machine Learning

[2405.06727] Approximation Error and Complexity Bounds for ReLU Networks on Low-Regular Function Spaces

This paper explores the approximation capabilities of ReLU neural networks on low-regularity function spaces, establishing bounds on appr...

arXiv - Machine Learning · 3 min ·
[2602.10195] Versor: A Geometric Sequence Architecture
Nlp

[2602.10195] Versor: A Geometric Sequence Architecture

The paper introduces Versor, a novel geometric sequence architecture that leverages Conformal Geometric Algebra for enhanced performance ...

arXiv - Machine Learning · 4 min ·
[2508.01780] LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?
Llms

[2508.01780] LiveMCPBench: Can Agents Navigate an Ocean of MCP Tools?

The paper presents LiveMCPBench, a benchmark designed to evaluate the capabilities of agents using Model Context Protocol (MCP) tools in ...

arXiv - AI · 4 min ·
[2602.05535] Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification
Llms

[2602.05535] Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification

This paper presents Evidential Uncertainty Quantification (EUQ) to detect misbehaviors in large vision-language models (LVLMs), addressin...

arXiv - Machine Learning · 4 min ·
[2504.13359] Cost-of-Pass: An Economic Framework for Evaluating Language Models
Llms

[2504.13359] Cost-of-Pass: An Economic Framework for Evaluating Language Models

The paper presents an economic framework for evaluating language models by analyzing the tradeoff between performance and inference costs...

arXiv - AI · 4 min ·
[2601.22669] Beyond Fixed Rounds: Data-Free Early Stopping for Practical Federated Learning
Ai Infrastructure

[2601.22669] Beyond Fixed Rounds: Data-Free Early Stopping for Practical Federated Learning

This paper introduces a data-free early stopping framework for federated learning, enhancing efficiency and privacy by eliminating the ne...

arXiv - Machine Learning · 3 min ·
[2512.03383] UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs
Llms

[2512.03383] UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs

The paper presents UniQL, a unified framework for quantization and low-rank compression of large language models (LLMs) tailored for edge...

arXiv - Machine Learning · 4 min ·
[2602.23334] Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators
Machine Learning

[2602.23334] Bitwise Systolic Array Architecture for Runtime-Reconfigurable Multi-precision Quantized Multiplication on Hardware Accelerators

This paper presents a novel bitwise systolic array architecture designed for runtime-reconfigurable multi-precision quantized multiplicat...

arXiv - AI · 3 min ·
[2509.26238] Beyond Linear Probes: Dynamic Safety Monitoring for Language Models
Llms

[2509.26238] Beyond Linear Probes: Dynamic Safety Monitoring for Language Models

This paper presents Truncated Polynomial Classifiers (TPCs) for dynamic safety monitoring in large language models, enhancing efficiency ...

arXiv - Machine Learning · 4 min ·
[2509.21936] Statistical Advantage of Softmax Attention: Insights from Single-Location Regression
Llms

[2509.21936] Statistical Advantage of Softmax Attention: Insights from Single-Location Regression

This article explores the statistical advantages of softmax attention mechanisms in large language models, particularly in single-locatio...

arXiv - Machine Learning · 4 min ·
[2509.22935] Compute-Optimal Quantization-Aware Training
Machine Learning

[2509.22935] Compute-Optimal Quantization-Aware Training

This paper explores Compute-Optimal Quantization-Aware Training (QAT), revealing how optimal compute allocation between full-precision an...

arXiv - Machine Learning · 4 min ·
[2509.03810] Online time series prediction using feature adjustment
Machine Learning

[2509.03810] Online time series prediction using feature adjustment

The paper presents a novel approach to online time series prediction, addressing challenges related to distribution shifts and delayed fe...

arXiv - Machine Learning · 4 min ·
[2602.23225] Why Diffusion Language Models Struggle with Truly Parallel (Non-Autoregressive) Decoding?
Llms

[2602.23225] Why Diffusion Language Models Struggle with Truly Parallel (Non-Autoregressive) Decoding?

This paper investigates why Diffusion Language Models (DLMs) often default to autoregressive decoding instead of utilizing their potentia...

arXiv - AI · 4 min ·
[2507.03772] Skewed Score: A statistical framework to assess autograders
Llms

[2507.03772] Skewed Score: A statistical framework to assess autograders

The paper presents a statistical framework for assessing autograders used in evaluating LLM outputs, addressing reliability and bias issu...

arXiv - Machine Learning · 4 min ·
[2506.14261] RL-Obfuscation: Can Language Models Learn to Evade Latent-Space Monitors?
Llms

[2506.14261] RL-Obfuscation: Can Language Models Learn to Evade Latent-Space Monitors?

This article explores RL-Obfuscation, a method for training language models to evade latent-space monitors that detect undesirable behavi...

arXiv - Machine Learning · 4 min ·
[2602.23153] Efficient Encoder-Free Fourier-based 3D Large Multimodal Model
Machine Learning

[2602.23153] Efficient Encoder-Free Fourier-based 3D Large Multimodal Model

This article presents Fase3D, an innovative encoder-free Fourier-based model for processing 3D multimodal data, enhancing efficiency and ...

arXiv - AI · 4 min ·
[2602.23057] Affine-Scaled Attention: Towards Flexible and Stable Transformer Attention
Machine Learning

[2602.23057] Affine-Scaled Attention: Towards Flexible and Stable Transformer Attention

The paper introduces Affine-Scaled Attention, a novel approach to Transformer attention that enhances flexibility and stability by modify...

arXiv - AI · 4 min ·
[2602.23036] LLMServingSim 2.0: A Unified Simulator for Heterogeneous and Disaggregated LLM Serving Infrastructure
Llms

[2602.23036] LLMServingSim 2.0: A Unified Simulator for Heterogeneous and Disaggregated LLM Serving Infrastructure

LLMServingSim 2.0 introduces a unified simulator for heterogeneous and disaggregated large language model (LLM) serving infrastructures, ...

arXiv - AI · 4 min ·
Previous Page 69 Next

Related Topics

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime