AI Infrastructure

GPUs, training clusters, MLOps, and deployment

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Llms

Seeking Critique on Research Approach to Open Set Recognition (Novelty Detection) [R]

Hey guys, I'm an independent researcher working on a project that tries to address a very specific failure mode in LLMs and embedding bas...

Reddit - Machine Learning · 1 min · 8 minutes ago

Machine Learning

What if attention didn’t need matrix multiplication?

I built a cognitive architecture where all computation reduces to three bit operations: XOR, MAJ, POPCNT. No GEMM. No GPU. No floating-po...

Reddit - Artificial Intelligence · 1 min · about 2 hours ago

Machine Learning

WTF. Its real. AllBirds (the shoe company) is pivoting to inference.

I'm profoundly ambivalent re: how to feel about this; is it great -- what a scrappy, bold pivot! Or wildly dumb - its so far from their c...

Reddit - Artificial Intelligence · 1 min · about 4 hours ago

All Content

Llms

[2506.02634] KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider

This paper characterizes and optimizes KVCache, a caching mechanism for large language model (LLM) serving at a major cloud provider, hig...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2602.03546] How to Train Your Resistive Network: Generalized Equilibrium Propagation and Analytical Learning

This paper presents a novel algorithm for training resistive networks using Generalized Equilibrium Propagation, aiming to enhance energy...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2602.02201] Cardinality-Preserving Attention Channels for Graph Transformers in Molecular Property Prediction

This article presents a novel graph transformer model, incorporating cardinality-preserving attention channels, to enhance molecular prop...

arXiv - Machine Learning · 3 min · about 2 months ago

Ai Infrastructure

[2602.01051] SwiftRepertoire: Few-Shot Immune-Signature Synthesis via Dynamic Kernel Codes

The paper presents SwiftRepertoire, a framework for synthesizing immune signatures using few-shot learning techniques, enabling efficient...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2505.07861] Scalable LLM Reasoning Acceleration with Low-rank Distillation

The paper presents Caprese, a low-rank distillation method designed to enhance reasoning capabilities in large language models (LLMs) whi...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2601.22323] Models Under SCOPE: Scalable and Controllable Routing via Pre-hoc Reasoning

The paper presents SCOPE, a novel routing framework for language models that dynamically predicts cost and performance, enhancing efficie...

arXiv - Machine Learning · 4 min · about 2 months ago

Ai Infrastructure

[2505.07755] Benchmarking of CPU-intensive Stream Data Processing in The Edge Computing Systems

This article evaluates CPU-intensive stream data processing in edge computing systems, highlighting performance and power consumption opt...

arXiv - AI · 4 min · about 2 months ago

Llms

[2601.18702] From Fuzzy to Exact: The Halo Architecture for Infinite-Depth Reasoning via Rational Arithmetic

This paper introduces the Halo Architecture, a new framework for infinite-depth reasoning using rational arithmetic, aiming to enhance th...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2601.03213] Critic-Guided Reinforcement Unlearning in Text-to-Image Diffusion

The paper presents a novel reinforcement learning framework for unlearning targeted concepts in text-to-image diffusion models, enhancing...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2512.20885] From GNNs to Symbolic Surrogates via Kolmogorov-Arnold Networks for Delay Prediction

This paper explores the use of Kolmogorov-Arnold Networks (KAN) for predicting flow delays in communication networks, enhancing efficienc...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2511.17879] Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction

This paper presents a novel method using generative adversarial training to address reward hacking in real-time human-AI music interactio...

arXiv - Machine Learning · 4 min · about 2 months ago

Ai Infrastructure

[2511.16652] Evolution Strategies at the Hyperscale

The paper presents EGGROLL, an enhanced Evolution Strategy for optimizing large-scale models, achieving significant speed improvements an...

arXiv - AI · 4 min · about 2 months ago

Llms

[2408.10746] Resource-Efficient Personal Large Language Models Fine-Tuning with Collaborative Edge Computing

The paper presents PAC, a collaborative edge computing framework designed for resource-efficient fine-tuning of personal large language m...

arXiv - Machine Learning · 4 min · about 2 months ago

Llms

[2510.15987] Algorithmic Primitives and Compositional Geometry of Reasoning in Language Models

The paper explores how algorithmic primitives and compositional geometry can enhance reasoning capabilities in large language models (LLM...

arXiv - AI · 4 min · about 2 months ago

Llms

[2510.13654] Challenges and Requirements for Benchmarking Time Series Foundation Models

This article discusses the challenges and requirements for benchmarking Time Series Foundation Models (TSFMs), highlighting issues of inf...

arXiv - Machine Learning · 3 min · about 2 months ago

Machine Learning

[2406.04955] Experimental Evaluation of ROS-Causal in Real-World Human-Robot Spatial Interaction Scenarios

This article presents an experimental evaluation of ROS-Causal, a framework for causal discovery in human-robot spatial interactions, dem...

arXiv - AI · 4 min · about 2 months ago

Machine Learning

[2510.07182] Bridged Clustering: Semi-Supervised Sparse Bridging

The paper introduces Bridged Clustering, a semi-supervised framework that learns predictors from unpaired datasets by clustering inputs a...

arXiv - Machine Learning · 3 min · about 2 months ago

Llms

[2404.08634] When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models

This article explores the phenomenon of 'attention collapse' in large language models (LLMs) and introduces Inheritune, a method for crea...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2510.04008] RACE Attention: A Strictly Linear-Time Attention for Long-Sequence Training

The paper presents RACE Attention, a novel linear-time attention mechanism designed for long-sequence training, significantly improving e...

arXiv - Machine Learning · 4 min · about 2 months ago

Machine Learning

[2303.09807] TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction

The paper presents TKN, a transformer-based neural network designed for real-time video prediction, achieving a remarkable prediction rat...

arXiv - AI · 4 min · about 2 months ago

Previous Page 158 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Infrastructure

Top This Week

Seeking Critique on Research Approach to Open Set Recognition (Novelty Detection) [R]

What if attention didn’t need matrix multiplication?

WTF. Its real. AllBirds (the shoe company) is pivoting to inference.

All Content

[2506.02634] KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider

[2602.03546] How to Train Your Resistive Network: Generalized Equilibrium Propagation and Analytical Learning

[2602.02201] Cardinality-Preserving Attention Channels for Graph Transformers in Molecular Property Prediction

[2602.01051] SwiftRepertoire: Few-Shot Immune-Signature Synthesis via Dynamic Kernel Codes

[2505.07861] Scalable LLM Reasoning Acceleration with Low-rank Distillation

[2601.22323] Models Under SCOPE: Scalable and Controllable Routing via Pre-hoc Reasoning

[2505.07755] Benchmarking of CPU-intensive Stream Data Processing in The Edge Computing Systems

[2601.18702] From Fuzzy to Exact: The Halo Architecture for Infinite-Depth Reasoning via Rational Arithmetic

[2601.03213] Critic-Guided Reinforcement Unlearning in Text-to-Image Diffusion

[2512.20885] From GNNs to Symbolic Surrogates via Kolmogorov-Arnold Networks for Delay Prediction

[2511.17879] Generative Adversarial Post-Training Mitigates Reward Hacking in Live Human-AI Music Interaction

[2511.16652] Evolution Strategies at the Hyperscale

[2408.10746] Resource-Efficient Personal Large Language Models Fine-Tuning with Collaborative Edge Computing

[2510.15987] Algorithmic Primitives and Compositional Geometry of Reasoning in Language Models

[2510.13654] Challenges and Requirements for Benchmarking Time Series Foundation Models

[2406.04955] Experimental Evaluation of ROS-Causal in Real-World Human-Robot Spatial Interaction Scenarios

[2510.07182] Bridged Clustering: Semi-Supervised Sparse Bridging

[2404.08634] When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models

[2510.04008] RACE Attention: A Strictly Linear-Time Attention for Long-Sequence Training

[2303.09807] TKN: Transformer-based Keypoint Prediction Network For Real-time Video Prediction

Related Topics

Stay updated with AI News