AI Infrastructure

GPUs, training clusters, MLOps, and deployment

This Week's Best | Monthly Best | Guide | Trending

Top This Week

Machine Learning

[P] Fused MoE Dispatch in Pure Triton: Beating CUDA-Optimized Megablocks at Inference Batch Sizes

I built a fused MoE dispatch kernel in pure Triton that handles the full forward pass for Mixture-of-Experts models. No CUDA, no vendor-s...

Reddit - Machine Learning · 1 min · 36 minutes ago

Robotics

In Japan, the robot isn't coming for your job; it's filling the one nobody wants | TechCrunch

Driven by labor shortages, Japan is pushing physical AI from pilot projects into real-world deployment.

TechCrunch - AI · 9 min · about 3 hours ago

Machine Learning

[P] bitnet-edge: Ternary-weight CNNs ({-1,0,+1}) on MNIST and CIFAR-10, deployed to ESP32-S3 with zero multiplications

I built a pipeline that takes ternary-quantized CNNs from PyTorch training all the way to bare-metal inference on an ESP32-S3 microcontro...

Reddit - Machine Learning · 1 min · about 6 hours ago

All Content

Machine Learning

[2602.21647] Mitigating Structural Noise in Low-Resource S2TT: An Optimized Cascaded Nepali-English Pipeline with Punctuation Restoration

This paper presents an optimized cascaded Nepali-English speech-to-text translation system that mitigates structural noise from ASR, enha...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.21452] Adversarial Robustness of Deep Learning-Based Thyroid Nodule Segmentation in Ultrasound

This article evaluates the adversarial robustness of deep learning models for thyroid nodule segmentation in ultrasound images, highlight...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.21447] Adversarial Intent is a Latent Variable: Stateful Trust Inference for Securing Multimodal Agentic RAG

The paper presents a novel framework, MMA-RAG^T, for enhancing the security of multimodal agentic retrieval-augmented generation systems ...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.21429] Provably Safe Generative Sampling with Constricting Barrier Functions

This paper presents a safety filtering framework for generative models, ensuring generated samples meet hard constraints while minimizing...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.21399] FedVG: Gradient-Guided Aggregation for Enhanced Federated Learning

The paper presents FedVG, a novel gradient-guided aggregation framework for federated learning that enhances model performance by address...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.21374] Small Language Models for Privacy-Preserving Clinical Information Extraction in Low-Resource Languages

This study explores the use of small language models for extracting clinical information from low-resource languages, focusing on a priva...

arXiv - Machine Learning · 4 min · about 1 month ago

Machine Learning

[2602.21379] MrBERT: Modern Multilingual Encoders via Vocabulary, Domain, and Dimensional Adaptation

MrBERT introduces a family of multilingual encoders optimized for various domains, achieving state-of-the-art results in specific tasks w...

arXiv - Machine Learning · 3 min · about 1 month ago

Ai Infrastructure

[2602.21368] Black-Box Reliability Certification for AI Agents via Self-Consistency Sampling and Conformal Calibration

This paper presents a method for certifying the reliability of black-box AI systems using self-consistency sampling and conformal calibra...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.21255] A General Equilibrium Theory of Orchestrated AI Agent Systems

This paper presents a general equilibrium theory for orchestrated AI agent systems, modeling large language model (LLM) agents within a p...

arXiv - AI · 4 min · about 1 month ago

Ai Safety

[2602.21267] A Systematic Review of Algorithmic Red Teaming Methodologies for Assurance and Security of AI Applications

This systematic review explores automated red teaming methodologies for enhancing the security of AI applications, addressing the limitat...

arXiv - AI · 3 min · about 1 month ago

Llms

[2602.21251] AgenticTyper: Automated Typing of Legacy Software Projects Using Agentic AI

AgenticTyper is a novel AI-driven tool that automates the typing of legacy JavaScript projects, significantly reducing manual effort and ...

arXiv - AI · 3 min · about 1 month ago

Machine Learning

[2602.21233] AngelSlim: A more accessible, comprehensive, and efficient toolkit for large model compression

AngelSlim introduces a versatile toolkit for large model compression, integrating advanced algorithms for efficient deployment and improv...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.21227] Budget-Aware Agentic Routing via Boundary-Guided Training

The paper presents Budget-Aware Agentic Routing, a method for optimizing the use of large language models in autonomous agents by balanci...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.21225] Architecture-Agnostic Curriculum Learning for Document Understanding: Empirical Evidence from Text-Only and Multimodal

This paper explores architecture-agnostic curriculum learning for document understanding, demonstrating efficiency gains in training time...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.21224] Make Every Draft Count: Hidden State based Speculative Decoding

The paper presents a novel approach to speculative decoding in large language models (LLMs), focusing on reusing discarded draft tokens t...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.21222] Task-Aware LoRA Adapter Composition via Similarity Retrieval in Vector Databases

This paper presents a novel framework for dynamic LoRA adapter composition using similarity retrieval in vector databases, enabling effic...

arXiv - Machine Learning · 4 min · about 1 month ago

Llms

[2602.21221] Latent Context Compilation: Distilling Long Context into Compact Portable Memory

The paper introduces Latent Context Compilation, a novel framework that enhances long-context LLM deployment by distilling long contexts ...

arXiv - Machine Learning · 3 min · about 1 month ago

Llms

[2602.21215] Inference-time Alignment via Sparse Junction Steering

This paper presents Sparse Inference-time Alignment (SIA), a novel approach to enhance alignment in large language models by intervening ...

arXiv - AI · 4 min · about 1 month ago

Machine Learning

[2602.21889] 2-Step Agent: A Framework for the Interaction of a Decision Maker with AI Decision Support

The paper presents the 2-Step Agent framework, which models the interaction between decision makers and AI decision support systems, high...

arXiv - Machine Learning · 3 min · about 1 month ago

Machine Learning

[2602.21746] fEDM+: A Risk-Based Fuzzy Ethical Decision Making Framework with Principle-Level Explainability and Pluralistic Validation

The paper presents fEDM+, an enhanced fuzzy ethical decision-making framework that improves explainability and validation by integrating ...

arXiv - AI · 4 min · about 1 month ago

Previous Page 79 Next

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Subscribe to Newsletter

Daily or weekly digest • Unsubscribe anytime

AI Infrastructure

Top This Week

[P] Fused MoE Dispatch in Pure Triton: Beating CUDA-Optimized Megablocks at Inference Batch Sizes

In Japan, the robot isn't coming for your job; it's filling the one nobody wants | TechCrunch

[P] bitnet-edge: Ternary-weight CNNs ({-1,0,+1}) on MNIST and CIFAR-10, deployed to ESP32-S3 with zero multiplications

All Content

[2602.21647] Mitigating Structural Noise in Low-Resource S2TT: An Optimized Cascaded Nepali-English Pipeline with Punctuation Restoration

[2602.21452] Adversarial Robustness of Deep Learning-Based Thyroid Nodule Segmentation in Ultrasound

[2602.21447] Adversarial Intent is a Latent Variable: Stateful Trust Inference for Securing Multimodal Agentic RAG

[2602.21429] Provably Safe Generative Sampling with Constricting Barrier Functions

[2602.21399] FedVG: Gradient-Guided Aggregation for Enhanced Federated Learning

[2602.21374] Small Language Models for Privacy-Preserving Clinical Information Extraction in Low-Resource Languages

[2602.21379] MrBERT: Modern Multilingual Encoders via Vocabulary, Domain, and Dimensional Adaptation

[2602.21368] Black-Box Reliability Certification for AI Agents via Self-Consistency Sampling and Conformal Calibration

[2602.21255] A General Equilibrium Theory of Orchestrated AI Agent Systems

[2602.21267] A Systematic Review of Algorithmic Red Teaming Methodologies for Assurance and Security of AI Applications

[2602.21251] AgenticTyper: Automated Typing of Legacy Software Projects Using Agentic AI

[2602.21233] AngelSlim: A more accessible, comprehensive, and efficient toolkit for large model compression

[2602.21227] Budget-Aware Agentic Routing via Boundary-Guided Training

[2602.21225] Architecture-Agnostic Curriculum Learning for Document Understanding: Empirical Evidence from Text-Only and Multimodal

[2602.21224] Make Every Draft Count: Hidden State based Speculative Decoding

[2602.21222] Task-Aware LoRA Adapter Composition via Similarity Retrieval in Vector Databases

[2602.21221] Latent Context Compilation: Distilling Long Context into Compact Portable Memory

[2602.21215] Inference-time Alignment via Sparse Junction Steering

[2602.21889] 2-Step Agent: A Framework for the Interaction of a Decision Maker with AI Decision Support

[2602.21746] fEDM+: A Risk-Based Fuzzy Ethical Decision Making Framework with Principle-Level Explainability and Pluralistic Validation

Related Topics

Stay updated with AI News