[2602.13288] Benchmarking Anomaly Detection Across Heterogeneous Cloud Telemetry Datasets

[2602.13288] Benchmarking Anomaly Detection Across Heterogeneous Cloud Telemetry Datasets

arXiv - Machine Learning 4 min read Article

Summary

This paper evaluates various deep learning models for anomaly detection across multiple cloud telemetry datasets, highlighting the importance of calibration and feature-space geometry.

Why It Matters

Anomaly detection is crucial for maintaining the reliability of cloud systems. This study addresses the challenges of evaluating models across diverse datasets, providing insights that can enhance model performance and deployment in real-world scenarios.

Key Takeaways

  • Evaluates four deep learning models and a classical baseline for anomaly detection.
  • Highlights the impact of calibration stability and feature-space geometry on model performance.
  • Introduces a unified training and evaluation pipeline for consistent analysis across datasets.
  • Demonstrates the necessity of testing models on heterogeneous datasets for real-world applicability.
  • Provides preprocessing pipelines and evaluation artifacts to support reproducibility.

Computer Science > Networking and Internet Architecture arXiv:2602.13288 (cs) [Submitted on 7 Feb 2026] Title:Benchmarking Anomaly Detection Across Heterogeneous Cloud Telemetry Datasets Authors:Mohammad Saiful Islam, Andriy Miranskyy View a PDF of the paper titled Benchmarking Anomaly Detection Across Heterogeneous Cloud Telemetry Datasets, by Mohammad Saiful Islam and Andriy Miranskyy View PDF HTML (experimental) Abstract:Anomaly detection is important for keeping cloud systems reliable and stable. Deep learning has improved time-series anomaly detection, but most models are evaluated on one dataset at a time. This raises questions about whether these models can handle different types of telemetry, especially in large-scale and high-dimensional environments. In this study, we evaluate four deep learning models, GRU, TCN, Transformer, and TSMixer. We also include Isolation Forest as a classical baseline. The models are tested across four telemetry datasets: the Numenta Anomaly Benchmark, Microsoft Cloud Monitoring dataset, Exathlon dataset, and IBM Console dataset. These datasets differ in structure, dimensionality, and labelling strategy. They include univariate time series, synthetic multivariate workloads, and real-world production telemetry with over 100,000 features. We use a unified training and evaluation pipeline across all datasets. The evaluation includes NAB-style metrics to capture early detection behaviour for datasets where anomalies persist over contiguous ...

Related Articles

Machine Learning

Ml project user give dataset and I give best model [D] [P]

Tl,dr : suggest me a solution to create a ai ml project where user will give his dataset as input and the project should give best model ...

Reddit - Machine Learning · 1 min ·
Machine Learning

[D] ICML Reviewer Acknowledgement

Hi, I'm a little confused about ICML discussion period Does the period for reviewer acknowledging responses have already ended? One of th...

Reddit - Machine Learning · 1 min ·
Llms

Claude Opus 4.6 API at 40% below Anthropic pricing – try free before you pay anything

Hey everyone I've set up a self-hosted API gateway using [New-API](QuantumNous/new-ap) to manage and distribute Claude Opus 4.6 access ac...

Reddit - Artificial Intelligence · 1 min ·
Machine Learning

[D] ICML reviewer making up false claim in acknowledgement, what to do?

In a rebuttal acknowledgement we received, the reviewer made up a claim that our method performs worse than baselines with some hyperpara...

Reddit - Machine Learning · 1 min ·
More in Machine Learning: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime