Llms Machine Learning Nlp Ai Infrastructure Generative Ai Ai Safety

[2602.15889] Evidence for Daily and Weekly Periodic Variability in GPT-4o Performance

arXiv - AI February 19, 2026 4 min read Article

Summary

This article investigates the temporal variability in the performance of the GPT-4o model, revealing significant daily and weekly patterns that challenge the assumption of time-invariant model performance.

Why It Matters

Understanding the periodic variability in LLM performance is crucial for researchers relying on these models for consistent results. This study highlights potential biases in research findings and emphasizes the need for careful consideration of temporal factors in AI applications.

Key Takeaways

GPT-4o performance shows significant daily and weekly variability.
Approximately 20% of performance variance can be attributed to these periodic patterns.
The findings challenge the assumption of time-invariant performance in LLMs.
Implications for research validity and replicability are discussed.
Researchers should account for temporal factors when using LLMs.

Statistics > Applications arXiv:2602.15889 (stat) [Submitted on 6 Feb 2026] Title:Evidence for Daily and Weekly Periodic Variability in GPT-4o Performance Authors:Paul Tschisgale, Peter Wulff View a PDF of the paper titled Evidence for Daily and Weekly Periodic Variability in GPT-4o Performance, by Paul Tschisgale and 1 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) are increasingly used in research both as tools and as objects of investigation. Much of this work implicitly assumes that LLM performance under fixed conditions (identical model snapshot, hyperparameters, and prompt) is time-invariant. If average output quality changes systematically over time, this assumption is violated, threatening the reliability, validity, and reproducibility of findings. To empirically examine this assumption, we conducted a longitudinal study on the temporal variability of GPT-4o's average performance. Using a fixed model snapshot, fixed hyperparameters, and identical prompting, GPT-4o was queried via the API to solve the same multiple-choice physics task every three hours for approximately three months. Ten independent responses were generated at each time point and their scores were averaged. Spectral (Fourier) analysis of the resulting time series revealed notable periodic variability in average model performance, accounting for approximately 20% of the total variance. In particular, the observed periodic patterns are well explained by the interactio...

Read Original Article

[2602.15889] Evidence for Daily and Weekly Periodic Variability in GPT-4o Performance

Summary

Why It Matters

Key Takeaways

Related Articles

OpenClaw security checklist: practical safeguards for AI agents

I let Gemini in Google Maps plan my day and it went surprisingly well | The Verge

The person who replaces you probably won't be AI. It'll be someone from the next department over who learned to use it - opinion/discussion

Block Resets Management With AI As Cash App Adds Installment Transfers

No comments

Stay updated with AI News