[2602.15889] Evidence for Daily and Weekly Periodic Variability in GPT-4o Performance

[2602.15889] Evidence for Daily and Weekly Periodic Variability in GPT-4o Performance

arXiv - AI 4 min read Article

Summary

This article investigates the temporal variability in the performance of the GPT-4o model, revealing significant daily and weekly patterns that challenge the assumption of time-invariant model performance.

Why It Matters

Understanding the periodic variability in LLM performance is crucial for researchers relying on these models for consistent results. This study highlights potential biases in research findings and emphasizes the need for careful consideration of temporal factors in AI applications.

Key Takeaways

  • GPT-4o performance shows significant daily and weekly variability.
  • Approximately 20% of performance variance can be attributed to these periodic patterns.
  • The findings challenge the assumption of time-invariant performance in LLMs.
  • Implications for research validity and replicability are discussed.
  • Researchers should account for temporal factors when using LLMs.

Statistics > Applications arXiv:2602.15889 (stat) [Submitted on 6 Feb 2026] Title:Evidence for Daily and Weekly Periodic Variability in GPT-4o Performance Authors:Paul Tschisgale, Peter Wulff View a PDF of the paper titled Evidence for Daily and Weekly Periodic Variability in GPT-4o Performance, by Paul Tschisgale and 1 other authors View PDF HTML (experimental) Abstract:Large language models (LLMs) are increasingly used in research both as tools and as objects of investigation. Much of this work implicitly assumes that LLM performance under fixed conditions (identical model snapshot, hyperparameters, and prompt) is time-invariant. If average output quality changes systematically over time, this assumption is violated, threatening the reliability, validity, and reproducibility of findings. To empirically examine this assumption, we conducted a longitudinal study on the temporal variability of GPT-4o's average performance. Using a fixed model snapshot, fixed hyperparameters, and identical prompting, GPT-4o was queried via the API to solve the same multiple-choice physics task every three hours for approximately three months. Ten independent responses were generated at each time point and their scores were averaged. Spectral (Fourier) analysis of the resulting time series revealed notable periodic variability in average model performance, accounting for approximately 20% of the total variance. In particular, the observed periodic patterns are well explained by the interactio...

Related Articles

Llms

OpenClaw security checklist: practical safeguards for AI agents

Here is one of the better quality guides on the ensuring safety when deploying OpenClaw: https://chatgptguide.ai/openclaw-security-checkl...

Reddit - Artificial Intelligence · 1 min ·
I let Gemini in Google Maps plan my day and it went surprisingly well | The Verge
Llms

I let Gemini in Google Maps plan my day and it went surprisingly well | The Verge

Gemini in Google Maps is a surprisingly useful way to explore new territory.

The Verge - AI · 11 min ·
Llms

The person who replaces you probably won't be AI. It'll be someone from the next department over who learned to use it - opinion/discussion

I'm a strategy person by background. Two years ago I'd write a recommendation and hand it to a product team. Now.. I describe what I want...

Reddit - Artificial Intelligence · 1 min ·
Block Resets Management With AI As Cash App Adds Installment Transfers
Llms

Block Resets Management With AI As Cash App Adds Installment Transfers

Block (NYSE:XYZ) plans a permanent organizational overhaul that replaces many middle management roles with AI-driven models to create fla...

AI Tools & Products · 5 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime