Introducing Storage Buckets on the Hugging Face Hub

Introducing Storage Buckets on the Hugging Face Hub

Hugging Face Blog 9 min read

About this article

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Back to Articles Introducing Storage Buckets on the Hugging Face Hub Published March 10, 2026 Update on GitHub Upvote 183 +177 Lucain Pouget Wauplin Follow Eliott Coyac coyotte508 Follow Adrien Carreira XciD Follow Victor Mustar victor Follow Julien Chaumond julien-c Follow Quentin Lhoest lhoestq Follow Pierric Cistac pierric Follow Sylvestre Bcht Sylvestre Follow Hugo Larcher hlarcher Follow Rajat Arya rajatarya Follow Di Xiao seanses Follow Assaf Vayner assafvayner Follow Hugging Face Models and Datasets repos are great for publishing final artifacts. But production ML generates a constant stream of intermediate files (checkpoints, optimizer states, processed shards, logs, traces, etc.) that change often, arrive from many jobs at once, and rarely need version control. Storage Buckets are built exactly for this: mutable, S3-like object storage you can browse on the Hub, script from Python, or manage with the hf CLI. And because they are backed by Xet, they are especially efficient for ML artifacts that share content across files. Why we built Buckets Git starts to feel like the wrong abstraction pretty quickly when you're dealing with: Training clusters writing checkpoints and optimizer states throughout a run Data pipelines processing raw datasets iteratively Agents storing traces, memory, and shared knowledge graphs The storage need in all these cases is the same: write fast, overwrite when needed, sync directories, remove stale files, and keep things moving. A Bucket i...

Originally published on March 10, 2026. Curated by AI News.

Related Articles

[2603.25112] Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory
Llms

[2603.25112] Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory

Abstract page for arXiv paper 2603.25112: Do LLMs Know What They Know? Measuring Metacognitive Efficiency with Signal Detection Theory

arXiv - AI · 4 min ·
[2603.24772] Evaluating Fine-Tuned LLM Model For Medical Transcription With Small Low-Resource Languages Validated Dataset
Llms

[2603.24772] Evaluating Fine-Tuned LLM Model For Medical Transcription With Small Low-Resource Languages Validated Dataset

Abstract page for arXiv paper 2603.24772: Evaluating Fine-Tuned LLM Model For Medical Transcription With Small Low-Resource Languages Val...

arXiv - Machine Learning · 4 min ·
[2603.25325] How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models
Llms

[2603.25325] How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models

Abstract page for arXiv paper 2603.25325: How Pruning Reshapes Features: Sparse Autoencoder Analysis of Weight-Pruned Language Models

arXiv - AI · 4 min ·
Llms

[D] Why evaluating only final outputs is misleading for local LLM agents

Been running local agents with Ollama + LangChain lately and noticed something kind of uncomfortable — you can get a completely correct f...

Reddit - Machine Learning · 1 min ·
More in Open Source Ai: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime