[2604.04440] Training Transformers in Cosine Coefficient Space

arXiv - AI April 07, 2026 3 min read

About this article

Abstract page for arXiv paper 2604.04440: Training Transformers in Cosine Coefficient Space

Computer Science > Performance arXiv:2604.04440 (cs) [Submitted on 6 Apr 2026] Title:Training Transformers in Cosine Coefficient Space Authors:Mohamed Amine Bergach View a PDF of the paper titled Training Transformers in Cosine Coefficient Space, by Mohamed Amine Bergach View PDF HTML (experimental) Abstract:We parameterize the weight matrices of a transformer in the two-dimensional discrete cosine transform (DCT) domain, retaining only the lowest-frequency coefficients. At each forward pass the full weight matrix is reconstructed via the inverse DCT; gradients propagate through the reconstruction to update the spectral coefficients directly. On character-level language modeling (Shakespeare, 1M characters), a 4-layer transformer trained from scratch in this representation matches the perplexity of the standard parameterization (6.1 vs.\ 6.1) while storing 52\% of the parameters. At 4$\times$ compression (29\% of parameters), the model reaches perplexity 6.9 -- outperforming a low-rank baseline (perplexity 8.8 at 21\% of parameters) at a comparable reduction. The method requires no architectural changes, no pre-trained checkpoint, and no auxiliary loss. It reduces to replacing each \texttt{this http URL} with a drop-in spectral layer that stores $K$ DCT coefficients instead of $n \times m$ weights. Subjects: Performance (cs.PF); Artificial Intelligence (cs.AI) Cite as: arXiv:2604.04440 [cs.PF] (or arXiv:2604.04440v1 [cs.PF] for this version) https://doi.org/10.48550/ar...

Originally published on April 07, 2026. Curated by AI News.

Llms

Anthropic Claude AI training model targets AI skills gap | ETIH EdTech News

AI in education, edtech AI tools, and AI skills training drive Anthropic’s Claude curriculum. ETIH edtech news covers how AI fluency, wor...

AI Tools & Products · 6 min · about 1 hour ago

Llms

I use ChatGPT every day — I stick to these 3 rules to protect my privacy

I stick to three essential rules whenever I open up a new chat in ChatGPT to always protect my privacy and keep my data secure

AI Tools & Products · 9 min · about 1 hour ago

Llms

Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute

AI Tools & Products · 3 min · about 1 hour ago

Llms

Codex and Claude Code Can Work Together

AI Tools & Products · about 1 hour ago

[2604.04440] Training Transformers in Cosine Coefficient Space

About this article

Related Articles

Anthropic Claude AI training model targets AI skills gap | ETIH EdTech News

I use ChatGPT every day — I stick to these 3 rules to protect my privacy

Anthropic expands partnership with Google and Broadcom for multiple gigawatts of next-generation compute

Codex and Claude Code Can Work Together

No comments

Stay updated with AI News