[2603.27006] The Last Fingerprint: How Markdown Training Shapes LLM Prose

[2603.27006] The Last Fingerprint: How Markdown Training Shapes LLM Prose

arXiv - AI 4 min read

About this article

Abstract page for arXiv paper 2603.27006: The Last Fingerprint: How Markdown Training Shapes LLM Prose

Computer Science > Computation and Language arXiv:2603.27006 (cs) [Submitted on 27 Mar 2026] Title:The Last Fingerprint: How Markdown Training Shapes LLM Prose Authors:E. M. Freeburg View a PDF of the paper titled The Last Fingerprint: How Markdown Training Shapes LLM Prose, by E. M. Freeburg View PDF HTML (experimental) Abstract:Large language models produce em dashes at varying rates, and the observation that some models "overuse" them has become one of the most widely discussed markers of AI-generated text. Yet no mechanistic account of this pattern exists, and the parallel observation that LLMs default to markdown-formatted output has never been connected to it. We propose that the em dash is markdown leaking into prose -- the smallest surviving unit of the structural orientation that LLMs acquire from markdown-saturated training corpora. We present a five-step genealogy connecting training data composition, structural internalization, the dual-register status of the em dash, and post-training amplification. We test this with a two-condition suppression experiment across twelve models from five providers (Anthropic, OpenAI, Meta, Google, DeepSeek): when models are instructed to avoid markdown formatting, overt features (headers, bullets, bold) are eliminated or nearly eliminated, but em dashes persist -- except in Meta's Llama models, which produce none at all. Em dash frequency and suppression resistance vary from 0.0 per 1,000 words (Llama) to 9.1 (GPT-4.1 under supp...

Originally published on April 01, 2026. Curated by AI News.

Related Articles

Can Claude Opus 4.7 and Ensemble AI Models Finally Make Code Review Reliable?
Llms

Can Claude Opus 4.7 and Ensemble AI Models Finally Make Code Review Reliable?

Ensemble AI models like Claude Opus 4.7 transform code review reliability. Discover how multi-model approaches catch subtle bugs human re...

AI Tools & Products · 9 min ·
Starbucks Tests AI-Driven Drink Discovery Through ChatGPT Integration |
Llms

Starbucks Tests AI-Driven Drink Discovery Through ChatGPT Integration |

Not long ago, the idea that a customer could describe a mood instead of a menu item and receive a tailored drink recommendation would hav...

AI Tools & Products · 7 min ·
AI XRP Price Prediction: ChatGPT and Claude Predict XRP Price After Hitting $1.45
Llms

AI XRP Price Prediction: ChatGPT and Claude Predict XRP Price After Hitting $1.45

XRP has seen recent gains due to Rakuten listing it as a payment method and Ripple's partnership with Kyobo Life. Bitcoin's rise also con...

AI Tools & Products · 6 min ·
I canceled ChatGPT Plus and 2 other AI subscriptions — here’s what I replaced them with
Llms

I canceled ChatGPT Plus and 2 other AI subscriptions — here’s what I replaced them with

I was paying for Adobe Firefly, ChatGPT Plus, and Perplexity Pro at the same time. Here's why I canceled all three, and what replaced them.

AI Tools & Products · 6 min ·
More in Llms: This Week Guide Trending

No comments

No comments yet. Be the first to comment!

Stay updated with AI News

Get the latest news, tools, and insights delivered to your inbox.

Daily or weekly digest • Unsubscribe anytime