[2603.10742] A Grammar of Machine Learning Workflows

arXiv - Machine Learning April 07, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.10742: A Grammar of Machine Learning Workflows

Computer Science > Machine Learning arXiv:2603.10742 (cs) [Submitted on 11 Mar 2026 (v1), last revised 5 Apr 2026 (this version, v3)] Title:A Grammar of Machine Learning Workflows Authors:Simon Roth View a PDF of the paper titled A Grammar of Machine Learning Workflows, by Simon Roth View PDF HTML (experimental) Abstract:Data leakage has been identified in 648 published machine learning papers across 30 scientific fields. The knowledge to prevent it exists; the tools do not enforce it. This paper presents a grammar - eight typed primitives, a directed acyclic graph, and four hard constraints - that makes the most damaging leakage types structurally unrepresentable. The core mechanism is a terminal assessment gate: the first call-time-enforced evaluate/assess boundary in an ML framework, backed by a specification precise enough for independent reimplementation. A companion landscape study across 2,047 datasets grounds the constraints in measured effect sizes. Two reference implementations (Python, R) are available. Comments: Subjects: Machine Learning (cs.LG) ACM classes: I.2.6; D.2.4 Cite as: arXiv:2603.10742 [cs.LG] (or arXiv:2603.10742v3 [cs.LG] for this version) https://doi.org/10.48550/arXiv.2603.10742 Focus to learn more arXiv-issued DOI via DataCite Related DOI: https://doi.org/10.5281/zenodo.19406355 Focus to learn more DOI(s) linking to related resources Submission history From: Simon Roth [view email] [v1] Wed, 11 Mar 2026 13:15:33 UTC (118 KB) [v2] Sat, 14 Ma...

Originally published on April 07, 2026. Curated by AI News.

Llms

Associative memory system for LLMs that learns during inference [P]

I've been working on MDA (Modular Dynamic Architecture), an online associative memory system for LLMs. Here's what I learned building it....

Reddit - Machine Learning · 1 min · 36 minutes ago

Machine Learning

A comedian’s strategy for poisoning AI training data

Apparently the best defense against AI copying your voice is strawberry mango forklift supersize fries. submitted by /u/bekircagricelik [...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Machine Learning

Bias in training data on display in weird way

So i was working on this Tabletop roleplaying game project and for my own amusement I told two different video generating ai models to ge...

Reddit - Artificial Intelligence · 1 min · about 1 hour ago

Llms

Things I got wrong building a confidence evaluator for local LLMs [D]

I've been building **Autodidact**, a local-first AI agent framework. The central piece is a **confidence evaluator** - something that dec...

Reddit - Machine Learning · 1 min · about 2 hours ago

[2603.10742] A Grammar of Machine Learning Workflows

About this article

Related Articles

Associative memory system for LLMs that learns during inference [P]

A comedian’s strategy for poisoning AI training data

Bias in training data on display in weird way

Things I got wrong building a confidence evaluator for local LLMs [D]

No comments

Stay updated with AI News