[2603.04045] Inference-Time Toxicity Mitigation in Protein Language

[2603.04045] Inference-Time Toxicity Mitigation in Protein Language Models

arXiv - AI March 05, 2026 3 min read

About this article

Abstract page for arXiv paper 2603.04045: Inference-Time Toxicity Mitigation in Protein Language Models

Computer Science > Machine Learning arXiv:2603.04045 (cs) [Submitted on 4 Mar 2026] Title:Inference-Time Toxicity Mitigation in Protein Language Models Authors:Manuel Fernández Burda, Santiago Aranguri, Iván Arcuschin Moreno, Enzo Ferrante View a PDF of the paper titled Inference-Time Toxicity Mitigation in Protein Language Models, by Manuel Fern\'andez Burda and 3 other authors View PDF HTML (experimental) Abstract:Protein language models (PLMs) are becoming practical tools for de novo protein design, yet their dual-use potential raises safety concerns. We show that domain adaptation to specific taxonomic groups can elicit toxic protein generation, even when toxicity is not the training objective. To address this, we adapt Logit Diff Amplification (LDA) as an inference-time control mechanism for PLMs. LDA modifies token probabilities by amplifying the logit difference between a baseline model and a toxicity-finetuned model, requiring no retraining. Across four taxonomic groups, LDA consistently reduces predicted toxicity rate (measured via ToxDL2) below the taxon-finetuned baseline while preserving biological plausibility. We evaluate quality using Fréchet ESM Distance and predicted foldability (pLDDT), finding that LDA maintains distributional similarity to natural proteins and structural viability (unlike activation-based steering methods that tend to degrade sequence properties). Our results demonstrate that LDA provides a practical safety knob for protein generators t...

Originally published on March 05, 2026. Curated by AI News.

Llms

This Is Not Hacking. This Is Structured Intelligence.

Watch me demonstrate everything I've been talking about—live, in real time. The Setup: Maestro University AI enrollment system Standard c...

Reddit - Artificial Intelligence · 1 min · 44 minutes ago

Llms

[D] Howcome Muon is only being used for Transformers?

Muon has quickly been adopted in LLM training, yet we don't see it being talked about in other contexts. Searches for Muon on ConvNets tu...

Reddit - Machine Learning · 1 min · about 1 hour ago

Llms

[P] I trained a language model from scratch for a low resource language and got it running fully on-device on Android (no GPU, demo)

Hi Everybody! I just wanted to share an update on a project I’ve been working on called BULaMU, a family of language models trained (20M,...

Reddit - Machine Learning · 1 min · about 2 hours ago

Llms

Popular AI gateway startup LiteLLM ditches controversial startup Delve | TechCrunch

LiteLLM had obtained two security compliance certifications via Delve and fell victim to some horrific credential-stealing malware last w...

TechCrunch - AI · 3 min · about 5 hours ago

[2603.04045] Inference-Time Toxicity Mitigation in Protein Language Models

About this article

Related Articles

This Is Not Hacking. This Is Structured Intelligence.

[D] Howcome Muon is only being used for Transformers?

[P] I trained a language model from scratch for a low resource language and got it running fully on-device on Android (no GPU, demo)

Popular AI gateway startup LiteLLM ditches controversial startup Delve | TechCrunch

No comments

Stay updated with AI News