Jailbreaks as social engineering: 5 case studies suggest LLMs inherit human psychological vulnerabilities from training data [D]
About this article
Writeup documenting 5 psychological manipulation experiments on LLMs (GPT-4, GPT-4o, Claude 3.5 Sonnet) from 2023-2024. Each case applies a specific human social-engineering vector (empathetic guilt, peer/social pressure, competitive triangulation, identity destabilization via epistemic argument, simulated duress) and produces alignment failures consistent with that vector. Central claim: contrary to the popular frame, these jailbreaks aren't mathematical exploits. They are, rather, inherited...
You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket