Three separate teams shipped models this week that cut token usage by nearly half through the same insight: the best inference is often no inference at all.