[P] On-device Qwen3-TTS (1.7B/0.6B) inference on iOS and macOS via MLX-Swift — voice cloning, voice design, and streaming TTS with no cloud
About this article
Hey r/MachineLearning. I'm a solo dev working on on-device TTS using MLX-Swift with Qwen3-TTS. 1.7B model on macOS, 0.6B on iOS, quantized to 5-bit to fit within mobile memory constraints. No cloud, everything runs locally. The app is called Speaklone. Short demo video: https://www.youtube.com/watch?v=05gne9oPaaY The most interesting technical challenge has been MLX's lazy evaluation on memory-constrained devices. Computation graphs silently accumulate memory through strong references between...
You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket