[P] On-device speech toolkit for Apple Silicon — ASR, TTS, diarization, speech-to-speech, all in native Swift
About this article
Open-source Swift package running 11 speech models on Apple Silicon via MLX (GPU) and CoreML (Neural Engine). Fully local inference, no cloud dependency. Models implemented: ASR - Qwen3-ASR 0.6B/1.7B (4-bit), Parakeet TDT (CoreML INT4) - RTF ~0.06 on M2 Max TTS - Qwen3-TTS 0.6B (4-bit), CosyVoice3 0.5B (4-bit) - Streaming, ~120ms first chunk Speech-to-speech - PersonaPlex 7B (4-bit) - Full-duplex, RTF ~0.87 VAD - Silero v5, Pyannote segmentation-3.0 - Streaming + overlap detection Diarization...
You've been blocked by network security.To continue, log in to your Reddit account or use your developer tokenIf you think you've been blocked by mistake, file a ticket below and we'll look into it.Log in File a ticket