mlx-audiocraft
Open Source · Apple Silicon · No Server
MusicGen + AudioGen on Apple Silicon
The first package that runs both text-to-music and text-to-sound-effects locally on M-series Macs — full GPU acceleration via MLX, no CUDA, no Docker.
pip install mlx-audiocraft · audiogen-mlx "keyboard typing, office" -d 5 · musicgen-mlx "cinematic piano, 120 BPM, no vocals" -d 30
Architecture
How it works
Your text goes through a three-stage pipeline. The heavy transformer and decoder run on Apple GPU via MLX — zero transfer overhead thanks to unified memory.
📝
Text Prompt
Your description
→
🔤
T5 Encoder
CPU · PyTorch ~3s cached
→
⚡
Transformer LM
Apple GPU · MLX auto-regressive
→
🎵
EnCodec
Apple GPU · MLX tokens → audio
→
Package
MusicGen
AudioGen (SFX)
MLX (Apple GPU)
pip install
audiocraft (Meta)
✓
✓
✗
Broken on macOS
musicgen-mlx
✓
✗
✓
✓
mlx-audiocraft ← this
✓
✓
✓
✓
Sound Effects · AudioGen
Text → Sound Effects
All clips generated locally on M4 Mac via audiogen-mlx. Each is 5 seconds, 16 kHz, rendered in ~29s.
⌨️
Keyboard Typing
"mechanical keyboard typing, quiet office ambience"
audiogen-medium 5s · 16kHz
🌧️
Rain & Thunder
"rain falling on a metal roof, distant rolling thunder"
audiogen-medium 5s · 16kHz
👏
Crowd Applause
"crowd applause in a conference room"
audiogen-medium 5s · 16kHz
🔔
Notification Chime
"notification chime, clean bright tone"
audiogen-medium 5s · 16kHz
☕
Coffee Machine
"coffee machine brewing, kitchen background ambience"
audiogen-medium 5s · 16kHz
Music · MusicGen
Text → Music
15-second clips generated with musicgen-small (300M params, 32 kHz). No vocals, no samples — entirely synthesised from the prompt.
Prompt
"upbeat cinematic tech promo, clean piano with electronic pads, building momentum, 120 BPM, no vocals"
Prompt
"calm lo-fi beat, soft warm piano, subtle vinyl crackle, mellow bass, 80 BPM, no vocals"
Usage
Get started in 3 lines
Works anywhere on Apple Silicon — M1 through M4. Models auto-download from HuggingFace and are cached locally.
# Install
pip install mlx-audiocraft
# CLI — sound effects
audiogen-mlx "keyboard typing, office ambience" -d 5 -o sfx.wav
# CLI — music
musicgen-mlx "upbeat cinematic, piano, 120 BPM, no vocals" -d 30 -o music.wav
# Python API
from mlx_audiocraft import AudioGen, MusicGen
model = AudioGen.get_pretrained ("facebook/audiogen-medium" )
model.set_generation_params (duration=5)
wav = model.generate (["rain on a window, distant thunder" ])
Performance
Benchmarks
Measured on M4 Max, 64 GB unified memory. Realtime ratio = audio duration ÷ generation time.
0.68×
musicgen-small realtime
100%
offline — no API calls
~3s
model load (after cache)
0 GB
VRAM needed (unified mem)
Model
Duration
Wall time
Realtime ratio
Speed
audiogen-medium
5s
~29s
0.17×
musicgen-small
10s
~15s
0.68×
musicgen-medium
10s
~17s
0.60×
musicgen-large
10s
~35s
0.29×