Open Source · Apple Silicon · No Server

MusicGen + AudioGen
on Apple Silicon

The first package that runs both text-to-music and text-to-sound-effects locally on M-series Macs — full GPU acceleration via MLX, no CUDA, no Docker.

View on GitHub pip install Read the blog
pip install mlx-audiocraft  ·  audiogen-mlx "keyboard typing, office" -d 5  ·  musicgen-mlx "cinematic piano, 120 BPM, no vocals" -d 30
Architecture

How it works

Your text goes through a three-stage pipeline. The heavy transformer and decoder run on Apple GPU via MLX — zero transfer overhead thanks to unified memory.

📝
Text Prompt
Your description
🔤
T5 Encoder
CPU · PyTorch
~3s cached
Transformer LM
Apple GPU · MLX
auto-regressive
🎵
EnCodec
Apple GPU · MLX
tokens → audio
💾
WAV File
16kHz / 32kHz

Package MusicGen AudioGen (SFX) MLX (Apple GPU) pip install
audiocraft (Meta) Broken on macOS
musicgen-mlx
mlx-audiocraft ← this
Sound Effects · AudioGen

Text → Sound Effects

All clips generated locally on M4 Mac via audiogen-mlx. Each is 5 seconds, 16 kHz, rendered in ~29s.

⌨️
Keyboard Typing
"mechanical keyboard typing, quiet office ambience"
audiogen-medium5s · 16kHz
🌧️
Rain & Thunder
"rain falling on a metal roof, distant rolling thunder"
audiogen-medium5s · 16kHz
👏
Crowd Applause
"crowd applause in a conference room"
audiogen-medium5s · 16kHz
🔔
Notification Chime
"notification chime, clean bright tone"
audiogen-medium5s · 16kHz
Coffee Machine
"coffee machine brewing, kitchen background ambience"
audiogen-medium5s · 16kHz
Music · MusicGen

Text → Music

15-second clips generated with musicgen-small (300M params, 32 kHz). No vocals, no samples — entirely synthesised from the prompt.

🎹
Cinematic Tech Promo
musicgen-small · 15s · 32kHz
Prompt
"upbeat cinematic tech promo, clean piano with electronic pads, building momentum, 120 BPM, no vocals"
🎸
Calm Lo-Fi Beat
musicgen-small · 15s · 32kHz
Prompt
"calm lo-fi beat, soft warm piano, subtle vinyl crackle, mellow bass, 80 BPM, no vocals"
Usage

Get started in 3 lines

Works anywhere on Apple Silicon — M1 through M4. Models auto-download from HuggingFace and are cached locally.

# Install pip install mlx-audiocraft # CLI — sound effects audiogen-mlx "keyboard typing, office ambience" -d 5 -o sfx.wav # CLI — music musicgen-mlx "upbeat cinematic, piano, 120 BPM, no vocals" -d 30 -o music.wav # Python API from mlx_audiocraft import AudioGen, MusicGen model = AudioGen.get_pretrained("facebook/audiogen-medium") model.set_generation_params(duration=5) wav = model.generate(["rain on a window, distant thunder"])
Performance

Benchmarks

Measured on M4 Max, 64 GB unified memory. Realtime ratio = audio duration ÷ generation time.

0.68×
musicgen-small realtime
100%
offline — no API calls
~3s
model load (after cache)
0 GB
VRAM needed (unified mem)
Model Duration Wall time Realtime ratio Speed
audiogen-medium 5s ~29s 0.17×
musicgen-small 10s ~15s 0.68×
musicgen-medium 10s ~17s 0.60×
musicgen-large 10s ~35s 0.29×