Open Source · Apple Silicon · No Server

MusicGen + AudioGen
on Apple Silicon

The first package that runs both text-to-music and text-to-sound-effects locally on M-series Macs — full GPU acceleration via MLX, no CUDA, no Docker.

View on GitHub pip install Read the blog

pip install mlx-audiocraft  ·  audiogen-mlx "keyboard typing, office" -d 5  ·  musicgen-mlx "cinematic piano, 120 BPM, no vocals" -d 30

Architecture

How it works

Your text goes through a three-stage pipeline. The heavy transformer and decoder run on Apple GPU via MLX — zero transfer overhead thanks to unified memory.

📝

Text Prompt

Your description

→

🔤

T5 Encoder

CPU · PyTorch
~3s cached

→

⚡

Transformer LM

Apple GPU · MLX
auto-regressive

→

🎵

EnCodec

Apple GPU · MLX
tokens → audio

→

💾

WAV File

16kHz / 32kHz

Package	MusicGen	AudioGen (SFX)	MLX (Apple GPU)	pip install
`audiocraft` (Meta)	✓	✓	✗	Broken on macOS
`musicgen-mlx`	✓	✗	✓	✓
`mlx-audiocraft` ← this	✓	✓	✓	✓

Sound Effects · AudioGen

Text → Sound Effects

All clips generated locally on M4 Mac via audiogen-mlx. Each is 5 seconds, 16 kHz, rendered in ~29s.

⌨️

Keyboard Typing

"mechanical keyboard typing, quiet office ambience"

audiogen-medium5s · 16kHz

🌧️

Rain & Thunder

"rain falling on a metal roof, distant rolling thunder"

audiogen-medium5s · 16kHz

👏

Crowd Applause

"crowd applause in a conference room"

audiogen-medium5s · 16kHz

🔔

Notification Chime

"notification chime, clean bright tone"

audiogen-medium5s · 16kHz

☕

Coffee Machine

"coffee machine brewing, kitchen background ambience"

audiogen-medium5s · 16kHz

Music · MusicGen

Text → Music

15-second clips generated with musicgen-small (300M params, 32 kHz). No vocals, no samples — entirely synthesised from the prompt.

🎹

Cinematic Tech Promo

musicgen-small · 15s · 32kHz

Prompt

"upbeat cinematic tech promo, clean piano with electronic pads, building momentum, 120 BPM, no vocals"

🎸

Calm Lo-Fi Beat

musicgen-small · 15s · 32kHz

Prompt

"calm lo-fi beat, soft warm piano, subtle vinyl crackle, mellow bass, 80 BPM, no vocals"

Usage

Get started in 3 lines

Works anywhere on Apple Silicon — M1 through M4. Models auto-download from HuggingFace and are cached locally.

# Install
pip install mlx-audiocraft

# CLI — sound effects
audiogen-mlx "keyboard typing, office ambience" -d 5 -o sfx.wav

# CLI — music
musicgen-mlx "upbeat cinematic, piano, 120 BPM, no vocals" -d 30 -o music.wav

# Python API
from mlx_audiocraft import AudioGen, MusicGen

model = AudioGen.get_pretrained("facebook/audiogen-medium")
model.set_generation_params(duration=5)
wav = model.generate(["rain on a window, distant thunder"])
  

Performance

Benchmarks

Measured on M4 Max, 64 GB unified memory. Realtime ratio = audio duration ÷ generation time.

0.68×

musicgen-small realtime

100%

offline — no API calls

~3s

model load (after cache)

0 GB

VRAM needed (unified mem)

Model	Duration	Wall time	Realtime ratio
`audiogen-medium`	5s	~29s	0.17×
`musicgen-small`	10s	~15s	0.68×
`musicgen-medium`	10s	~17s	0.60×
`musicgen-large`	10s	~35s	0.29×

MusicGen + AudioGenon Apple Silicon

How it works

Text → Sound Effects

Text → Music

Get started in 3 lines

Benchmarks

MusicGen + AudioGen
on Apple Silicon