Chatterbox

Local zero shot voice cloning from .wav files.

Device

Auto-detects best available option

Model

Default Chatterbox model optimized for speed

Chunk size

Split text into chunks of this size. Smaller values will increase responsiveness at the cost of lost context between chunks. (Stuff like appropriate inflection, etc.). 0 = no chunking

Adding Chatterbox Voices

Voice Requirements

Chatterbox voices require:

Reference audio file (.wav format, 5-15 seconds optimal)
Clear speech with minimal background noise
Single speaker throughout the sample

Creating a Voice

Open the Voice Library
Click New
Select "Chatterbox" as the provider
Configure the voice:

Label: Descriptive name (e.g., "Marcus - Deep Male")

Voice ID / Upload File Upload a .wav file containing the voice sample. The uploaded reference audio will also be the voice ID.

Speed: Adjust playback speed (0.5 to 2.0, default 1.0)

Tags: Add descriptive tags for organization

Extra voice parameters

There exist some optional parameters that can be set here on a per voice level.

Exaggeration Level

Exaggeration (Neutral = 0.5, extreme values can be unstable). Higher exaggeration tends to speed up speech; reducing cfg helps compensate with slower, more deliberate pacing.

CFG / Pace

If the reference speaker has a fast speaking style, lowering cfg to around 0.3 can improve pacing.