ResembleAI Chatterbox
Deploy ResembleAI Chatterbox behind a dedicated API endpoint on Koyeb GPU for high-performance, low-latency, and efficient inference.
Deploy Chatterbox text to speech (TTS) model on Koyeb’s high-performance cloud infrastructure.
With one click, get a dedicated GPU-powered inference endpoint ready to handle requests with built-in autoscaling and scale-to-zero.
Get up to $200 in credit to get started!
Overview of Chatterbox
Chatterbox is a production-grade, open-source text-to-speech (TTS) model that has been evaluated alongside top proprietary systems like ElevenLabs, consistently earning higher preference in direct comparisons.
The default GPU for running this model is the Nvidia L40S instance type. You are free to adjust the GPU instance type to fit your workload requirements.
Quickstart
After you deploy the Chatterbox model, copy the Koyeb App public URL similar to https://<YOUR_DOMAIN_PREFIX>.koyeb.app
and create a simple Python file with the following content to start interacting with the model.
import base64
import httpx
KOYEB_URL = "https://<YOUR_DOMAIN_PREFIX>.koyeb.app"
def wav_to_base64(file_path):
"""
Convert a WAV file to a Base64 string.
:param file_path: Path to the WAV file
:return: Base64 encoded audio string
"""
with open(file_path, "rb") as wav_file:
binary_data = wav_file.read()
base64_data = base64.b64encode(binary_data)
base64_string = base64_data.decode("utf-8")
return base64_string
def b64_to_wav(base64_string, output_file_path):
"""
Convert a Base64 string to a WAV file.
:param base64_string: Base64 encoded audio string
:return: WAV file object
"""
# Remove the header if present
if base64_string.startswith("data:audio"):
base64_string = base64_string.split(",")[1]
# Decode the Base64 string
binary_data = base64.b64decode(base64_string)
with open(output_file_path, "wb") as wav_file:
wav_file.write(binary_data)
voice_base64 = wav_to_base64("./voice.wav")
print
payload = {
"audio_prompt_b64": voice_base64,
"cfgw_input": 0.2,
"exaggeration_input": 0.75,
"temperature_input": 0.8,
"text_input": "Ezreal and Jinx teamed up with Ahri, Yasuo, and Teemo to take down the enemy's Nexus in an epic late-game pentakill.",
}
# Call the model precition endpoint
res = httpx.post(
f"{KOYEB_URL}/predict",
json=payload,
timeout=60.0,
)
# Get the output audio
res = res.json()
output = res.get("audio")
# Convert the base64 model output to a wav file and save it to disk
b64_to_wav(output, "output.wav")
The snippet above showcases how to interact with the Chatterbox model to generate an audio from a text prompt and save it to disk.
Take care to replace the KOYEB_URL
value in the snippet with your Koyeb App public URL.
Executing the Python script generate an audio and save it to disk.
python main.py