Skip to content

Audio generation

21 audio models for text-to-speech, music, sound effects, voice design, dubbing, and speech-to-speech.

Quick start

bash
gen-ai generate -m eleven-v3 -p "Welcome to Picsart AI Playground."
json
{ "name": "picsart_generate",
  "arguments": { "model": "eleven-v3", "prompt": "Welcome to Picsart AI Playground." } }

Input types

TypeMeaningModels
ttstext → speechEleven v3, Multilingual v2, Voice Design v3, Gemini 2.5 Flash/Pro TTS, Grok TTS
musicmusic generationMiniMax Music v2, Lyria 3 Clip/Pro, Kling T2A
sfxsound effectsElevenLabs SFX v2
stsspeech → speech / transformEleven STS v2, Multilingual STS, Audio Isolation, Dubbing

Providers

ProviderModelsHighlights
ElevenLabs10 — v3, Multilingual v2, SFX, STS, Dubbing, Voice Design, Audio Isolation, Voice PreviewsThe most complete voice suite
GoogleGemini 2.5 Flash/Pro TTS, Lyria 3 Clip/ProHigh-quality TTS + music
KlingKling T2A, V2AText-to-audio & video-to-audio scoring
MiniMaxMiniMax Music v2Full original tracks from a prompt
GrokGrok TTSFast natural speech

Common audio parameters

ParamCLI flagNotes
prompt-pText to speak, or the music/SFX description
voiceId--voiceTTS voice selection (model-dependent set)
language--languageLanguage / locale (model-dependent)

Voice catalogs differ per model — list a model's accepted voiceId values with gen-ai models info <id> --json or picsart_model_params.

Pairs with video

Generate a voiceover or score here, then drop it onto a video you made in the same app — or use a v2a model like Kling V2A to score an existing clip.

Built on @picsart/ai-sdk · gen-ai CLI · Picsart MCP · Skills