Grok
Modes: video · image · audio · Models: 7
Vendor: xAI · Official API docs: docs.x.ai
Grok Imagine (by xAI) is a multi-mode family on a single API: a fast image-to-video model with start-frame and native audio, image generation in standard and a higher-fidelity Quality tier, and the Grok TTS voice model. Video covers text-to-video, image-to-video, edit, and extend; image generation is synchronous while video is async.
Models
| id | Name | Input type |
|---|---|---|
grok-imagine-video | Grok | t2v |
grok-edit-video | Grok Edit Video | v2v |
grok-extend-video | Grok Extend Video | v2v |
grok-imagine-image | Grok Imagine | t2i |
grok-imagine-image-quality | Grok Imagine Quality | t2i |
grok-tts | Grok TTS | tts |
grok-imagine-video-1.5 | Grok Imagine 1.5 | i2v |
CLI
# text-to-video with native audio
gen-ai generate -m grok-imagine-video \
-p "a neon hovercar drifting through a rainy cyberpunk alley, cinematic" \
--ar 16:9 -r 720p -d 6
# image-to-video from a start image
gen-ai generate -m grok-imagine-video -p "slow push-in, drifting fog" -i ./still.jpg
# restyle / continue an existing clip
gen-ai generate -m grok-edit-video -p "anime style repaint" --video ./clip.mp4
gen-ai generate -m grok-extend-video -p "the camera keeps flying forward" --video ./clip.mp4
# high-fidelity image generation
gen-ai generate -m grok-imagine-image-quality \
-p "a porcelain teapot on a marble counter, soft window light" --ar 1:1 -r 2k -n 4
# text-to-speech
gen-ai generate -m grok-tts -p "Welcome to the Picsart AI Playground." --voice eveMCP
{ "name": "picsart_generate",
"arguments": {
"model": "grok-imagine-video",
"prompt": "a neon hovercar drifting through a rainy cyberpunk alley",
"aspectRatio": "16:9",
"resolution": "720p",
"duration": 6
} }{ "name": "picsart_generate",
"arguments": {
"model": "grok-imagine-image-quality",
"prompt": "a porcelain teapot on a marble counter, soft window light",
"aspectRatio": "1:1",
"resolution": "2k",
"count": 4
} }{ "name": "picsart_generate",
"arguments": {
"model": "grok-tts",
"prompt": "Welcome to the Picsart AI Playground.",
"voiceId": "eve"
} }Parameters
Full parameter surface for every model, sourced from gen-ai models info <id> --json. CLI flags show the primary short form; the canonical --kebab-case long form always works too.
grok-imagine-video — Grok
Try grok-imagine-video in Playground ↗
Input type: t2v
| Param | CLI flag | Type | Values |
|---|---|---|---|
prompt | -p | text | required |
aspectRatio | --ar | enum | 16:9 · 9:16 · 1:1 · 4:3 · 3:4 · 3:2 · 2:3 (default 16:9) |
resolution | -r | enum | 480p · 720p (default 720p) |
duration | -d | enum | 3 · 5 · 6 · 8 · 10 · 12 · 15 (default 6) |
imageUrls | -i | file | image (up to 1) |
grok-edit-video — Grok Edit Video
Try grok-edit-video in Playground ↗
Input type: v2v
| Param | CLI flag | Type | Values |
|---|---|---|---|
prompt | -p | text | required |
videoUrl | --video | file | required video |
grok-extend-video — Grok Extend Video
Try grok-extend-video in Playground ↗
Input type: v2v
| Param | CLI flag | Type | Values |
|---|---|---|---|
prompt | -p | text | required |
duration | -d | enum | 3 · 5 · 6 · 8 · 10 (default 6) |
videoUrl | --video | file | required video |
grok-imagine-image — Grok Imagine
Try grok-imagine-image in Playground ↗
Input type: t2i
| Param | CLI flag | Type | Values |
|---|---|---|---|
prompt | -p | text | required |
aspectRatio | --ar | enum | 1:1 · 16:9 · 9:16 · 4:3 · 3:4 · 3:2 · 2:3 · 2:1 · 1:2 · 19.5:9 · 9:19.5 · 20:9 · 9:20 (default 1:1) |
resolution | -r | enum | 1k · 2k (default 1k) |
count | -n | enum | 1 · 2 · 4 (default 1) |
imageUrls | -i | file | image (up to 1) |
grok-imagine-image-quality — Grok Imagine Quality
Try grok-imagine-image-quality in Playground ↗
Input type: t2i
| Param | CLI flag | Type | Values |
|---|---|---|---|
prompt | -p | text | required |
aspectRatio | --ar | enum | 1:1 · 16:9 · 9:16 · 4:3 · 3:4 · 3:2 · 2:3 · 2:1 · 1:2 · 19.5:9 · 9:19.5 · 20:9 · 9:20 (default 1:1) |
resolution | -r | enum | 1k · 2k (default 2k) |
count | -n | enum | 1 · 2 · 4 (default 1) |
imageUrls | -i | file | image (up to 1) |
grok-tts — Grok TTS
Input type: tts
| Param | CLI flag | Type | Values |
|---|---|---|---|
language | --language | text | free text |
accent | --accent | text | free text |
prompt | -p | text | required (≤15000 chars) |
voiceId | --voice | enum | eve (Eve) · ara (Ara) · rex (Rex) · sal (Sal) · leo (Leo) (default eve) |
grok-imagine-video-1.5 — Grok Imagine 1.5
Try grok-imagine-video-1.5 in Playground ↗
Input type: i2v
| Param | CLI flag | Type | Values |
|---|---|---|---|
prompt | -p | text | required |
aspectRatio | --ar | enum | 16:9 · 9:16 · 1:1 · 4:3 · 3:4 · 3:2 · 2:3 (default 16:9) |
resolution | -r | enum | 480p · 720p (default 720p) |
duration | -d | enum | 3 · 5 · 6 · 8 · 10 · 12 · 15 (default 8) |
imageUrls | -i | file | required image (up to 1) |
Notes:
grok-imagine-videoalso backs image-to-video (pass-i); the edit / extend variants take a--videoinput and retain the source duration.
Pricing
gen-ai pricing grok-imagine-video -d 6 -r 720pVideo cost scales with duration (priced per second); image cost scales with the Quality vs. standard tier and count; TTS is priced per character.