1. Name

2. Voice persona

Pick a voice persona. Grok Imagine generates the spoken audio natively inside each clip, shaped by this persona description (gender, energy, delivery style). Click Preview on a card to hear the xAI TTS sample of that persona for reference.

3. Character reference

Grok Imagine is image-to-video, so each clip needs at least one reference image of the on-camera human. Use a stock photo, a Grok-generated portrait, a model headshot, or any face you have rights to — Twincaster is post-twins, so this no longer has to be you. PNG or JPEG, at least 256×256, up to 5 references. The first is required; extra angles let Grok Imagine vary the pose between clips while keeping the character locked. Drop files anywhere, paste from clipboard, or click + Add.
Cancel