Skip to main content

Google Veo 3.1 Reference-to-Video

Generate videos with consistent characters and objects using reference images. Perfect for storytelling and multi-scene projects.
Reference-to-Video is a specialized video generation mode that maintains visual identity consistency across generated content. By providing 1-3 reference images, you can create videos where characters, objects, or subjects maintain their appearance throughout the scene. This is ideal for creating coherent narratives, character-based content, and multi-scene projects where visual consistency is critical.

Key capabilities

  • Character consistency: Maintain visual identity of characters across video generation
  • Multi-reference support: Use 1-3 reference images for subject consistency
  • Multi-resolution output: Generate videos in 720p, 1080p, or 4K resolution
  • Native audio generation: Includes dialogue and sound effects synthesis
  • Fixed 8-second duration: Optimized duration at 24 FPS for cinematic quality
  • Aspect ratio control: 16:9 (landscape) or 9:16 (portrait) formats
  • Negative prompts: Specify elements to avoid in generation
  • Long prompts: Up to 20,000 characters for detailed scene descriptions

Use cases

  • Storytelling: Create multi-scene narratives with consistent characters
  • Brand mascots: Generate videos featuring consistent brand characters
  • Product showcases: Maintain product appearance across different scenes
  • Character animation: Bring illustrated or photographed characters to life consistently
  • Social media series: Create episodic content with recurring characters
  • Advertising campaigns: Produce multiple ads with consistent spokesperson

How it differs from Image-to-Video

FeatureImage-to-VideoReference-to-Video
InputSingle image to animate1-3 reference images + prompt
PurposeAnimate a specific imageGenerate new scenes with consistent subjects
OutputAnimation of the input imageNew video featuring reference subjects
Duration4, 6, or 8 secondsFixed 8 seconds
ModesStandard and FastSingle mode

Generate with Reference-to-Video

Create videos with consistent characters and objects using reference images.

POST /v1/ai/reference-to-video/veo-3-1

Create a reference-to-video task

GET /v1/ai/reference-to-video/veo-3-1

List all reference-to-video tasks

GET /v1/ai/reference-to-video/veo-3-1/{task-id}

Get task status by ID

Parameters

ParameterTypeRequiredDescription
image_urlsarrayYesArray of 1-3 reference image URLs (HTTPS, publicly accessible)
promptstringYesText describing the video scene with reference subjects (max 20,000 chars)
negative_promptstringNoText describing what to avoid in the video
resolutionstringNoOutput resolution: "720p", "1080p", or "4k" (default: "720p")
aspect_ratiostringNoVideo format: "16:9" or "9:16" (default: "16:9")
generate_audiobooleanNoGenerate audio with dialogue and effects (default: true)
seedintegerNoRandom seed for reproducibility
webhook_urlstringNoURL for task completion notification

Example request

curl -X POST "https://api.magnific.com/v1/ai/reference-to-video/veo-3-1" \
  -H "x-magnific-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "image_urls": [
      "https://example.com/character-front.jpg",
      "https://example.com/character-side.jpg"
    ],
    "prompt": "The character walks through a futuristic city at night, neon lights reflecting on wet streets",
    "negative_prompt": "blurry, low quality, distorted",
    "resolution": "1080p",
    "aspect_ratio": "16:9",
    "generate_audio": true
  }'

Frequently Asked Questions

Reference-to-Video uses reference images to maintain visual consistency of subjects (characters, objects) while generating entirely new video scenes. Image-to-Video animates a single input image directly. Use Reference-to-Video when you need to create multiple scenes with the same character or object looking consistent.
You can provide 1-3 reference images. Using multiple images from different angles improves consistency. For characters, include front-facing and profile views. For objects, include various angles to help the model understand the complete appearance.
Good reference images are:
  • High resolution and well-lit
  • Show the subject clearly without obstructions
  • Include different angles when using multiple images
  • Have consistent appearance of the subject across images
  • Use HTTPS URLs that are publicly accessible
The 8-second duration at 24 FPS is optimized for reference-to-video generation, providing enough time for meaningful scenes while ensuring high-quality consistency of the reference subjects throughout the video.
Currently, Reference-to-Video is available in a single mode optimized for quality and consistency. Unlike Text-to-Video and Image-to-Video, there is no Fast variant for Reference-to-Video.
When generate_audio is enabled (default), the model generates synchronized audio including dialogue and sound effects appropriate to the scene. If your reference subject is a person and the prompt describes them speaking, the audio will include synthesized dialogue.

Best practices

  • Multiple reference angles: Provide 2-3 images showing different angles of your subject for best consistency
  • Clear subjects: Use reference images where the subject is clearly visible and unobstructed
  • Consistent lighting: Reference images with similar lighting produce more coherent results
  • Descriptive prompts: Describe how the reference subject should act in the scene
  • Scene context: Include environment and action details in your prompt
  • Negative prompts: Use to avoid quality issues like “blurry, distorted, inconsistent features”
  • Webhook integration: Use webhooks for production workflows to handle async completion