Veo 3.1 Reference-to-Video API

Google Veo 3.1 Reference-to-Video

Generate videos with consistent characters and objects using reference images. Perfect for storytelling and multi-scene projects.

Reference-to-Video is a specialized video generation mode that maintains visual identity consistency across generated content. By providing 1-3 reference images, you can create videos where characters, objects, or subjects maintain their appearance throughout the scene. This is ideal for creating coherent narratives, character-based content, and multi-scene projects where visual consistency is critical.

Key capabilities

Character consistency: Maintain visual identity of characters across video generation
Multi-reference support: Use 1-3 reference images for subject consistency
Multi-resolution output: Generate videos in 720p, 1080p, or 4K resolution
Native audio generation: Includes dialogue and sound effects synthesis
Fixed 8-second duration: Optimized duration at 24 FPS for cinematic quality
Aspect ratio control: 16:9 (landscape) or 9:16 (portrait) formats
Negative prompts: Specify elements to avoid in generation
Long prompts: Up to 20,000 characters for detailed scene descriptions

Use cases

Storytelling: Create multi-scene narratives with consistent characters
Brand mascots: Generate videos featuring consistent brand characters
Product showcases: Maintain product appearance across different scenes
Character animation: Bring illustrated or photographed characters to life consistently
Social media series: Create episodic content with recurring characters
Advertising campaigns: Produce multiple ads with consistent spokesperson

How it differs from Image-to-Video

Feature	Image-to-Video	Reference-to-Video
Input	Single image to animate	1-3 reference images + prompt
Purpose	Animate a specific image	Generate new scenes with consistent subjects
Output	Animation of the input image	New video featuring reference subjects
Duration	4, 6, or 8 seconds	Fixed 8 seconds
Modes	Standard and Fast	Single mode

Generate with Reference-to-Video

Create videos with consistent characters and objects using reference images.

POST /v1/ai/reference-to-video/veo-3-1

Create a reference-to-video task

GET /v1/ai/reference-to-video/veo-3-1

List all reference-to-video tasks

GET /v1/ai/reference-to-video/veo-3-1/{task-id}

Get task status by ID

Parameters

Parameter	Type	Required	Description
`image_urls`	`array`	Yes	Array of 1-3 reference image URLs (HTTPS, publicly accessible)
`prompt`	`string`	Yes	Text describing the video scene with reference subjects (max 20,000 chars)
`negative_prompt`	`string`	No	Text describing what to avoid in the video
`resolution`	`string`	No	Output resolution: `"720p"`, `"1080p"`, or `"4k"` (default: `"720p"`)
`aspect_ratio`	`string`	No	Video format: `"16:9"` or `"9:16"` (default: `"16:9"`)
`generate_audio`	`boolean`	No	Generate audio with dialogue and effects (default: `true`)
`seed`	`integer`	No	Random seed for reproducibility
`webhook_url`	`string`	No	URL for task completion notification

Example request

curl -X POST "https://api.magnific.com/v1/ai/reference-to-video/veo-3-1" \
  -H "x-magnific-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "image_urls": [
      "https://example.com/character-front.jpg",
      "https://example.com/character-side.jpg"
    ],
    "prompt": "The character walks through a futuristic city at night, neon lights reflecting on wet streets",
    "negative_prompt": "blurry, low quality, distorted",
    "resolution": "1080p",
    "aspect_ratio": "16:9",
    "generate_audio": true
  }'

Frequently Asked Questions

What is Reference-to-Video and how is it different from Image-to-Video?

Reference-to-Video uses reference images to maintain visual consistency of subjects (characters, objects) while generating entirely new video scenes. Image-to-Video animates a single input image directly. Use Reference-to-Video when you need to create multiple scenes with the same character or object looking consistent.

How many reference images should I provide?

You can provide 1-3 reference images. Using multiple images from different angles improves consistency. For characters, include front-facing and profile views. For objects, include various angles to help the model understand the complete appearance.

What makes good reference images?

Good reference images are:

High resolution and well-lit
Show the subject clearly without obstructions
Include different angles when using multiple images
Have consistent appearance of the subject across images
Use HTTPS URLs that are publicly accessible

Why is the duration fixed at 8 seconds?

The 8-second duration at 24 FPS is optimized for reference-to-video generation, providing enough time for meaningful scenes while ensuring high-quality consistency of the reference subjects throughout the video.

Does Reference-to-Video have a Fast mode?

Currently, Reference-to-Video is available in a single mode optimized for quality and consistency. Unlike Text-to-Video and Image-to-Video, there is no Fast variant for Reference-to-Video.

How does audio generation work with reference subjects?

When generate_audio is enabled (default), the model generates synchronized audio including dialogue and sound effects appropriate to the scene. If your reference subject is a person and the prompt describes them speaking, the audio will include synthesized dialogue.

Best practices

Multiple reference angles: Provide 2-3 images showing different angles of your subject for best consistency
Clear subjects: Use reference images where the subject is clearly visible and unobstructed
Consistent lighting: Reference images with similar lighting produce more coherent results
Descriptive prompts: Describe how the reference subject should act in the scene
Scene context: Include environment and action details in your prompt
Negative prompts: Use to avoid quality issues like “blurry, distorted, inconsistent features”
Webhook integration: Use webhooks for production workflows to handle async completion

Veo 3.1 Text-to-Video: Generate videos from text prompts without reference images
Veo 3.1 Image-to-Video: Animate a single image into video
Kling 2.6 Motion Control: Transfer motion from reference videos
RunWay Act Two: Character performance with reference video

Get Started

APIs

Veo 3.1 Reference-to-Video API | Magnific API

Google Veo 3.1 Reference-to-Video

Key capabilities

Use cases

How it differs from Image-to-Video

Generate with Reference-to-Video

POST /v1/ai/reference-to-video/veo-3-1

GET /v1/ai/reference-to-video/veo-3-1

GET /v1/ai/reference-to-video/veo-3-1/{task-id}

Parameters

Example request

Frequently Asked Questions

Best practices

Get Started

APIs

Google Veo 3.1 Reference-to-Video

​Key capabilities

​Use cases

​How it differs from Image-to-Video

​Generate with Reference-to-Video

POST /v1/ai/reference-to-video/veo-3-1

GET /v1/ai/reference-to-video/veo-3-1

GET /v1/ai/reference-to-video/veo-3-1/{task-id}

​Parameters

​Example request

​Frequently Asked Questions

​Best practices

​Related APIs

Key capabilities

Use cases

How it differs from Image-to-Video

Generate with Reference-to-Video

Parameters

Example request

Frequently Asked Questions

Best practices

Related APIs