Automated Car Commercial Generation Workflow by Karin SuvaryanAutomated Car Commercial Generation Workflow by Karin Suvaryan

Automated Car Commercial Generation Workflow

Karin Suvaryan

Karin Suvaryan

CASE STUDY: Automated Car Commercial Generation Workflow


Client Brief & Challenge

The Client Request A automotive marketing agency approached me with a specific need: they wanted to generate professional car commercials quickly and cost-effectively. Their requirements were:
Client Needs:
Simple Input System - Only 2 inputs: car model + city location
Multiple Camera Angles - 7 different cinematic shots per commercial
Driving Footage - Car must be shown in motion, not static
Beautiful Environments - Cinematic city settings with proper lighting
Professional Quality - Broadcast-ready output
Easy Variation - Ability to change car/city instantly for new videos
Mini-Movie Feel - Complete narrative arc, not just random clips

The Problem:

Traditional car commercial production costs $50,000-$150,000 and takes weeks. They needed:
Cost: Under $10 per commercial
Speed: 10-15 minutes per video
Scale: Ability to produce hundreds of variations
Consistency: Professional quality every time

My Analysis & Strategic Approach

Why Traditional Solutions Failed

Option 1: Stock Footage
❌ Limited variety
❌ Generic, not brand-specific
❌ Can't customize car models or cities
❌ Licensing costs add up
Option 2: Manual AI Generation
❌ Requires 7 separate prompts
❌ Inconsistent style between shots
❌ Time-consuming prompt engineering
❌ No automation
Option 3: Traditional Video Editing
❌ Still requires source footage
❌ Labor-intensive
❌ Not scalable

My Solution Strategy
I identified that the client needed automated orchestration of AI video generation with:
Intelligent prompt generation - AI creates cinematography descriptions
Systematic shot variety - Predefined camera angle archetypes
Workflow automation - No manual intervention between inputs and output
Style consistency - Unified aesthetic across all 7 shots
Modular architecture - Easy to modify parameters

The Weavy Workflow Solution

Architecture Overview

I designed a node-based pipeline that transforms 2 text inputs into a complete commercial:
INPUT LAYER → PROCESSING LAYER → GENERATION LAYER → OUTPUT LAYER
[Car Model Input] ──┐
├──→ [Prompt Concatenator] ──→ [GPT-4] ──→ [Array Splitter] ──→ [Text Iterator] ─┬──→ [Veo #1: Wide Shot]
[City Input] ───────┘ ↓ ↓ ↓ ├──→ [Veo #2: Tracking]
Master Prompt 7 Descriptions Distribution ├──→ [Veo #3: Aerial]
Template Split by *** Mechanism ├──→ [Veo #4: Close-up]
├──→ [Veo #5: POV]
├──→ [Veo #6: Action]
└──→ [Veo #7: Hero]

[Video Concatenator]

[Final 45-sec Commercial]

Detailed Workflow Breakdown

Layer 1: Input Collection (Nodes 1-2)

Node 1: Car Model Text Input
Purpose: Capture specific vehicle
Example: "2024 Porsche 911 Turbo S"
Why separate node: Enables instant car swapping
Node 2: City Location Text Input
Purpose: Define environment and lighting
Example: "Tokyo at night"
Why separate node: Enables instant location changes
Design Rationale: By separating these as independent nodes rather than a single combined input, I enabled maximum reusability. The client can generate 100 different combinations from just 10 cars × 10 cities without rebuilding the workflow.

Layer 2: Intelligent Prompt Generation (Nodes 3-4)

Node 3: Prompt Concatenator
Function: Combines user inputs with master cinematography template
Contents:
User inputs (car + city)
Cinematography instruction set
Shot type definitions
Technical requirements
Master Prompt Template Structure:
[Car Model] driving through [City]

You are an award-winning automotive cinematographer. Generate 7 cinematic shot descriptions.

SHOT TYPES REQUIRED:
1. Wide Establishing - Full scene context
2. Low Angle Tracking - Speed and power
3. Aerial Drone - Urban landscape overview
4. Close-Up Detail - Car craftsmanship
5. Driver POV - Immersive perspective
6. Dynamic Action - High-energy maneuver
7. Hero Beauty - Signature money shot

REQUIREMENTS:
- Car DRIVING in every shot (wheels turning, motion)
- Consistent time of day (night with city lights)
- 30-40 words per description
- Cinematic camera movements
- Professional lighting terminology

FORMATTING:
- Separate shots with ***
- NO numbering or labels
- Pure descriptions only

Generate now:
Design Rationale: This template was refined through 15+ iterations to achieve:
Specificity: Detailed enough for quality output
Flexibility: Works with any car/city combination
Consistency: Enforces unified aesthetic
Technical precision: Uses professional cinematography language

Node 4: GPT-4 LLM
Model Choice: GPT-4 (not 3.5) for superior creative writing
Temperature: 0.7 (balanced creativity + consistency)
Function: Generates 7 unique, professional shot descriptions
Why GPT-4:
Better understanding of cinematography terminology
More creative variation within constraints
Consistent quality across generations
Follows complex multi-part instructions reliably
Wide establishing shot of 2024 Porsche 911 Turbo S cruising Tokyo's neon-lit expressway at night, elevated camera capturing sweeping cityscape with illuminated skyscrapers, LED headlights cutting through mist ***

Low angle tracking shot racing alongside the Porsche as it accelerates through Shibuya, camera hugging asphalt, wheels spinning, city lights streaking, aggressive stance emphasized ***

Aerial drone view circling above the Turbo S navigating Tokyo's elevated highways, bird's eye perspective revealing intricate road networks, car's sleek form cutting through urban geometry ***

[... continues for all 7 shots]

Layer 3: Distribution System (Nodes 5-6)

Node 5: Array Splitter
Function: Converts single text block into 7 separate items
Split Character: *** (three asterisks)
Output: Array of 7 independent descriptions
Why This Is Critical: Without the Array node, all 7 descriptions would go to EVERY video generator, creating duplicate content. The Array creates clean separation.
Technical Detail:The *** separator was chosen because:
✅ LLMs reliably output it ✅ Unlikely to appear in natural text ✅ Easy to parse programmatically ✅ No whitespace sensitivity issues

Node 6: Text Iterator
Function: Distributes array items sequentially to connected nodes
Mechanism: Round-robin distribution
Result: Shot 1 → Veo #1, Shot 2 → Veo #2, etc.
Design Rationale: The Text Iterator is the "traffic controller" of the workflow. It ensures each video generator receives EXACTLY ONE unique shot description. This creates:
✅ Shot diversity (7 different angles)
✅ No duplicates
✅ Predictable ordering
✅ Scalability (could expand to 10, 15, 20 shots)
Like this project

Posted Feb 15, 2026

A automotive marketing agency approached me with a specific need: they wanted to generate professional car commercials quickly and cost-effectively.