I have always been fascinated by music direction. The way a song can hold an entire era of your life in three minutes. The way a visual cut at the right moment can make you feel something you forgot you were carrying.
When I saw this challenge, it did not feel like a competition brief. It felt like a reminder. A reminder that I had always wanted to do this and kept waiting for the right tools.
This is a music video about everyone you lose not to death, not to arguments, just to time. The childhood crew. The best friend at 14 who knew everything about you. The first love. The last great summer before university scattered everyone. The group chat that went quiet somewhere around 23. The hug at a bus stop that you did not know was the last one.
Those people do not leave loudly. They just stop appearing. And one day you are on your bedroom floor with a shoebox in your lap and half the people who shaped you are strangers now.
Polaroid of You is the music video for that feeling. The one nobody talks about because there is no clean word for it. Not heartbreak. Not grief. Something quieter and more permanent than both.
How I Actually Built This
The most important decision I made was starting with the song.
Not the visuals. Not the concept. The song first.
I spent time getting the track exactly right in Suno, Once the song was locked, I mapped every visual decision to the music's emotional temperature. The visuals serve the audio. Not the other way around.
Then I built the structure. 30 emotional blocks across the full 4 minutes. Each block a different era, a different set of people, a different kind of loss. Within those blocks I ended up generating over 45 individual scene nodes - A-rolls, B-rolls, rapid-cut fillers, long atmospheric holds, Super 8 flashback sequences, cold present-day static shots. Some blocks had multi-scene generation inside a single prompt. Some needed long fillers for breathing room between emotional peaks. Some needed B-roll texture to let the A-roll land harder.
I had never made a music video before in my life.
What Melius Actually Did
Here is the part I want to be honest about because it genuinely surprised me.
The old way of doing this would have been weeks of juggling separate tools, one for audio, one for image generation, one for video, manually carrying outputs between platforms, losing consistency between sessions, rebuilding style references from scratch every time.
Melius collapsed all of that into one canvas.
But the thing that changed everything was talking to the AI to build the nodes.
I did not sit there manually wiring every connection. I described what I needed, "I want a block that captures a group of friends on a beach, the last great summer, Super 8 warm grade, handheld" and the agent structured the node, selected the right model, and handled the generation logic on my behalf. I was directing. It was executing. That distinction matters more than I can explain. The canvas became a visual map of the entire film. I could see every block, every node, every connection at once. When something was not working I could step into any node and adjust the prompt directly without rebuilding anything around it. The whole structure stayed intact. No tab switching. No rebuilding. No losing the thread of the story.
The Character Sheet Breakthrough
The faces were the hardest part. Seedance and most video models will occasionally refuse facial generation, that is a real limitation and it happened to me multiple times during this build.
The solution that changed everything: I generated a Reference Character Sheet first.
Not just a prompt. A full character sheet, two versions of the same person. A younger self, warm lit, Super 8 grain, age 10. A current self, cold blue grade, present day. I fed that sheet as a reference node into every subsequent block that needed her face. It gave me time travel. The same person across decades, consistent, without rebuilding identity from scratch every time.
When the facial generation did get refused, I had the reference to reapproach the prompt from a different angle. That character sheet became the spine of the entire production. One node. Wired everywhere. Holding the whole story together.
12 Hours. Honest.
I will not pretend this was effortless. 45 plus nodes, iterative generation, prompt refinement, character consistency troubleshooting, model refusals, rebuilding blocks that did not land, this took 12 real hours of work.
But here is the comparison that matters: before tools like this existed, what I built in 12 hours would have required a production team, weeks of pre-production, separate software subscriptions for every layer, and a budget I do not have. I am a marketer. No filmmaking background. No cinematography training. No video production experience.
I came in with a feeling and a song. I left with a music video that made me stop and sit quietly when I watched it back.
That is what Melius made possible.
Platform Feedback
The node-based canvas is genuinely the right interface for this kind of creative work. Seeing the entire production as a connected visual map, being able to step into any node, adjust, and regenerate without breaking the structure around it, that is how creative thinking actually works. Non-linear, iterative, spatial.
The AI agent building nodes through conversation is the feature I would tell every creative about first. You describe what you need in plain language, the feeling, the grade, the era, the camera energy, and the agent structures the node, selects the right model, and handles the generation logic on your behalf. You are not learning software. You are directing. That distinction is everything.
Music videos are one of the most expensive formats in creative production. A professional shoot with crew, locations, colour grade, and post can cost anywhere from £10,000 to £500,000. That price tag has always meant only artists with label backing or industry connections could tell their story visually. Melius changes that equation completely. An independent marketer, a bedroom musician, an unsigned artist, a first-time creative director, anyone who has a story and cares enough to see it through can now build something with real cinematic production value. In an afternoon. On a laptop. Without a single crew member. That is not a small thing. That is a fundamental shift in who gets to make art.
Honest feedback: long-form cinematic production at this scale is credit-intensive. A long-form storyteller tier with optimised rendering costs for multi-block narrative projects would unlock creative ambition that currently has to be rationed mid-project. That is worth building.
And the character sheet approach for facial consistency across time periods deserves to be a native feature. A character anchor node that locks identity across an entire canvas would be transformative for anyone making narrative video at this scale.
A Final Note
Music direction is something I have always been drawn to. The way a cut lands at the right moment. Where the silence lives. Where the camera holds one beat longer than feels comfortable.
I never thought I could do that without a team, a budget, and years of technical training.
What I built in 12 hours on Melius would have been a £20,000 production two years ago. Now it is a canvas, a conversation, and someone who cares enough about the story to stay up until 4AM making sure it lands right.
That barrier coming down matters. Not just for competitions. For every person who has ever had a visual idea they could not afford to make real.
If you have a shoebox somewhere with old photos, ticket stubs, a friendship bracelet from someone you have not spoken to in years, this video is for you.
Canvas: https://app.melius.com/projects/e509c789-97e3-4b48-8d49-a8f109fa1887/canvas/6fc36300-4715-478b-8b3c-21035cedcc27
Social: https://x.com/S_Vipashyana/status/2056937073114800396?s=20