A Python CLI that takes two source videos and an audio track, detects the song's BPM, cuts between the two sources on every beat, applies a subtle color grade, and exports a finished 9:16 MP4 ready to post.
Every behavior is driven by a JSON config, so the client can tweak the look and iterate on new edits.
The engine auto-selects between two modes based on edit length. A 15-second Punch mode uses tight, beat-grid-locked cuts for viral hooks.
A 30-second Breathe mode uses looser cuts with occasional 808 bass snaps for cinematic reels.
Audio analysis uses librosa, tuned looser than default for trap and hip-hop, with a half/double clamp to fix the common BPM double-reporting issue.
A second pass isolates sub-bass frequencies to detect 808 attacks.
Cut planning walks the beat grid, keeps each beat as a possible cut with a configurable probability, and enforces a minimum segment duration.
In the 30-second mode, cuts within tolerance of an 808 snap to the actual bass attack. Source alternation has a max-consecutive cap so the edit never reads as A-B-A-B.
Rendering runs through FFmpeg subprocess calls, with frame-accurate extraction, concat-demuxer stitching, and a single mux pass for audio.
Long silent stretches are held through, not cut across.
Stack
Python
librosa
scipy
NumPy
FFmpeg via subprocess
Docker multi-stage build with a static FFmpeg binary
pytest with 100 percent test coverage
ruff
mypy strict mode
pre-commit hooks
GitHub Actions CI pipeline that runs lint, typecheck, and test on every push
Like this project
Posted Apr 20, 2026
A Python CLI that takes two source videos and an audio track, detects the song's BPM, cuts between the sources on every beat, and exports a finished 9:16 MP4.