Research prototype built during my internship at the Centre for Advanced Research Studies, GUNI.
A full end-to-end hybrid pipeline: Transformer-based symbolic planning generates MIDI structure, a Mel-Spectrogram Diffusion U-Net renders audio, and Direct Preference Optimization (DPO) aligns outputs to human preferences — all from a single text prompt.
This is research-grade work, not a tutorial clone. Architecture decisions, compute tradeoffs, and qualitative evaluation were all done hands-on. Research paper in progress.
Research prototype built during my internship at the Centre for Advanced Research Studies, GUNI.
A full end-to-end hybrid pipeline: Transformer-based symboli...