Designed and implemented a multimodal pipeline integrating LoRA fine-tuned CLIP and Stable Diffusion v1.5, achieving 21% higher SSIM and 25% higher PSNR over baseline models in forensic sketch generation.
Fine-tuned CLIP using LoRA on self- and cross-attention layers, improving text-sketch alignment by 9% and reducing perceptual error (LPIPS) over iterations validated through ablation studies.
Developed an iterative refinement process, dynamically updating embeddings and prompts, enabling cumulative quality improvements in generated sketches over 5 refinement cycles, enhancing investigative reliability.
Like this project
0
Posted Feb 7, 2025
AI-driven police sketch generation using Stable Diffusion and fine-tuned CLIP for enhanced text-sketch alignment with iterative refinement for accuracy.