The server-side engine behind a text-to-speech SaaS. I architected the backend in Python and FastAPI, deployed on Cloudflare Workers, integrating ElevenLabs and other top-tier AI voice models — with a focus on reliably generating, streaming, and storing large audio files at scale.
The Challenge
AI voice generation is heavy: requests are slow, audio files are large, and users expect a smooth experience anyway. The client needed a backend that could orchestrate calls to multiple AI voice providers, handle big audio payloads without timing out, and stay responsive as usage grew.
What I Built
A Python / FastAPI backend orchestrating text-to-speech generation end to end
Integration with ElevenLabs and other leading AI voice models behind a unified API
Cloudflare Workers for edge delivery and scalable request handling
Robust handling of large audio files — generation, streaming, and storage
A TypeScript-friendly API contract for clean front-end integration
Tech Stack
Python, FastAPI, Cloudflare Workers, ElevenLabs, and TypeScript.
Outcome
The SaaS launched with a backend that turns text into high-quality speech reliably and at scale — abstracting multiple AI voice providers behind one clean API and handling large audio workloads without breaking a sweat.
Like this project
Posted Dec 22, 2025
Backend architecture for a text-to-speech SaaS — Python/FastAPI services on Cloudflare Workers, integrating ElevenLabs and top AI voice models, robust handling.