Saraiva AI: WhatsApp Voice to Text Automation

Guilherme

Guilherme Arquer Giacometti

Saraiva AI is an experimental automation project that turns WhatsApp voice messages into intelligent text summaries using AI + no-code orchestration.
Built with n8n, Groq (Whisper Large v3), and Google Gemini, the workflow receives audio messages through the WhatsApp API (Evolution), validates permissions, and routes each message through a contextual processing pipeline.
The system automatically:
Detects whether the message comes from a private or group chat.
Applies different AI treatments depending on the duration and type of audio.
Transcribes the message using Whisper Large v3 hosted on Groq for ultra-fast inference.
Processes the text through a custom framework (OGRT – Optimization, Hook, Summary, Transcription).
Returns a structured, concise response directly to the WhatsApp user.
Short audios (≤40s) are summarized by an assistant model (“Pangeia”), producing quick actionable insights. Long audios (>40s) are processed by the “Ultron” model using the OGRT method to generate deeper analysis and formatted transcription.
Security and governance controls include:
Group permission validation and sender identification filters.
Base64 verification and timeout control for large files.
Complete LGPD-compliant design and encryption across the workflow.
The system was designed to save time, improve accessibility, and demonstrate how AI can transform everyday communication into structured, useful information — all built entirely with no-code tools.
🔗 Live Demo:
Technologies Used
n8n (workflow orchestration)
Groq (Whisper Large v3 for transcription)
Google Gemini (NLP processing)
WhatsApp API (Evolution)
Framework OGRT (Optimization, Hook, Summary, Transcription)
Like this project

Posted Oct 30, 2025

Automated WhatsApp voice message transcription and summarization using AI and no-code tools.