Serotonin Script: AI Medical Content Engine & RAG Pipeline by Serhii LukashSerotonin Script: AI Medical Content Engine & RAG Pipeline by Serhii Lukash

Serotonin Script: AI Medical Content Engine & RAG Pipeline

Serhii Lukash

Completed work

AI Automation

Backend Engineer

Data Engineer

LlamaIndex

Python

Qdrant

🛠 Serotonin Script: Enterprise AI Medical Content Engine & RAG Pipeline

❌ The Problem

Healthcare professionals and medical platforms spend 20+ hours per week writing and editing social media content to maintain authority. Generic AI tools are a massive legal liability here: they hallucinate medical facts (violating compliance guidelines) and produce robotic text that destroys the physician's unique credibility.

The Cost: $30K–$60K/year wasted on manual content creation and formatting overhead.

🛠 What I Built

An autonomous, enterprise-grade AI medical content engine that acts as a secure, style-preserving publisher. From a single /draft command inside a secure Slack channel, the system triggers real-time clinical fact-checking, replicates the physician's exact writing style, and handles automated distribution.

System Architecture — Comprehensive Request Lifecycle & RAG Pipeline

📊 Business Impact & ROI

⚙️ How It Works (The Value Pipeline)

1. Style-Preserving Hybrid RAG Loop

Nuance Matching: The system extracts the physician's vocabulary, sentence structure, and specific terminology from past posts using a Qdrant vector database with dense + sparse (BM25) search layers.

Feedback Loop: Approved and published posts are automatically vectorized back into the database, allowing the AI to continuously adapt to the doctor's evolving voice.

2. Clinical Real-Time Verification

Every claim is automatically cross-checked against live clinical data via the PubMed API (NCBI E-utilities). The system isolates medical claims from raw drafts, validates them against medical literature, and generates precise citations before the post is sent for final approval.

Slack Block Kit UI: Multi-stage verification status, medical sources citations, and interactive actions

3. Intelligent Resilient Infrastructure

Dual-LLM Fault Tolerance: Uses Anthropic Claude 3.5 Sonnet for high-tier medical reasoning with an automated, low-latency OpenAI GPT-4o fallback router to survive API rate limits or downtime.

Cost-Efficient Context Stripping: Pre-parsing removes heavy HTML/DOM noise before sending data to the LLM, dramatically cutting token usage.

4. One-Click Multi-Platform Publishing

Once approved via the interactive Slack interface, a self-hosted n8n orchestration layer secure-posts the content natively to X (Twitter) API v2, Threads API, and Telegram Bot API without manual copy-pasting.

Production Observability: Real-time Grafana dashboard tracking task distribution, worker queues, and Loki log aggregation

💻 Technical Foundation (For CTOs & Founders)

API Architecture: Async Python 3.13, FastAPI (SQLAlchemy 2.0 asyncpg sessions, Alembic)

Task Management: Async-native Taskiq worker pool with custom Redis brokers, achieving a memory footprint of just 50–80 MB per worker and sub-2-second container cold starts

Quality Assurance: Rigid testing pipeline with 98% automated code coverage verified via Pytest

Observability: Prometheus metrics collection, Grafana visualization dashboards, and Loki structured log aggregation (monitoring token budgets, latency, and queue health)

Performance: API endpoint response latency is kept under 500ms by immediate task offloading to background workers

n8n Orchestration Workflow

👥 Who Benefits From This System

Medical Clinics & Practices: Build dominant digital authority and attract patients with zero writing overhead and 100% legal compliance.

Healthcare SaaS Platforms: Embed verified, compliant AI content generation as a core scalable feature.

Individual Physicians: Scale medical education across major platforms without suffering burnout.

Like this project

Completed work

Posted May 22, 2026

Medical AI content RAG engine: Python 3.13, FastAPI, Qdrant. Real-time PubMed verification, Claude 3.5/GPT-4o fallback, async Taskiq & 98% code coverage.

Likes

Views