Serverless MLOps Pipeline for RAG by Mohsin SheikhaniServerless MLOps Pipeline for RAG by Mohsin Sheikhani

Serverless MLOps Pipeline for RAG

Mohsin Sheikhani

Cloud Infrastructure Architect

Software Engineer

AI Engineer

Amazon EC2

AWS

AWS Lambda

Artificial Intelligence

DocInsight

A Serverless MLOps Pipeline for Inference and Retrieval-Augmented Generation (RAG)

About the Project

DocInsight is a fully serverless, MLOps-enabled document intelligence system built on AWS. It automates the lifecycle of unstructured document processing: from extraction and semantic understanding to natural language querying and AI-generated answers.

This project implements a serverless MLOps pipeline for AI-driven document processing, automating text extraction, embedding generation, and natural language querying. It leverages AWS Textract, SageMaker, OpenSearch, Bedrock, EventBridge, Step Functions, and is deployed using AWS CDK.

Architecture

Component Role Amazon Textract Extracts structured text from PDFs/images Amazon SageMaker (Cohere v3) Generates semantic embeddings Amazon OpenSearch Stores vectors for retrieval-based search Amazon Bedrock (Claude) Generates final natural language answers Step Functions Orchestrates the end-to-end workflow Amazon EventBridge Triggers Step Function on S3 upload Amazon S3 Stores uploaded documents Dead Letter Queue Captures failed EventBridge invocations API Gateway + Lambda Enables file upload and question answering AWS CDK Provisions and deploys the infrastructure

Features

Upload Document:

Users upload PDFs/images via API Gateway → S3 → EventBridge automatically triggers Step Function Workflow.

Step Function Workflow:

Starts an asynchronous Textract Job

Monitors and waits for job completion

Extracts and chunks text

Invokes SageMaker model to generate semantic embeddings

Stores embeddings in Amazon OpenSearch for semantic search

Semantic Search:

Converts user questions into embeddings → finds the most relevant document chunks.

LLM Response Generation:

Uses Amazon Bedrock (Claude) to generate context-aware answers based on the retrieved document context.

Fault Tolerance:

EventBridge failures are routed to a Dead Letter Queue (DLQ) for inspection and retries.

Use Cases

Document Q&A for Enterprises

Medical/Financial/Legal document processing

Internal knowledge base automation

API Endpoints

Endpoint Method Description /upload POST Upload a binary PDF/image file via API Gateway to S3 /ask POST Accepts a natural language question and returns an AI-generated answer

Demo

Sample Upload

Sample Response Generation

Getting Started

Prerequisites

AWS CLI + CDK configured

Node.js + Typescript environment

Subscription to the embedding model and Claude for response generation

Fill up the .env file, variables used for OpenSearch Domain

Clone the Repository

git clone https://github.com/mohsinsheikhani/DocInsight.git
cd docinsight

Setup

cd docinsight
npm install
cdk deploy

🚀 Follow me on LinkedIn for more AWS content!

Like this project

Posted Sep 9, 2025

This project is a serverless MLOps RAG pipeline for AI-driven document processing, automating text extraction, refinement, and analysis.

Likes

Views

Timeline

Feb 3, 2025 - Feb 27, 2025