PDF Chatbot on AWS-Lambda

Raj Wankhede

PDF Chatbot on AWS Lambda

Overview

This guide outlines the steps to set up a PDF Chatbot using OpenAI. The process involves creating a VPC, setting up an RDS PostgreSQL DB, creating Lambda functions, layers, and interacting with the application using Postman.

Pre-requisite mandatory steps to use this repository

1. Create VPC

Create a VPC with Private and Public Subnet.

2. Create RDS PostgreSQL DB

Create an RDS PostgreSQL DB with the created VPC in Private subnet.
Note down the "DB name", "Master username", "Master password" and "Endpoint URL" for Lambda function configuration.

3. Clone GitHub Repository

Clone the GitHub folder (or Download) and navigate to the 00-Lambda-Layers folder, where 3 zip files are available.

4. Create Lambda Layer

Go to Lambda Layers (in the same region as the created Lambda function) and click on "Create Layer".
Provide Name, Description (optional), and select "Upload a .zip file" option (or use S3). Choose Layer-01-Flask-langchain-openai.zip.
Tick x86_64.
Under Compatible Runtime, select Python3.11.
Click Create.
Repeat above steps and select the other zip files: Layer-02-pinecone-psycopg2-PyPDF2-tqdm-Werkzeug-tiktoken.tzip and Layer-03-PyMuPDF.zip.
This process shall create multiple Lambda layers.

5. Create Lambda Function

Create one Lambda function for each directory in the Github repo - ManualUpload/OpenAI/AzureOpenAI/Bedrock (except for Lambda-Layers) using python3.11 runtime and x86_64 architecture. Leave everything else to default.

6. Add Layers to Lambda Function

From Lambda function go to -> “Code” section, scroll down to the “Layers” section and click on “Add a layer”.
Select “Custom layers” and choose the layers created in previous step.
Perform the same for other Lambda functions created in step 5.

7. Configuration on all 3 Lambda Functions

Change timeout to 15min and RAM to 512MB:
Create Environment Variables common for all the functions:
Additional Environment variables for OpenAI Lambda function:
Additional Environment variables for Azure OpenAI Lambda function:
Additional Environment variables for Bedrock Lambda function:
Enable Function URL:
Change the Lambda handler
Provide Lambda Execution Role access to EC2/S3/RDS:
Configure VPC access to Lambda function:

8. Create S3 Bucket

Create an S3 bucket (e.g., use-s3-bucket-as-input) and create 2 folders under it: processed_files/ and uploaded_files/.
Example: s3://uploaded_files/ s3://processed_files/

9. Organize Files in S3 Bucket

Under S3://uploaded_files folder, create a folder and a sub-folder within that folder. This correspond to “user_id” and “deployment_id” (sent via request body/form-data).
Example:
s3://use-s3-bucket-as-input/uploaded_files/user-1234/dep-1234/

10. Upload PDF Files to S3 Bucket

Upload the PDF files under folder s3://<bucket-name>/uploaded_files/<user_id>/<deployment_id>/.
Example: s3://use-s3-bucket-as-input/uploaded_files/user-1234/dep-1234/test-1.pdf s3://use-s3-bucket-as-input/uploaded_files/user-1234/dep-1234/test-2.pdf s3://use-s3-bucket-as-input/uploaded_files/user-1234/dep-1234/test-3.pdf

11. Other Configurations

Kindly refer README.md file of the respective folder 01-ManualUpload, 02-OpenAI, 03-AzureOpenAI for additional Configuration related to specific application and test the same using Postman application.
Like this project
0

Posted Jan 29, 2024

Set up a PDF Chatbot using OpenAI. The process involves creating a VPC, RDS PostgreSQL DB, creating Lambda function using OpenAI/Azure OpenAI/Bedrock.

Run Scrapy command from AWS Lambda function using AWS Serverless
Run Scrapy command from AWS Lambda function using AWS Serverless
AWS SNS with TopicPolicy to invoke Lambda
AWS SNS with TopicPolicy to invoke Lambda