
.env file.example.env template into .env First create the file, after creating it move it into the main folder of the project in Google Colab, in my case privateGPT..env file.langchain loads the SentenceTransformers embeddings, the first time you run the script it will require internet connection to download the embeddings model itself.source_documents directory.csv: CSV,.docx: Word Document,.doc: Word Document,.enex: EverNote,.eml: Email,.epub: EPub,.html: HTML File,.md: Markdown,.msg: Outlook Message,.odt: Open Document Text,.pdf: Portable Document Format (PDF),.pptx : PowerPoint Document,.ppt : PowerPoint Document,.txt: Text file (UTF-8),db folder containing the local vectorstore. Will take 20-30 seconds per document, depending on the size of the document. You can ingest as many documents as you want, and all will be accumulated in the local embeddings database. If you want to start from an empty database, delete the db folder.exit to finish the script or simply interrupt in the case of Google Colab.python privateGPT.py --help in your terminal.LangChain you can run the entire pipeline locally, without any data leaving your environment, and with reasonable performance.ingest.py uses LangChain tools to parse the document and create embeddings locally using HuggingFaceEmbeddings (SentenceTransformers). It then stores the result in a local vector database using Chroma vector store.privateGPT.py uses a local LLM based on GPT4All-J or LlamaCpp to understand questions and create answers. The context for the answers is extracted from the local vector store using a similarity search to locate the right piece of context from the docs.GPT4All-J wrapper was introduced in LangChain 0.0.162.Posted Sep 5, 2025
Developed a private LLM interaction system using Google Colab.