Data extraction pipeline. by Achint AchuData extraction pipeline. by Achint Achu

Data extraction pipeline.

Achint Achu

Achint Achu

Implemented a parallelized data extraction pipeline with diverse sources (web crawling, tabular data,
PDFs, OCR, zipped files), leveraging Redis and Celery in Python to efficiently generate embeddings and
store data in the Pincone vector database.
Like this project

Posted Jun 23, 2024

Implemented a parallelized data extraction pipeline with diverse sources (web crawling, tabular data,PDFs, OCR, zipped files).