Data extraction pipeline.

Achint Achu

Implemented a parallelized data extraction pipeline with diverse sources (web crawling, tabular data,
PDFs, OCR, zipped files), leveraging Redis and Celery in Python to efficiently generate embeddings and
store data in the Pincone vector database.
Like this project

Posted Jun 23, 2024

Implemented a parallelized data extraction pipeline with diverse sources (web crawling, tabular data,PDFs, OCR, zipped files).

Knowledge base Chat Application
Knowledge base Chat Application
AI application
AI application

Join 50k+ companies and 1M+ independents

Contra Logo

© 2025 Contra.Work Inc