Convert unstructured data to structured

Hariharamoorthy Theriappan

o    Patent filed for pAIges – an intelligent document extraction product based on AI/ML.
o    Designed, Implemented MLOps (for pAIges) using Azure ML studio. Built data storage, pre-processing, hyper-parameter search (Bayesian, grid), model training, model selection, versioning and model deployment (as REST Api)
o    Designed, implemented pAIges as a cloud based (private cloud) multi-tenant, subscription model with customizable customer specific post-extraction and data storage components. Implemented data security at each architecture component level, assessed by external auditors for Vulnerablity & Penetration testing. Brought governance and processes for data security within team and ready for ISO 27001 assessment
o    Designed, Implemented Event-driven architecture for pAIges extraction and metering using kafka, and Azure serverless components
o    Achieved high throughput and scale using Docker + Azure Kubernetes and serverless architecture with optimized cost. E.g. extraction for 500 documents in an hour with a cost of 1 Re/page
o    Extraction of signatures, seals/stamps, tick marks implemented using object detection CV model technique with 99.9% accuracy in pAIges
o    GIS mapping of buildings, roads, and trees using Segment Anything model with 90% accuracy. Fine tuning the model using semantic segmentation technique. Implemented a technique to split the image into smaller tiles, applying the model and joining the tiles back with exact tracing of the masks from each split
Like this project
0

Posted Feb 2, 2024

Enterprise data comes in various unstructured format. This product extracts structured data required to integrate with ERPs, CRMs, etc.,

Gen AI
Gen AI
Text classification & Summarization
Text classification & Summarization