o
Patent filed for
pAIges – an intelligent document extraction product based on AI/ML.
o Designed, Implemented MLOps (for pAIges) using Azure
ML studio. Built data storage, pre-processing, hyper-parameter search
(Bayesian, grid), model training, model selection, versioning and model
deployment (as REST Api)
o Designed, implemented pAIges as a cloud based
(private cloud) multi-tenant, subscription model with customizable customer
specific post-extraction and data storage components. Implemented data security
at each architecture component level, assessed by external auditors for
Vulnerablity & Penetration testing. Brought governance and processes for
data security within team and ready for ISO 27001 assessment
o Designed, Implemented Event-driven architecture for
pAIges extraction and metering using kafka, and Azure serverless components
o Achieved high throughput and scale using Docker +
Azure Kubernetes and serverless architecture with optimized cost. E.g.
extraction for 500 documents in an hour with a cost of 1 Re/page
o Extraction of signatures, seals/stamps, tick marks
implemented using object detection CV model technique with 99.9% accuracy in
pAIges
o GIS mapping of buildings, roads, and trees using
Segment Anything model with 90% accuracy. Fine tuning the model using semantic
segmentation technique. Implemented a technique to split the image into smaller
tiles, applying the model and joining the tiles back with exact tracing of the
masks from each split
Like this project
Posted Feb 2, 2024
Enterprise data comes in various unstructured format. This product extracts structured data required to integrate with ERPs, CRMs, etc.,