AI-Powered PDF Data Extraction
My role: AI Data Processing and Extracton Engineer
Organizations often struggle to extract structured and useful information from large volumes of unstructured PDF documents.
I developed a flexible AI-powered data extraction solution that allows users to define the specific entities and fields they want to retrieve. The system processes different PDF formats, identifies relevant information, and converts it into structured, usable data.
The solution reduces manual document processing, improves retrieval accuracy, and can be adapted to different document types and business requirements.
A working demo link is attached.
0
14
AI Agents & RAG Chatbots with Persistent Memory
I design and build intelligent AI agents and chatbots that maintain conversation context, retrieve reliable information, and interact with external tools and APIs.
Core Capabilities
š¹ Persistent conversation and long-term memory
š¹ RAG-powered answers with reduced hallucinations
š¹ Tool calling, APIs, web search, and file retrieval
š¹ Multi-agent and multi-step workflows
š¹ Integration with OpenAI, Claude, Gemini, and open-source LLMs
š¹ Vector databases including pgvector, Pinecone, Weaviate, FAISS, and ChromaDB
Technologies
LangGraph, LangChain, Agno, PydanticAI, Haystack, FastAPI, OpenAI, Claude, Gemini, Hugging Face, PostgreSQL, pgvector, Pinecone, and Weaviate
0
17
AI Vision for Retail, Industrial & Monitoring Workflows
Overview
I have built and deployed multiple real-world computer vision systems for industrial inspection, retail automation, and monitoring workflows.
My responsibilities covered:
š¹ Dataset preparation and labeling
š¹ Object detection model training
š¹ Segmentation model training
š¹ YOLO-based detection and tracking
š¹ Image/video inference pipeline development
š¹ Model evaluation and threshold tuning
š¹ Production deployment support
š¹ Cloud server management and optimization
š¹ Building practical AI workflows for real-world operational environments
Fish Quality Inspection System - lythium.cl (http://lythium.cl)
I led the development of an advanced fish quality inspection solution for an industrial workflow.
The system used image analysis to monitor fish quality and support automated fish sorting based on AI predictions.
š¹ Led the development of an advanced AI-powered fish quality inspection system for an industrial workflow.
š¹ Built an image analysis pipeline to monitor fish quality from production-line images.
š¹ Trained object detection models to identify fish and relevant visual quality indicators.
š¹ Trained segmentation models to support more detailed visual inspection of fish regions.
š¹ Designed the AI workflow to support automated fish sorting based on model predictions.
š¹ Worked on inspection logic that could classify or route fish based on quality-related outputs.
š¹ Designed the system for conveyor-belt usage, where images need to be processed consistently and reliably.
š¹ Focused on production issues such as image quality, camera consistency, lighting variation, and model reliability.
š¹ Helped convert visual inspection from a manual/rule-based workflow into an AI-supported inspection pipeline.
š¹ Built the system to reduce manual inspection effort and improve production workflow efficiency.
Shelfr.ai (http://Shelfr.ai) - Retail Automation Platform
I developed AI image solutions for retail automation and execution. The system handled large-scale product detection across 10,575+ SKUs, price tag detection, shelf and display type detection, and gap detection for empty shelf spaces.
š¹ Developed large-scale AI image solutions for retail automation and execution.
š¹ Worked on product detection across 10,575+ SKUs, where each SKU represented a unique product.
š¹ Built object detection workflows to identify products from retail shelf images.
š¹ Developed price tag detection to locate and extract price label areas from store images.
š¹ Worked on shelf and display type detection to understand the retail environment layout.
š¹ Built gap detection logic to identify empty shelf spaces and out-of-stock areas.
š¹ Supported computer vision workflows for retail compliance, shelf monitoring, and store execution.
š¹ Worked with high-volume image data and production-level inference requirements.
š¹ Managed high-load production servers on Google Cloud Platform.
š¹ Implemented load balancing and autoscaling to improve system stability under production traffic.
š¹ Focused on scalable AI infrastructure capable of handling real-world retail image workloads.
š¹ Helped create AI systems for inventory visibility, shelf condition monitoring, and retail execution analytics.
lake-shield.com (http://lake-shield.com) - USA LAKES - Boat Detection & Inspection System
š¹ Worked on a YOLO-based boat detection, tracking, and monitoring system.
š¹ Labeled datasets for boat detection and inspection model training.
š¹ Prepared image/video data for object detection training workflows.
š¹ Trained YOLO object detection models to detect boats in monitoring footage.
š¹ Built a detection pipeline capable of identifying boats from visual data. š¹ Worked on boat tracking logic to monitor boat movement across frames. š¹ Supported inspection and monitoring workflows using computer vision predictions.
š¹ Developed an end-to-end pipeline from labeled data to trained model and inference output.
š¹ Focused on practical model performance in outdoor environments where lighting, distance, angle, and background can vary.
š¹ Helped build a monitoring system that could support automated detection and review instead of fully manual observation.
My Responsibilities Across These Projects
š¹ Led AI/computer vision system development
š¹ Designed labeling and dataset preparation workflows
š¹ Trained YOLO/object detection models
š¹ Trained segmentation models where needed
š¹ Built image and video inference pipelines
š¹ Evaluated models using practical production metrics
š¹ Improved model performance through dataset cleanup, retraining, and threshold tuning
š¹ Integrated AI models into backend or operational workflows
š¹ Supported production deployment and infrastructure optimization
š¹ Worked with real-world constraints such as lighting, camera angle, image quality, latency, and false detection rates
Technologies Used
š¹ Python š¹ YOLO / YOLOv8 š¹ Object Detection š¹ Image Segmentation š¹ OpenCV š¹ PyTorch š¹ FastAPI š¹ Google Cloud Platform š¹ Linux Servers š¹ Load Balancing š¹ Autoscaling š¹ Custom Data Labeling Workflows š¹ Model Training š¹ Model Evaluation š¹ Inference Pipeline Development š¹ Production AI Deployment
1
41
French Legal AI Assistant & Agentic RAG System
Overview
I designed, built, and deployed a specialized Legal AI Assistant for French lawyers using agentic RAG, legal data pipelines, vector search, reranking, open-source LLMs, and citation-grounded answer generation. The system allowed lawyers to ask legal questions and receive answers grounded in French law articles, legal references, and relevant judicial cases.
Problem / Challenge
Legal data is very different from normal document data. A generic RAG pipeline using fixed-size chunks often breaks legal meaning, misses important context, or retrieves incomplete references.
The main challenges were:
š¹ Legal documents had different structures and lengths
š¹ Articles and laws could not be randomly split into fixed-size chunks
š¹ Each answer needed traceable legal references
š¹ Retrieval had to understand legal scope, not just semantic similarity
š¹ The system needed to reduce hallucinations for legal users
š¹ Deployment had to respect privacy and regulatory requirements
My Expertise
I worked as the Lead AI Engineer / Agentic RAG Developer responsible for the complete system design and implementation.
My responsibilities included:
š¹ Legal data pipeline architecture
š¹ Document parsing and preprocessing
š¹ Custom legal chunking strategy
š¹ Vector database design
š¹ Agentic RAG workflow development
š¹ Retrieval optimization and reranking
š¹ Open-source LLM deployment
š¹ Backend API development with FastAPI
š¹ Secure Azure cloud deployment
š¹ Multi-tenant system support
French Legal Data Engineering Pipeline
I built an automated ETL pipeline to process thousands of French legal documents, articles, and judicial cases.
The pipeline handled:
š¹ Raw legal document ingestion
š¹ Text cleaning and normalization
š¹ Legal article extraction
š¹ Section-aware document structuring
š¹ Custom chunk generation
š¹ Metadata extraction for article number, article title, section, source, and reference
š¹ Embedding generation
š¹ Vector database ingestion
š¹ Repeatable updates for future legal data expansion The chunking strategy was designed so legal articles were not cut in the middle or separated from their meaning.
Agentic RAG Workflow
Instead of using a simple one-step vector search, I built a LangGraph-based agentic RAG workflow.
The workflow included:
š¹ User query understanding
š¹ Legal intent detection
š¹ Legal domain and scope identification
š¹ Generation of 2ā5 targeted legal search queries
š¹ Retrieval of relevant chunks for each query
š¹ Deduplication of repeated results
š¹ Reranking of retrieved legal evidence
š¹ Source-grounded answer generation This improved tested retrieval accuracy from around 50% to 95%+.
Retrieval, Citations & Case Law
The retrieval system was designed to make answers transparent and verifiable.
I implemented:
š¹ Vector search for semantic legal retrieval
š¹ Reranking to improve relevance
š¹ Metadata-based source traceability
š¹ Citation-backed answer generation
š¹ Article-level legal references
š¹ Typesense-based retrieval for French judicial cases
š¹ Supporting case law returned with legal answers This allowed lawyers to verify the exact legal source behind each generated response.
Open-Source LLM & Cloud Deployment
I evaluated and deployed open-source LLM infrastructure for private legal AI usage.
The deployment included:
š¹ Qwen2.5:14B for French legal reasoning
š¹ Ollama and vLLM for model serving
š¹ Embedding and reranker models on a private Azure GPU VM
š¹ NVIDIA T4 16GB GPU optimization
š¹ Python/FastAPI backend APIs
š¹ Secure Azure deployment in the France region
š¹ Multi-tenant isolated access
š¹ GitHub CI/CD and Linux server management The system was designed for privacy, reliability, and regulatory compliance.
Technologies Used
š¹ Python š¹ FastAPI š¹ LangChain š¹ LangGraph š¹ LangSmith š¹ Ollama š¹ vLLM š¹ Qwen2.5:14B š¹ ChromaDB š¹ Typesense š¹ Vector Databases š¹ Reranking Models š¹ Embedding Models š¹ Azure Cloud š¹ Linux š¹ GitHub CI/CD
Impact
š¹ Built a production-ready legal AI assistant for lawyers
š¹ Improved retrieval accuracy from ~50% to 95%+ in tested scenarios
š¹ Reduced hallucinations through citation-grounded generation
š¹ Enabled lawyers to verify answers using article and case references
š¹ Created a scalable legal data pipeline for thousands of documents
š¹ Deployed private open-source LLM infrastructure for legal compliance
š¹ Delivered a strong foundation for future legal AI workflows