Arslan Mehmood - Data Engineer | ContraWork by Arslan Mehmood
Arslan Mehmood

Arslan Mehmood

ML AI | Backend | Computer Vision | GenAI | LLM Agents

New to Contra

Arslan is ready for their next project!

Cover image for AI-Powered PDF Data Extraction
My role:
AI-Powered PDF Data Extraction My role: AI Data Processing and Extracton Engineer Organizations often struggle to extract structured and useful information from large volumes of unstructured PDF documents. I developed a flexible AI-powered data extraction solution that allows users to define the specific entities and fields they want to retrieve. The system processes different PDF formats, identifies relevant information, and converts it into structured, usable data. The solution reduces manual document processing, improves retrieval accuracy, and can be adapted to different document types and business requirements. A working demo link is attached.
0
14
Cover image for AI Agents & RAG Chatbots
AI Agents & RAG Chatbots with Persistent Memory I design and build intelligent AI agents and chatbots that maintain conversation context, retrieve reliable information, and interact with external tools and APIs. Core Capabilities šŸ”¹ Persistent conversation and long-term memory šŸ”¹ RAG-powered answers with reduced hallucinations šŸ”¹ Tool calling, APIs, web search, and file retrieval šŸ”¹ Multi-agent and multi-step workflows šŸ”¹ Integration with OpenAI, Claude, Gemini, and open-source LLMs šŸ”¹ Vector databases including pgvector, Pinecone, Weaviate, FAISS, and ChromaDB Technologies LangGraph, LangChain, Agno, PydanticAI, Haystack, FastAPI, OpenAI, Claude, Gemini, Hugging Face, PostgreSQL, pgvector, Pinecone, and Weaviate
0
17
Cover image for AI Vision for Retail, Industrial
AI Vision for Retail, Industrial & Monitoring Workflows Overview I have built and deployed multiple real-world computer vision systems for industrial inspection, retail automation, and monitoring workflows. My responsibilities covered: šŸ”¹ Dataset preparation and labeling šŸ”¹ Object detection model training šŸ”¹ Segmentation model training šŸ”¹ YOLO-based detection and tracking šŸ”¹ Image/video inference pipeline development šŸ”¹ Model evaluation and threshold tuning šŸ”¹ Production deployment support šŸ”¹ Cloud server management and optimization šŸ”¹ Building practical AI workflows for real-world operational environments Fish Quality Inspection System - lythium.cl (http://lythium.cl) I led the development of an advanced fish quality inspection solution for an industrial workflow. The system used image analysis to monitor fish quality and support automated fish sorting based on AI predictions. šŸ”¹ Led the development of an advanced AI-powered fish quality inspection system for an industrial workflow. šŸ”¹ Built an image analysis pipeline to monitor fish quality from production-line images. šŸ”¹ Trained object detection models to identify fish and relevant visual quality indicators. šŸ”¹ Trained segmentation models to support more detailed visual inspection of fish regions. šŸ”¹ Designed the AI workflow to support automated fish sorting based on model predictions. šŸ”¹ Worked on inspection logic that could classify or route fish based on quality-related outputs. šŸ”¹ Designed the system for conveyor-belt usage, where images need to be processed consistently and reliably. šŸ”¹ Focused on production issues such as image quality, camera consistency, lighting variation, and model reliability. šŸ”¹ Helped convert visual inspection from a manual/rule-based workflow into an AI-supported inspection pipeline. šŸ”¹ Built the system to reduce manual inspection effort and improve production workflow efficiency. Shelfr.ai (http://Shelfr.ai) - Retail Automation Platform I developed AI image solutions for retail automation and execution. The system handled large-scale product detection across 10,575+ SKUs, price tag detection, shelf and display type detection, and gap detection for empty shelf spaces. šŸ”¹ Developed large-scale AI image solutions for retail automation and execution. šŸ”¹ Worked on product detection across 10,575+ SKUs, where each SKU represented a unique product. šŸ”¹ Built object detection workflows to identify products from retail shelf images. šŸ”¹ Developed price tag detection to locate and extract price label areas from store images. šŸ”¹ Worked on shelf and display type detection to understand the retail environment layout. šŸ”¹ Built gap detection logic to identify empty shelf spaces and out-of-stock areas. šŸ”¹ Supported computer vision workflows for retail compliance, shelf monitoring, and store execution. šŸ”¹ Worked with high-volume image data and production-level inference requirements. šŸ”¹ Managed high-load production servers on Google Cloud Platform. šŸ”¹ Implemented load balancing and autoscaling to improve system stability under production traffic. šŸ”¹ Focused on scalable AI infrastructure capable of handling real-world retail image workloads. šŸ”¹ Helped create AI systems for inventory visibility, shelf condition monitoring, and retail execution analytics. lake-shield.com (http://lake-shield.com) - USA LAKES - Boat Detection & Inspection System šŸ”¹ Worked on a YOLO-based boat detection, tracking, and monitoring system. šŸ”¹ Labeled datasets for boat detection and inspection model training. šŸ”¹ Prepared image/video data for object detection training workflows. šŸ”¹ Trained YOLO object detection models to detect boats in monitoring footage. šŸ”¹ Built a detection pipeline capable of identifying boats from visual data. šŸ”¹ Worked on boat tracking logic to monitor boat movement across frames. šŸ”¹ Supported inspection and monitoring workflows using computer vision predictions. šŸ”¹ Developed an end-to-end pipeline from labeled data to trained model and inference output. šŸ”¹ Focused on practical model performance in outdoor environments where lighting, distance, angle, and background can vary. šŸ”¹ Helped build a monitoring system that could support automated detection and review instead of fully manual observation. My Responsibilities Across These Projects šŸ”¹ Led AI/computer vision system development šŸ”¹ Designed labeling and dataset preparation workflows šŸ”¹ Trained YOLO/object detection models šŸ”¹ Trained segmentation models where needed šŸ”¹ Built image and video inference pipelines šŸ”¹ Evaluated models using practical production metrics šŸ”¹ Improved model performance through dataset cleanup, retraining, and threshold tuning šŸ”¹ Integrated AI models into backend or operational workflows šŸ”¹ Supported production deployment and infrastructure optimization šŸ”¹ Worked with real-world constraints such as lighting, camera angle, image quality, latency, and false detection rates Technologies Used šŸ”¹ Python šŸ”¹ YOLO / YOLOv8 šŸ”¹ Object Detection šŸ”¹ Image Segmentation šŸ”¹ OpenCV šŸ”¹ PyTorch šŸ”¹ FastAPI šŸ”¹ Google Cloud Platform šŸ”¹ Linux Servers šŸ”¹ Load Balancing šŸ”¹ Autoscaling šŸ”¹ Custom Data Labeling Workflows šŸ”¹ Model Training šŸ”¹ Model Evaluation šŸ”¹ Inference Pipeline Development šŸ”¹ Production AI Deployment
1
41
Cover image for French Legal AI Assistant &
French Legal AI Assistant & Agentic RAG System Overview I designed, built, and deployed a specialized Legal AI Assistant for French lawyers using agentic RAG, legal data pipelines, vector search, reranking, open-source LLMs, and citation-grounded answer generation. The system allowed lawyers to ask legal questions and receive answers grounded in French law articles, legal references, and relevant judicial cases. Problem / Challenge Legal data is very different from normal document data. A generic RAG pipeline using fixed-size chunks often breaks legal meaning, misses important context, or retrieves incomplete references. The main challenges were: šŸ”¹ Legal documents had different structures and lengths šŸ”¹ Articles and laws could not be randomly split into fixed-size chunks šŸ”¹ Each answer needed traceable legal references šŸ”¹ Retrieval had to understand legal scope, not just semantic similarity šŸ”¹ The system needed to reduce hallucinations for legal users šŸ”¹ Deployment had to respect privacy and regulatory requirements My Expertise I worked as the Lead AI Engineer / Agentic RAG Developer responsible for the complete system design and implementation. My responsibilities included: šŸ”¹ Legal data pipeline architecture šŸ”¹ Document parsing and preprocessing šŸ”¹ Custom legal chunking strategy šŸ”¹ Vector database design šŸ”¹ Agentic RAG workflow development šŸ”¹ Retrieval optimization and reranking šŸ”¹ Open-source LLM deployment šŸ”¹ Backend API development with FastAPI šŸ”¹ Secure Azure cloud deployment šŸ”¹ Multi-tenant system support French Legal Data Engineering Pipeline I built an automated ETL pipeline to process thousands of French legal documents, articles, and judicial cases. The pipeline handled: šŸ”¹ Raw legal document ingestion šŸ”¹ Text cleaning and normalization šŸ”¹ Legal article extraction šŸ”¹ Section-aware document structuring šŸ”¹ Custom chunk generation šŸ”¹ Metadata extraction for article number, article title, section, source, and reference šŸ”¹ Embedding generation šŸ”¹ Vector database ingestion šŸ”¹ Repeatable updates for future legal data expansion The chunking strategy was designed so legal articles were not cut in the middle or separated from their meaning. Agentic RAG Workflow Instead of using a simple one-step vector search, I built a LangGraph-based agentic RAG workflow. The workflow included: šŸ”¹ User query understanding šŸ”¹ Legal intent detection šŸ”¹ Legal domain and scope identification šŸ”¹ Generation of 2–5 targeted legal search queries šŸ”¹ Retrieval of relevant chunks for each query šŸ”¹ Deduplication of repeated results šŸ”¹ Reranking of retrieved legal evidence šŸ”¹ Source-grounded answer generation This improved tested retrieval accuracy from around 50% to 95%+. Retrieval, Citations & Case Law The retrieval system was designed to make answers transparent and verifiable. I implemented: šŸ”¹ Vector search for semantic legal retrieval šŸ”¹ Reranking to improve relevance šŸ”¹ Metadata-based source traceability šŸ”¹ Citation-backed answer generation šŸ”¹ Article-level legal references šŸ”¹ Typesense-based retrieval for French judicial cases šŸ”¹ Supporting case law returned with legal answers This allowed lawyers to verify the exact legal source behind each generated response. Open-Source LLM & Cloud Deployment I evaluated and deployed open-source LLM infrastructure for private legal AI usage. The deployment included: šŸ”¹ Qwen2.5:14B for French legal reasoning šŸ”¹ Ollama and vLLM for model serving šŸ”¹ Embedding and reranker models on a private Azure GPU VM šŸ”¹ NVIDIA T4 16GB GPU optimization šŸ”¹ Python/FastAPI backend APIs šŸ”¹ Secure Azure deployment in the France region šŸ”¹ Multi-tenant isolated access šŸ”¹ GitHub CI/CD and Linux server management The system was designed for privacy, reliability, and regulatory compliance. Technologies Used šŸ”¹ Python šŸ”¹ FastAPI šŸ”¹ LangChain šŸ”¹ LangGraph šŸ”¹ LangSmith šŸ”¹ Ollama šŸ”¹ vLLM šŸ”¹ Qwen2.5:14B šŸ”¹ ChromaDB šŸ”¹ Typesense šŸ”¹ Vector Databases šŸ”¹ Reranking Models šŸ”¹ Embedding Models šŸ”¹ Azure Cloud šŸ”¹ Linux šŸ”¹ GitHub CI/CD Impact šŸ”¹ Built a production-ready legal AI assistant for lawyers šŸ”¹ Improved retrieval accuracy from ~50% to 95%+ in tested scenarios šŸ”¹ Reduced hallucinations through citation-grounded generation šŸ”¹ Enabled lawyers to verify answers using article and case references šŸ”¹ Created a scalable legal data pipeline for thousands of documents šŸ”¹ Deployed private open-source LLM infrastructure for legal compliance šŸ”¹ Delivered a strong foundation for future legal AI workflows
0
48