Freelance Data Engineers in Lahore
Freelance Data Engineers in Lahore
Sign Up
Post a job
Sign Up
Log In
Filters
2
Projects
People
Toolshed (Data, Automation, AI Agents, Framer, Retool)
max
Lahore, Pakistan
Data, Automation, AI Agents, Framer, Retool, Bubble
$50k+
Earned
7x
Hired
5.0
Rating
53
Followers
Top
expert
+1
Follow
Message
Data, Automation, AI Agents, Framer, Retool, Bubble
0
Financial & Usage Analytics for YC-backed healthcare startup
0
11
1
Streamlining Operations for House of Sylas
1
15
4
Development of Financial Management Platform for Vergo
4
256
0
AI Agent Development & Workflow Automation for Kelly Brogan MD
0
33
Data Engineer
(3)
Follow
Message
Arslan Mehmood
Lahore, Pakistan
ML AI | Backend | Computer Vision | GenAI | LLM Agents
New to Contra
Follow
Message
ML AI | Backend | Computer Vision | GenAI | LLM Agents
0
AI-Powered PDF Data Extraction My role: AI Data Processing and Extracton Engineer Organizations often struggle to extract structured and useful information from large volumes of unstructured PDF documents. I developed a flexible AI-powered data extraction solution that allows users to define the specific entities and fields they want to retrieve. The system processes different PDF formats, identifies relevant information, and converts it into structured, usable data. The solution reduces manual document processing, improves retrieval accuracy, and can be adapted to different document types and business requirements. A working demo link is attached.
0
27
0
AI Agents & RAG Chatbots with Persistent Memory I design and build intelligent AI agents and chatbots that maintain conversation context, retrieve reliable information, and interact with external tools and APIs. Core Capabilities 🔹 Persistent conversation and long-term memory 🔹 RAG-powered answers with reduced hallucinations 🔹 Tool calling, APIs, web search, and file retrieval 🔹 Multi-agent and multi-step workflows 🔹 Integration with OpenAI, Claude, Gemini, and open-source LLMs 🔹 Vector databases including pgvector, Pinecone, Weaviate, FAISS, and ChromaDB Technologies LangGraph, LangChain, Agno, PydanticAI, Haystack, FastAPI, OpenAI, Claude, Gemini, Hugging Face, PostgreSQL, pgvector, Pinecone, and Weaviate
0
25
1
AI Vision for Retail, Industrial & Monitoring Workflows Overview I have built and deployed multiple real-world computer vision systems for industrial inspection, retail automation, and monitoring workflows. My responsibilities covered: 🔹 Dataset preparation and labeling 🔹 Object detection model training 🔹 Segmentation model training 🔹 YOLO-based detection and tracking 🔹 Image/video inference pipeline development 🔹 Model evaluation and threshold tuning 🔹 Production deployment support 🔹 Cloud server management and optimization 🔹 Building practical AI workflows for real-world operational environments Fish Quality Inspection System - lythium.cl (http://lythium.cl) I led the development of an advanced fish quality inspection solution for an industrial workflow. The system used image analysis to monitor fish quality and support automated fish sorting based on AI predictions. 🔹 Led the development of an advanced AI-powered fish quality inspection system for an industrial workflow. 🔹 Built an image analysis pipeline to monitor fish quality from production-line images. 🔹 Trained object detection models to identify fish and relevant visual quality indicators. 🔹 Trained segmentation models to support more detailed visual inspection of fish regions. 🔹 Designed the AI workflow to support automated fish sorting based on model predictions. 🔹 Worked on inspection logic that could classify or route fish based on quality-related outputs. 🔹 Designed the system for conveyor-belt usage, where images need to be processed consistently and reliably. 🔹 Focused on production issues such as image quality, camera consistency, lighting variation, and model reliability. 🔹 Helped convert visual inspection from a manual/rule-based workflow into an AI-supported inspection pipeline. 🔹 Built the system to reduce manual inspection effort and improve production workflow efficiency. Shelfr.ai (http://Shelfr.ai) - Retail Automation Platform I developed AI image solutions for retail automation and execution. The system handled large-scale product detection across 10,575+ SKUs, price tag detection, shelf and display type detection, and gap detection for empty shelf spaces. 🔹 Developed large-scale AI image solutions for retail automation and execution. 🔹 Worked on product detection across 10,575+ SKUs, where each SKU represented a unique product. 🔹 Built object detection workflows to identify products from retail shelf images. 🔹 Developed price tag detection to locate and extract price label areas from store images. 🔹 Worked on shelf and display type detection to understand the retail environment layout. 🔹 Built gap detection logic to identify empty shelf spaces and out-of-stock areas. 🔹 Supported computer vision workflows for retail compliance, shelf monitoring, and store execution. 🔹 Worked with high-volume image data and production-level inference requirements. 🔹 Managed high-load production servers on Google Cloud Platform. 🔹 Implemented load balancing and autoscaling to improve system stability under production traffic. 🔹 Focused on scalable AI infrastructure capable of handling real-world retail image workloads. 🔹 Helped create AI systems for inventory visibility, shelf condition monitoring, and retail execution analytics. lake-shield.com (http://lake-shield.com) - USA LAKES - Boat Detection & Inspection System 🔹 Worked on a YOLO-based boat detection, tracking, and monitoring system. 🔹 Labeled datasets for boat detection and inspection model training. 🔹 Prepared image/video data for object detection training workflows. 🔹 Trained YOLO object detection models to detect boats in monitoring footage. 🔹 Built a detection pipeline capable of identifying boats from visual data. 🔹 Worked on boat tracking logic to monitor boat movement across frames. 🔹 Supported inspection and monitoring workflows using computer vision predictions. 🔹 Developed an end-to-end pipeline from labeled data to trained model and inference output. 🔹 Focused on practical model performance in outdoor environments where lighting, distance, angle, and background can vary. 🔹 Helped build a monitoring system that could support automated detection and review instead of fully manual observation. My Responsibilities Across These Projects 🔹 Led AI/computer vision system development 🔹 Designed labeling and dataset preparation workflows 🔹 Trained YOLO/object detection models 🔹 Trained segmentation models where needed 🔹 Built image and video inference pipelines 🔹 Evaluated models using practical production metrics 🔹 Improved model performance through dataset cleanup, retraining, and threshold tuning 🔹 Integrated AI models into backend or operational workflows 🔹 Supported production deployment and infrastructure optimization 🔹 Worked with real-world constraints such as lighting, camera angle, image quality, latency, and false detection rates Technologies Used 🔹 Python 🔹 YOLO / YOLOv8 🔹 Object Detection 🔹 Image Segmentation 🔹 OpenCV 🔹 PyTorch 🔹 FastAPI 🔹 Google Cloud Platform 🔹 Linux Servers 🔹 Load Balancing 🔹 Autoscaling 🔹 Custom Data Labeling Workflows 🔹 Model Training 🔹 Model Evaluation 🔹 Inference Pipeline Development 🔹 Production AI Deployment
1
47
0
French Legal AI Assistant & Agentic RAG System Overview I designed, built, and deployed a specialized Legal AI Assistant for French lawyers using agentic RAG, legal data pipelines, vector search, reranking, open-source LLMs, and citation-grounded answer generation. The system allowed lawyers to ask legal questions and receive answers grounded in French law articles, legal references, and relevant judicial cases. Problem / Challenge Legal data is very different from normal document data. A generic RAG pipeline using fixed-size chunks often breaks legal meaning, misses important context, or retrieves incomplete references. The main challenges were: 🔹 Legal documents had different structures and lengths 🔹 Articles and laws could not be randomly split into fixed-size chunks 🔹 Each answer needed traceable legal references 🔹 Retrieval had to understand legal scope, not just semantic similarity 🔹 The system needed to reduce hallucinations for legal users 🔹 Deployment had to respect privacy and regulatory requirements My Expertise I worked as the Lead AI Engineer / Agentic RAG Developer responsible for the complete system design and implementation. My responsibilities included: 🔹 Legal data pipeline architecture 🔹 Document parsing and preprocessing 🔹 Custom legal chunking strategy 🔹 Vector database design 🔹 Agentic RAG workflow development 🔹 Retrieval optimization and reranking 🔹 Open-source LLM deployment 🔹 Backend API development with FastAPI 🔹 Secure Azure cloud deployment 🔹 Multi-tenant system support French Legal Data Engineering Pipeline I built an automated ETL pipeline to process thousands of French legal documents, articles, and judicial cases. The pipeline handled: 🔹 Raw legal document ingestion 🔹 Text cleaning and normalization 🔹 Legal article extraction 🔹 Section-aware document structuring 🔹 Custom chunk generation 🔹 Metadata extraction for article number, article title, section, source, and reference 🔹 Embedding generation 🔹 Vector database ingestion 🔹 Repeatable updates for future legal data expansion The chunking strategy was designed so legal articles were not cut in the middle or separated from their meaning. Agentic RAG Workflow Instead of using a simple one-step vector search, I built a LangGraph-based agentic RAG workflow. The workflow included: 🔹 User query understanding 🔹 Legal intent detection 🔹 Legal domain and scope identification 🔹 Generation of 2–5 targeted legal search queries 🔹 Retrieval of relevant chunks for each query 🔹 Deduplication of repeated results 🔹 Reranking of retrieved legal evidence 🔹 Source-grounded answer generation This improved tested retrieval accuracy from around 50% to 95%+. Retrieval, Citations & Case Law The retrieval system was designed to make answers transparent and verifiable. I implemented: 🔹 Vector search for semantic legal retrieval 🔹 Reranking to improve relevance 🔹 Metadata-based source traceability 🔹 Citation-backed answer generation 🔹 Article-level legal references 🔹 Typesense-based retrieval for French judicial cases 🔹 Supporting case law returned with legal answers This allowed lawyers to verify the exact legal source behind each generated response. Open-Source LLM & Cloud Deployment I evaluated and deployed open-source LLM infrastructure for private legal AI usage. The deployment included: 🔹 Qwen2.5:14B for French legal reasoning 🔹 Ollama and vLLM for model serving 🔹 Embedding and reranker models on a private Azure GPU VM 🔹 NVIDIA T4 16GB GPU optimization 🔹 Python/FastAPI backend APIs 🔹 Secure Azure deployment in the France region 🔹 Multi-tenant isolated access 🔹 GitHub CI/CD and Linux server management The system was designed for privacy, reliability, and regulatory compliance. Technologies Used 🔹 Python 🔹 FastAPI 🔹 LangChain 🔹 LangGraph 🔹 LangSmith 🔹 Ollama 🔹 vLLM 🔹 Qwen2.5:14B 🔹 ChromaDB 🔹 Typesense 🔹 Vector Databases 🔹 Reranking Models 🔹 Embedding Models 🔹 Azure Cloud 🔹 Linux 🔹 GitHub CI/CD Impact 🔹 Built a production-ready legal AI assistant for lawyers 🔹 Improved retrieval accuracy from ~50% to 95%+ in tested scenarios 🔹 Reduced hallucinations through citation-grounded generation 🔹 Enabled lawyers to verify answers using article and case references 🔹 Created a scalable legal data pipeline for thousands of documents 🔹 Deployed private open-source LLM infrastructure for legal compliance 🔹 Delivered a strong foundation for future legal AI workflows
0
54
Data Engineer
(1)
Follow
Message
Umaima Iqbal
Lahore, Pakistan
I build offline AI tools that make documents talk.
New to Contra
Follow
Message
I build offline AI tools that make documents talk.
2
AuraExtract — Intelligent Invoice & Receipt Data Extractor The extraction engine uses intelligent regex pattern matching that handles real-world invoice layouts — column-per-line PDF formats, inline tabular formats, and plain text documents. It detects 10 fields automatically and parses up to 20 line items per invoice. Supports PDF, TXT, and DOCX formats. Includes a raw text preview panel so users can verify exactly what the engine is reading. CSV export includes both the summary fields and full line items table — ready to open directly in Excel. Pure Python. Zero external dependencies beyond pypdf for PDF reading.
2
115
2
AuraSort scans any folder and automatically sorts files into named subfolders by type — Documents, Images, Videos, Audio, Code, Archives, and more. Files are renamed to clean, consistent lowercase format. Every operation is logged live on screen as it happens. Built with a Dry Run mode so users can preview exactly what will move before anything is touched. Full undo restores every file to its original location with one click. An HTML report is generated after each sort showing every file moved, every category created, and total time taken. Pure Python. Zero external libraries. Works on any machine without installation.
2
124
1
A fully offline document summarizer built in pure Python. Uses TF-IDF scoring, position weighting, and Jaccard deduplication to extract the most important sentences from any PDF, DOCX, or TXT file — each labeled with a relevance percentage. The result looks like this: [1] [100% relevance] The algorithm achieved 94% accuracy on benchmark tests. [2] [81% relevance] Training was performed on 50,000 labeled samples. [3] [67% relevance] Results were validated using 5-fold cross validation. Supports PDF, Word, and TXT files. Saves summaries to your computer. Runs completely offline. No subscriptions, no API keys, no internet required.
1
122
1
Built AuraChat v3.0 — a fully offline Document Intelligence desktop app in pure Python. Users upload any PDF, Word, or TXT file and ask questions in plain English. The system returns cited answers with confidence scores instantly. Technical highlights: — Custom NLP engine using TF-IDF scoring + hybrid token overlap analysis — 1,700× faster indexing than baseline on 500-page documents — Multi-threaded processing — UI never freezes during heavy indexing — Supports PDF, DOCX, and TXT file formats — Zero external APIs — runs completely offline on the user's machine — 23 production-grade bugs identified and resolved before delivery This is not a demo. This is production-ready software built with clean architecture, full error handling, keyboard shortcuts, chat export, source citations, and confidence indicators
1
127
Data Engineer
(1)
Follow
Message
Shahroz Naeem
Lahore, Pakistan
Data Visualization & Operations Expert 📊
Follow
Message
Data Visualization & Operations Expert 📊
0
Case Study: Building Pipelines & Visualizations for Insights
0
7
0
Automating Workflows with No-Code Solutions using Power Platform
0
6
0
Case Study: Replacing SAAS Solution for over $50k Annual Savings
0
24
0
Case Study: Built a Support Team & Process from Scratch
0
17
Data Engineer
(1)
Follow
Message
Afaq Ahmed
Lahore, Pakistan
Full Stack Dev | ASP.NET, Angular, SQL Server
Follow
Message
Full Stack Dev | ASP.NET, Angular, SQL Server
0
Digital SME Lending Platform Development
0
5
0
Motor Vehicle Registration System (MVRS) Development
0
0
0
Real-Time Vehicle eAuction Platform Development
0
2
View more →
Data Engineer
(1)
Follow
Message
Spacebar Technologies
Lahore, Pakistan
Low-Code, High Impact: Blazing-Fast, Top-Tier Internal Tools
$10k+
Earned
8x
Hired
5.0
Rating
13
Followers
Agency Partner
Follow
Message
Low-Code, High Impact: Blazing-Fast, Top-Tier Internal Tools
1
Instacoach
1
6
0
Digital-Treasury SEO Management Platform
0
19
0
Shopify Theme Redesign for Smorgasboard
0
5
0
BrainCube Logo
0
5
Data Engineer
(1)
Follow
Message
Usman Haider
Lahore, Pakistan
AI/ML & Data Solutions Engineer
New to Contra
Follow
Message
AI/ML & Data Solutions Engineer
1
Worked on an RLHF (Reinforcement Learning from Human Feedback) pipeline focused on dataset creation, data annotation, and model evaluation. My role involved designing and curating high-quality prompt datasets, reviewing AI-generated responses, and providing structured feedback based on accuracy, relevance, safety, and helpfulness. Contributed to improving model performance by ensuring consistent evaluation standards and high-quality human feedback for training alignment and refinement.
1
104
1
Trained a DreamBooth LoRA model to generate high-quality, personalized image outputs with consistent subject identity across different prompts and styles. The project involved dataset preparation, image captioning, and fine-tuning diffusion models using LoRA for efficient training and deployment. The solution enables fast generation of customized visuals while preserving subject consistency, style control, and high fidelity, suitable for creative, branding, and content generation use cases.
1
95
2
Retail Knowledge Graph In this project, we built a semantic knowledge graph tailored to the retail industry. The pipeline involved developing AI agents to transform heterogeneous data into standardized formats. Ontologies were created to represent domain knowledge accurately. Using Gemini models and LangChain, user queries were converted into Cypher queries to retrieve insights from a Neo4j database. We utilized an MCP server for orchestration and LangSmith for secure login and audit trails. This system enhances complex data exploration for non-technical users.
2
2
79
0
Student Medical Chatbot Built a chatbot to assist MBBS students in navigating medical literature. Leveraged Llama Index and fine-tuned language models to ensure accuracy. Embeddings were stored in OpenSearch, hosted on AWS. The Django backend included secure authentication and session management for a robust user experience.
0
51
Data Engineer
(2)
Follow
Message
Abu Sufyan
Lahore, Pakistan
Full-Stack Dev: Building fast React/Next.js apps & SEO.
Follow
Message
Full-Stack Dev: Building fast React/Next.js apps & SEO.
0
Severance Calculator
0
29
0
SaaS Project (tradeconvert.pro (http://tradeconvert.pro))
0
46
0
a US Freelance Net Income & Tax Calculator for the year 2026. Built on Replit, it is a web-based tool designed to help freelancers in the United States estimate their taxes and calculate their net take-home pay based on their freelance income.
0
7
0
A browser-based tool built and deployed on Replit that lets developers scaffold and build custom Replit extensions — without leaving the browser. Built entirely on Replit, demonstrating real platform-native development.
0
16
Data Engineer
(2)
Follow
Message
Explore people