AI-Powered Lead Enrichment System by Patricia OsorioAI-Powered Lead Enrichment System by Patricia Osorio

AI-Powered Lead Enrichment System

Patricia Osorio

Patricia Osorio

Lead Enrichment System

AI powered lead enrichment and qualification system using LLM extraction, configurable scoring, and intelligent routing.

Overview

This system processes inbound sales leads by:
Extracting structured data from raw lead notes using LLM
Scoring leads based criteria
Routing leads to sales or marketing based on priority threshold.
Built for a technical challenge.

Quick Start


Input: data/leads.json Output: output/enriched_leads.json

How It Works


LLM Extraction (OpenAI structured outputs)
Industry: Cybersecurity/Fintech/AI/Retail/Other...
Size: Employee count (null if not mentioned)
Intent: What they're looking for
Scoring (0-100 points)
Industry: Cybersecurity/AI (+50), Fintech (+25)
Size: 100+ employees (+25), 10-100 (+10)
Content: Meaningful intent (+10)
Routing
Score ≥ 70 → Sales (Priority)
Score < 70 → Marketing (Nurture)

Design Decisions

1. OpenAI Structured Outputs (No LangChain)

Why: Single-pipeline task doesn't need LangChain's orchestration features. OpenAI's native structured outputs + Pydantic gives us schema enforcement with zero abstraction overhead.
When I'd use LangChain: Multi-step agents, tool routing, RAG systems, complex chains.

2. ThreadPoolExecutor (No Async)

Why: Simpler than async, works with sync openi client, perfect for IObound API calls. Processes 100 leads in aprox 20 seconds vs 200 seconds synchronously.
When I'd use async: 1000+ concurrent requests, websockets, or integration with async frameworks.

3. Tenacity for Retries

Why: standard retry library. Handles transient failures (rate limits, timeouts) with exponential backoff.
Alternative: Manual retry loops (+++ lines of boilerplate).

4. Fallback Enrichment

Why: The challenge requires resilience when LLM fails: rule based keyword matching ensures pipeline always produces output, even without API key.
Tradeoff: Lower accuracy than llm, but prevents total failure, the fallback also has limitations but ensures the pipeline never completely fails. (arund 70% accuracy vs LLM's 90%, but prevents total system failure.)

Production Considerations

Not implemented (out of the hrs scope):
Async processing: Celery + Redis for 10k+ leads/hour
Cost optimization: Cache by domain (70% reduction), use batch API
Monitoring: LangSmith for prompt debugging, Datadog metrics
CRM integration: Webhooks to crm
Score explainability: Breakdown by rule
Architecture: fastapi

Project Structure


This Readme has been enriched by the AI.
Like this project

Posted Apr 10, 2026

Developed AI lead enrichment and routing system using LLM.