AI-Powered Lead Enrichment System by Patricia OsorioAI-Powered Lead Enrichment System by Patricia Osorio

AI-Powered Lead Enrichment System

Patricia Osorio

AI Engineer

FastAPI

Google Gemini

OpenAI

Artificial Intelligence

Lead Enrichment System

AI powered lead enrichment and qualification system using LLM extraction, configurable scoring, and intelligent routing.

Overview

This system processes inbound sales leads by:

Extracting structured data from raw lead notes using LLM

Scoring leads based criteria

Routing leads to sales or marketing based on priority threshold.

Built for a technical challenge.

Quick Start

Input:

data/leads.json

Output: output/enriched_leads.json

How It Works

LLM Extraction (OpenAI structured outputs)

Industry: Cybersecurity/Fintech/AI/Retail/Other...

Size: Employee count (null if not mentioned)

Intent: What they're looking for

Scoring (0-100 points)

Industry: Cybersecurity/AI (+50), Fintech (+25)

Size: 100+ employees (+25), 10-100 (+10)

Content: Meaningful intent (+10)

Routing

Score ≥ 70 → Sales (Priority)

Score < 70 → Marketing (Nurture)

Design Decisions

1. OpenAI Structured Outputs (No LangChain)

Why: Single-pipeline task doesn't need LangChain's orchestration features. OpenAI's native structured outputs + Pydantic gives us schema enforcement with zero abstraction overhead.

When I'd use LangChain: Multi-step agents, tool routing, RAG systems, complex chains.

2. ThreadPoolExecutor (No Async)

Why: Simpler than async, works with sync openi client, perfect for IObound API calls. Processes 100 leads in aprox 20 seconds vs 200 seconds synchronously.

When I'd use async: 1000+ concurrent requests, websockets, or integration with async frameworks.

3. Tenacity for Retries

Why: standard retry library. Handles transient failures (rate limits, timeouts) with exponential backoff.

Alternative: Manual retry loops (+++ lines of boilerplate).

4. Fallback Enrichment

Why: The challenge requires resilience when LLM fails: rule based keyword matching ensures pipeline always produces output, even without API key.

Tradeoff: Lower accuracy than llm, but prevents total failure, the fallback also has limitations but ensures the pipeline never completely fails. (arund 70% accuracy vs LLM's 90%, but prevents total system failure.)

Production Considerations

Not implemented (out of the hrs scope):

Async processing: Celery + Redis for 10k+ leads/hour

Cost optimization: Cache by domain (70% reduction), use batch API

Monitoring: LangSmith for prompt debugging, Datadog metrics

CRM integration: Webhooks to crm

Score explainability: Breakdown by rule

Architecture: fastapi

Project Structure

This Readme has been enriched by the AI.

Like this project

Posted Apr 10, 2026

Developed AI lead enrichment and routing system using LLM.

Likes

Views