Contact Deduplication Automation for HighLevel CRM

Ayomide

Ayomide Ganiyu

Open Dental ↔ GoHighLevel: Contact Dedup & Clean-Up Automation (API)

The problem

HighLevel (GHL) is fantastic for funnels—but duplicated contacts wreck reporting, sequences, and ownership. Sales thinks “leads are cold,” marketing sees inflated counts, and support messages the same person twice. Manual clean-ups don’t last.

The outcome

I built a safe, repeatable automation that:
Finds and removes duplicates by (phone + name) and (email + name)—keeping the most recent record.
Backs up everything first (timestamped JSON + a CSV of duplicates for audit).
Respects rate limits, runs page-by-page with robust pagination, and returns a deletion report at the end.
What this means for you
CRMs and pipelines stay accurate (no “ghost” leads).
Sequences, attribution, and dashboards stop lying.
Team sees one clear owner per contact.
Typical results across similar installs: 80–95% reduction in duplicate contacts on the first run and clean reporting within 24–48 hours. Your mileage may vary based on data quality.

What I built (in plain English)

API-level connection test: verifies auth and endpoint health before touching data.
Safe, full fetch with pagination: pulls every contact in controlled pages, tracking unique IDs to avoid accidental double work.
Deterministic “keeper” logic: sorts by dateUpdated/dateAdded (newest wins), then flags older duplicates.
Backups & audit trail:
Timestamped full backup: ghl_contacts_backup_YYYYMMDD_HHMMSS.json
“Latest” snapshot: ghl_contacts_backup_latest.json
CSV of duplicates before any deletion (fields: id, name, firstName, lastName, email, phone, dateAdded, dateUpdated, status, source, original_id, duplicate_reason)
Careful deletion loop: deletes only flagged duplicates, pauses between requests to avoid rate limits, and logs every success/failure.
Final report: deletion_report_YYYYMMDD_HHMMSS.json with totals, the duplicate list, and a link to the CSV used.
Tech stack Python 3 • requests • CSV/JSON • HighLevel REST API (Optionally extendable with FastAPI + GitHub Actions + Google Drive/Sheets for scheduled runs and shareable reports.)

How it works (step-by-step)

Smoke test → Confirms the GHL API key & endpoint with a tiny read call.
Full pull → Fetches contacts page-by-page (limit=100, page=n), honoring locationId when needed.
Index & compare → Normalizes names/emails/phones and builds keys:
"{phone}:{name}" (preferred)
"{email}:{name}" (fallback if no phone)
Choose the keeper → Newest contact stays; older twins are tagged as duplicates.
Write backups → Full JSON + CSV export of duplicates—with original/duplicate IDs.
Delete safely → Removes only the duplicates, with short sleeps to respect rate limits.
Report → Saves totals (found/removed/failed), plus the duplicate list and CSV path.

Why clients hire me for this

Data safety first: I never delete without a fresh backup and CSV audit you can open in Excel.
Deterministic rules: No guessing. Clear keys, clear winner, predictable results.
Production-minded: Handles timeouts, unexpected responses, and rate-limits gracefully.
Extensible: From one-off cleanups to automated nightly jobs and Open Dental sync.
Like this project

Posted Sep 18, 2025

Dedup & sync GHL contacts with Open Dental. Backups + CSV audits + safe deletes. Cleaner pipelines, accurate dashboards.