CRM Migration Python Automation HighLevel / GoHighLevel Data Engineering

HighLevel CRM
Migration at Scale
188K Records
Across 77 Businesses

One monolithic CRM. 77 distinct businesses. 700,000+ records that needed to land in exactly the right place — with zero data loss and a 32-record failure rate traceable entirely to bad source data.

188K Contacts extracted
482K Notes processed
36K Opportunities migrated
77 Business subaccounts
// OVERVIEW

One account.
77 businesses.
One impossible data problem.

A franchising group had been running all 77 of its member businesses out of a single HighLevel CRM subaccount for years. Contacts, notes, pipeline opportunities, and calendar bookings were all co-mingled — 188,000 contacts with no clean way to tell which business owned which record.

The decision was made to migrate every business onto its own isolated subaccount with its own API key, pipelines, automations, and integrations. Each business needed to receive only its own data — no cross-contamination.

This project delivered that: a complete extraction, transformation, and re-import of over 700,000 CRM records, finished with a 32-record failure rate against a dataset of that size — all traceable to bad source data.

// PROJECT AT A GLANCE
34
Custom Python scripts & modules
Built specifically for this migration, from extraction to retry logic
9
Project milestones
Setup through automation remapping and post-migration validation
1
Full-stack migration web app
FastAPI + HTMX + Alpine.js — built for future reuse on any HighLevel project
0
Calendar import failures
280 calendars created across 76 subaccounts — perfect record
// BY THE NUMBERS

What was extracted. What was imported.

EXTRACTION — SOURCE ACCOUNT
188,000Contacts extracted~376 paginated API calls
36,339Opportunities extractedAcross all pipeline stages
482,081Notes extractedDiscovered during import phase
837Calendars extractedFuzzy-matched to 76 businesses
IMPORT — 77 DESTINATION SUBACCOUNTS
83,286Contacts importedAcross 77 businesses
427,911Notes imported88.8% success rate
27,747Opportunities imported76 subaccounts
280Calendars created0 failures
RELIABILITY
32Total data quality failuresAll caused by invalid source data
15Businesses auto-recoveredVia automated retry mechanism
~770Workflow configs remapped77 accounts × ~10 workflows each
// CHALLENGES & SOLUTIONS

The hard parts nobody warns you about.

// 01

GHL API Bug: Option Fields

// PROBLEM

HighLevel's API cannot create custom fields with options (radio buttons, checkboxes, multi-select dropdowns). The API silently accepts the request but never creates the field. This is not documented anywhere.

// SOLUTION

The bug was identified, documented, and reported. The 451 option-based custom fields were created manually via the UI for each subaccount. The remaining field types were fully automated. The discovery was turned into an API reference document to prevent future projects from hitting the same wall.

// 02

Pipeline Stage ID Uniqueness

// PROBLEM

"New Lead" in Subaccount A and "New Lead" in Subaccount B have the same name but completely different UUIDs. GHL snapshots clone the workflow structure but do not remap stage IDs — leaving every imported workflow silently pointing to the wrong subaccount's pipeline stages.

// SOLUTION

During setup, every subaccount's pipeline stages were fetched via API and stored in a SQLite lookup table keyed by (subaccount, stage name). At import time, every workflow trigger, action, and branch condition was rewritten against the correct destination IDs — roughly 8 remappings per workflow, 770 total.

// 03

Transient API Failures at Scale

// PROBLEM

During large batch fetches, the source HighLevel API returned intermittent 5xx errors for 375 contacts across 2 businesses — not because the data was bad, but because the API was under load. A naive run-and-hope approach would have silently dropped those records.

// SOLUTION

Every API call wrote its outcome to a progress JSON file in real time. A dedicated retry_failed.py script read those files, identified failed contacts, and re-fetched each one individually with exponential backoff. 100% of transient failures were recovered.

// 04

OAuth Tokens Do Not Transfer

// PROBLEM

GHL snapshots clone workflow steps — including Slack and Google Sheets integrations — but the OAuth tokens embedded in those steps still reference the source account. The result: cloned workflows that look correct but will silently fail on every execution.

// SOLUTION

Each integration step was manually disconnected and reconnected in each destination subaccount, forcing a fresh OAuth handshake. This was documented as a non-automatable step and built into the project timeline as explicit manual work.

// TECHNICAL APPROACH

Eight engineering decisions that made this work.

Cursor-Based Pagination

HighLevel's contact search API uses a searchAfter cursor rather than page numbers. ~376 API calls with a 500-contact page size extracted all 188K contacts. Checkpoint files ensured no re-work on interruption.

Contact ID Translation

Every contact receives a new UUID in its destination subaccount. A SQLite mapping table (old ID → new ID) was populated during contact import, then used to rewrite all foreign-key references in opportunities and notes.

Concurrent Import Engine

ThreadPoolExecutor with 10 workers per subaccount achieved 2–3× throughput over sequential imports. Subaccounts were processed sequentially to prevent aggregate rate-limit spikes against the destination API.

Custom Field Translation

451 fields mapped using a name-based lookup table built during setup. 70–103 fields translated per contact. Option-based fields (blocked by API bug) created manually then included in automation.

Fuzzy Calendar Matching

837 source calendars assigned to 76 businesses based on name prefixes. The matching logic handled abbreviations, & vs and, multi-location variants, and punctuation differences — 280 calendars created with zero failures.

Hot Prospector Integration

When any lead reaches "Responded (Hot)" in any of the 77 pipelines, a webhook fires to Hot Prospector's power dialer API. The lead enters a live dialing queue within seconds. Remapped across all 77 subaccounts.

// CODE SPOTLIGHT

Cursor-based pagination — extracting 188K contacts

extract_contacts.py
def extract_all_contacts(api_key, location_id):
    """Extract all contacts using cursor-based pagination."""
    contacts = []
    search_after = None
    page = 0

    while True:
        payload = {
            "locationId": location_id,
            "limit": 500,
            "sort": "dateAdded",
            "sortDirection": "asc",
        }
        if search_after:
            payload["searchAfter"] = search_after

        response = requests.post(
            f"{BASE_URL}/contacts/search",
            headers={"Authorization": f"Bearer {api_key}", "Version": "2021-07-28"},
            json=payload,
        )
        data  = response.json()
        batch = data.get("contacts", [])
        if not batch:
            break

        contacts.extend(batch)
        search_after = data.get("meta", {}).get("nextPageAfter")
        page += 1
        print(f"Page {page}: {len(contacts):,} contacts fetched")

        if not search_after:
            break
        time.sleep(0.3)  # Rate limiting

    return contacts

The cursor (searchAfter) is a composite sort key returned by the API in meta.nextPageAfter. Unlike offset-based pagination, cursors stay stable even if records are added during extraction — critical when working with an actively-used CRM. The 0.3s delay and deterministic sort order prevent duplicate or skipped records across the ~376 pages required to exhaust the dataset.

// PROJECT MILESTONES

9 phases from setup to completion.

The project was structured into discrete milestones to allow safe checkpointing. Each phase was independently verifiable before the next began — ensuring that a failure in any step could not corrupt the output of a prior one.

1

Setup & Subaccount Inventory

Fetched all 77 destination subaccounts, created API key mapping, initialized SQLite schema for contact ID translation, custom field maps, and stage lookups.

2

Contact Extraction (Source)

Cursor-based extraction of all 188,000 contacts from the source account. Checkpoint files written after each page. Took ~376 paginated API calls.

3

Contact Segmentation

Parsed business-identifying tags and fields to assign each of the 188K contacts to one of the 77 destination subaccounts. Built the per-subaccount import batches.

4

Contact Import + ID Mapping

Imported 83,286 contacts using concurrent workers. Built old-ID → new-ID translation table in SQLite as each contact was created. Automated retry recovered 15 subaccounts.

5

Opportunity Extraction & Import

Extracted 36,339 opportunities. Translated contact IDs and pipeline stage IDs. Imported 27,747 opportunities across 76 subaccounts.

6

Notes Extraction & Import

482,081 notes extracted. Contact ID references rewritten. 427,911 notes imported at 88.8% success rate — all failures attributed to invalid source data.

7

Calendar Migration

837 source calendars matched to 76 businesses via fuzzy name-prefix logic. 280 calendars created across destinations with zero failures.

8

Automation Remapping

~770 workflow configurations rewritten. Pipeline stage IDs, Slack steps, Google Sheets steps, and Hot Prospector webhooks all remapped per subaccount. OAuth integrations manually reconnected.

9

Validation & Post-Migration Tooling

Record counts verified, failure reports generated. Reusable FastAPI migration web app built and documented for future client use.

// RESULTS

What the client ended up with.

77 fully independent CRM environments

Each business now operates in an isolated HighLevel subaccount with its own contacts, pipelines, automations, and calendar. Zero data leakage between businesses. Each subaccount owner can see and work with only their own leads.

Live dialer integration across all 77 businesses

The Hot Prospector webhook fires in seconds when any lead hits "Responded (Hot)" in any of the 77 pipelines. Leads enter a live dialing queue automatically — no manual routing, no CSV exports, no delays.

A reusable migration tool for the future

The migration web app (FastAPI + HTMX + SQLite + Server-Sent Events) was engineered as a generic tool — no hardcoded account IDs or API keys. Any future HighLevel migration can be started by pasting two API keys and clicking through a five-step guided workflow.

99.98%Data fidelity rate(32 failures / ~700K+ records)
100%Transient failure recoveryVia automated retry mechanism
0Calendar import failures280 created / 280 successful
2–3×Import throughput gainThreadPoolExecutor vs. sequential
// TECH STACK

Tools & technologies.

// BACKEND
Python 3.9+
FastAPI
SQLite
requests
concurrent.futures
// FRONTEND
HTMX
Alpine.js
Jinja2 Templates
Server-Sent Events
Tailwind CSS
// API / PLATFORM
HighLevel REST API
Hot Prospector API
Slack OAuth
Google Sheets OAuth
GHL Webhooks
// PATTERNS
Cursor pagination
Checkpoint / resume
Exponential backoff
ID translation tables
Concurrent workers