Moving nine million records into Salesforce from a tangle of legacy systems — SharePoint, Siebel, DB2, and an IVR platform — is less a coding problem than a logistics one. The platform can ingest the volume; the hard part is doing it with referential integrity, in a window the business can tolerate, without tripping governor or API limits. Here's what I've learned doing it for real.

Map and cleanse before you move a single row

The biggest migration risks are decided before runtime: ambiguous field mappings, duplicate keys across source systems, and silent data-type mismatches. Invest early in a mapping document and a profiling pass over the source data. Garbage migrated faithfully is still garbage — and now it's in your CRM.

  • External IDs on every object, so relationships can be resolved by key instead of fragile load order.
  • A staging model — land raw data, transform, then load — rather than transforming in flight.
  • Dedup strategy up front — decide the survivorship rules before you discover the duplicates.

Use the Bulk API and load in dependency order

For millions of records, the Bulk API (parallel batches) is the only sane choice. Load parents before children and resolve relationships via external IDs so you're not querying for IDs mid-load:

# Upsert by external id — idempotent and relationship-safe
sf data upsert bulk --sobject Account \
  --external-id Legacy_Id__c --file accounts.csv --wait 30

sf data upsert bulk --sobject Case \
  --external-id Legacy_Case_Id__c --file cases.csv --wait 30
# cases.csv references Account by Account.Legacy_Id__c — no ID lookups needed
Idempotency is everything. Design every load so re-running it is safe — because at this scale, you will re-run it.

Respect the limits, defer the expensive work

Bulk loads and automation are a bad mix. Triggers, flows, and roll-up logic firing on millions of inserts will blow through limits and stretch the window for hours. The pattern:

  • Disable or gate automation during the load, then reconcile in a controlled batch afterward.
  • Batch Apex for post-load calculations and enrichment.
  • Throttle integrations to stay within API allocations across the org.

Verify with numbers, not vibes

Every migration ends with reconciliation: source counts vs. target counts, checksums on key fields, and spot-checks on referential integrity. A migration isn't done when the load finishes — it's done when the numbers match and the business signs off.

Takeaway

High-volume migration rewards preparation and idempotency over cleverness. Stage, key everything by external ID, load in order with automation parked, and reconcile relentlessly. Do that and 9M records is a Tuesday, not a crisis.

Planning a large Salesforce migration or integration? Let's map it out.