Skip to content
Skip to main content

Sync email contacts to a CRM

CRM data goes stale because nobody wants to manually enter contacts. The boring fix is automation: every week, walk the team’s recent email, find new external addresses, enrich what we know, and push them into the CRM. This recipe is the generic version of that pipeline — you can plug in Salesforce, HubSpot, or Pipedrive at the end without changing the extraction logic.

The whole thing runs as a scheduled job and replaces ~$150–450/month in Zapier credits.

nylas email list ─▶ dedupe by address ─▶ enrich (signatures, domain) ─▶ map to CRM schema ─▶ push (REST or CSV)

Five steps. Each one is a few lines of code on its own and the failure modes are independent — if signature parsing breaks for one contact, the rest of the pipeline still runs.

nylas email list --days 30 --limit 1000 --json > emails.json
nylas contacts list --json > contacts.json

Pull both — email list gets you everyone you’ve corresponded with; contacts list gets you anyone already in your provider’s address book. Merging them deduplicates against the address book so you don’t shove duplicates into the CRM.

Strip auto-replies, internal addresses, freemail when relevant:

import json
EXCLUDE_DOMAINS = {"noreply.com", "github.com", "no-reply.example.com"}
FREEMAIL = {"gmail.com", "yahoo.com", "hotmail.com", "outlook.com", "icloud.com"}
contacts = {}
for msg in json.load(open("emails.json")):
for participant in msg["from"] + msg["to"] + msg.get("cc", []):
addr = participant["email"].lower()
domain = addr.split("@")[1]
if domain in EXCLUDE_DOMAINS:
continue
if addr.endswith("@yourcompany.com"): # internal
continue
if domain in FREEMAIL:
# Optional — depends whether your CRM wants personal addresses
continue
contacts.setdefault(addr, {
"email": addr,
"name": participant.get("name", ""),
"domain": domain,
"first_seen": msg["date"],
"msg_count": 0,
})
contacts[addr]["msg_count"] += 1

Filtering freemail is a judgment call — keep it for B2C-shaped products, drop it for B2B.

Two enrichment paths give surprisingly good coverage:

  • Signatures for the contact: title, phone, LinkedIn URL. See Parse email signatures for contact enrichment.
  • Domain DNS for the company: MX, SPF, DMARC reveal what stack the company runs on (interesting for sales prioritization).
for c in contacts.values():
msg = latest_message_from(c["email"])
sig = parse_signature(msg) # see signature-enrichment recipe
c.update(sig) # title, phone, linkedin

For new contacts (msg_count == 1), this pulls a single message; the signature is whatever’s in their first response. For repeat contacts, run the cross-referencing trick over their last 3 messages to fill more fields.

Each CRM has a different shape. The mapping is the only thing that changes between targets — everything above is identical:

  • Salesforce — Contact (person) + Account (company) + Task (the email itself). Full mapping.
  • HubSpot — Contact + Company (auto-created from domain) + Engagement. Full mapping.
  • Pipedrive — Person + Organization + Activity. Full mapping.

A trivial generic mapping for CSV-based imports:

def to_csv_row(c):
name = c["name"].split(" ", 1)
return {
"FirstName": name[0],
"LastName": name[1] if len(name) > 1 else "",
"Email": c["email"],
"Company": c["domain"],
"Title": c.get("title", ""),
"Phone": c.get("phone", ""),
}

Two flavors. CSV-and-import is the safest:

import csv
with open("contacts.csv", "w") as f:
w = csv.DictWriter(f, fieldnames=["FirstName","LastName","Email","Company","Title","Phone"])
w.writeheader()
for c in contacts.values():
w.writerow(to_csv_row(c))

Direct API is the productionized version. The provider-specific recipes above show the API calls.

Once a week is a sensible default for most teams. Cron:

0 6 * * MON /usr/bin/python3 /opt/crm-sync/run.py >> /var/log/crm-sync.log 2>&1

Daily for high-volume sales teams; monthly for low-touch B2B is fine.

  • Review before the first push. The first run always surfaces edge cases — ex-employees, mailing lists you forgot you subscribed to, vendor support@ addresses that don’t belong as Contacts. Inspect the CSV and tune your filters before pointing the script at the live CRM.
  • Idempotency. Always upsert by email. The CRM API endpoints linked above all support upsert by external ID — use it.
  • Privacy. This pulls addresses out of your team’s mailboxes. Document it in your data-handling policy and surface a way for individuals to opt out.