Sync email contacts to a CRM

CRM data goes stale because nobody wants to manually enter contacts. The boring fix is automation: every week, walk the team’s recent email, find new external addresses, enrich what we know, and push them into the CRM. This recipe is the generic version of that pipeline — you can plug in Salesforce, HubSpot, or Pipedrive at the end without changing the extraction logic.

The whole thing runs as a scheduled job and replaces ~$150–450/month in Zapier credits.

The pipeline

nylas email list ─▶ dedupe by address ─▶ enrich (signatures, domain) ─▶ map to CRM schema ─▶ push (REST or CSV)

Five steps. Each one is a few lines of code on its own and the failure modes are independent — if signature parsing breaks for one contact, the rest of the pipeline still runs.

Pull recent senders

Pull recent mail and contacts from the Nylas CLI:

nylas email list --days 30 --limit 1000 --json > emails.json
nylas contacts list --json > contacts.json

Pull both — nylas email list gets you everyone you’ve corresponded with; nylas contacts list gets you anyone already in your provider’s address book. Merging them deduplicates against the address book so you don’t shove duplicates into the CRM.

Dedupe and filter

Strip auto-replies, internal addresses, freemail when relevant:

import json

EXCLUDE_DOMAINS = {"noreply.com", "github.com", "no-reply.example.com"}
FREEMAIL = {"gmail.com", "yahoo.com", "hotmail.com", "outlook.com", "icloud.com"}

contacts = {}
for msg in json.load(open("emails.json")):
    for participant in msg["from"] + msg["to"] + msg.get("cc", []):
        addr = participant["email"].lower()
        domain = addr.split("@")[1]

        if domain in EXCLUDE_DOMAINS:
            continue
        if addr.endswith("@yourcompany.com"):    # internal
            continue
        if domain in FREEMAIL:
            # Optional — depends whether your CRM wants personal addresses
            continue

        contacts.setdefault(addr, {
            "email":   addr,
            "name":    participant.get("name", ""),
            "domain":  domain,
            "first_seen": msg["date"],
            "msg_count": 0,
        })
        contacts[addr]["msg_count"] += 1

Filtering freemail is a judgment call — keep it for B2C-shaped products, drop it for B2B.

Enrich

Two enrichment paths give surprisingly good coverage:

Signatures for the contact: title, phone, LinkedIn URL. See Parse email signatures for contact enrichment.
Domain DNS for the company: MX, SPF, DMARC reveal what stack the company runs on (interesting for sales prioritization).

for c in contacts.values():
    msg = latest_message_from(c["email"])
    sig = parse_signature(msg)             # see signature-enrichment recipe
    c.update(sig)                          # title, phone, linkedin

For new contacts (msg_count == 1), this pulls a single message; the signature is whatever’s in their first response. For repeat contacts, run the cross-referencing trick over their last 3 messages to fill more fields.

Map to a CRM schema

Each CRM has a different shape. The mapping is the only thing that changes between targets — everything above is identical:

Salesforce — Contact (person) + Account (company) + Task (the email itself). Full mapping.
HubSpot — Contact + Company (auto-created from domain) + Engagement. Full mapping.
Pipedrive — Person + Organization + Activity. Full mapping.

A trivial generic mapping for CSV-based imports:

def to_csv_row(c):
    name = c["name"].split(" ", 1)
    return {
        "FirstName": name[0],
        "LastName":  name[1] if len(name) > 1 else "",
        "Email":     c["email"],
        "Company":   c["domain"],
        "Title":     c.get("title", ""),
        "Phone":     c.get("phone", ""),
    }

Push

Two flavors. CSV-and-import is the safest:

import csv
with open("contacts.csv", "w") as f:
    w = csv.DictWriter(f, fieldnames=["FirstName","LastName","Email","Company","Title","Phone"])
    w.writeheader()
    for c in contacts.values():
        w.writerow(to_csv_row(c))

Direct API is the productionized version. The provider-specific recipes above show the API calls.

Schedule it

Once a week is a sensible default for most teams. Cron:

0 6 * * MON  /usr/bin/python3 /opt/crm-sync/run.py >> /var/log/crm-sync.log 2>&1

Daily for high-volume sales teams; monthly for low-touch B2B is fine.

Things to know

Review before the first push. The first run always surfaces edge cases — ex-employees, mailing lists you forgot you subscribed to, vendor support@ addresses that don’t belong as Contacts. Inspect the CSV and tune your filters before pointing the script at the live CRM.
Idempotency. Always upsert by email. The CRM API endpoints linked above all support upsert by external ID — use it.
Privacy. This pulls addresses out of your team’s mailboxes. Document it in your data-handling policy and surface a way for individuals to opt out.

Next steps

Export to Salesforce
Export to HubSpot
Export to Pipedrive
Map communication patterns between organizations
Nylas CLI — installation and full command reference