CRM data goes stale because nobody wants to manually enter contacts. The boring fix is automation: every week, walk the team’s recent email, find new external addresses, enrich what we know, and push them into the CRM. This recipe is the generic version of that pipeline — you can plug in Salesforce, HubSpot, or Pipedrive at the end without changing the extraction logic.
The whole thing runs as a scheduled job and replaces ~$150–450/month in Zapier credits.
The pipeline
Section titled “The pipeline”nylas email list ─▶ dedupe by address ─▶ enrich (signatures, domain) ─▶ map to CRM schema ─▶ push (REST or CSV)Five steps. Each one is a few lines of code on its own and the failure modes are independent — if signature parsing breaks for one contact, the rest of the pipeline still runs.
Pull recent senders
Section titled “Pull recent senders”nylas email list --days 30 --limit 1000 --json > emails.jsonnylas contacts list --json > contacts.jsonPull both — email list gets you everyone you’ve corresponded with; contacts list gets you anyone already in your provider’s address book. Merging them deduplicates against the address book so you don’t shove duplicates into the CRM.
Dedupe and filter
Section titled “Dedupe and filter”Strip auto-replies, internal addresses, freemail when relevant:
import json
EXCLUDE_DOMAINS = {"noreply.com", "github.com", "no-reply.example.com"}FREEMAIL = {"gmail.com", "yahoo.com", "hotmail.com", "outlook.com", "icloud.com"}
contacts = {}for msg in json.load(open("emails.json")): for participant in msg["from"] + msg["to"] + msg.get("cc", []): addr = participant["email"].lower() domain = addr.split("@")[1]
if domain in EXCLUDE_DOMAINS: continue if addr.endswith("@yourcompany.com"): # internal continue if domain in FREEMAIL: # Optional — depends whether your CRM wants personal addresses continue
contacts.setdefault(addr, { "email": addr, "name": participant.get("name", ""), "domain": domain, "first_seen": msg["date"], "msg_count": 0, }) contacts[addr]["msg_count"] += 1Filtering freemail is a judgment call — keep it for B2C-shaped products, drop it for B2B.
Enrich
Section titled “Enrich”Two enrichment paths give surprisingly good coverage:
- Signatures for the contact: title, phone, LinkedIn URL. See Parse email signatures for contact enrichment.
- Domain DNS for the company: MX, SPF, DMARC reveal what stack the company runs on (interesting for sales prioritization).
for c in contacts.values(): msg = latest_message_from(c["email"]) sig = parse_signature(msg) # see signature-enrichment recipe c.update(sig) # title, phone, linkedinFor new contacts (msg_count == 1), this pulls a single message; the signature is whatever’s in their first response. For repeat contacts, run the cross-referencing trick over their last 3 messages to fill more fields.
Map to a CRM schema
Section titled “Map to a CRM schema”Each CRM has a different shape. The mapping is the only thing that changes between targets — everything above is identical:
- Salesforce — Contact (person) + Account (company) + Task (the email itself). Full mapping.
- HubSpot — Contact + Company (auto-created from domain) + Engagement. Full mapping.
- Pipedrive — Person + Organization + Activity. Full mapping.
A trivial generic mapping for CSV-based imports:
def to_csv_row(c): name = c["name"].split(" ", 1) return { "FirstName": name[0], "LastName": name[1] if len(name) > 1 else "", "Email": c["email"], "Company": c["domain"], "Title": c.get("title", ""), "Phone": c.get("phone", ""), }Two flavors. CSV-and-import is the safest:
import csvwith open("contacts.csv", "w") as f: w = csv.DictWriter(f, fieldnames=["FirstName","LastName","Email","Company","Title","Phone"]) w.writeheader() for c in contacts.values(): w.writerow(to_csv_row(c))Direct API is the productionized version. The provider-specific recipes above show the API calls.
Schedule it
Section titled “Schedule it”Once a week is a sensible default for most teams. Cron:
0 6 * * MON /usr/bin/python3 /opt/crm-sync/run.py >> /var/log/crm-sync.log 2>&1Daily for high-volume sales teams; monthly for low-touch B2B is fine.
Things to know
Section titled “Things to know”- Review before the first push. The first run always surfaces edge cases — ex-employees, mailing lists you forgot you subscribed to, vendor
support@addresses that don’t belong as Contacts. Inspect the CSV and tune your filters before pointing the script at the live CRM. - Idempotency. Always upsert by email. The CRM API endpoints linked above all support upsert by external ID — use it.
- Privacy. This pulls addresses out of your team’s mailboxes. Document it in your data-handling policy and surface a way for individuals to opt out.