Capture curated article into CRM record.
When the team saves an article about a company, the CRM gets a fresh company record (deduped by website) with a synthesized note that pairs the LLM brief with the original source body — no manual CRM entry, no copy-paste.
Reference build
A teammate saves an article that's really about a company — an intro email, a fundraise announcement, a competitor teardown. A few seconds later that company exists (or is updated) in the CRM with a synthesized note attached: what the company does, why it matters, the original source body below the synthesis. The next time anyone opens that company record, the context is already there.
Like the SharePoint sibling, this runs off the article database in Module 5 of the Thematic Work playbook. The webhook fire is the action; the database row is the payload. No DB row, nothing to capture.
Model output contract
The extractor is given a single instruction: return exactly these five fields, each on its own line. The downstream JS step regexes them out by label.
| Field | Role in the CRM |
|---|---|
| COMPANY_NAME | The single company of interest the article is really about — picked by the model, not by a keyword match. This becomes the company name in the CRM. |
| WEBSITE | Company website URL. Used as the matching attribute on the company upsert so we don't create duplicates when the same company shows up across multiple articles. |
| NOTE_TITLE | Short descriptive label for the CRM note — context-rich, not generic. The prompt asks for things like 'Email Intro to Jacobus Systems from Jim Farmer' over 'Article saved'. |
| NOTE_CONTENT | The synthesized brief — what the company does, milestones, funding, risk, strategic context. Citations stripped. This is the body the team actually reads when the note pops up on the company record. |
| NOTE_EMAILBODY | Verbatim article/email body, HTML and chrome stripped. Appended below the synthesis so the original source travels with the note for anyone who wants to re-read. |
Note shape on the company record
The synthesis sits on top so the reader gets the point in two seconds; the original source lives below the rule so anyone can verify it. Both travel together — the note is self-contained.
Gotchas
- 01Same article-database contract as the SharePoint workflow. This Zap doesn't ingest from Reader / Slack / extensions directly — it expects a row to already exist (or be created in the same beat) in the firm's Notion article DB. The webhook payload only needs a title; everything else is fetched from the DB row.
- 02Plain-text contract between the model and the parser. The extractor is asked for a fixed five-line shape (COMPANY_NAME:, WEBSITE:, NOTE_TITLE:, NOTE_CONTENT:, NOTE_EMAILBODY:), and a JS regex pulls each field out. We avoid JSON mode here because the NOTE_CONTENT is long-form with newlines and the regex tolerates the model's natural formatting better than strict JSON would.
- 03Web search is on, with a 5-call budget. The article alone often doesn't give us the canonical company name or website. `web_search_preview` lets the model resolve those fields against the live web; `max_tool_calls=5` keeps the cost bounded and the latency reasonable.
- 04Match by website, not by name. Attio's 'Assert Record' uses the website field as the match key. This is the single biggest dedup decision in the whole workflow — company names drift (LLC vs Inc, parent vs subsidiary, rebrands), websites don't. If website is missing the assert can create a duplicate.
- 05Wrong company is the failure mode to watch. The model is asked to first identify the right company, but on articles that mention several it can land on the wrong one. The note goes to whichever company gets asserted — easy to spot when reviewing the CRM, hard to spot in aggregate. Spot-check during the first week per source.
- 06Note carries the original body, not just the summary. The Attio note body is `Summary:\n\n{NOTE_CONTENT}\n\n---\n\nSubject / Title:\n{title}\n\nBody:\n{markdown}`. That way anyone reading the note can verify the synthesis against the source without leaving Attio.
- 07No reuse of an existing note on the company. Every run creates a new note, even if the same article was captured before. Dedup at the note layer would require a search-then-update step we don't run today — the cost is some duplicate notes on companies that show up repeatedly in the same week.
- 08GPT-5 in the Responses API, not Chat Completions. The Zap step is OpenAI's `conversation_responses_api`, not the older chat endpoint — that's what unlocks `web_search_preview` as a first-class tool, the reasoning_effort knob, and parallel tool calls.
- 09Cost lives in the model step. Article body + web search + a long NOTE_CONTENT response with reasoning=medium is the expensive part of this flow. Everything else is cheap CRUD. If volume jumps, the lever is the model (drop to a cheaper reasoning model or cap NOTE_CONTENT length), not the orchestrator.