Automation · Playbook 04

Capture curated article into CRM record.

When the team saves an article about a company, the CRM gets a fresh company record (deduped by website) with a synthesized note that pairs the LLM brief with the original source body — no manual CRM entry, no copy-paste.

Thematic Work

Reference build

A teammate saves an article that's really about a company — an intro email, a fundraise announcement, a competitor teardown. A few seconds later that company exists (or is updated) in the CRM with a synthesized note attached: what the company does, why it matters, the original source body below the synthesis. The next time anyone opens that company record, the context is already there.

Like the SharePoint sibling, this runs off the article database in Module 5 of the Thematic Work playbook. The webhook fire is the action; the database row is the payload. No DB row, nothing to capture.

Flow

01 · Trigger

Catch-hook fires with the article's title

Webhook fired by the reader/save action — payload carries enough to locate the DB row

02 · Hydrate from DB

Find the row in the article database by Title

Notion search → returns title + markdown body + saved metadata

03 · Extractor model

GPT-5 (Responses API) with web search

reasoning_effort=medium, tools=[web_search_preview], max_tool_calls=5 — returns a 5-field plain-text contract

04 · Parse

JS regex pulls COMPANY_NAME · WEBSITE · NOTE_TITLE · NOTE_CONTENT · NOTE_EMAILBODY

Plain-text shape, not JSON — tolerant of the model's natural multi-line content

05 · Upsert company in CRM

Attio 'Assert Record' on companies, matching by website

Website is the dedup key. Name fills in (or is created) if there's no match.

06 · Attach note

Attio 'Create Note' on the asserted company

Body = Summary + horizontal rule + original Subject/Title + Body, so the source travels with the synthesis

07 · Stop

Company exists or got updated · note is on the record

Next person who opens that company in the CRM sees the brief without leaving the tool

Model output contract

The extractor is given a single instruction: return exactly these five fields, each on its own line. The downstream JS step regexes them out by label.

Field	Role in the CRM
COMPANY_NAME	The single company of interest the article is really about — picked by the model, not by a keyword match. This becomes the company name in the CRM.
WEBSITE	Company website URL. Used as the matching attribute on the company upsert so we don't create duplicates when the same company shows up across multiple articles.
NOTE_TITLE	Short descriptive label for the CRM note — context-rich, not generic. The prompt asks for things like 'Email Intro to Jacobus Systems from Jim Farmer' over 'Article saved'.
NOTE_CONTENT	The synthesized brief — what the company does, milestones, funding, risk, strategic context. Citations stripped. This is the body the team actually reads when the note pops up on the company record.
NOTE_EMAILBODY	Verbatim article/email body, HTML and chrome stripped. Appended below the synthesis so the original source travels with the note for anyone who wants to re-read.

Note shape on the company record

Title: {NOTE_TITLE} Summary: {NOTE_CONTENT} ------------------------------ Subject / Title: {article.title} Body: {article.markdown}

The synthesis sits on top so the reader gets the point in two seconds; the original source lives below the rule so anyone can verify it. Both travel together — the note is self-contained.

Gotchas

01Same article-database contract as the SharePoint workflow. This Zap doesn't ingest from Reader / Slack / extensions directly — it expects a row to already exist (or be created in the same beat) in the firm's Notion article DB. The webhook payload only needs a title; everything else is fetched from the DB row.
02Plain-text contract between the model and the parser. The extractor is asked for a fixed five-line shape (COMPANY_NAME:, WEBSITE:, NOTE_TITLE:, NOTE_CONTENT:, NOTE_EMAILBODY:), and a JS regex pulls each field out. We avoid JSON mode here because the NOTE_CONTENT is long-form with newlines and the regex tolerates the model's natural formatting better than strict JSON would.
03Web search is on, with a 5-call budget. The article alone often doesn't give us the canonical company name or website. `web_search_preview` lets the model resolve those fields against the live web; `max_tool_calls=5` keeps the cost bounded and the latency reasonable.
04Match by website, not by name. Attio's 'Assert Record' uses the website field as the match key. This is the single biggest dedup decision in the whole workflow — company names drift (LLC vs Inc, parent vs subsidiary, rebrands), websites don't. If website is missing the assert can create a duplicate.
05Wrong company is the failure mode to watch. The model is asked to first identify the right company, but on articles that mention several it can land on the wrong one. The note goes to whichever company gets asserted — easy to spot when reviewing the CRM, hard to spot in aggregate. Spot-check during the first week per source.
06Note carries the original body, not just the summary. The Attio note body is `Summary:\n\n{NOTE_CONTENT}\n\n---\n\nSubject / Title:\n{title}\n\nBody:\n{markdown}`. That way anyone reading the note can verify the synthesis against the source without leaving Attio.
07No reuse of an existing note on the company. Every run creates a new note, even if the same article was captured before. Dedup at the note layer would require a search-then-update step we don't run today — the cost is some duplicate notes on companies that show up repeatedly in the same week.
08GPT-5 in the Responses API, not Chat Completions. The Zap step is OpenAI's `conversation_responses_api`, not the older chat endpoint — that's what unlocks `web_search_preview` as a first-class tool, the reasoning_effort knob, and parallel tool calls.
09Cost lives in the model step. Article body + web search + a long NOTE_CONTENT response with reasoning=medium is the expensive part of this flow. Everything else is cheap CRUD. If volume jumps, the lever is the model (drop to a cheaper reasoning model or cap NOTE_CONTENT length), not the orchestrator.

Swap matrix

← Back to all automations

Reference build

Model output contract

Note shape on the company record

Gotchas

Swap matrix

See the layers and drop-in alternatives

Why this build matters