Playbook 04

Thematic Work.

Develop a point of view; defend it with evidence.

Module 01

Organizing the Content

Give every thematic artifact a durable home so it compounds across the firm.

Thematic work generates a lot of raw material — market maps, industry reports, scraped research, team-flagged articles, vendor primers, call notes. Without a home, it evaporates the moment the analyst who collected it moves on. This module covers the three places that content needs to live, and the organizing principle that makes all three coherent.

Start with the thesis taxonomy

Before you decide where things go, decide what the categories are. Most firms describe themselves in shorthand — "deep tech," "vertical SaaS," "applied AI" — and that shorthand is useless as an organizing scheme. "Deep tech" is not a folder. The taxonomy you actually need is the breakdown of your thesis into the investable sub-sectors the firm is willing to write checks in. That list is what gives every other layer — your drive, your CRM views, your RAG namespaces — a coherent spine.

The how lives in the Sourcing playbook. If you have not built the taxonomy yet, start there and come back: Sourcing → Enrich the Pipeline → Thesis Taxonomy. Everything below assumes that list exists.

Three homes for thematic content

Thematic content needs to live in three places, each doing a different job. None of them substitute for the others.

1. The file drive — the durable archive

SharePoint or Google Drive, depending on the firm. This is where the long-lived artifacts go: market maps, industry reports, scraped research, team-flagged articles, exported vendor reports, and the primary documents themselves. Structure it simply — a "Thematic Work" folder at the root, and under it one folder per investable theme from your taxonomy. Resist the urge to pre-build sub-sector folders inside each theme. Themes are durable; sub-sectors evolve. Sub-folders that looked clean in Q1 become a mess by Q3.

The reason this layer matters more than people think: it is what your RAG system trains on. The firm's internal intelligence — the chat agent that answers "what do we know about industrial robotics?" — gets better in direct proportion to the corpus sitting in these folders. Save the primary documents, not just your team's summaries of them. The line between "their content" and "our content" has blurred. The firms that hoard the source material end up with the smartest assistants.

2. The collaborative workspace — the shared reading layer

Notion, or whatever your team uses for shared work. This is where articles get surfaced in near-real-time: the read someone wants partners to see this week, the running thread on a theme, the commentary that does not belong in a permanent document. The article email database lives here too — covered in detail in Module 06 · Building an Article Database. The point of this layer is distribution: getting learnings across the team while they are still hot, separate from the archive.

3. Data vendors — bring the research in

Vendors like Wokelo and AlphaSense are increasingly useful for thematic work — not just for company-level facts but for industry primers and state-of-the-market reports. The catch is that vendor portals sit behind a login wall, which means your AI cannot see any of it. The content is doing nothing for your corpus while it lives in their UI.

The discipline is to export. When a vendor produces a report worth reading, save the PDF (or the export) into the relevant theme folder on the drive. Now it is part of the firm's institutional memory and the RAG layer can reach it. See the data vendor blueprint for which providers fall in this category.

What comes next

Once the homes exist and the taxonomy is wired through them, the question becomes what to put in the folders. The next module — Market Mapping — covers how to build the living view of each theme that the rest of the firm actually consumes. From there, Modules 03–07 cover saving AI outputs as durable assets, prioritizing themes quarterly, weaponizing the pipeline, building the article database, and pulling trusted sources on a recurring cadence.