Case Study — 03
Building the AI agent that turns 4 hours of competitive research into 2 minutes
A personal project that started with a frustrating afternoon of manual competitor research and ended as a production-grade agentic workflow. The pm-research-agent performs autonomous multi-step research, scores competitors across five dimensions, and delivers structured briefs directly to Slack and Notion.
Background
The problem every PM has and nobody has solved well
Competitive research is one of those tasks that every PM does and everyone does slightly differently. The core loop is always the same: search for recent news, check pricing pages, read release notes, scan G2 and Trustpilot reviews, synthesize into a brief. When done properly it takes 3–4 hours per competitor.
Most PMs either skip it (bad), do it inconsistently (also bad), or delegate it to someone junior who doesn't know what to look for (arguably worse). I wanted a solution that did the work at the level I'd do it myself — with the same information-gathering instincts, the same synthesis quality, at a fraction of the time.
"The goal wasn't to generate text about competitors. It was to produce a brief I'd actually trust enough to bring into a strategy meeting."
Design Decisions
Why a fixed pipeline wouldn't work — and what adaptive loops solve
The first version I built was a fixed pipeline: a sequence of predetermined search queries, scraped results, summarization prompt, output. It produced briefs quickly, but the quality was shallow. Fixed pipelines don't know when they've found something interesting and should go deeper — they just move on to the next step.
The redesign used an adaptive research loop. The agent evaluates what it finds at each step and decides whether to continue searching in the same direction, pivot to a different angle, or move to synthesis. It can run up to 10 search iterations but often needs fewer. This is closer to how a skilled researcher actually works.
Agent Architecture — Research Loop
The scorer module evaluates each competitor across five dimensions: Market Overlap, AI Maturity, Execution Velocity, Distribution Strength, and Resource Depth. Each dimension has a weighted score and a reasoning trace — the output isn't just a number, it's an explanation.
Technical Approach
Claude API, adaptive search, and structured output — the three components that matter
agent.py — the research loop
The core loop runs iteratively: search, read, evaluate, decide. At each step the agent considers what it's learned and whether the current search direction is productive. The adaptive logic is what separates this from a script — it can abandon a low-signal thread and reorient toward a more promising one.
scorer.py — structured competitor scoring
After research, each competitor is scored across five dimensions with reasoning traces. The scores are opinionated by design — a competitor with high AI Maturity but low Distribution Strength is a different type of threat than the inverse. The model captures that distinction.
batch.py — landscape summaries
For tracking multiple competitors simultaneously, batch mode processes all of them and produces a ranked landscape summary — which competitors are accelerating, which are stagnating, and where the emerging threats are coming from.
diff.py — threat trajectory over time
Perhaps the most useful module for ongoing strategy work: diff compares briefs generated at different points in time, surfacing what has changed about a competitor's position. It turns competitive intelligence from a point-in-time snapshot into a continuous signal.
Output to Slack and Notion
Briefs are delivered where PMs actually work. Slack for immediate sharing and discussion; Notion for structured storage and linking to strategy documents. The output format is standardized to enable comparison across competitors and time periods.
What I Learned
Building tools for yourself is the fastest way to develop genuine AI product instincts
The most useful thing about this project wasn't the output — it was what I learned about designing agentic systems by being the end user. I felt the difference between adaptive loops and fixed pipelines in my own workflow before I could articulate it in a product spec.
Two specific insights I've carried into professional AI product work: first, output format matters as much as output quality — a technically correct brief that nobody reads is worthless. Second, trust degrades fast when an agent fails in unexpected ways. The diff module exists because I wanted to be able to verify the agent's claims over time, not just accept them.