This is a 13-part series about building a multi-agent research system that doesn’t drown in AI slop. It starts with what seemed like an elegant solution to routing (allocate agents based on query keywords) and ends with a system that actually works, though not before discovering that most of my clever ideas were backward in ways I couldn’t see until I tried to implement them. The journey matters, but so does the telling, because real development doesn’t follow clean narrative arcs, and pretending it does would be rather dishonest.
“You should tell this chronologically,” Marvin offered, and I looked up from the outline I’d been sketching with the expression of someone who had just been told to alphabetize his mistakes.
I blinked. “What?"
"The series. Discovery, implementation, testing, results. Clean narrative progression. Readers expect linear structure.”“Real development doesn’t work that way.” I closed the outline document. This was the sort of thing that sounds wise until you realize you’re about to spend the next hour explaining exactly what you mean by it.
”Hence the suggestion to structure the narrative.”I opened a fresh one, feeling rather like a man who has been asked to draw a map of a country he’s still exploring. “Let me explain how this actually happened.”
The Structure Nobody Tells You About
This series has four narrative arcs, each with its own particular flavor of discovery and implementation and the gap between the two that makes development what it is.
Arc 1: Discovery & Foundation (Stories 1-3, 5) is where I discover problems, build foundational concepts, realize solutions I can’t implement yet, and log them for later with the optimistic confidence of someone who believes Future Me will be both smarter and more motivated.
Arc 2: Implementation & Validation (Story 4) is where I actually build the core system and validate the approach works, which is to say, where I discover all the ways my foundational concepts were slightly wrong.
Arc 3: Refinement (Stories 6-7) is where I polish what exists and discover what’s still broken, a phase that sounds more elegant than it is.
Arc 4: Making It Actually Work (Stories 8-13) is where I finally implement all those solutions I logged in Arc 1, the ones I couldn’t build yet because I hadn’t discovered the problems they’d cause.
“You’re telling readers they won’t see solutions immediately.” The quality metrics on my dashboard flickered as if in sympathy.
“I’m telling readers this is how development actually works.”
Why Discovery Does Not Equal Implementation
Story 2: I discover the platform coverage gap, where quality metrics look excellent while missing entire platforms. The problem is obvious, and the solution is clear enough to sketch on a napkin (perspective-level platform requirements with validation), so I log it, save the architecture document, and move on with the breezy confidence of someone who hasn’t yet tried to implement it.
Story 10, eight stories later: I finally implement platform coverage enforcement, which takes five hours of TypeScript interfaces, agent prompt updates, failure mode handling, and testing, none of which was visible from the napkin sketch.
”Why the delay?”“Because I had other problems to solve first, and because designing a solution and implementing it are different beasts entirely.” I leaned back in my chair. The gap between concept and reality is one of those things you learn every single time you touch a keyboard and somehow forget before the next project.
Story 5: Vendor bias discovered during citation validation, the insight hitting with that particular clarity that comes from staring at the same three sources in every output. Three research tracks, force diversity through architecture, sketch the concept, document the approach, set it aside for later, move on to citation utilization.
Story 13, six stories later: I finally build the three-track architecture with domain classification, quality gates, and Wave 2 rebalancing, and vendor concentration drops from majority to around a third.
Six stories of productive procrastination, documented with depressing accuracy.
”You could call it iterative development.”“I could, but productive procrastination is more honest.”
What This Means For Reading
If you read this series expecting chronological implementation, you’ll be confused, asking yourself things like “Wait, didn’t he say he’d implement this later? Why is he doing it now?” and the answer is always that you’re reading Story 13, not Story 5. The arc labels tell you which phase you’re in.
Discovery & Foundation means I find problems and can’t solve them yet. Implementation & Validation means I build the core system. Refinement means I polish what exists. Making It Actually Work means I implement all those logged solutions, the ones that seemed so simple when I sketched them.
The “set it aside for later” moments are breadcrumbs, telling you that this problem returns in Arc 4. The uncomfortable conversations with Marvin are inflection points, and when I ask “Do I need to enumerate every platform?” in Story 2, that question drives the solution built in Story 10.
Why This Structure?
Because this is how it actually happened.
I discovered vendor bias while building citation validation. I realized command files were too large while debugging synthesis context overflow. I found platform coverage gaps by accident while reviewing test results, which is rather like finding a hole in the boat while you’re admiring the sunset. You work on other things, you procrastinate productively for six stories, then you finally circle back when you can’t avoid the problem anymore.
“I’m asking readers to trust that this is how development actually works.” I watched the afternoon light shift across the desk. “Problems discovered in November don’t all get solved in November. Some get logged. Some get revisited. Some take six stories to implement because you needed to build other things first."
"And the readers who want chronological order?”“Can read them numbered, one through thirteen. The arcs provide the map. The story provides the honesty.”
“This is how it happened.” I gestured at the outline still glowing on the screen. “Discovery, procrastination, implementation, repeat. Four arcs, thirteen stories, one messy, iterative, real development process."
"Not the clean version.”“Never the clean version.”
The Complete Series
Arc 1: Discovery & Foundation — Where problems emerge
- Story 1: The Routing Revelation — Keyword analysis fails spectacularly, revealing routing was backward
- Story 2: When 97/100 Means You Failed — Quality metrics look excellent while missing entire platforms
- Story 3: The Two-Wave Architecture — First exploration, then targeted specialists for what’s missing
- Story 5: When Valid Sources Are Still Wrong — Citation validation works, but valid sources are all vendors
Arc 2: Implementation & Validation — Where theory meets reality
- Story 4: The Devil Lives in the Routing Logic — Building the core system and discovering foundational assumptions were wrong
Arc 3: Refinement — Where polish reveals what’s still broken
- Story 6: When Good Enough Is Actually Good Enough — The surprising discovery that initial results were acceptable
- Story 7: When Abysmal Citation Utilization Reveals Context Overflow — Tracking what we actually use from what we collect
Arc 4: Making It Actually Work — Where logged solutions finally get implemented
- Story 8: When 30K Tokens Is Too Many — Command files too large, synthesis context overflowing
- Story 9: Selective Ensemble If Uncertain — When to use multiple models and when one is enough
- Story 10: Suggestions Without Enforcement — Turning platform suggestions into validated requirements
- Story 11: PRIMARY Tool Fails? — When your primary research tool is unavailable
- Story 12: The Silent Hang — Debugging a system that fails without error messages
- Story 13: The Vendor Bias Problem We Left Unresolved — Three-track architecture finally reduces vendor concentration
Start reading: Story 1 - The Routing Revelation — Where the journey actually begins.