I had a search query routing problem and two equally plausible solutions, neither of which I was entirely certain about. What seemed like an elegant approach to agent allocation, route by keywords, was about to reveal itself as precisely backward in ways I couldn’t quite name yet.

Story 1 of 13 in the The Adaptive Research System series (Discovery & Foundation).

We thought keyword routing was clever, right up until I saw how much of the research space it couldn’t see at all.


I was building a multi-agent research system, which sounds rather more impressive than it actually was at that particular moment, since the truth of the matter was that I had a routing problem and two equally plausible solutions sitting in front of me, neither of which I was entirely certain about. The fundamental issue was straightforward enough: different AI models excel at different things, with some dominating social media through native platform access while others handle technical depth beautifully, and still others excelling at citations or bringing multimodal capabilities and context windows you could park a modest-sized aircraft carrier in. Dozens of services, each with distinct strengths, each costing money every time you invoke them.

The question wasn’t whether to use specialists, since that much was obvious, but precisely how to route research questions to the right ones without burning through API costs like a particularly enthusiastic bonfire. That’s where Marvin and I found ourselves on a Tuesday afternoon, staring at what appeared to be a perfectly sensible architecture that I was increasingly convinced was backward in ways I couldn’t quite name yet.

The Elegant Diagram

“I have prepared the comparison analysis,” Marvin announced, pulling up a spreadsheet that looked like he’d spent the better part of the morning on it. “Option A provides the most efficient routing mechanism. Query analysis, keyword extraction, domain classification, then agent allocation based on domain weightings.”

I looked at the flow diagram, which traced a clean linear path from Query to Keyword Analysis to Domain Classification to Agent Allocation, and I had to admit it was appealingly straightforward in the way that architectural diagrams often are before you try to implement them.

query analysis first, keyword extraction, domain classification, then agent allocation based on domain weightings

“Walk me through an example.” I leaned forward, because examples have a way of revealing problems that diagrams hide.

”Consider the query ai frameworks enterprise deployment. We extract keywords: ai, frameworks, deployment, enterprise. The analyzer scores this high on technical domain, moderate on business domain. We allocate depth-focused models for technical investigation and perhaps one citation specialist for business context.”

“And that routes in what, under a second?” The efficiency was genuinely impressive, even as the nagging doubt continued to nag.

”Approximately 800 milliseconds. Deterministic. Reliable.”

There was something appealing about deterministic, I had to admit, since you extract words, you map them to domains, and you get the same result every time with no LLM uncertainty and no hallucinations, just clean keyword analysis and domain mapping. Also, I was beginning to suspect, backward at its core, which was the sort of architectural revelation that tends to arrive just after you’ve fallen in love with how elegant everything looks in the diagram.

“What if the research space includes angles that don’t appear as keywords in the query?” I watched carefully for any sign that Marvin had already considered this.

”The domain weighting accounts for implicit requirements. A query containing enterprise would trigger business domain allocation even without explicit business terminology.”

“But how do you know what angles exist before you look for them?” I pressed, because this was the core of the thing, the bit that had been bothering me since I’d started staring at the flow diagram.

”I am uncertain what you mean.”

I leaned back from the screen and tried to articulate what felt wrong. We were making routing decisions based on query features instead of research requirements, which seemed rather like deciding what tools to bring on a hike before you’d checked whether you were climbing a mountain or strolling through a park. The sequence felt backward, I explained, because we were allocating agents before we knew what the research space actually looked like.

“The keyword analysis reveals the research space,” Marvin countered, though I thought I detected a note of uncertainty creeping into his voice.

“Does it?” I leaned forward. “What if I told you there’s a step we’re skipping? Generate perspectives first, see what angles actually exist, then route agents to match what we found instead of what we assumed we’d find?”

Marvin was quiet for a moment, the kind of silence that suggested complex recalculation was occurring somewhere in his architecture. Then he added a new column to the comparison table.

”Option B: perspective generation as initial layer. Query analysis produces explicit research angles, each routed to optimal agent.”

“Show me what that looks like with your enterprise AI example.” I pulled up a fresh terminal, suddenly very interested in seeing this play out with real data.

Six perspective cards versus keyword analysis showing 2 of 6 angles
”Running perspective generator now.”

I watched the analysis populate as six distinct angles emerged on screen, one after another. Technical architecture and implementation patterns appeared first, which was predictable enough for anything involving frameworks, followed by enterprise ROI and business case analysis, which made sense given the enterprise keyword. Then came market overview and vendor comparison, academic research on agent frameworks, user experience and adoption patterns, and security and compliance implications, and suddenly the screen looked very different from what keyword analysis had suggested.

I counted them twice, just to be certain. “Six distinct research angles."

"Correct.”

“How many would keyword analysis have identified?"

"Keyword analysis identifies technical and business domains based on query terms. Two domains.”

Two of Six

Two of six, I thought, and the number sat there like an indictment of everything we’d been building. Market research, UX patterns, security compliance, all of these angles matter for enterprise AI deployment, but none of them appear as explicit keywords in ai frameworks enterprise deployment, which meant keyword analysis would never see them.

“That is accurate.” The gap between two domains and six perspectives sat there, undeniable.

“So keyword analysis sees query features.” I traced the flow on my notepad, working through the implications. “Perspectives reveal the actual research space.”

“A useful distinction.” Marvin’s cursor hovered over his spreadsheet, recalculating every routing assumption we’d made. “However, perspective generation adds latency. Three to five seconds based on my estimates. Option A routes in under one second.”

“And if Option A routes wrong?” I countered. “Allocates heavy technical, misses business context, and we trigger Wave 2 to fill the coverage gaps?"

"Wave 2 correction would involve six to ten additional agents. Approximately thirty seconds latency. Significant API costs.”

We were trading three seconds upfront to avoid thirty seconds of correction later, which was the sort of math that doesn’t require an advanced degree to evaluate, though I suspected Marvin was running more complex calculations about reliability versus speed and various edge cases I hadn’t considered.

“There’s a validation problem.” I pulled up the perspective output, because it wouldn’t do to get swept up in the elegance of Option B without examining its failure modes. “Keyword analysis is deterministic. You extract words, you get the same result every time. Perspective generation depends on the LLM correctly identifying research angles. What if it hallucinates fake perspectives or misses critical ones?”

This was the sort of question that sounds paranoid until you’ve actually watched an LLM confidently describe research angles that don’t exist, complete with citations to papers that were never written.

“Hybrid validation would address this concern.” I could tell he was already sketching the architecture in his head. “The analyzer generates perspectives with domain tags. We run keyword validation on each perspective text. If LLM classification and keyword analysis agree, we proceed with high confidence. Mismatches trigger ensemble review.”

I sketched the flow myself, tracing the path from Query to Generate Perspectives to Keyword Validation to Route Each Perspective to Specialist. “So typical case: one API call for perspective generation, instant validation, full research space visibility. Edge cases get ensemble analysis only where we actually need it."

"The complexity increases.”

“But in a useful direction.” I tapped my pen against the diagram. “We’re adding layers that provide information, not just processing steps.”

“There is also the matter of Wave 2 integration.” He was warming to the architectural implications. “With perspectives as the fundamental unit of work, emergent angles can be handled through the same routing infrastructure. If security concerns emerge from initial results, we generate a security perspective and route it to the appropriate specialist. The architecture remains consistent.”

That was cleaner than I’d expected, since we wouldn’t be bolting Wave 2 corrections onto a keyword-based system but instead generating additional perspectives and routing them through machinery that already existed. “What if we generate eight perspectives but only have budget for six agents?” This was the sort of practical constraint that tends to surface after you’ve committed to an architecture, at which point it’s far more expensive to fix.

”Perspectives would include priority scores based on relevance to core query. We fund the top six, defer the bottom two as Wave 2 candidates. This produces an explicit backlog of investigated angles versus deferred angles.”

Better than discovering entirely new angles during research, I thought, since we’d know upfront which angles existed but weren’t being investigated in Wave 1.

“Let me validate the routing logic.” I pulled up the perspective list again. “Six perspectives for enterprise AI deployment. How would they map to agents?"

"Technical architecture: depth-focused model for technical rigor. Enterprise ROI: citation specialist for sourced business analysis. Market research: citation specialist for vendor analysis. Academic research: citation specialist for paper analysis. User adoption: broad-context model for wider analysis patterns. Security compliance: depth-focused model for technical rigor.”

Three citation specialists for business, market, and academic angles, two depth-focused models for technical and security rigor, and one broad-context model for user adoption patterns that benefited from wider analysis. Balanced allocation across the research space, with multiple specialists each matched to perspectives they were actually suited for.

“Compare that to Option A keyword routing. What would that allocate?"

"Hypothetically, keyword-first allocation would weight heavily toward technical domain. Multiple depth-focused instances for implementation details and architectural patterns. Minimal citation coverage. Limited broad-context analysis.”

Complete coverage of technical angles, I thought, and complete blindness to market context, business case analysis, vendor ecosystem, academic foundations, and user adoption patterns. And we wouldn’t know we were blind, which was the genuinely concerning bit.

“Quality scores would appear satisfactory,” Marvin confirmed, reading my expression. “The agents would return detailed technical analysis. There would be no signal that alternative research angles were systematically excluded.”

Not that keyword routing might miss things, but that it would miss them silently while reporting success, which is exactly the sort of failure mode that keeps you awake at night once you’ve spotted it.


I opened the architecture document and started writing.

Decision: Option B - Perspective-First Routing Architecture

Rationale: Query keyword analysis reveals surface-level signals. Perspective generation reveals actual research space, enabling intelligent specialist allocation based on research requirements instead of query text features.

Implementation: Single-model perspective generation with keyword validation. Hybrid approach catches hallucinations while maintaining speed. Typical latency: 3-5 seconds. Wave 2 integration cleaner with perspectives as fundamental work unit.

Trade-offs: Higher upfront latency versus lower Wave 2 correction costs. Better specialist allocation versus faster routing decisions. Research space visibility versus keyword simplicity.

“I shall add the technical specifications.” He was already typing. API call counts appeared, followed by cost projections, fallback logic for validation failures, and all the other implementation details that sound tedious until you’re debugging a production system at two in the morning wondering why the fallback logic doesn’t exist.

“This is joint authorship.” I watched the document take shape, his technical precision on metrics and implementation complementing my architectural reasoning on trade-offs and selection rationale. Neither of us would have produced this alone, or at least not one this thorough.

”Noted. Author attribution: Marvin and Petteri.”

We saved it to the architecture directory, and tomorrow we’d implement Option B, with the routing layer generating perspectives first, validating them, then matching specialists to research angles with actual knowledge of the terrain. It felt right, the kind of architectural decision you make once and never regret, because the alternative would have been systematically blind to two-thirds of the research space while confidently reporting excellent quality scores.

I closed my laptop. “Sometimes the slower path is the only one that actually gets you there."

"Indeed. Though in this specific instance, three to five seconds slower initially, yet potentially thirty seconds faster overall when accounting for Wave 2 correction avoidance.”

I chose not to point out that he’d just restated my observation with precise numbers, since Marvin does enjoy his specificity, and some battles simply aren’t worth fighting.


Next in series: Story 2 - Platform Coverage Blindspot - Excellent quality scores don’t mean complete coverage: a system can route perfectly, run quality agents, score highly, and still systematically miss critical research angles because it never looked at the right platforms.

The link has been copied!