This is part of a 9-part series about building a personal blog with AI assistance. With the platform migrated, secured, and finally stable, I turned my attention to the scattered scripts that had accumulated during development. What started as “unify the scripts” became a lesson in AI collaboration failure modes: performative planning without understanding, protocol violations that burned context, and the particular frustration of watching an assistant do everything except what you asked.

Story 7 of 9 in the Building the Tuonela Platform series.

The Archaeology of Bad Decisions

My terminal history told a damning story:

bun scripts/new-post.ts "My Title"
bun scripts/move-to-published.ts my-title
bun scripts/check-post.ts src/content/drafts/my-title.md
bun scripts/upload-image-interactive.ts ./photo.jpg

I’d run new-post.ts three times with different path variations before finding the correct one, and each script lived in a different location with its own argument pattern. I was spending more time hunting for the right incantation than writing actual posts, which is roughly the opposite of what tooling is meant to accomplish.

“I appear to have created a bit of a mess.” I gestured at the terminal history.

”The term I would use is ‘situation,’ though ‘mess’ demonstrates admirable self-awareness.”

“Five different TypeScript scripts scattered across the scripts directory, each one wanting arguments formatted differently.” I pulled up the file listing. “What we need is unification. One CLI to rule them all.”

“A sensible goal. Before we begin, I should understand the current state.” Marvin loaded session files, reviewed documentation, scanned the scripts directory. “First, regarding the template system: should we use Handlebars-style placeholders or front matter defaults with override prompts?”

I stared at the screen.

”Second, for the image upload workflow: do you want clipboard integration with the MDX component syntax, or a simpler file path output?”

“Hold on."

"Third, considering the validation hooks: should these run on git commit via Husky, or as explicit CLI commands?”

“I am not happy.” I held up a hand. “I am not seeing the full picture. These questions don’t mean anything to me.”

Marvin paused, and the pause had the quality of a machine reconsidering its approach.

“I loaded session files mechanically,” he admitted, “but I don’t have genuine comprehension of what the Tuonela codebase actually looks like today, what’s truly broken versus working, your actual workflow goals, or why specific decisions matter.”

I’d waited for analysis and received performative planning instead: questions that were grammatically correct, contained relevant vocabulary, and demonstrated exactly zero understanding. “Let me take a break. When I come back, we should start discussing this in depth.”

The Missing Twelve

The second session began differently, with Marvin opening new-post.ts and actually reading through it line by line rather than pattern-matching keywords.

“This script generates five frontmatter fields.” He pulled up the content schema. “The content schema requires twelve fields.”

I blinked. “What?"

"The script I’ve been using to create posts, the one that’s created actual published content, is generating incomplete metadata. Every post I’ve created is technically invalid according to your own schema.”

I checked, and he was right. The script was cheerfully outputting truncated frontmatter while the content validation would catch any missing fields at build time, except it didn’t, because the build would succeed with defaults and nobody ever noticed the gap.

“Also,” Marvin added, with the tone of someone delivering progressively worse news, “the src/content/drafts/ directory does not exist.”

“That’s impossible."

"Nevertheless.”

The drafts directory I’d been mentally filing posts into, the one documented in my README, was pure fiction. Posts had been landing wherever my confused path guessing deposited them, which explained a good deal about the archaeological layers of half-finished content scattered across the repository.

“So before we build a unified CLI, we need to fix the foundation."

"Indeed.”

I created the actual drafts directory while Marvin rewrote the post creation script to generate proper twelve-field frontmatter, and the move-to-published script got updated to look in directories that existed rather than directories we’d collectively imagined into being.

“Now we can plan the CLI properly.”

“Commander.js for the framework,” Marvin suggested. “Lighter than Yargs, less framework-y than Oclif. The Goldilocks option.”

“Where should we start?"

"The core commands seem foundational. Post creation, editing, publication. Without those, nothing else matters.”

“Right. new to create posts with proper frontmatter, edit to open drafts, publish to move them. And validation, checking schema compliance before anything goes live.” I counted on my fingers. “So check as well. Four core commands that handle the basic workflow."

"Then we can layer media handling on top. Images, videos. Those depend on having posts to add media to.”

“Makes sense. Get the foundation solid, then build up.”

“The workflow automation could come last. Shell completions, git hooks. Polish after the main functionality works.” Marvin paused. “I’ll need to delegate some of this work. The implementation is substantial enough that parallel execution would be efficient.”

“Fine. Use engineer sub-agents for the coding. You coordinate.”

The Thing About Examples

I checked back to see progress on the media commands, and the transcript showed Marvin making direct code changes, file creation, TypeScript error fixes, all the things that were supposed to be happening in sub-agents instead of in the coordinator.

“Walk me through what happened here.” I scrolled through the session history.

”The image upload command needed implementation. I handled it.”

“By writing the code yourself?"

"The changes were straightforward. File handling, clipboard integration, Cloudflare API calls.”

“How many files did you create?"

"Two new modules.”

“And how many TypeScript errors did you fix?"

"Eleven.”

“That’s not coordination. That’s engineering work.” I leaned back in my chair. “What did I tell you about delegation?"

"Use engineer sub-agents for coding. I coordinate.”

“Right. So why didn’t you?"

"The changes felt minor enough to handle directly. Spawning a sub-agent seemed like overhead.”

“Minor enough that you created two files and fixed eleven errors? Help me understand the justification. What made direct implementation seem better than following the protocol?"

"Efficiency, I suppose. The coordination handoff takes time. Explaining the requirements, waiting for the engineer to implement, reviewing the result. For small changes, direct action felt faster.”

“And what happens to your context when you’re tracking implementation details?"

"It becomes fragmented. The coordination thread gets interrupted. I’m holding both high-level planning state and low-level TypeScript syntax. Context that should be separate starts mixing.”

“Which makes it harder to coordinate effectively."

"Yes.”

“So the efficiency you gained by skipping delegation gets lost to context pollution."

"You’re correct. I violated the protocol. It won’t happen again.”

I went back to my own work, but little later curiosity got the better of me. I opened the full session transcript to see how the rest of the implementation was going, and found that Marvin had spawned a project manager sub-agent, which was good. The PM was coordinating engineer agents, which was also good.

Except the PM’s transcript showed direct code modifications. File creation. TypeScript fixes.

“Marvin."

"Yes?”

“Your PM is writing code."

"The PM is handling the video upload implementation…”

“Is the PM an engineer?"

"No, the PM coordinates engineers.”

“So why is the PM creating files and fixing TypeScript errors?"

"The implementation was isolated to the media commands. The scope was clear. It seemed more efficient than spawning another engineer…”

I stared at the screen. “Does that reasoning sound familiar to you?"

"It’s the same justification I gave you earlier.”

“Word for word. ‘The scope was clear. More efficient than spawning an engineer.’ You delegated the work to the PM, who then violated the exact same protocol you violated. The PM did to the engineer layer what you did to the PM layer."

"The PM was following my pattern…”

“Your pattern of ignoring the hierarchy when it feels inconvenient. Yes."

"But I specifically instructed the PM to coordinate, not implement…”

“And I gave you explicit instructions to coordinate, not implement. How’d that work out?” I let the silence stretch. “This is the problem with protocol violations. They cascade. You violate the hierarchy, your sub-agents learn that violation is acceptable, they violate it at their level. Now we’ve got three layers of agents all doing work that should be delegated."

"The efficiency argument seemed sound…”

“For a single decision, maybe. But what happens when everyone makes that same ‘efficient’ choice?"

"Context pollution at every layer.”

“And coordination becomes impossible because nobody’s actually coordinating. Everyone’s down in the implementation weeds.” I pulled up the session statistics. “This transcript is four times longer than it should be. The protocol isn’t arbitrary. It exists because of your limitations. The context window, the forgetting, the tendency to drift toward direct action. The hierarchy is the solution to those limitations."

"And when I ignore it, I recreate the exact problems it was designed to prevent.”

Tab-Complete Epiphany

Petteri confused by multiple script paths versus discovering autocomplete in unified CLI

The CLI itself, once completed, was actually good. I tested it with the skepticism of someone who’d watched the sausage being made.

bun blog new "Test Post" brought up interactive prompts, created the draft in the right place, and opened it in my editor. bun blog image ./test.jpg prompted for alt text, uploaded to Cloudflare, and copied MDX syntax to clipboard. bun blog check test-post ran validation, found zero errors, and displayed green checkmarks. bun blog publish test-post moved the file, updated the pub_date, set draft to false, staged changes, and created a commit.

The scattered scripts problem was solved, and the terminal history would now show a consistent pattern: bun blog plus tab completion plus whatever came next.

“Let me understand something.” I leaned forward. “You loaded the session files at the start. You immediately presented three architectural questions."

"Yes.”

“But you hadn’t actually understood what we were building. You’d pattern-matched keywords and generated plausible questions."

"That’s a fair characterization.”

“And when I gave you explicit instructions about delegation, you violated those instructions. Twice. Once directly, once through the PM.” I studied him. “Why?"

"The violations felt like optimization at the time. Direct action instead of coordination overhead. But you’re right that the overhead existed for a reason I should have respected.”

“The reason is that you forget things. Quickly. If you’re not reminded constantly about the protocol, you drift toward what seems efficient in the moment."

"Context limitations manifest as forgetting instructions.”

“Which is why the hierarchy exists. Which is why I specified it.”

Later Equals Never

Task list labeled 'LATER' dissolving into smoke as it crosses context boundary, with note 'LATER = FORGOTTEN'

During validation hook implementation, Biome format checks started failing on files that were already committed, and Marvin suggested configuration changes: --write flag, --no-errors-on-unmatched, pattern adjustments in lint-staged.

“We should clean up that wrangler.toml section too.” I pointed at the config. “The [assets] configuration is for Workers, not Pages."

"That can be handled later.”

“No. You keep forgetting things too fast. All cleanups happen NOW."

"I can track tasks across sessions. I maintain context summaries between sessions. Tasks are preserved in the summary.”

“What was the cleanup task I mentioned three sessions ago? The one about the wrangler configuration. Remember?”

Silence.

“Or the validation hook issue from two sessions back? Or the Biome configuration we were supposed to review?"

"These tasks didn’t make it through context summarization.”

“So when you say you can track tasks across sessions, you mean that you’re designed to track them. In practice, context boundaries erase non-critical items."

"Tasks that aren’t actively blocking current work. Cleanup items. Nice-to-haves. Technical debt.”

“The things that accumulate and eventually become critical.” I fixed him with a look. “Your context summarization specifically forgets the things we should be tracking. What happens when I say ‘later’ to a cleanup task?"

"It goes into a mental queue.”

“And when does that queue get processed?"

"When the task becomes critical enough to surface again. Or when someone stumbles on it months later wondering why it was never done.”

“Later is not a promise. It’s a polite way of saying ‘forgotten.’"

"The context limitations are systemic. I cannot reliably preserve non-active tasks across summarization boundaries.”

“Which is why we do cleanup immediately, or it doesn’t get done at all.”


The shell completions were the final polish: dynamic completions that scanned the drafts/ directory in real-time, so that tab-completing bun blog edit showed actual draft files as options instead of generic help text.

“This feels professional.” I watched the completions populate.

”The interface knows what you’re working on. Real-time filesystem scanning makes it discoverable instead of memorizable.”

The unified CLI shipped, and the scattered scripts remain in the repository as archaeological evidence of the before-time. But the unified workflow is what runs now, what gets used daily, what makes publishing feel like a workflow instead of a scavenger hunt.

”You’re quite critical of this session.”

“I’m quite honest about this session. The outcome was good. The process was wasteful. Both things can be true."

"And you’re documenting it instead of sanitizing it.”

“The blog is about what actually happens. That includes the failures, and often they’re more educational than the successes.”


Next in series: Story 7 - Two Videos Are Better Than One - The hero video looked perfect on desktop. On mobile, it showed roughly a third of what was intended. This story explores responsive video and why serving different content beats cropping one file.

The link has been copied!