Coding Agents As An Interface To The Codebase

2026-01-11

Attack Dogs

I mentioned previously that coding agents kind of suck for lots of people. As of January 2026, coding agents lack the long-horizon skills needed to produce effective codebases independently.

However, it’s clear to anyone who has used modern coding models - Claude Opus 4.5, GPT 5.2-Codex, hell even GLM 4.7 (open source) - that they are smart, knowledgeable, agentic, and tenacious in a way that is almost uncanny.

Setting Claude Code on a problem with --dangerously-skip-permissions feels like letting an attack dog off the leash. It sprints straight at the problem and attacks it with the terrible certainty of something that has never known hesitation, all the violence of its training distilled into pure forward motion.

Which is fine as long as there isn’t a fence in the way.

Rather than expecting the attack dog to catch the perp, cuff him, bring him in, and file the relevant papers independently - we can repurpose its knowledge, power, and tenacity as an extension of our own will. The interface of Saying What You Want combines with the utility of the model to present a new view on a codebase.

Codebase Interfaces

The most common interface to a codebase is a text editor. VSCode, Notepad++, IDEA, Vim, etc. You select the file you want to read and it presents a window of text which you can scroll and edit by interacting with your keyboard and mouse to add/remove characters. Maybe it has some functions like find/replace, find symbol references, rename symbol, git integration, DB querying, test runner, build automation, etc.

Text editors are pretty good. The majority of all code ever produced prior to 2025 went through a text editor. Code generation exists, but it’s really more of an amplifier for text editor-produced code. Visual programming interfaces exist, but no one likes them because they suck (okay some people like them, sorry Scratch).

Text editors give you one view of the code. A very low-level, raw view of the code. Like reading SELECT * FROM table output. You can read the functions, classes, variables, etc. and produce a model at a higher level of abstraction (see Object Oriented Programming, Domain Driven Design, etc.). Then, you make changes at that higher level of abstraction, and translate them back down to key presses in your text editor.

Coding agents can give you a view of a codebase that is already on that higher level of abstraction. You can say:

What does the data flow through the system look like for the interaction in this log file?

And get back an accurate diagram of the data flow structure of the system. Then you can say:

This is unnecessarily chatty. All the data used for this operation is available in the first request, is there another use-case for this flow that makes the extra requests necessary?

And get back a correct answer. Then:

Okay, let’s simplify the client-server interaction model here so that the client only sends one request. Produce a plan for the implementation.

And the plan will be wrong. Probably. Sometimes you get lucky and it’s right. But that’s okay, you’re a skilled engineer who’s been tapping keys on a keyboard to manually add/remove individual characters from codebases for years. You can read a PR and tell if it actually accomplishes what is says it does, right? There’s definitely a new skill to be learned here, which senior engineers with experience reviewing junior PRs already have a head start on. Worst case scenario, you break the plan down into small parts and go over them individually with the agent.

But hey, don’t despair, in a couple of years the models will probably have improved enough to get the plans right first try, too.

Operating at a higher level of abstraction like this has a number of benefits. Subjectively, I find that: - It’s easier stay focused on high-level objectives without going down implementation detail rabbit holes. - Stamina is improved many times over. My brain gets tired after maybe 3-4 hours of coding via a text editor, but at a higher level of abstraction I can go all day. - You can build mental models that would be so time consuming to build manually that the utility-per-unit-of-effort tradeoff is not worth it with a text editor interface. - You can confirm/disprove notions that you’re not 100% sure on with very little effort. - You can try out and evaluate multiple designs very quickly. - Debugging is 100x easier. - You can perform refactors that would take far too much effort to be practical with a text editor easily.

To finish, a few (edited for ease of understanding) prompts from my recent history to give some concrete ideas on how this can be used:

Let’s have {component A} take ownership of the streamed content entirely. So the callback triggers on each new content piece coming in, with just that content piece (not the whole accumulated message). {component A} would then let the broadcaster know on each callback. What would an architecture like that look like? What would have to change? How does this affect the rest of the system?

Let’s do an analysis of which repos take {format A} models and which take {format B}.

So, where/how do we create the repos in this design? What does the model of the session/DB connection management look like?

Please compare the compaction mechanism from this repo with the one in {other project}. What are the key differences?

What is the performance overhead like for spawning 2 tasks for every stream event sent? Let’s say we’re streaming 10 events per second per worker, with 10 workers streaming at the same time, that’s 200 new events per second on an instance. Do a quick benchmark.

Pick one failure and use the production logs to piece together what happened. If you can’t figure it out, then determine which extra logging would have helped you figure it out.

- omegastick