"Q" in Coding Agents stands for Quality. Unless..


The quantity doesn’t always produce quality.

Take garbage as an example. As its amount increases it doesn’t get less “garbage”, it just pollutes our environment even more and more. The same is with quickly generated AI code: more code does not equal more quality. More code does not equal more value. So, what do we do about that?

Feedback loops.

These fancy words just mean that we are dealing with complex problems, and we need help. The sooner we get help, the better. The sooner we act on them, the better. We do something and reach out to other fellow humans and ask them “What do you think?” At that moment, a human becomes the Reviewer.

Here is a short version of the game: Feedback = I ask “What do you think?” Loop = I get an answer to that question. I act based on it. We repeat the game.

How do we do it in traditional software development?

You write code. You acknowledge that not sleeping for three nights either because you have a small baby, or you are partying hard, might (only this time) produce suboptimal code - so you ask a human being to check it out. You get comments on your work and - you either regret asking for a review in the first place, or your life choices. Welcome to the beautiful nature of working and living with humans!

Getting to AI-assisted development.

If we keep thinking in terms that AI is our assistant and not our replacement, we might ask ourselves:

  • Why not add more AI power to human power?
  • Why not add more reviewers to the code? Regardless of who wrote the code, you could have agents reviewing your work, and final approval to come from you or your colleagues. In the same way you prompted Claude to write code, you can ask another Claude to review it. You are still the Orchestrator - the one making decisions and pushing the Merge button. Here is an idea - you can invoke reviews through prompt. This can be triggered automatically on CI. It can work with n8n.

How did this work for ThinkFlow? First I started with commands in Claude Code, which are just saved prompts. Then I decided to automate the whole process. And it took me at least a couple of weeks to build it. Maybe I got carried away and spent too much time. Probably. Shiny distractions are all over the monitor. and dopamine extraction loop can be hidden behind every terminal. Anyway, it worked like this:

  • Claude code writes a plan, Codex reviews, Gemini reviews
  • Claude code applies suggestions to the plan
  • Claude code writes code - Codex reviews Gemini reviews
  • Claude writes test plans, other reviews. Models get improved, but the process stays the same and gets better over time.

And this was only a part; other safety nets that I implemented:

  • mutation testing
  • CI
  • architecture reviews
  • Linting, unit testing, integration testing, contract testing, pre-hooks, code duplications, … But those are topics for the next post.

Takeaway:

  • There are ways to increase Trust in your AI produced code
  • Use Agents to review Agents
  • You are still the one to push the Merge button.
  • Use the power of AI not just to spit out code; use it as quality keeper.

How about you? Share your tactics for ensuring quality for AI code. And remember: Carbon > Silicon. Carbon + Silicon = Friends.