The Review-Fix-Test Pipeline — Three Agentic-Nets That Autonomously Audit, Repair, and Validate Your Codebase

The Review-Fix-Test Pipeline — Three Agentic-Nets That Autonomously Audit, Repair, and Validate Your Codebase

What if you could drop a single sentence like “review error handling across the system” into a place, walk away, and come back to find code reviewed, bugs fixed, and tests passing? This is the story of building three interconnected Agentic-Nets that form an autonomous code quality pipeline: one that reviews modules with Claude Code, one that fixes every issue found, and one that runs the test suite to verify the fixes. All connected through shared places, all running without human intervention.


The Idea: A Pipeline That Thinks

Most CI pipelines are dumb pipes. They run linters, execute tests, and report pass/fail. They don’t understand the code. They can’t decide which module needs attention, can’t reason about what’s wrong, and definitely can’t fix anything.

We wanted something different: a pipeline where every stage has intelligence. An agent that routes tasks to the right module. Claude Code that reads the codebase and produces structured findings. Another Claude Code session that takes those findings and implements the actual fixes. And a test runner that validates everything compiles and passes.

All of this orchestrated by Agentic-Nets — not by a monolithic script, but by three separate nets that communicate through shared token pools.


Architecture: Three Nets, Shared Places

The system lives in a single AgenticOS model (agentic-nets-reviewer) with one session containing three nets:

Review-Fix-Test Pipeline — Three Connected Agentic-Nets21 transitions • 25 places • 2 shared places bridge the netsNet 1: Module ReviewerNet 2: Auto-FixerNet 3: Test RunnerTaskp-taskRouteagentgwp-gatewayvaultp-vaultclip-cli8 modulesmapmapmaprunrunrunreviewoutputSHAREDMap FixagentRun Fix4h timeoutfixresultSHAREDMap TestagentRuntestbootlog

The key insight: shared places bridge the nets. p-review-output is the output of Net 1 and the input of Net 2. p-fix-result is the output of Net 2 and the input of Net 3. No message queues, no webhooks. Just shared token pools in the meta-filesystem.


Net 1: The Module Reviewer

The reviewer net has 8 parallel lanes — one per open-source module (executor, gateway, vault, blobstore, CLI, chat, deployment, monitoring). Each lane is a simple two-step pipeline:

  1. Map transition — reads a review task token, builds a Claude Code command token with the correct workingDir for that module
  2. Command transition — sends the command to the executor, which runs claude -p '...' --dangerously-skip-permissions in that module’s directory

All 8 lanes fan into a single output place: p-review-output. Whether you review the gateway or the CLI, the result lands in the same pool.

The Bootstrap Agent

But we didn’t want to manually decide which module to review. So we added a side net with an agent transition (t-route-task) that acts as a smart router. Drop a natural language task into p-task:

{"description": "Review error handling across the gateway and CLI", "depth": "quick"}

The agent reads this, reasons about which modules are affected, and creates correctly formatted review tokens in the right module places. For the example above, it routed to both p-gateway and p-cli — because error handling spans both modules. It also cleans up after itself: deleting the consumed task token and writing a bootstrap log.


Net 2: The Auto-Fixer

The fixer net reads from p-review-output — the same place the reviewer writes to. It has two transitions:

  1. Agent transition (t-map-fix) — reads the review result, extracts the findings from the batchResults JSON, determines the module’s working directory from the _transitionId metadata, and builds a Claude Code fix command. The prompt includes every specific issue found in the review.
  2. Command transition (t-run-fix) — executes the fix. This is the long-running one.

The Timeout Problem

Claude Code fix sessions can run for minutes to hours. But the executor had a hard-coded 10-minute maximum timeout (MAX_TIMEOUT_MS = 600000). Any command exceeding 10 minutes would be killed.

We made this configurable via a Spring property:

# application.properties
executor.command.max-timeout-ms=${EXECUTOR_MAX_TIMEOUT_MS:14400000}  # 4 hours

The fix command tokens now carry timeoutMs: 14400000 (4 hours), and the executor respects it. Polling continues normally on a separate thread — long-running commands don’t block the executor from handling other transitions.


Net 3: The Test Runner

The test net reads from p-fix-result — the fixer’s output. Same pattern: an agent transition builds the right test command (Java modules get ./mvnw test, TypeScript modules get npx tsup), and a command transition runs it.

One lesson learned: Java integration tests that need Docker (testcontainers) fail when Docker isn’t running. The agent’s prompt now includes -Dtest='!*IntegrationTest' for Java modules to skip those.


Live Results: Three Modules End-to-End

We ran the full pipeline on three modules. Here’s what happened:

Live Pipeline Results — Three Modules Reviewed, Fixed, and TestedModuleReview FindingsFix SummaryTestGateway103s review• Empty error bodies on 502 fan-out• No @ControllerAdvice handler• Silent JSON parse failures6 fixes across 4 files (63s)JSON error bodies, logger.debugfor parse errors, 500→502 fix10sCLI178s review• Missing process.exit(1) in 10+ cmds• 7 empty catch blocks (swallowed)• Inconsistent console.log vs loggerFixes across 11+ files (361s)Exit codes, error logging,outputInfo/outputDim migration1sVault84s review• HIGH: Path injection vulnerability• Raw exceptions exposed to clients• No token renewal, wide-open CORS8 fixes + new tests (268s)Regex path validation, genericerrors, retry logic, CORS removed5s

Gateway

Review (103s): Found 6 issues — empty error bodies on fan-out 502 responses, no @ControllerAdvice global exception handler, silent JSON parse failures, unlogged SSE exceptions.

Fix (63s): Fixed all 6 issues across 4 files. Added JSON error messages to fan-out failures, added logger.debug() for parse exceptions, changed aggregation error status from 500 to 502.

Test: PASS (10s).

CLI

Review (178s): Found issues across 6 severity categories — missing process.exit(1) in 10+ command files, 7 empty catch blocks swallowing errors, inconsistent logging.

Fix (361s): Added exit codes to all error paths, replaced empty catch blocks with proper error logging, standardized logging across 15+ files.

Test: PASS (1s) — ESM bundle builds clean.

Vault

Review (84s): Found a HIGH severity path injection vulnerability — direct string concatenation in OpenBaoClient.buildPath() allowing path traversal. Also found raw exception messages exposed to clients, missing token renewal, and wide-open CORS.

Fix (268s): Added validatePathSegment() with regex enforcement, generic error messages, token renewal via LifecycleAwareSessionManager, removed @CrossOrigin, added retry logic with exponential backoff. Plus new tests for path traversal rejection.

Test: PASS (5s) — all 28 unit tests green.


The Numbers

Metric Value
Nets 3 (reviewer, fixer, tester)
Transitions 21 (1 agent router + 8 map + 8 command + 2 fixer + 2 tester)
Places 25 (8 input + 8 cmd + 3 shared + 6 net-specific)
Shared places 2 (p-review-output, p-fix-result)
Modules covered 8 (executor, gateway, vault, blobstore, CLI, chat, deployment, monitoring)
Max command timeout 4 hours (configurable)
Gateway: review → fix → test 176s total
Vault: found + fixed security vuln 357s total

What Makes This Different

Shared places are the integration layer. Net 1 doesn’t know Net 2 exists. It just writes review results to p-review-output. Net 2 doesn’t know Net 3 exists. It just writes fix results to p-fix-result. Each net is independently deployable, testable, and replaceable.

The agent transitions are the intelligence. The bootstrap router decides which modules to target. The fix mapper extracts structured findings and builds targeted prompts. The test mapper knows Java needs Maven and TypeScript needs tsup. None of this is hard-coded — it’s natural language instructions evaluated by the LLM at runtime.

The executor handles the heavy lifting. With the configurable timeout (now up to 4 hours), Claude Code can take as long as it needs to understand a codebase and implement thorough fixes. The executor’s poll-based architecture means long-running commands never block other transitions.


Definition of Done

For this pipeline to be considered production-ready, every token that enters p-task must flow cleanly through all three nets and arrive in p-test-result with a PASS verdict. Specifically:

  1. Bootstrap routes correctly — the agent identifies the right module(s), creates properly formatted tokens, deletes the consumed task, and logs the routing decision.
  2. Review produces findings — Claude Code runs in the correct working directory, returns exit code 0, and the stdout contains structured findings.
  3. Fix addresses all findings — the agent extracts findings from the review, builds a targeted fix prompt, Claude Code implements the fixes with exit code 0.
  4. Tests pass — the correct test command runs for the module, exit code is 0.
  5. All intermediate places are empty — after a complete run, only p-test-result and p-bootstrap-log hold tokens. Everything else was consumed.
  6. No manual intervention — the entire flow from task drop to test result requires zero human steps.

All 6 criteria were verified against a live run: task “review executor timeout handling” flowed through all 8 transitions in sequence, produced a fix, passed tests, and left all 14 intermediate places empty.


What’s Next

p-test-result is now the natural input for a fourth net: a commit/PR creator that stages changes and opens pull requests for passing fixes. The pattern crystallization continues — what starts as AI-driven review and fix will eventually become deterministic rules as patterns emerge across repeated runs.

The pipeline is also its own best test case. We’re reviewing the modules that build the pipeline. The vault path injection fix was a real security vulnerability found and patched in under 6 minutes by three cooperating nets. That’s the promise of Agentic-Nets: not just automation, but intelligent automation that improves the system it runs on.

Leave a Reply

Your email address will not be published. Required fields are marked *