The Review-Fix-Test Pipeline — Three Agentic-Nets That Autonomously Audit, Repair, and Validate Your Codebase
What if you could drop a single sentence like “review error handling across the system” into a place, walk away, and come back to find code reviewed, bugs fixed, and tests passing? This is the story of building three interconnected Agentic-Nets that form an autonomous code quality pipeline: one that reviews modules with Claude Code, one that fixes every issue found, and one that runs the test suite to verify the fixes. All connected through shared places, all running without human intervention.
The Idea: A Pipeline That Thinks
Most CI pipelines are dumb pipes. They run linters, execute tests, and report pass/fail. They don’t understand the code. They can’t decide which module needs attention, can’t reason about what’s wrong, and definitely can’t fix anything.
We wanted something different: a pipeline where every stage has intelligence. An agent that routes tasks to the right module. Claude Code that reads the codebase and produces structured findings. Another Claude Code session that takes those findings and implements the actual fixes. And a test runner that validates everything compiles and passes.
All of this orchestrated by Agentic-Nets — not by a monolithic script, but by three separate nets that communicate through shared token pools.
Architecture: Three Nets, Shared Places
The system lives in a single AgenticOS model (agentic-nets-reviewer) with one session containing three nets:
The key insight: shared places bridge the nets. p-review-output is the output of Net 1 and the input of Net 2. p-fix-result is the output of Net 2 and the input of Net 3. No message queues, no webhooks. Just shared token pools in the meta-filesystem.
Net 1: The Module Reviewer
The reviewer net has 8 parallel lanes — one per open-source module (executor, gateway, vault, blobstore, CLI, chat, deployment, monitoring). Each lane is a simple two-step pipeline:
- Map transition — reads a review task token, builds a Claude Code command token with the correct
workingDirfor that module - Command transition — sends the command to the executor, which runs
claude -p '...' --dangerously-skip-permissionsin that module’s directory
All 8 lanes fan into a single output place: p-review-output. Whether you review the gateway or the CLI, the result lands in the same pool.
The Bootstrap Agent
But we didn’t want to manually decide which module to review. So we added a side net with an agent transition (t-route-task) that acts as a smart router. Drop a natural language task into p-task:
{"description": "Review error handling across the gateway and CLI", "depth": "quick"}
The agent reads this, reasons about which modules are affected, and creates correctly formatted review tokens in the right module places. For the example above, it routed to both p-gateway and p-cli — because error handling spans both modules. It also cleans up after itself: deleting the consumed task token and writing a bootstrap log.
Net 2: The Auto-Fixer
The fixer net reads from p-review-output — the same place the reviewer writes to. It has two transitions:
- Agent transition (
t-map-fix) — reads the review result, extracts the findings from thebatchResultsJSON, determines the module’s working directory from the_transitionIdmetadata, and builds a Claude Code fix command. The prompt includes every specific issue found in the review. - Command transition (
t-run-fix) — executes the fix. This is the long-running one.
The Timeout Problem
Claude Code fix sessions can run for minutes to hours. But the executor had a hard-coded 10-minute maximum timeout (MAX_TIMEOUT_MS = 600000). Any command exceeding 10 minutes would be killed.
We made this configurable via a Spring property:
# application.properties
executor.command.max-timeout-ms=${EXECUTOR_MAX_TIMEOUT_MS:14400000} # 4 hours
The fix command tokens now carry timeoutMs: 14400000 (4 hours), and the executor respects it. Polling continues normally on a separate thread — long-running commands don’t block the executor from handling other transitions.
Net 3: The Test Runner
The test net reads from p-fix-result — the fixer’s output. Same pattern: an agent transition builds the right test command (Java modules get ./mvnw test, TypeScript modules get npx tsup), and a command transition runs it.
One lesson learned: Java integration tests that need Docker (testcontainers) fail when Docker isn’t running. The agent’s prompt now includes -Dtest='!*IntegrationTest' for Java modules to skip those.
Live Results: Three Modules End-to-End
We ran the full pipeline on three modules. Here’s what happened:
Gateway
Review (103s): Found 6 issues — empty error bodies on fan-out 502 responses, no @ControllerAdvice global exception handler, silent JSON parse failures, unlogged SSE exceptions.
Fix (63s): Fixed all 6 issues across 4 files. Added JSON error messages to fan-out failures, added logger.debug() for parse exceptions, changed aggregation error status from 500 to 502.
Test: PASS (10s).
CLI
Review (178s): Found issues across 6 severity categories — missing process.exit(1) in 10+ command files, 7 empty catch blocks swallowing errors, inconsistent logging.
Fix (361s): Added exit codes to all error paths, replaced empty catch blocks with proper error logging, standardized logging across 15+ files.
Test: PASS (1s) — ESM bundle builds clean.
Vault
Review (84s): Found a HIGH severity path injection vulnerability — direct string concatenation in OpenBaoClient.buildPath() allowing path traversal. Also found raw exception messages exposed to clients, missing token renewal, and wide-open CORS.
Fix (268s): Added validatePathSegment() with regex enforcement, generic error messages, token renewal via LifecycleAwareSessionManager, removed @CrossOrigin, added retry logic with exponential backoff. Plus new tests for path traversal rejection.
Test: PASS (5s) — all 28 unit tests green.
The Numbers
| Metric | Value |
| Nets | 3 (reviewer, fixer, tester) |
| Transitions | 21 (1 agent router + 8 map + 8 command + 2 fixer + 2 tester) |
| Places | 25 (8 input + 8 cmd + 3 shared + 6 net-specific) |
| Shared places | 2 (p-review-output, p-fix-result) |
| Modules covered | 8 (executor, gateway, vault, blobstore, CLI, chat, deployment, monitoring) |
| Max command timeout | 4 hours (configurable) |
| Gateway: review → fix → test | 176s total |
| Vault: found + fixed security vuln | 357s total |
What Makes This Different
Shared places are the integration layer. Net 1 doesn’t know Net 2 exists. It just writes review results to p-review-output. Net 2 doesn’t know Net 3 exists. It just writes fix results to p-fix-result. Each net is independently deployable, testable, and replaceable.
The agent transitions are the intelligence. The bootstrap router decides which modules to target. The fix mapper extracts structured findings and builds targeted prompts. The test mapper knows Java needs Maven and TypeScript needs tsup. None of this is hard-coded — it’s natural language instructions evaluated by the LLM at runtime.
The executor handles the heavy lifting. With the configurable timeout (now up to 4 hours), Claude Code can take as long as it needs to understand a codebase and implement thorough fixes. The executor’s poll-based architecture means long-running commands never block other transitions.
Definition of Done
For this pipeline to be considered production-ready, every token that enters p-task must flow cleanly through all three nets and arrive in p-test-result with a PASS verdict. Specifically:
- Bootstrap routes correctly — the agent identifies the right module(s), creates properly formatted tokens, deletes the consumed task, and logs the routing decision.
- Review produces findings — Claude Code runs in the correct working directory, returns exit code 0, and the stdout contains structured findings.
- Fix addresses all findings — the agent extracts findings from the review, builds a targeted fix prompt, Claude Code implements the fixes with exit code 0.
- Tests pass — the correct test command runs for the module, exit code is 0.
- All intermediate places are empty — after a complete run, only
p-test-resultandp-bootstrap-loghold tokens. Everything else was consumed. - No manual intervention — the entire flow from task drop to test result requires zero human steps.
All 6 criteria were verified against a live run: task “review executor timeout handling” flowed through all 8 transitions in sequence, produced a fix, passed tests, and left all 14 intermediate places empty.
What’s Next
p-test-result is now the natural input for a fourth net: a commit/PR creator that stages changes and opens pull requests for passing fixes. The pattern crystallization continues — what starts as AI-driven review and fix will eventually become deterministic rules as patterns emerge across repeated runs.
The pipeline is also its own best test case. We’re reviewing the modules that build the pipeline. The vault path injection fix was a real security vulnerability found and patched in under 6 minutes by three cooperating nets. That’s the promise of Agentic-Nets: not just automation, but intelligent automation that improves the system it runs on.