AI Coding Harness Framework
2026-04-01 — present
Personal development framework integrating AI coding agents such as Claude Code, Codex, and Amazon Q into Git-based operating infrastructure for context, rules, RAG, cost, and browser runtime — built local BGE-M3 vector RAG, MCP, hook-based governance, runtime surface compiler, and cost analyzer
System Architecture
Problem Solving
AI-agent context and rules drifted across docs, config, and local runtime homes
Made harness source the canonical source and generated Claude/Codex runtime surfaces automatically, adding XML runtime contracts and cross-harness diff verification
Aligned generated CLAUDE.md files and runtime contracts across kh/gp/gd harnesses
Growing markdown knowledge base forced agents to search context manually each time
Implemented sqlite-vec + FTS5 hybrid RAG, local BGE-M3 embeddings, RRF fusion, MCP search server, and automatic reindexing after markdown changes
Indexed 548 files / 3,347 chunks / 20.7 MB, reached 0.7s warm queries, and hit top-2/3 on all 5 gold queries
RAG was an opt-in reference tool and was missed during real investigation/review prompts
Moved to a default UserPromptSubmit trigger, calling search_harness automatically for prompts of 12+ characters and porting it to gp/gd
Passed 15/15 default-on tests across 3 harnesses and hit the target in a real-prompt smoke test
Resume and portfolio JSON were outside the RAG scope, leaving JD customization without evidence search over those surfaces
Implemented portfolio/resume JSON chunkers, target=resume indexing, path-based auto-reindexing, and federated search
Indexed 109 resume-corpus chunks, passed 32 tests, and verified federated search across 4 DBs
Project Description
A personal harness built to operate AI coding agents as repeatable development infrastructure rather than one-off tools. It generates Claude/Codex runtime surfaces from source configuration, indexes the markdown knowledge base through local-embedding RAG, and injects relevant context at session start, prompt submission, edit, and commit time. Rules that documentation alone cannot enforce are backed by Git hooks and tests, while the cost analyzer decomposes session transcripts into 8 categories to track hidden token spend. The system later expanded the RAG target to portfolio/resume JSON, applying the same evidence-search path to resume customization and portfolio retrieval.
Highlights
- Aligned 3 harnesses through automatic Claude/Codex runtime-surface generation plus XML runtime contracts
- Indexed 548 files / 3,347 chunks with local BGE-M3 RAG and reached 0.7s warm queries
- Moved RAG to default-on search_harness calls for prompts of 12+ characters, passing 15/15 tests across 3 harnesses
- Integrated resume/portfolio JSON into the RAG corpus, indexing 109 chunks and passing 32 tests
- Documented Codex Browser + node_repl IAB trust boundaries and fallback order, then reflected runtime settings across 3 harnesses
Performance Metrics
| Performance Metrics | Before | After |
|---|---|---|
| RAG MVP Index | manual context search | 548 files / 3,347 chunks / 20.7 MB (0.7s warm query) |
| RAG Default-On Verification | opt-in trigger | 15/15 tests across 3 harnesses (prompt >=12 chars auto-search) |
| Resume RAG Corpus | 0 indexed chunks | 109 chunks (32 tests passed) |
Tech Decisions
- ▶ Chose local BGE-M3 + sqlite-vec over cloud vector DBs to prioritize immediate reindexing after markdown changes and cost control
- ▶ Chose hook-first enforcement over doc-only rules to compensate for agents not reading policy documents at the moment of need
- ▶ Chose a runtime surface compiler over manual Claude/Codex settings to reduce generated-surface drift and automate 3-harness verification
- ▶ Chose default-on search over opt-in retrieval to reduce missed evidence during real investigation, review, and edit prompts
Lessons Learned
- • Learned that AI development productivity is reproducible only when runtime context, rule enforcement, evidence search, and cost observability share one operating path
- • Established that document-only rules remain reference material unless read at the moment of need, so critical rules should be enforced through hooks, tests, and compiler output
- • Confirmed that RAG must include trigger placement, index freshness, usage logging, and fallback paths, not just retrieval quality, to become part of an agent workflow
- • Learned that tools with different runtime features and trust boundaries, such as Claude and Codex, need canonical source plus generation to reduce drift over long-term operation