Skip to content
Personal 3 harnesses · RAG MVP · 109 chunks

AI Coding Harness Framework

2026-04-01 — present

Personal development framework integrating AI coding agents such as Claude Code, Codex, and Amazon Q into Git-based operating infrastructure for context, rules, RAG, cost, and browser runtime — built local BGE-M3 vector RAG, MCP, hook-based governance, runtime surface compiler, and cost analyzer

AI Coding Harness Framework project cover image

System Architecture

AI Coding Harness Framework architecture diagram

Problem Solving

1

AI-agent context and rules drifted across docs, config, and local runtime homes

Solution Process

Made harness source the canonical source and generated Claude/Codex runtime surfaces automatically, adding XML runtime contracts and cross-harness diff verification

Result

Aligned generated CLAUDE.md files and runtime contracts across kh/gp/gd harnesses

2

Growing markdown knowledge base forced agents to search context manually each time

Solution Process

Implemented sqlite-vec + FTS5 hybrid RAG, local BGE-M3 embeddings, RRF fusion, MCP search server, and automatic reindexing after markdown changes

Result

Indexed 548 files / 3,347 chunks / 20.7 MB, reached 0.7s warm queries, and hit top-2/3 on all 5 gold queries

3

RAG was an opt-in reference tool and was missed during real investigation/review prompts

Solution Process

Moved to a default UserPromptSubmit trigger, calling search_harness automatically for prompts of 12+ characters and porting it to gp/gd

Result

Passed 15/15 default-on tests across 3 harnesses and hit the target in a real-prompt smoke test

4

Resume and portfolio JSON were outside the RAG scope, leaving JD customization without evidence search over those surfaces

Solution Process

Implemented portfolio/resume JSON chunkers, target=resume indexing, path-based auto-reindexing, and federated search

Result

Indexed 109 resume-corpus chunks, passed 32 tests, and verified federated search across 4 DBs

Project Description

A personal harness built to operate AI coding agents as repeatable development infrastructure rather than one-off tools. It generates Claude/Codex runtime surfaces from source configuration, indexes the markdown knowledge base through local-embedding RAG, and injects relevant context at session start, prompt submission, edit, and commit time. Rules that documentation alone cannot enforce are backed by Git hooks and tests, while the cost analyzer decomposes session transcripts into 8 categories to track hidden token spend. The system later expanded the RAG target to portfolio/resume JSON, applying the same evidence-search path to resume customization and portfolio retrieval.

Highlights

  • Aligned 3 harnesses through automatic Claude/Codex runtime-surface generation plus XML runtime contracts
  • Indexed 548 files / 3,347 chunks with local BGE-M3 RAG and reached 0.7s warm queries
  • Moved RAG to default-on search_harness calls for prompts of 12+ characters, passing 15/15 tests across 3 harnesses
  • Integrated resume/portfolio JSON into the RAG corpus, indexing 109 chunks and passing 32 tests
  • Documented Codex Browser + node_repl IAB trust boundaries and fallback order, then reflected runtime settings across 3 harnesses

Performance Metrics

Performance Metrics Before After
RAG MVP Index manual context search 548 files / 3,347 chunks / 20.7 MB (0.7s warm query)
RAG Default-On Verification opt-in trigger 15/15 tests across 3 harnesses (prompt >=12 chars auto-search)
Resume RAG Corpus 0 indexed chunks 109 chunks (32 tests passed)

Tech Decisions

  • Chose local BGE-M3 + sqlite-vec over cloud vector DBs to prioritize immediate reindexing after markdown changes and cost control
  • Chose hook-first enforcement over doc-only rules to compensate for agents not reading policy documents at the moment of need
  • Chose a runtime surface compiler over manual Claude/Codex settings to reduce generated-surface drift and automate 3-harness verification
  • Chose default-on search over opt-in retrieval to reduce missed evidence during real investigation, review, and edit prompts

Lessons Learned

  • Learned that AI development productivity is reproducible only when runtime context, rule enforcement, evidence search, and cost observability share one operating path
  • Established that document-only rules remain reference material unless read at the moment of need, so critical rules should be enforced through hooks, tests, and compiler output
  • Confirmed that RAG must include trigger placement, index freshness, usage logging, and fallback paths, not just retrieval quality, to become part of an agent workflow
  • Learned that tools with different runtime features and trust boundaries, such as Claude and Codex, need canonical source plus generation to reduce drift over long-term operation

Tech Stack

TypeScript Python Bash MCP sqlite-vec FTS5 Ollama BGE-M3 Vector RAG Git Hooks Claude Code Codex