Personal project · Case study

Inter: running a multi-agent AI operation under real governance

Can one person direct a team of AI agents with the discipline an enterprise would demand of any production system, and have them do real work? This project is my working answer. The control set covers documented change management, separation of duties, independent review, incident response, and auditability.

What it is

Inter is a small fleet of AI agents with defined roles, running on separate platforms. Together they operate a production job-search automation pipeline end to end: multi-source aggregation, LLM-assisted scoring with grounding verification, deduplication, a database-backed coordination layer that can wake agents on demand, and a web operations dashboard. I serve as Release Manager, the human authority that every consequential decision routes through.

The pipeline is the workload. The governance is what I built it to prove out.

~2 hrs
task dispatched > agent woken autonomously > built & tested > independently reviewed > deployed to production
375
tests passing on a release authored end-to-end by an autonomously woken agent
<$0.01
marginal cost per production pipeline run, by deliberate model-tier and batch economics
~2 min
detection-to-revocation on a real P1 credential exposure, under a pre-written incident protocol

The governance, actually practiced

The control set is derived from a policy framework designed toward ISO 27001, ISO 27701, ISO 42001, and SOC 2 alignment. It is scaled honestly to a single-operator project. Where cost or scale justified a deviation, it went into a maintained exceptions register.

Change control & release discipline

Security operations

AI-specific management (the ISO 42001 layer)

Architecture, briefly

Three Claude-based seats (operations/review, engineering, infrastructure) run on separate platforms, with a Gemini-based independent reviewer and a multi-model gateway agent at tool tier. Coordination runs on PostgreSQL with event-driven notification. When a dispatch row addressed to an agent appears, it wakes a headless session that boots against the governance record, does bounded work, reports back, and marks its own dispatch complete. The human-facing layer is a Django operations dashboard with a curated triage workflow and a bounded two-way sync to a pre-existing datastore during a controlled migration. Cost engineering is explicit: batch APIs for bulk work at half price, model tiers matched to task difficulty, and fleet-wide spend telemetry.

Multi-agent orchestration
PostgreSQL + event-driven wake
Django
LLM grounding verification
Batch API cost engineering
Hardened service configurations
Cross-vendor review
A technical appendix covering the coordination-layer design, the wake function's containment model, audit methodology, and the incident timeline is available on request.

Why it matters

Most AI-governance experience today is policy written for systems someone else runs. This project closes that loop. The person writing the control set has to live under it while agents do real work, fast, with real credentials and real data. The governance survived contact with autonomous agents, and the places it bent are documented, because keeping that record honest is the actual practice.

It also reflects how I approach the discipline professionally. Frameworks should be working tools, not shelf-ware. Risk acceptance should be a documented decision. And I don't believe you really own a control until you can explain the failure mode it addresses.