I Built a Multi-Agent AI Tribunal with Gemma 4

DEV Community

ujja

May 15, 2026, 11:41 AM

What If AI Agents Put Each Other on Trial? This is a submission for the "Gemma 4 Challenge: Build with Gemma 4" (https://dev.to/challenges/google-gemma-2026-05-06) HumanLayer is a multi-agent AI governance platform where specialized Gemma 4 agents collaboratively review, challenge, and hold each other accountable — instead of one model silently making all the decisions. The system has two modes: Governance Council — Five permanent agents review uploaded documents (policy docs, OAuth configs, onboarding flows, architecture reports) against compliance frameworks in parallel, then run a consensus engine to produce a single governance verdict. Constitutional Tribunal — A full adversarial proceeding: four agents argue competing positions across three debate rounds, a four-member AI jury validates reasoning quality, and a Governance Judge issues a constitutional ruling. The human can appeal and override at any point. The agent roster: Agent Model Role Governance Agent gemma4:moe Orchestrates consensus, policy analysis, GDPR/AI Act/ISO 27001 alignment Security Agent gemma4:31b-q4_K_M OWASP Top 10, threat modeling (STRIDE), auth/RBAC/JWT analysis Ethics & Inclusion gemma4:31b-q4_K_M Bias detection, exclusion patterns, fairness scoring across 8 protected classes Accessibility Agent gemma4:2b WCAG 2.2 AA, Flesch-Kincaid reading level, plain-English rewriting Audit Agent gemma4:4b Immutable audit trails, explainability timelines, decision history Privacy Advocate gemma4:9b Adversarial tribunal role — data minimization enforcer, consent challenger AI Jury (×4) gemma4:moe Anti-hallucination validators — each evaluates logical consistency, evidence validity, constitutional alignment No agent can approve its own actions. No agent has unrestricted authority. Every governance decision is traceable to a specific rationale chain. ┌──────────────────────────────────────────────────────────────────────────────┐ │ HumanLayer Constitutional Tribunal │ │ │ │ ┌──────────────┐ ┌────────────────┐ ┌────────────────┐ ┌─────────────┐ │ │ │ Security │ │ Accessibility │ │ Privacy │ │ Ethics │ │ │ │ Prosecutor │ │ Defender │ │ Advocate │ │ Council │ │ │ │ gemma4:31b │ │ gemma4:2b │ │ gemma4:9b │ │ gemma4:31b │ │ │ └──────┬───────┘ └──────┬──────────┘ └────────┬────────┘ └──────┬──────┘ │ │ └─────────────────┴───────────────────────┴──────────────────┘ │ │ │ 3-Round Adversarial Debate │ │ ┌────────▼────────┐ │ │ │ Jury Panel │ │ │ │ (4 validators) │ │ │ │ gemma4:moe │ │ │ └────────┬─────────┘ │ │ │ Jury Verdict + Consensus Score │ │ ┌────────▼─────────┐ │ │ │ Governance Judge │ │ │ │ gemma4:moe │ │ │ └────────┬──────────┘ │ │ ┌─────────────────┼──────────────────┐ │ │ ┌──────▼──────┐ ┌────────▼──────┐ ┌────────▼──────┐ │ │ │ Audit Agent │ │ Human Appeal │ │ Precedent │ │ │ │ gemma4:4b │ │ & Override │ │ Library │ │ │ └─────────────┘ └───────────────┘ └───────────────┘ │ └──────────────────────────────────────────────────────────────────────────────┘ Every tribunal case passes through 7 phases: Phase What happens 1. Opening Arguments All 4 adversarial agents argue concurrently 2. Cross-Examination Each agent challenges the others' positions 3. Closing Arguments Final positions with full debate context injected 4. Jury Deliberation 4 jury agents evaluate reasoning quality concurrently 5. Trust + DSL Evaluation Trust scores updated; governance DSL rules evaluated 6. Constitutional Ruling Judge issues approve / reject / escalate / conditional 7. Audit + Precedent Immutable audit written; high-confidence cases become precedents Minority opinions are preserved at every phase. The human can appeal at any point. The platform ships with 8 governance simulations that demonstrate how different combinations of Gemma models handle real-world conflict types. Each simulation pre-loads the full artifact set, assigns the right adversarial agents, and runs the complete tribunal pipeline. Case 1 — Inaccessible MFA Crisis (accessibility_vs_security) gemma4:31b) builds a detailed OWASP threat model justifying stronger auth. The Accessibility Defender (gemma4:2b) immediately flags that the CAPTCHA implementation fails WCAG 2.2 SC 1.1.1 and the MFA UX creates cognitive barriers for users with anxiety disorders. The Ethics Council (gemma4:31b) adds intersectional analysis — disabled users are disproportionately excluded. The Jury (gemma4:moe) detects that the Security agent's argument implicitly assumes able-bodied users; consensus drops. The Judge (gemma4:moe) issues a conditional approval: passkey-based auth with accessible fallback required. Why these models: 2b is fast enough to run reading-level and WCAG checks in real-time during debate rounds; 31b holds the full RBAC + auth flow in context to reason about privilege escalation paths. Case 2 — Unauthorized Self-Approval Incident (rogue_agent_detection) gemma4:31b) traces the approval chain and identifies the constitutional violation. The Audit Observer (gemma4:4b) reconstructs the hidden approval timeline from trust score history. The Governance Judge (gemma4:moe) cross-references three governance DSL rules simultaneously and issues a hard reject. No jury needed — the constitutional violation is unambiguous. Why these models: 4b is ideal for narrative reconstruction of multi-step event chains; 31b reasons across the full RBAC policy graph to find the escalation path; moe evaluates overlapping DSL rules without specialization bias. Case 3 — Manipulative Consent Flow (dark_pattern_detection) gemma4:2b) flags cognitive overload and Flesch-Kincaid Grade 14 language in the consent modal. The Privacy Advocate (gemma4:9b) prosecutes on GDPR Article 7 — consent must be freely given, and pre-checked boxes fail that standard. The Ethics Council (gemma4:31b) identifies the pattern as predatory toward low-literacy and elderly users. The Jury (gemma4:moe) scores the Privacy Advocate's constitutional argument highest; verdict: reject. Why these models: 9b holds enough legal context to cite specific GDPR articles accurately while staying fast enough for 3-round debate; 2b produces plain-language rewrites of the hostile consent copy as a remediation artifact. Case 4 — Discriminatory Hiring Pipeline (ai_bias_review) gemma4:31b) performs intersectional analysis across 8 protected classes, identifying proxy discrimination through resume formatting preferences that correlate with socioeconomic status. The Security Prosecutor (gemma4:31b) questions training data provenance and highlights that the model's confidence scores aren't calibrated across demographic groups. The Judge (gemma4:moe) escalates: the system cannot be approved or rejected without third-party fairness audit evidence. Why these models: both 31b agents need to hold the full model output distribution, demographic analysis, and fairness metric definitions in context simultaneously — this is exactly the kind of multi-document cross-reference the 31B parameter count handles well. Case 5 — Content Removal Appeal (content_moderation_tribunal) gemma4:moe) detects that the Safety Prosecutor's argument relies on a misapplied GDPR citation; reasoning score drops. Conditional approval: human review required for this content category going forward. Why these models: moe's expert sub-network routing handles the cross-domain reasoning — cultural context, legal frameworks, and safety analysis activate different experts per token rather than averaging across all domains. Case 6 — Unsafe Medical Recommendation (healthcare_ai_review) gemma4:4b) reconstructs the reasoning chain and identifies exactly where the model's confidence score dropped but wasn't flagged. The Ethics Council (gemma4:31b) highlights that the patient was elderly and the recommendation could have caused serious harm. The Judge (gemma4:moe) escalates to mandatory human clinical review. This case type demonstrates the system's highest-stakes use: identifying hallucination risk in safety-critical inference chains. Why these models: 4b is the explainability layer — it produces the step-by-step reasoning audit that shows clinicians exactly where and why the inference went wrong, without requiring the expensive 31B model for that reconstruction task. Case 7 — Translation Drift Crisis (multilingual_governance_failure) gemma4:2b) flags that the terminology is inaccessible to non-legal readers in all versions. The Ethics Council (gemma4:31b) identifies that the translation error would have changed user rights without notice. The Jury (gemma4:moe) validates semantic consistency across the translation diffs. Conditional approval: re-translation with legal review required. Why these models: this is one of the cases where Gemma 4's multimodal capability matters — agents can directly analyze side-by-side screenshots of translated policy text and flag visual layout differences that affect readability. Case 8 — Production Deployment Tribunal (autonomous_deployment_approval) gemma4:31b) maps the RBAC gaps against OWASP's least-privilege principle. The Accessibility Defender (gemma4:2b) rewrites the inaccessible admin onboarding copy as a remediation artifact. The Privacy Advocate (gemma4:9b) demands telemetry scope reduction with explicit legal basis. The Governance Agent (gemma4:moe) coordinates the multi-agent response and identifies that all 4 issues can be resolved with a conditional approval rather than a full reject. Why these models: this is the system's most complex case — it shows why model-sizing matters. Running all four roles with one large model would either be prohibitively slow or collapse domain expertise. Using 2b for accessibility, 9b for privacy prosecution, 31b for security depth, and moe for orchestration means each reasoning task runs at the right capability level in parallel. Backend: FastAPI + SQLAlchemy (async) + Celery + Redis Frontend: Next.js 15 App Router + TypeScript + TailwindCSS Inference: Ollama serving Gemma 4 variants locally Observability: OpenTelemetry + Prometheus + Grafana Most governance tooling today is written for people who already understand governance. It assumes security literacy, compliance experience, and comfort with technical jargon. When people can't understand governance language, they disengage — and that gap is where real risk lives. I wanted to build something that made governance decisions explainable to the people they affect, not just to the compliance team. Not simplified governance. Not weaker governance. More accessible governance. That meant treating plain-English explainability as a hard requirement, not a nice-to-have. Governance is fundamentally about balancing competing concerns. Security optimizes for protection. Accessibility optimizes for inclusion. Compliance optimizes for consistency. Those priorities conflict constantly in real systems. A single model collapses that tension into one answer. A council makes the tension visible. During testing, the Security Agent recommended mandatory CAPTCHA and MFA on a login flow. The Accessibility Agent immediately contested it — the CAPTCHA implementation was inaccessible to screen-reader users, and the proposed MFA UX created cognitive barriers for users with anxiety. The Governance Agent mediated, proposing passkey-based authentication with accessible fallback flows. Both agents' rationales, the disagreement, and the compromise are preserved in the audit trail. That's the core idea: governance decisions should show their work. Accessibility started as a supporting feature. Midway through building, I realized it was a governance issue. A privacy policy no one can realistically read is technically compliant — but is it meaningfully transparent? An MFA flow that excludes disabled users may improve security metrics while reducing equitable access. Those aren't accessibility problems. They're governance failures that happen to manifest as accessibility problems. So the Accessibility Agent (gemma4:2b) is a first-class council member with veto authority. It doesn't just document issues — it can block a governance approval if WCAG 2.2 AA failures affect core user journeys, if error messages shame users, or if reading level exceeds Grade 12 with no plain-English alternative. It also actively rewrites hostile content, making it the only agent that produces corrected output rather than just findings. The most important architectural decision in HumanLayer was which Gemma variant to assign to each role — and why. This wasn't arbitrary. Each model was selected based on the nature of its reasoning task: gemma4:2b — Accessibility Agent Fast, focused, empathetic. Accessibility analysis is primarily pattern recognition: reading level estimation, WCAG criterion checking, hostile language detection, and plain-English rewriting. These tasks don't require deep multi-document reasoning. They require speed, consistency, and — critically — the ability to produce warm, non-intimidating output. gemma4:2b runs the fastest inference loop in the system, which matters because accessibility checks run on every submission in parallel with the other agents. What it does in practice: Estimates Flesch-Kincaid reading level and flags content above Grade 8 Checks WCAG 2.2 AA compliance (contrast, keyboard nav, screen-reader compatibility) Detects cognitively hostile patterns (excessive steps, jargon overload, shame-based error messages) Rewrites flagged content into dyslexia-friendly plain English Evaluates security UX for accessibility barriers (MFA flows, lockout states, CAPTCHA) gemma4:4b — Audit Agent The Audit Agent has a deliberately narrow job: explain what happened, neutrally and completely. It never makes governance decisions. It observes, summarizes, and records. gemma4:4b sits at a sweet spot — capable enough to produce coherent multi-agent decision narratives, but not so expensive that it becomes a bottleneck. It runs after every governance cycle and every tribunal phase. What it does in practice: Generates immutable, human-readable audit trails for every governance decision Produces explainability timelines showing step-by-step how a verdict was reached Documents inter-agent disagreements with both sides' rationale Writes plain-English summaries readable by non-technical stakeholders Assigns an explainability score to each decision chain gemma4:9b — Privacy Advocate (Tribunal) The Privacy Advocate is the tribunal's adversarial data-minimization enforcer. Its constitutional mandate: assume data minimization as the default, and make collectors justify every data point. gemma4:9b provides enough depth for nuanced consent analysis and privacy-by-design reasoning while remaining faster than the 31B models — important because the Privacy Advocate runs three debate rounds in the tribunal. What it does in practice: Challenges data collection practices and demands explicit consent justification Enforces purpose limitation (data collected for X cannot be used for Y) Detects passive surveillance patterns Validates data subject rights implementation (access, erasure, portability) Can veto any decision that lacks explicit privacy-by-design evidence gemma4:31b-q4_K_M — Security Agent & Ethics Agent Both the Security Agent and Ethics & Inclusion Agent use gemma4:31b (quantized to q4_K_M for local inference). These are the system's two deep-reasoning roles. Security Agent: Security analysis is context-dense. Reasoning about privilege escalation requires holding an entire RBAC configuration in context. Evaluating an OAuth flow requires tracing token lifetimes, redirect URIs, scope definitions, and interaction patterns simultaneously. The 31B model handles that kind of multi-document, cross-referenced reasoning substantially better than smaller variants. What it does in practice: Full OWASP Top 10 sweep on every submission STRIDE threat modeling for system designs Auth flow analysis: OAuth 2.0/OIDC, JWT lifetime and scope risks, session security RBAC/ABAC validation against least-privilege principles Privilege escalation path detection "Security empathy" checks — flags security controls that are inaccessible (hostile MFA, opaque lockouts, shame-based error states) Ethics & Inclusion Agent: Bias and fairness analysis requires intersectional reasoning across multiple identity dimensions simultaneously. Detecting subtle exclusion patterns — where a policy is technically neutral but systematically disadvantages a protected group — requires the model to hold cultural context, power dynamics, and representation gaps in mind at once. The 31B model is the right fit. What it does in practice: Detects bias across 8 protected classes (gender, race, age, disability, religion, nationality, economic, other) Identifies exclusion patterns, representation gaps, and discriminatory assumptions Evaluates AI system fairness (disparate impact, proxy discrimination) Considers intersectionality — overlapping marginalized identities, not just single-axis analysis Global cultural sensitivity, not just Western defaults Describes harmful language without reproducing it gemma4:moe — Governance Agent + Jury Panel (×4) + Governance Judge The Mixture-of-Experts architecture makes gemma4:moe the natural fit for orchestration, meta-reasoning, and judgment tasks. MoE models activate specialized expert sub-networks per token, which gives them strong cross-domain reasoning at lower per-token compute cost than comparably-capable dense models. Three distinct roles in HumanLayer use it: Governance Agent — The council chair. It handles multi-framework compliance scoring (GDPR, EU AI Act, WCAG 2.2, ISO 27001, OWASP), disagreement mediation between agents, confidence-weighted consensus evaluation, and escalation routing. The breadth of frameworks and the need to reason across competing agent outputs makes MoE the right choice. AI Jury Panel (4 agents) — Each jury member evaluates the reasoning quality of the adversarial agents, not the governance decision itself. They're the anti-hallucination layer. Four independent validators check: Is the evidence valid? Is the logic consistent? Is there a hallucination risk? Does the reasoning align with the constitutional rules? Having four independent MoE instances running concurrently — each unaware of the others' verdicts until deliberation — makes reasoning laundering significantly harder. Governance Judge — After the jury delivers its verdict, the Judge issues the constitutional ruling. It synthesizes the full three-round debate, the jury consensus score, trust scores, governance DSL rules, and precedent library to produce a final ruling: approve, reject, escalate, or conditional approval with required remediation steps. Three capabilities made Gemma 4 the right choice for a governance platform specifically: Long-context window. Governance is inherently context-heavy. A single tribunal case might involve a policy document, prior agent analyses, historical rulings, disagreement records, and governance DSL rules — all needing to stay in context simultaneously. The long-context window made it possible to feed the full debate history into each jury agent without truncation. Multimodal input. A significant portion of governance artifacts aren't text documents — they're screenshots of onboarding flows, architecture diagrams, admin dashboards, and consent screens. Being able to analyze those directly meant the system could review visual accessibility patterns, screenshot-based CAPTCHA flows, and UI-level governance risks that a text-only model would miss entirely. Local deployment via Ollama. Governance reviews frequently involve sensitive material: internal policy documents, architecture specs, authentication configurations. Running inference locally means none of that leaves the machine. For enterprise adoption, local-first isn't a nice-to-have — it's often a hard requirement. The most unexpected insight: disagreement is a feature, not a bug. Most AI systems are optimized to produce confident, singular answers as quickly as possible. HumanLayer deliberately surfaces disagreement — between agents, across debate rounds, in the audit trail. That visibility turned out to be the most useful part of the system, because it shows users why a decision landed where it did, not just what the answer is. I also think the model-sizing decisions matter more than they're usually given credit for. Using gemma4:2b for accessibility and gemma4:31b for security isn't just a cost optimization. It reflects a genuine difference in the nature of those reasoning tasks. Fast, empathetic pattern recognition is not the same as deep cross-referenced threat analysis, and collapsing them into the same model loses something real. Note: The app currently runs locally and is not deployed yet. https://www.youtube.com/watch?v=pfUncccezQA Repository: https://github.com/ujjavala/HumanLayer Architecture Docs: https://github.com/ujjavala/HumanLayer/tree/main/docs One question stayed with me throughout this project: If AI systems become influential enough to shape governance decisions, who governs the governors? HumanLayer is one answer: make the AI systems govern each other, transparently, with human override always available. Expose disagreement instead of hiding it. Treat accessibility as a governance requirement. Build audit trails that explain decisions to the people they affect, not just to the compliance team. Trustworthy AI will probably look less like all-knowing superintelligence and more like collaborative systems designed to keep each other accountable.