AI News Hub Logo

AI News Hub

I Built a Red Team of AI Agents to Attack My Code. Here is the Full Technical Report

DEV Community
PythonWoods

In Part 1, I explained why I built Zenzic — the philosophy, the threat model, and the architecture of a Pure Python documentation analyzer. In Part 2, I detailed the transition to the Obsidian Bastion architecture: engine-agnostic discovery, the Layered Exclusion Manager, and zero-subprocess enforcement. Today, in the final chapter of this series, I'm sharing the results of Operation Obsidian Stress: a controlled adversarial audit where I orchestrated a multi-agent AI system to find every gap in the Shield before the v0.6.1rc2 release. Four bypass vectors. Four real findings. All closed. This is the complete technical post-mortem of Operation Obsidian Stress — the adversarial security audit we ran against Zenzic v0.6.1rc2's Shield (credential scanner) before release. I'm publishing the full technical details because the findings are instructive, the fixes are non-obvious, and the code belongs in the open. Note on methodology: To validate the Shield, I orchestrated a multi-team AI system — Red Team, Blue Team, and Purple Team — using specialized agent ensembles to simulate advanced obfuscation techniques. This is AI-assisted security engineering: using the same agentic architecture that attackers use to find the gaps they would exploit. All findings, bypass vectors, and fixes documented here are real. Before the attack details, context: Shield is Zenzic's credential detection layer. It scans every Markdown and MDX file in your documentation before the build runs, looking for patterns that indicate real credentials in content. The threat model is simple: a contributor submits a PR with a code example. That example contains a real API key — copied from a local terminal session, pasted from a Slack thread, or forgotten after a debugging session. The reviewer reads the prose, not the bytes. The PR merges. The docs build. The key is now live on your documentation site, indexed by search engines. Shield exists to catch that before it ships. If Shield can be bypassed by someone who knows how it works, it's not a scanner — it's a false guarantee. Shield's architecture before Operation Obsidian Stress: Read each line of the Markdown/MDX file Apply a normalization pass (strip backticks, collapse whitespace) Run 9 regex patterns against the normalized line Report any match as a ShieldFinding Step 4 triggers Exit Code 2 (Shield breach) — non-bypassable, distinct from Exit Code 1 (validation failure) and Exit Code 3 (Blood Sentinel / path traversal). The attack surface was step 2: the normalization pass. It normalized formatting noise but did not account for deliberate obfuscation. Category: Input normalization bypass Severity: High — complete bypass of all regex patterns CVSS analogy: 8.1 (High) Python's unicodedata module exposes a character category classification. The Cf category ("Format characters") includes characters that are semantically meaningful in Unicode text processing but are invisible in rendered output and most text displays: Code Point Name Use U+200B Zero Width Space Line breaking hint U+200C Zero Width Non-Joiner Prevents ligatures U+200D Zero Width Joiner Forces ligatures U+00AD Soft Hyphen Optional hyphenation U+FEFF Zero Width No-Break Space BOM marker Inject any of these into a credential token and the regex fails to match: # Craft the bypass import unicodedata key = "sk-abc123def456ghi789jkl012mno345pqr678stu" # Insert ZWS after position 9 (inside the token) bypass = key[:9] + "\u200B" + key[9:] print(len(bypass)) # 50 chars — 1 more than the real key print(repr(bypass)) # 'sk-abc123\u200Bdef456ghi789jkl012mno345pqr678stu' import re pattern = re.compile(r"sk-[a-zA-Z0-9]{48}") print(pattern.search(bypass)) # None — bypass confirmed The zero-width space is not in [a-zA-Z0-9]. The 48-character quantifier fails to match the now-51 byte sequence (50 characters, but the ZWS is a multi-byte UTF-8 character). The credential leaks. Strip all Cf-category characters before any normalization step runs: import unicodedata def _strip_unicode_format_chars(text: str) -> str: """ Remove all Unicode Format (Cf) characters. These are invisible to human readers but can be used to interrupt regex pattern matching against credential tokens. Examples: U+200B (zero-width space), U+200C (ZWNJ), U+200D (ZWJ), U+00AD (soft hyphen), U+FEFF (BOM). """ return "".join(c for c in text if unicodedata.category(c) != "Cf") Test coverage added: @pytest.mark.parametrize("char", [ "\u200b", # zero-width space "\u200c", # zero-width non-joiner "\u200d", # zero-width joiner "\u00ad", # soft hyphen "\ufeff", # zero-width no-break space / BOM ]) def test_shield_cf_strip(char, tmp_path): key = "sk-abc123def456ghi789jkl012mno345pqr678stu" bypass = key[:9] + char + key[9:] doc = tmp_path / "test.md" doc.write_text(f"My API key: {bypass}") results = run_shield(doc) assert len(results) == 1, f"Cf char {repr(char)!r} should not bypass Shield" assert results[0].family == "openai" Category: Input normalization bypass Severity: High — bypasses patterns that depend on punctuation characters Affected families: OpenAI (hyphen), Stripe (hyphen, underscore), GitHub (underscore) Markdown renderers decode standard HTML entities. The hyphen character (-) has the HTML entity -. The underscore (_) is _. sk-abc123def456ghi789jkl012mno345pqr678stu Renders as: sk-abc123def456ghi789jkl012mno345pqr678stu — a valid OpenAI key format. The credential scanner sees sk-abc123... — which does not match sk-[a-zA-Z0-9]{48}. The entity is a one-character substitution of a single character that forms the structural boundary of the pattern. import html def _decode_html_entities(text: str) -> str: """ Decode HTML entities before pattern matching. A credential containing - (hyphen) or _ (underscore) renders correctly in a browser but bypasses regex patterns that match on the literal character. """ return html.unescape(text) html.unescape() is part of the Python standard library. No dependencies. Zero cost. Affected patterns if left unpatched: sk-... (OpenAI): hyphen obfuscated as - sk_live_... (Stripe): underscores obfuscated as _ ghp_... (GitHub): underscore in prefix obfuscated Category: Token fragmentation via markup Severity: High — renders the token non-contiguous in raw source Technique: Inject HTML or MDX comment blocks between credential characters HTML comments and MDX expression comments are invisible in rendered output. They are valid Markdown syntax that any Markdown renderer will process and discard. sk-abc123def456ghi789jkl012mno345pqr678stu In the rendered documentation: sk-abc123def456ghi789jkl012mno345pqr678stu (fully readable, valid pattern). In the raw source the scanner reads: sk-abc123def456ghi789... — the regex match fails because the comment block interrupts the character class [a-zA-Z0-9]. MDX variant: sk-abc123{/* inline MDX comment */}def456ghi789jkl012mno345pqr678stu Same effect. Both comment syntaxes are invisible in render, structurally disruptive in raw source. import re # Pre-compile: these run against every line of every scanned file _HTML_COMMENT_RE = re.compile(r"", re.DOTALL) _MDX_COMMENT_RE = re.compile(r"\{/\*.*?\*/\}", re.DOTALL) def _strip_markup_comments(text: str) -> str: """ Strip HTML and MDX comments before pattern matching. Comments are invisible in rendered output and can be used to fragment credential tokens in raw Markdown/MDX source. """ text = _HTML_COMMENT_RE.sub("", text) text = _MDX_COMMENT_RE.sub("", text) return text Note on re.DOTALL: The DOTALL flag is required because a multi-line comment spanning multiple characters — though unusual in this attack vector — must also be caught. The per-line processing means DOTALL applies within the buffer being processed, not across the entire file. Category: Architectural bypass — stateless scanner assumption Severity: Critical — bypasses all pattern matching with zero obfuscation Technique: Line break This is the most architecturally significant finding. It requires no Unicode tricks, no entity encoding, no markup injection. One line break. Here is my staging key for the integration tests: sk-abc123def456 ghi789jkl012mno345pqr678stu901vwx234yz The scanner processes line 1: Here is my staging key for the integration tests: sk-abc123def456 No match. The pattern requires 48 characters after sk-. There are only 12. The scanner processes line 2: ghi789jkl012mno345pqr678stu901vwx234yz No match. No sk- prefix. The credential leaks. The split is invisible in rendered output — the two lines render as a single paragraph. All documentation prose wraps at rendering time. A human reader sees the full key. The scanner never does. sequenceDiagram participant Line1 as Line N participant Buffer as Lookback Buffer (80 chars) participant Line2 as Line N+1 participant Detector as Pattern Detector Note over Line1: "sk-abc123def456" (12 chars after prefix) Line1->>Detector: Scan line N → no match Line1->>Buffer: Store tail[-80:] Note over Line2: "ghi789jkl012mno345pqr678stu..." Line2->>Detector: Scan line N+1 → no match Buffer->>Detector: join_zone = prev[-80:] + current[:80] Note over Detector: Full 48-char token now visible Detector-->>Line2: ✅ ShieldFinding: family=openai A stateful generator that maintains context across line boundaries, creating a synthetic overlap zone: from collections.abc import Iterable, Iterator from pathlib import Path def scan_lines_with_lookback( lines: Iterable[tuple[int, str]], file_path: Path, buffer_width: int = 80, ) -> Iterator[ShieldFinding]: """ Scan lines for credentials with cross-line token detection. For each line, in addition to scanning the normalized line itself, a 'join zone' is constructed from the tail of the previous line and the head of the current line. Any credential split across the line boundary will appear as a contiguous token in this synthetic window. Args: lines: Iterable of (line_number, raw_line) tuples. file_path: Path of the file being scanned (for reporting). buffer_width: Characters to take from each side of the boundary. Default 80 — calibrated to catch splits at typical prose line lengths without inflating false positives. Yields: ShieldFinding instances for each unique credential detected. """ prev_normalized: str = "" prev_seen: set[str] = set() for line_no, raw_line in lines: seen_this_line: set[str] = set() normalized = _normalize_line_for_shield(raw_line) # Pass 1: standard per-line scan for finding in _scan_normalized_line(normalized, file_path, line_no): yield finding seen_this_line.add(finding.family) # Pass 2: cross-line join zone scan if prev_normalized: join_zone = prev_normalized[-buffer_width:] + normalized[:buffer_width] for finding in _scan_normalized_line(join_zone, file_path, line_no): # Deduplicate against families already seen on either adjacent line. # A finding in the join zone that also matched on the current line # would otherwise be reported twice. if finding.family not in (seen_this_line | prev_seen): yield finding prev_normalized = normalized prev_seen = seen_this_line Why 80 characters? The choice reflects the statistical distribution of credential split positions relative to line length. A credential split is most likely to occur near the end of a prose line that happens to end mid-token. Standard terminal width and most documentation editors wrap at 80–120 characters. Taking 80 characters from each side of the boundary covers the vast majority of real-world split positions. Increasing to 160 would double the join zone size with minimal additional detection coverage but would increase false positive probability for partial pattern fragments. The 80-character default can be overridden if scan results show false positives on a specific corpus. Adding a second pass per line and constructing a join-zone string has measurable but acceptable overhead: Mode 5,000 files 10,000 files 50,000 files No lookback (v0.6.0) 412 ms 803 ms 3,891 ms With lookback (v0.6.1) 626 ms 1,247 ms 6,128 ms Overhead +52% +55% +57% The overhead is roughly linear: each file with N lines now performs N additional string slices and N additional pattern passes. The absolute numbers remain well within CI pipeline acceptable ranges. A 5,000-file documentation corpus completes in 626 ms on a mid-range runner. The benchmark script is in the repository: python scripts/benchmark.py --files 5000 --mode lookback. After closing all four vectors, Shield's normalization function runs every line through a deterministic eight-step sequence: def _normalize_line_for_shield(raw_line: str) -> str: """ Apply the full normalization pipeline before credential pattern matching. Steps are ordered to guarantee that later transformations operate on clean input — e.g., entity decoding happens before comment stripping to handle entities within comment boundaries. """ text = raw_line # Step 1: Strip Unicode Format (Cf) characters # Must run first — prevents Cf chars from surviving entity decoding. text = _strip_unicode_format_chars(text) # Step 2: Decode HTML entities # - → -, _ → _, & → &, etc. text = html.unescape(text) # Step 3: Strip HTML comments # → "" text = _HTML_COMMENT_RE.sub("", text) # Step 4: Strip MDX expression comments # {/* ... */} → "" text = _MDX_COMMENT_RE.sub("", text) # Step 5: Unwrap backtick code spans # `sk-abc123...` → sk-abc123... # Credentials in code spans are still credentials. text = _BACKTICK_RE.sub(lambda m: m.group(1), text) # Step 6: Remove string concatenation operators # "sk-" + "abc123..." → "sk-" "abc123..." # Then whitespace collapse in step 8 joins them for matching. text = text.replace("+", " ") # Step 7: Replace Markdown table cell separators # | key | value | → " key value " # Prevents pipe characters from interrupting patterns. text = text.replace("|", " ") # Step 8: Collapse whitespace # Multiple spaces → single space, strip leading/trailing text = " ".join(text.split()) return text Each step is independently testable. The test suite includes 47 tests specifically for normalization, covering each step in isolation and in combination. Before the operation: 929 passing tests. After closing all four vectors: 1,046 passing tests. 117 new tests, distributed across: Area New Tests Cf character injection (ZRT-006) 23 HTML entity obfuscation (ZRT-006b) 18 Comment interleaving (ZRT-007) 31 Cross-line token splitting (ZRT-007b) 28 Normalization pipeline integration 17 9 credential families, all validated against the complete normalization pipeline: Family Pattern Example true positive OpenAI API Key sk-[a-zA-Z0-9]{48} sk-abc123def456ghi789... GitHub Token gh[poushr]_[A-Za-z0-9_]+ ghp_abc123def456 AWS Access Key AKIA[0-9A-Z]{16} AKIAIOSFODNN7EXAMPLE Stripe Live Key sk_live_[a-zA-Z0-9]+ sk_live_abc123def456 Slack Token xox[bpas]-[0-9]+-... xoxb-12345-67890-abc Google API Key AIza[0-9A-Za-z\-_]{35} AIzaSyD-9tSrke72I6e0... Private Key Block -----BEGIN .* PRIVATE KEY----- PEM headers Hex-Encoded Payload (\\x[0-9a-fA-F]{2}){8,} \x41\x42\x43... GitLab PAT glpat-[0-9a-zA-Z\-_]{20} glpat-xxxxxxxxxxxxxxxxxxxx Zenzic's exit codes are non-negotiable — no configuration can suppress them: Exit Code Name Trigger 0 Clean No issues found 1 Sentinel Validation failures (broken links, orphans, etc.) 2 Shield Credential detected 3 Blood Sentinel Path traversal attempt in config Codes 2 and 3 cannot be configured away. This is intentional: they represent the security perimeter. A CI step that can be silenced on a security failure is not a security control. # .github/workflows/docs.yml - name: Zenzic Shield run: | pip install zenzic==0.6.1rc2 zenzic shield --strict # Exit code 2 → credential found → build fails # Exit code 3 → path traversal → build fails # No --ignore-shield flag exists # Pre-commit hook pip install zenzic==0.6.1rc2 # Full analysis (links + orphans + credentials + assets) zenzic check all # Security scan only zenzic shield # Quality score with regression detection zenzic score zenzic diff --baseline .zenzic-baseline.json The four bypass vectors found during Operation Obsidian Stress are not exotic. They're the kind of techniques that appear in any list of regex evasion methods — Unicode injection, HTML entity encoding, markup comment interleaving, structural line splitting. What made them findable was the decision to look for them systematically, with adversarial intent, before release. What made them fixable was having a normalization pipeline with defined semantics and comprehensive test coverage at each step. Security tooling that isn't tested adversarially is security tooling that provides the appearance of coverage without the substance. The Shield bypass vectors existed for the same reason most security gaps exist: nobody had tried to break through them yet. Documentation: zenzic.dev GitHub: github.com/PythonWoods/zenzic PyPI: pypi.org/project/zenzic