Beyond Repo Scanning: How AIRI Expanded the Risk Vocabulary in STEM BIO-AI 1.7.x

DEV Community

Kwansub Yun

May 14, 2026, 09:41 AM

This is the second half of the same 1.7.x transition. In the previous post, I wrote about calibration governance: how STEM BIO-AI keeps score authority from drifting when users simulate policy posture. That was about how the system decides. This post is about a different layer: how the system speaks about risk. A local repository scanner can become trapped inside its own vocabulary. It can detect dependency issues, weak provenance language, shallow validation, reproducibility gaps, and risky exception handling. But if every finding stays only inside the scanner's internal language, the report may remain too narrow. That is the problem AIRI helped address in STEM BIO-AI 1.7.x. In this context, AIRI is used as a local risk-vocabulary layer built from the MIT AI Risk Repository ecosystem. The point is not to replace deterministic repository scanning with an external risk database. The point is to give local findings a broader risk vocabulary without turning that vocabulary into a truth claim. The MIT AI Risk Repository is a public AI risk resource from the MIT AI Risk Initiative. It helps organize fragmented AI risk language across research, policy, and industry sources. The repository includes three main parts: an AI Risk Database a Causal Taxonomy of AI Risks a Domain Taxonomy of AI Risks According to the MIT AI Risk Repository site, the database collects 1,700+ risks from 74 existing AI risk frameworks and classifications. The public domain taxonomy organizes risks into 7 domains and 24 subdomains. Some of those domain taxonomy nodes include: 2. Privacy & Security 2.1 Compromise of privacy by obtaining, leaking or correctly inferring sensitive information 2.2 AI system security vulnerabilities and attacks 6.5 Governance failure 7. AI System Safety, Failures, & Limitations 7.3 Lack of capability or robustness 7.4 Lack of transparency or interpretability That makes AIRI useful as a vocabulary source. But vocabulary is not truth. A local scanner should not say: this repository caused this risk. It should say something more careful: this local finding belongs to a broader class of AI risk language. That distinction is the design boundary. STEM BIO-AI began as a deterministic evidence-surface scanner for bio and medical AI repositories. That core remains. The scanner looks at observable repository surfaces: README and docs code structure CI configuration dependency manifests changelogs reproducibility signals clinical-adjacent boundary language But once STEM BIO-AI started producing richer audit outputs, a new question appeared: How should the system talk about the broader risk territory around a detected finding? For example: a fail-open exception path may have implications beyond code quality weak provenance language may connect to reproducibility and trust concerns shallow validation around sensitive inputs may point toward a wider harm surface than the repository alone makes obvious Without a broader vocabulary, those findings remain local and narrow. AIRI helps widen the vocabulary without making the scanner less deterministic. In this article, a detector family means a bounded local analysis surface inside STEM BIO-AI. It does not mean an AI model judging the repository. Examples include: code integrity detectors such as hardcoded credential or fail-open exception checks AST contract detectors such as shallow validator checks bio diagnostics such as SMILES parser-guard or silent mock fallback checks provenance and reproducibility evidence surfaces A detector family produces a local finding. The AIRI layer does not replace that finding. It gives the finding a broader vocabulary anchor. This boundary matters. The AIRI layer does not: validate that a real-world incident happened prove that a repository causes a given harm turn a detector hit into a clinical danger claim replace due diligence or domain review override the deterministic score Instead, it gives the system a structured way to say: what broader risk territory a finding may relate to which risk vocabulary exists around that class of concern where known coverage gaps remain That is why AIRI is a risk-vocabulary layer, not a truth layer. If a report says something like: covered risks: 12 / 31 that should not be read as: the repository is 38% safe or: the scanner covers 38% of all AI risk A better interpretation is: within the detector scope currently mapped into the curated AIRI runtime layer, this scan triggered findings that connect to these AIRI risk entries. That is narrower. It is also more useful. The AIRI story in STEM BIO-AI changed during 1.7.x. The initial direction was simple: use AIRI to provide broader risk labels around local findings. That was useful, but not enough. If an audit system relies on an external risk source, it needs governance around that source. So STEM BIO-AI separates AIRI into three local layers: Local layer Purpose airi_registry_full.v1.json normalized full local registry derived from the upstream AIRI snapshot airi_runtime_bundle.v1.json curated runtime subset used by deterministic scans airi_detector_mapping.v1.json detector-to-risk mapping registry plus known-gap records This separation prevents a common mistake: confusing the full upstream AIRI universe with the smaller curated runtime bundle used by the scanner. The scanner uses the curated runtime bundle, not the entire upstream AIRI universe. That keeps runtime outputs deterministic, reviewable, and tied to a known local snapshot. In the current 1.7.5 state of the 1.7.x line, governed does not mean that every mapping has gone through an external review board. It means something narrower and more concrete: AIRI data is stored as versioned local artifacts runtime scan output uses a curated bundle, not the entire upstream universe detector mappings are separated from the full registry known gaps are recorded as part of the mapping layer artifact metadata surfaces AIRI registry, bundle, mapping, snapshot, and license information changes to registry, runtime bundle, or mapping versions require explicit version bumps That is the current governance level. It is not final. But it is stronger than attaching a risk dataset as an unversioned appendix. This is the part that matters most. AIRI is broad. STEM BIO-AI is narrow. STEM BIO-AI does not need every AIRI entry active at runtime. It needs the subset that can be responsibly connected to deterministic repository evidence. So the runtime bundle is curated by exclusion as much as inclusion. A risk vocabulary node should stay outside the runtime bundle when: No local evidence surface exists The mapping would require causal inference The risk is too broad for repository-local evidence The mapping would confuse vocabulary with score authority So the runtime bundle is not a summary of all AI risk. It is the subset of risk vocabulary that the scanner can use responsibly. A concrete example helps. Suppose STEM BIO-AI detects a shallow validator around sensitive or clinical-adjacent inputs. The local finding might be: CC3_shallow_validator: validate_* or check_* function uses only length checks without structural validation. At the repository level, this is a code-contract finding. It says: the function appears to validate input the validation is shallow the implementation may not enforce the boundary implied by its name The AIRI layer should not turn that into: this repository caused privacy harm. That would be too strong. A safer mapping uses AIRI as vocabulary: Local detector surface Local meaning AIRI vocabulary anchor CC3_shallow_validator validation function appears shallower than its name implies 7.3 Lack of capability or robustness; possibly 2.1 Compromise of privacy... if sensitive information handling is in scope fail-open exception path code path may silently continue after failure 7.3 Lack of capability or robustness hardcoded credential signal repository surface suggests exposed secret-like pattern 2.2 AI system security vulnerabilities and attacks weak provenance surface repository gives weak evidence about data/source traceability 7.4 Lack of transparency or interpretability; possibly 6.5 Governance failure silent mock fallback production-like path may fall back to simulated behavior 7.3 Lack of capability or robustness; 7.4 Lack of transparency or interpretability The mapping does not prove harm. It tells the reviewer which broader AIRI vocabulary may be relevant to the local finding. That is the difference between: this detector proves a risk occurred and: this detector finding belongs near this risk-language area. The second claim is weaker. It is also the correct claim. AIRI is external. That means STEM BIO-AI needs to answer governance questions explicitly: which upstream snapshot is being used? which subset is active at runtime? which risks are included in the curated bundle? which risks are known gaps? which detector maps to which AIRI entry? what version of the mapping is active? This is why the AIRI work matters. It is not just adding labels. It is turning risk vocabulary into a governed local data layer. In the current governance note, the upstream source is recorded as: upstream source: https://airisk.mit.edu/ upstream artifact: The AI Risk Repository V4_03 upstream license: MIT local snapshot date: 2026-04-23 That provenance is not cosmetic. It allows an audit artifact to say which risk vocabulary it was using when the scan was produced. The current AIRI layer is implemented, but bounded. Implemented surfaces include: AIRI-backed coverage surfaces in scan outputs local curated runtime bundle local registry and mapping schemas detector-to-AIRI mapping layer known-gap reporting provenance and bundle/source labeling In current scan results, airi_risk_coverage is the main artifact surface for this layer. The public result contract includes AIRI fields such as: airi_registry_version airi_bundle_version airi_mapping_version airi_bundle_scope airi_upstream_snapshot_date airi_upstream_license total_risks_in_registry total_risks_in_bundle total_risks_in_detector_scope detectors_triggered covered_risks covered_count coverage_rate known_gaps_in_bundle known_gaps_outside_bundle These fields matter because they let a reviewer distinguish three things that are easy to confuse: the upstream AIRI source, the local runtime bundle, and the detector mapping actually used by the scan. The important part is not only that these fields exist. The important part is that AIRI usage becomes auditable from the artifact itself. If two scans use different AIRI snapshots or mappings, that difference should not be hidden. AIRI coverage in STEM BIO-AI is an audit-surface concept, not a safety percentage. It does not mean: the repository is safe the repository is unsafe the scanner covers all AI risk the covered percentage is a compliance score It means: a local deterministic finding has been mapped to a known risk-vocabulary entry inside the curated AIRI runtime layer. That is useful because it gives reviewers a wider frame. But it does not turn local evidence into a global safety claim. This is the same discipline used elsewhere in STEM BIO-AI: scoring is not clinical validation advisory interpretation is not scoring authority reproducibility evidence is not automatic score authority AIRI coverage is not a safety percentage Each layer has a role. Each layer has a boundary. The 1.7.x AIRI story is not simply “we added AIRI.” The actual change was a move from loose risk labeling toward governed local vocabulary. AIRI V4 integration appeared in scan outputs. The scanner began producing an airi_risk_coverage section that maps triggered detector findings to AIRI risk IDs, coverage rate, and known gaps. The same release also introduced Layer 2 AST contract detectors such as CC1, CC2, and CC3, which expanded the local detector surface available for risk-vocabulary mapping. AIRI became a governed local data layer. The architecture separated: full local registry curated runtime bundle detector mapping registry This release also replaced hardcoded AIRI detector mappings and known-gap lists with packaged local registry files. Runtime outputs began surfacing registry version, bundle version, mapping version, upstream snapshot date, license, attribution note, and split known gaps into known_gaps_in_bundle and known_gaps_outside_bundle. No major AIRI architecture change. The important governance point was regression stability: same-target self-scan comparison verified no drift in airi_risk_coverage alongside score, tier, code contract, detector summary, and evidence ledger count. No major AIRI architecture change. The release focused on runtime cleanup, stale demo wording, layout stabilization, and output routing. AIRI presentation became clearer across demo and report outputs. The release surfaced AIRI summary material more clearly across the Hugging Face overview card and markdown/explain report sections. No new AIRI data architecture change. But artifact-level governance improved more broadly through additive evidence-ledger quality fields and audit-freshness metadata. That matters because AIRI is most useful when it lives inside a report surface that already carries freshness, evidence quality, and provenance signals. The important change across the line is this: AIRI moved from attached dataset toward versioned local risk-vocabulary layer. The AIRI layer still does not: verify real incidents prove causality certify repository safety replace domain review turn AIRI categories into deterministic truth claims collapse the full upstream AIRI universe into the runtime scanner Those are not missing features. They are the boundaries that keep the layer useful. The next useful direction is not to overload the scanner with external systems. It is to improve: registry provenance bundle governance mapping confidence known-gap clarity artifact-visible mapping metadata disciplined links to incident-oriented resources The broader MIT AIRI ecosystem also includes related incident-oriented resources such as the AI Incident Tracker. That ecosystem is relevant context, but it is not the same thing as current runtime integration in STEM BIO-AI. A future version may choose to reference incident-oriented resources more explicitly, but deterministic scans should not ingest them casually or blur them with repository-local findings. A future version should be able to say not only: this detector maps to this AIRI risk vocabulary area. But also: this mapping has this confidence level, this review status, this local evidence family, and this known limitation. That is the next governance step. That is the role of AIRI in this release line. Not truth replacement. Not safety certification. Not incident proof. A governed vocabulary bridge. Local evidence first. External vocabulary second. Explicit provenance always. MIT AI Risk Repository: https://airisk.mit.edu/ MIT AI Incident Tracker: https://airisk.mit.edu/ai-incident-tracker STEM BIO-AI repository: https://github.com/flamehaven01/STEM-BIO-AI This AIRI-related direction in STEM BIO-AI was informed by broader public AI risk work, including the MIT AI Risk Repository ecosystem. The framing around AIRI as a broader risk-vocabulary layer, rather than a repository-local truth layer, was also strengthened by public commentary and ecosystem work from people in this space, including Peter Slattery, PhD. These references informed the vocabulary and governance direction described here. They do not imply endorsement of STEM BIO-AI or responsibility for its implementation choices.