falsify / spec.falsify.dev Home · Spec v0.1 · RFC v0.2 (open) · Registry · GitHub

NIST AI RMF crosswalk for PRML — Govern, Measure, Manage subcategory map

The NIST AI Risk Management Framework 1.0 (NIST AI 100-1, January 2023) and the Generative AI Profile (NIST AI 600-1, July 2024) are the most-cited voluntary standards for U.S. AI risk programs. Pre-Registered ML Manifests don't replace the framework — they mechanically satisfy a specific slice of it. This page walks subcategory by subcategory.

Scope and posture

PRML (v0.1 spec, Zenodo DOI 10.5281/zenodo.20177839) commits an evaluation claim — metric, comparator, threshold, dataset hash, seed, producer identity — to a SHA-256 hash before the experiment runs. Any retroactive edit breaks the hash. The format is plain UTF-8 YAML; the canonicalization rules are deterministic across four reference implementations.

This is a primitive, not a programme. The AI RMF asks an organization to govern, map, measure, and manage AI risk over the system's lifecycle. PRML lives inside Measure and the parts of Govern and Manage that touch evaluation evidence. It does not, on its own, satisfy MAP (context, stakeholders, capabilities) or the human-process parts of GOVERN.

The three coverage tags used below:

GOVERN — relevant subcategories

Subcat.Text (paraphrased)PRML mechanismCoverage
GOVERN 1.4 The organization defines, deploys, and documents processes governing AI design, development, and risk management. Manifest producer.id + the locked SHA-256 are the documented process artefact for the evaluation-decision step of that lifecycle. Partial
GOVERN 1.5 Ongoing monitoring and periodic review of the risk management process and its outcomes are planned. The prior_hash amendment chain is itself the periodic-review evidence. Each re-evaluation produces a forward-only link. Partial
GOVERN 4.1 Organizational policies and practices are in place to foster a critical thinking and safety-first mindset; AI risks are documented. A manifest is a falsifiable commitment. Committing to f1 >= 0.78 publicly via the registry is the strongest safety-first signal a producer can emit. Partial
GOVERN 5.1 Organizational policies and procedures are in place to address AI risks, including documentation. The locked manifest is the documentation of the evaluation-risk-acceptance decision. SHA-256 makes it tamper-evident over the system's lifetime. Full
GOVERN 6.1 Policies and procedures are in place to address risks associated with third-party software and data. Manifest dataset.hash binds the evaluation to a specific dataset content hash. Third-party dataset drift fails verification. Full

MAP — relevant subcategories

PRML is largely out of scope for MAP. MAP asks the organization to understand context, capabilities, and the people affected. A manifest captures a single evaluation-claim, not the surrounding mission, intended-use, or impact-assessment work. Listed for completeness:

Subcat.Text (paraphrased)PRML mechanismCoverage
MAP 2.3 Scientific integrity and TEVV considerations are identified and documented. PRML is a TEVV (test/evaluation/verification/validation) artifact. The manifest is the format in which a single TEVV claim becomes contestable. Partial
MAP 4.1 Approaches for mapping AI technology and legal risks are followed. Manifest fields map directly to EU AI Act Article 12 obligations — see the Article 12 crosswalk. Partial

MEASURE — the centre of gravity

This is where PRML earns its keep. MEASURE asks the organization to characterise, monitor, and evaluate. A locked manifest is the unit of evaluation evidence the framework keeps referring to without quite naming.

Subcat.Text (paraphrased)PRML mechanismCoverage
MEASURE 1.1 Approaches and metrics for measuring AI risks are followed and the methodology is documented. Manifest metric + comparator + threshold are exactly that documented methodology, hash-bound before the run. Full
MEASURE 2.3 AI system performance or assurance criteria are measured qualitatively and quantitatively. Manifest commits the quantitative criterion (threshold) and the dataset against which it is measured. Exit code 0/10 is the deterministic measurement signal. Full
MEASURE 2.7 AI system security and resilience are evaluated and documented. Tamper-evidence on the claim itself (exit 3 / TAMPERED on any drift) is a security control over the evaluation surface. Not over the model inference path — that's a separate concern. Partial
MEASURE 2.8 Risks associated with AI system transparency and accountability are examined and documented. Manifest producer.id is the accountability anchor. The public registry permalink is the transparency surface. The prior_hash chain documents accountability across re-evaluations. Full
MEASURE 3.1 Approaches, personnel, and documentation for measurement of AI risks are validated. The four byte-equivalent reference implementations (Python, JS, Go, Rust) plus 20 published conformance vectors are the measurement-approach validation. Full
MEASURE 3.3 Measurable performance improvements or declines are tracked over time. The amendment chain is exactly this — successive manifests with monotonic timestamps and explicit prior_hash pointers. Full
MEASURE 4.1 Approaches for measurement are in place that enable comparison to baselines and benchmarks. Manifest dataset.hash + seed + metric together fix all the comparison-stable inputs. Two manifests on the same dataset hash are comparable by construction. Full
MEASURE 4.2 Measurement results regarding AI system trustworthiness are informed by input from independent assessors. Anyone with the manifest and a reference implementation can re-derive the hash and re-run the verification offline. Independent assessment requires no producer trust. Full

MANAGE — relevant subcategories

Subcat.Text (paraphrased)PRML mechanismCoverage
MANAGE 2.2 Mechanisms are in place to sustain AI system value after deployment. The amendment chain is the sustainment mechanism for the evaluation-claim layer. Re-evaluations link to predecessors via prior_hash. Partial
MANAGE 4.1 Post-deployment AI system monitoring is regularly conducted; documentation is current. Each scheduled re-evaluation produces a new manifest. The chain's terminal hash is the monitoring summary an auditor can verify offline. Partial
MANAGE 4.2 Mechanisms are in place to capture, evaluate, and respond to AI system errors and failures. FAIL (exit 10) and TAMPERED (exit 3) are the deterministic error signals. CI gating on those exit codes is the response mechanism in code. Full
MANAGE 4.3 Incidents and errors are communicated to relevant AI actors. A failing PRML verification in CI surfaces a deterministic exit code and the diff between expected and observed hashes. The communication is the build log. Full

NIST AI 600-1 — Generative AI Profile

The Generative AI Profile (July 2024) adds 12 categories of GenAI-specific risk. Most of them sit upstream of PRML — data provenance, harmful content moderation, intellectual property handling. Two categories where PRML contributes:

RiskPRML mechanismCoverage
Confabulation — fabrication of plausible but ungrounded outputs, including for evaluation results. A pre-registered manifest makes it impossible to confabulate an evaluation outcome. The pre-run hash blocks the most common form: silently rewriting a threshold after the model underperforms. Full
Information Integrity — unauthorized changes to training, fine-tuning, or evaluation data. Manifest dataset.hash binds evaluation to specific dataset content. Any swap of the underlying dataset breaks the chain. Full

What this map does not give you

The honest read: of NIST AI RMF 1.0's roughly 72 subcategories, PRML contributes evidence to about 13–15. The framework is mostly process and governance work — committee structures, training requirements, stakeholder engagement, incident playbooks, third-party assessments. PRML is the small primitive that makes the evaluation-evidence slice arithmetically verifiable instead of attested. That slice happens to be where most AI risk programs are weakest, which is why it's worth naming.

If your AI risk function is operating against the AI RMF crosswalk because of a procurement requirement, audit finding, or board oversight commitment, the marginal value of PRML is highest at MEASURE 1.1, MEASURE 2.3, MEASURE 2.8, MEASURE 3.3, MEASURE 4.1, and MEASURE 4.2 simultaneously. Those six are where qualitative attestation tends to collapse under audit, and where a deterministic hash is a cleaner artifact than any policy document.

Pairing PRML with the rest of the framework

The framework's GOVERN function expects policy text — actual written policies about how the organization governs AI. PRML doesn't write those. What it does is give the policies something to point at. A policy that says "Evaluation claims for production AI systems must be pre-registered via PRML before release" is a one-sentence policy that an auditor can verify by re-deriving the hash. A policy that says "Evaluation claims must be documented" is verifiable only by reading whatever PDF the team produced, which is the gap NIST keeps trying to close.

Pair the manifest with the rest of the programme:

Pre-register your first evaluation claim. Paste a PRML draft into registry.falsify.dev and get a SHA-256 permalink plus a README badge. No account, no server-side state beyond the hash. The reference implementations are at github.com/studio-11-co/falsify (Python, byte-equivalent JS / Go / Rust in sibling repos).