falsify · spec Home · Spec v0.1 · RFC v0.2 (open) · Registry · GitHub

EU AI Act Article 12 logging compliance: a working pattern for high-risk providers

The high-risk obligations in Regulation (EU) 2024/1689 become applicable on 2 August 2026. Article 12 is the one that most procurement teams underestimate, because "automated logging" reads like a checkbox and turns out to be a system design decision. This page is the technical pattern we use.

What Article 12 actually requires

Article 12(1) of the AI Act states that high-risk AI systems "shall technically allow for the automatic recording of events (logs) over the lifetime of the system." Article 12(2) qualifies that logging must be appropriate to the intended purpose. Article 12(3) enumerates the events that must be captured for systems referred to in Annex III §1(a) — biometric identification — and Article 12(4)–(5) extends logging to facilitate post-market monitoring under Article 72 and operation monitoring under Article 26(5).

Three properties matter and tend to get missed:

The standard interpretation, consistent with the consolidated text in OJ L 2024/1689 and the editorial reading in our Article 12/17/18/50/72/73 crosswalk, is that the provider is responsible for both the per-request inference logs and the per-release evaluation-claim logs. Most procurement teams have a plan for the first. Almost no one has a clean plan for the second.

The gap: evaluation-claim logging is the soft target

You can buy an inference-logging pipeline. Datadog will sell you one. AWS, GCP, Azure all have offerings. None of them solve the evaluation-claim layer, because that layer is not an event stream — it is a small set of high-stakes commitments. "This model achieves 0.94 accuracy on this validation set with this seed." That sentence is what the auditor cares about, and it is the sentence a provider quietly rewrites under pressure when the number changes.

An Article 12 audit in 2027 will not ask "did you log requests?" It will ask "show me the evaluation claim you committed to before release v2.4, and show me how you know it has not been edited since."

If the answer is a Confluence page, the audit ends there.

The pattern: a pre-registered manifest, hashed before the run

PRML (Pre-Registered ML Manifest, v0.1 spec, Zenodo DOI 10.5281/zenodo.20177839) is the format we use to make evaluation-claim logging mechanically auditable. The manifest is a small UTF-8 YAML file that binds a SHA-256 hash to: the metric, the comparator, the threshold, the dataset content hash, the seed, the producer identity, and a forward-only prior_hash pointer to the previous claim in the chain.

A minimal Article-12-shaped manifest looks like this:

spec: prml/v0.1
claim_id: cv-screening-release-2026-q3
created_at: 2026-05-15T09:14:22Z
producer:
  id: company.example
  role: provider          # AI Act Art. 3(3)
metric: f1
comparator: ">="
threshold: 0.78
dataset:
  id: hr-screening-holdout-v3
  hash: sha256:0c4e...d811
seed: 42
prior_hash: null          # first link in the chain

Lock it before the eval runs. The CLI emits a sidecar containing the canonical SHA-256. From that moment, any retroactive edit to threshold, dataset hash, or seed will cause falsify verdict to exit with code 3 (TAMPERED) rather than 0 (PASS) or 10 (FAIL). The exit code is the audit signal.

Mapping the manifest to Article 12 obligations

Article 12 obligationPRML mechanism
Automatic recording of evaluation eventslock emits manifest + sidecar; deterministic exit codes are the event record
Identifying risk situationsFAIL (10) and TAMPERED (3) are deterministic, not interpretive
Facilitating post-market monitoring (Art. 72)prior_hash chain accumulates over the system lifetime
Lifetime preservationplain UTF-8 YAML + 32-byte hash; readable in 2046 without a vendor runtime

The amendment chain is the audit log

When you re-evaluate — quarterly retraining, post-incident retest, a fix landing after a TAMPERED verdict — you do not edit the prior manifest. You write a new one with prior_hash pointing at the previous canonical hash. The chain is forward-only. The chain's terminal hash compresses the entire history into a single 32-byte value that an auditor can verify offline.

spec: prml/v0.1
claim_id: cv-screening-release-2026-q3-amend-1
created_at: 2026-08-09T11:02:51Z
producer: { id: company.example, role: provider }
metric: f1
comparator: ">="
threshold: 0.78
dataset:
  id: hr-screening-holdout-v3.1   # bias remediation rerun
  hash: sha256:9ab7...c124
seed: 42
prior_hash: sha256:a3f9...c821    # Q3 release manifest above
reason: |
  Re-evaluation triggered by deployer-reported regression
  on the 30-44 age cohort. Article 72 incident folder #IN-2026-08-007.

The auditor in 2031 does not need access to your MLOps stack. They need the YAML, the sidecar, and a copy of falsify — or any of the four reference implementations in Python, JavaScript, Go, or Rust. They recompute the canonical bytes, recompute the chain hash, compare. Integrity verification requires no provider trust.

What this does not give you

PRML covers the evaluation-claim slice of Article 12. It does not log per-request inferences, it does not implement the quality management system required by Article 17, and it does not perform the substantive accuracy assessment required by Article 15. The full crosswalk is explicit about what is full coverage (Article 12 evaluation events, Article 18 retention, Article 72 monitoring), partial coverage (Article 17 records, Article 73 incident detection), and out of scope (Articles 9, 10, 14, 15, 27).

Pair PRML with whatever inference-logging stack you already trust. The two layers don't overlap.

What to do this quarter

Twelve weeks remain until the 2 August 2026 applicability date. Concretely:

  1. Identify every high-risk system you place on the EU market under Annex III. CV screening, credit scoring, biometric ID, critical infrastructure, education access — the list is enumerated.
  2. For each, list the named evaluation claims (accuracy, F1, calibration, latency SLA, fairness metric) that you would defend in writing today.
  3. Lock each one as a PRML manifest before the next release. Sign it (producer.signature is optional in v0.1, mandatory in v0.2).
  4. Store the manifest + sidecar in an immutable bucket. Wire falsify verdict into CI so a TAMPERED exit halts the release.
  5. Anchor the chain hash somewhere external. registry.falsify.dev provides a free public anchor.

Lock your first manifest. Paste a PRML draft into registry.falsify.dev and get a SHA-256 permalink with a README badge. No account. No server-side state beyond the hash. Or read the v0.1 spec and the Article 12/17/18/50/72/73 crosswalk directly.