Skip to content

Tier Policy Catalog (runtime tiers.yaml)

Overview

Aspect Details
Purpose Explain each policy key in tiers.yaml and its rationale.
Audience Operators auditing tier defaults and guard thresholds.
Supported tiers balanced, conservative (aggressive is research-only).
Source of truth Packaged runtime/tiers.yaml (override via INVARLOCK_CONFIG_ROOT).

Quick Start

# Inspect resolved tier policies in a report
invarlock report explain \
  --subject-report runs/subject/report.json \
  --baseline-report runs/baseline/report.json

Quick Comparison

Aspect Balanced Conservative
PM Ratio Limit ≤ 1.10 ≤ 1.05
Spectral σ Quantile 0.95 0.90
Spectral Deadband 0.10 0.05
Spectral Max Caps 5 3
RMT ε (all families) 0.01 0.01
VE Predictive Gate One-sided CI Two-sided CI
VE min_effect_lognll 0.0 0.016
Bootstrap Replicates ≥ 1200 ≥ 1500
Min Windows 180/180 220/220
Use Case Standard edits, dev/CI High-stakes releases

Values are from the packaged tiers.yaml; inspect with invarlock report explain.

Concepts

  • Calibrated vs policy keys: calibrated values come from pilot runs; policy keys define safety margins and floors.
  • Resolved policies are recorded in reports under resolved_policy.*.

Policy resolution order (highest → lowest)

  1. Explicit guard overrides in config (guards.* in YAML, deep‑merged).
  2. Edit adjustments (EDIT_ADJUSTMENTS, e.g. quant_rtn tweaks rmt.deadband).
  3. Profile guard overrides (runtime/profiles/<profile>.yamlguards.*).
  4. Runtime tiers.yaml (via INVARLOCK_CONFIG_ROOT or packaged data).
  5. Built‑in defaults (TIER_POLICIES).

Key override matrix

Setting Tier default Profile override Edit adjustment Config override Winner rule Confirm
spectral.sigma_quantile Config > profile > tier. resolved_policy.spectral.sigma_quantile.
rmt.deadband Config > edit > profile > tier. resolved_policy.rmt.deadband.
variance.max_calib Config > profile > tier. resolved_policy.variance.max_calib.

Metric gates (metrics.pm_ratio, metrics.pm_tail, metrics.accuracy) are resolved from tier policies (tiers.yaml + defaults) and surfaced under resolved_policy.metrics.*.

Reference

Plain language: The packaged runtime/tiers.yaml is the source of truth for tier defaults. Some values are calibrated from pilot/null runs (e.g., Spectral κ, RMT ε, VE min‑effect). The rest are policies (explicit design choices like sample-size floors, dead bands, and caps). This page is the “why map”: for every key in tiers.yaml, it explains what it controls and where to point for the rationale.

Location

  • Packaged default: runtime/tiers.yaml
  • Override: set INVARLOCK_CONFIG_ROOT and provide runtime/tiers.yaml under it (see docs/reference/env-vars.md).

Tier scope

Balanced and Conservative are the supported published assurance tiers; Aggressive is research‑oriented and explicitly outside the assurance case (see docs/assurance/00-assurance-case.md).

Catalog (what + why)

This page documents the tier keys grouped by section. Each section follows the same structure:

  • What it controls (runtime behavior)
  • Where documented (assurance notes / method write-ups)
  • Keys (key-by-key meaning)
  • Observability (where it appears in reports/evaluation artifacts)

Primary-metric gates (metrics.*)

What it controls. Run-level acceptance gates applied when generating/verifying a report (see “Quality Gates” in docs/assurance/04-guard-contracts.md).

Where documented.

  • docs/assurance/04-guard-contracts.md (gate definitions + flags)

Observability.

  • Resolved thresholds: resolved_policy.metrics.*
  • Gate flags: validation.primary_metric_acceptable, validation.primary_metric_tail_acceptable
  • CLI: invarlock report explain prints the resolved thresholds, floors, and outcomes.
metrics.pm_ratio.* (ppl-like kinds)

Keys.

  • ratio_limit_base (policy) — the baseline-relative gate for ppl-like primary metrics (ratio_vs_baseline ≤ ratio_limit_base, and when a CI exists, ratio_ci.upper ≤ ratio_limit_base). Rationale and tier intent are described in docs/assurance/04-guard-contracts.md (“Primary metric … ppl-like kinds”).
  • min_tokens (policy) — minimum total tokens (preview + final) required before enforcing the ppl ratio gate. Rationale: prevents noisy PASS/FAIL on tiny samples; keeps CI smokes meaningful while allowing small local demos.
  • min_token_fraction (policy) — dataset-scale-aware floor: when the runner knows available tokens, the effective floor becomes max(min_tokens, ceil(tokens_available * min_token_fraction)). Rationale: avoids “passing” large datasets using an unrepresentative tiny subset.
  • hysteresis_ratio (policy) — small additive slack on the ratio gate (ratio_limit_base + hysteresis_ratio). Rationale: avoids PASS/FAIL flapping when results hover near the boundary; reports mark when hysteresis was needed (validation.hysteresis_applied).

Observability.

  • Resolved policy: resolved_policy.metrics.pm_ratio
  • Evidence: primary_metric.{ratio_vs_baseline,display_ci}
  • Gate flag: validation.primary_metric_acceptable
metrics.pm_tail.* (Primary Metric Tail gate; ppl-like kinds)

What it controls. A tail-regression backstop computed on paired per-window ΔlogNLL samples vs the baseline (window-by-window logloss_subject - logloss_baseline, matched by window_id on the final schedule). It is additive to the mean/CI primary-metric gate.

Keys.

  • mode (policy)off|warn|fail.
  • warn (default): violations are recorded in the report but do not fail validation (validation.primary_metric_tail_acceptable stays true).
  • fail: violations fail validation and can trigger rollback in internal run pipelines (rollback_reason = primary_metric_tail_failed).
  • min_windows (policy) — minimum paired windows required before evaluating thresholds. Underpowered runs set primary_metric_tail.evaluated = false and do not warn/fail.
  • quantile (policy) — which percentile to monitor (default 0.95 → P95). Quantiles are computed unweighted with deterministic linear interpolation on sorted ΔlogNLL values.
  • quantile_max (policy/calibration target) — maximum allowed ΔlogNLL at the selected quantile (e.g., P95 ≤ 0.20).
  • epsilon (policy) — deadband for “tail mass”: tail_mass = Pr[ΔlogNLL > ε].
  • mass_max (policy/calibration target) — maximum allowed tail mass. Defaults to 1.0 (non-binding) until calibrated.

Observability.

  • report evidence: primary_metric_tail.{stats,policy,violations}.
  • Validation flag: validation.primary_metric_tail_acceptable (false only in fail mode).
  • CLI: invarlock report explain prints “Gate: Primary Metric Tail (ΔlogNLL)”.
metrics.accuracy.* (accuracy kinds)

Keys.

  • delta_min_pp (policy) — minimum allowed delta accuracy vs baseline (percentage points). Defaults per tier are stated in docs/assurance/04-guard-contracts.md (“accuracy kinds … defaults”).
  • min_examples (policy) — minimum n_final required before enforcing the delta accuracy gate. Rationale: avoids gating on too few examples.
  • min_examples_fraction (policy) — dataset-scale-aware floor: when available examples are known, the effective floor becomes max(min_examples, ceil(examples_available * min_examples_fraction)).
  • hysteresis_delta_pp (policy) — small slack on the delta accuracy gate (delta_min_pp - hysteresis_delta_pp). Rationale: avoids flapping near the boundary; marked in reports via validation.hysteresis_applied.

Observability.

  • Resolved policy: resolved_policy.metrics.accuracy
  • Evidence: primary_metric.{ratio_vs_baseline,display_ci}
  • Gate flag: validation.primary_metric_acceptable

Spectral guard (spectral_guard.*)

What it controls. Weight-based stability thresholds for per-family spectral monitoring.

Where documented.

  • docs/assurance/05-spectral-fpr-derivation.md (policy + FPR control)
  • docs/assurance/09-tier-v1-calibration.md (pilot numbers + recalibration)

Keys.

  • sigma_quantile (calibrated) — which baseline percentile defines the reference sigma target used for z-scoring.
  • deadband (policy) — z-score deadband δ to suppress flicker (see docs/assurance/05-spectral-fpr-derivation.md).
  • scope (policy) — which families are actively budgeted/monitored (e.g., all vs ffn), described in docs/assurance/05-spectral-fpr-derivation.md.
  • max_caps (policy) — per-run WARN/cap budget; exceeding this aborts in CI/Release (see docs/assurance/05-spectral-fpr-derivation.md).
  • max_spectral_norm (policy) — optional absolute clamp. null means “no absolute clamp”; rely on relative z-caps and the WARN budget (see docs/assurance/09-tier-v1-calibration.md “Keep these fixed … no clamp”).
  • family_caps (calibrated) — per-family κ caps (stored as raw floats in tiers.yaml; normalized to {family: {kappa: ...}} at runtime).
  • multiple_testing (policy) — the correction procedure used to interpret κ across families (bh/bonferroni, α, m); see docs/assurance/05-spectral-fpr-derivation.md.

Observability.

  • Evidence: spectral.{summary,families,family_caps,multiple_testing}
  • Resolved policy: resolved_policy.spectral
  • Gate flag: validation.spectral_stable

RMT guard (rmt_guard.*)

What it controls. Activation edge-risk stability via the ε-band acceptance rule.

Where documented.

  • docs/assurance/06-rmt-epsilon-rule.md (acceptance rule + calibration)
  • docs/assurance/09-tier-v1-calibration.md (recalibration recipe)

Keys.

  • epsilon_by_family (calibrated) — ε(f) per family for the acceptance band: edge_cur(f) ≤ (1 + ε(f)) · edge_base(f).
  • epsilon_default (calibrated) — fallback ε used when a family-specific value is missing.
  • deadband (policy) — additional tolerance used by the RMT outlier diagnostics/correction path (separate from ε-band acceptance), aligning the “ignore small changes” behavior with other guards.
  • margin (policy) — safety multiplier for the same outlier diagnostics/correction path; higher margins tolerate more deviation before flagging.

Observability.

  • Evidence: rmt.{status,stable,families,epsilon_by_family,epsilon_violations}
  • Resolved policy: resolved_policy.rmt
  • Gate flag: validation.rmt_stable

Variance guard (variance_guard.*)

What it controls. VE enablement/correction knobs including the predictive gate and min-effect semantics.

Where documented.

  • docs/assurance/07-ve-gate-power.md (power + sidedness + tier knobs)
  • docs/assurance/09-tier-v1-calibration.md (min-effect recalibration)

Keys.

  • predictive_gate (policy) — when true, VE only enables if the predictive A/B gate passes (report records variance.predictive_gate.*).
  • predictive_one_sided (calibrated policy) — one-sided improvement gate semantics (Balanced) vs two-sided CI (Conservative); see docs/assurance/07-ve-gate-power.md.
  • min_effect_lognll (calibrated) — minimum absolute improvement required for VE enablement; derived from z·σ̂/√n per tier, see docs/assurance/07-ve-gate-power.md.
  • deadband (policy) — ignores small proposed adjustments (prevents “flicker”/tiny rescaling).
  • min_abs_adjust (policy) — absolute floor on per-module |scale − 1| before a proposed scale is considered.
  • max_scale_step (policy) — per-module maximum |scale − 1| applied in a single run (caps correction aggressiveness).
  • topk_backstop (policy) — backstop that allows selecting the top candidate scale when filtering would otherwise produce no usable scales.
  • max_adjusted_modules (policy) — optional cap on how many modules receive a scale in one run (0 means “no cap”).
  • tap (policy) — module-name pattern(s) that define where VE is allowed to attach. Rationale: the tap must match the edited sublayer for provenance and reproducibility; see “Provenance & tap” in docs/assurance/07-ve-gate-power.md.

Observability.

  • Evidence: variance.{enabled,predictive_gate,ab_test,scope,proposed_scales}
  • Resolved policy: resolved_policy.variance

Troubleshooting

  • Overrides not taking effect: ensure INVARLOCK_CONFIG_ROOT points to a directory containing runtime/tiers.yaml.
  • Aggressive tier usage: this tier is research-only and outside the safety case; prefer balanced or conservative.

Observability

  • Resolved policies appear under resolved_policy.* in reports.
  • The CLI invarlock report explain prints gate decisions and policy digests.