Skip to content

Glossary

TL;DR: This glossary defines key terms used across InvarLock documentation, reports, and code. Terms are grouped by domain (metrics, guards, data, provenance) for quick reference. Each entry includes a definition, context, and cross-references to relevant assurance documents.

Plain language: When reading a report or debugging a pipeline, use this glossary to understand what each field means and where the term originated.


Quick Reference Tables

Primary Metric Terms

Term Short Definition report Field
Primary Metric Canonical task metric for gating (ppl or accuracy) primary_metric.*
BCa Bootstrap Bias-corrected accelerated bootstrap for CIs primary_metric.ci, primary_metric.reps
Ratio vs Baseline Edited ÷ baseline metric (ppl: lower=worse, acc: higher=better) primary_metric.ratio_vs_baseline
Primary Metric Tail Tail regression gate (ΔlogNLL at q95) primary_metric_tail.*

Guard Terms

Term Short Definition report Field
Canonical Guard Chain invariants (pre) → spectral → RMT → variance → invariants (post) validation.{invariants_pass,spectral_stable,rmt_stable}, variance.{enabled,predictive_gate.passed}
κ (kappa) Threshold Per-family spectral cap for z-score outliers spectral.family_caps.*.kappa
ε (epsilon) Band RMT acceptance threshold for edge-risk rmt.epsilon_by_family.*
Guard Overhead Performance cost of guards vs bare run guard_overhead.*
Measurement Contract Estimator + sampling policy recorded in reports spectral.measurement_contract_hash

Data Terms

Term Short Definition report Field
Window Pairing Aligning baseline and subject eval windows dataset.windows.stats.paired_windows
Provider Digest Hash of dataset identity (ids/tokenizer/masking) provenance.provider_digest
Tokenizer Hash Stable hash of tokenizer settings meta.tokenizer_hash

Policy Terms

Term Short Definition report Field
Tier Policy Guard threshold preset (conservative/balanced/aggressive) auto.tier
Policy Digest Stable hash of resolved policy thresholds policy_digest.thresholds_hash

Detailed Definitions

A–B

Baseline

The unedited reference model run used for comparison and gating.

Aspect Details
Context baseline report in Compare & evaluate workflow
Related terms Subject Run, Window Pairing, report
report fields provenance.baseline.*, baseline_ref.*
See also Compare & evaluate

Example: invarlock evaluate --baseline gpt2 --subject gpt2-quant This follows the default runtime-container path unless a host-side workflow uses --execution-mode host.


BCa Bootstrap

Bias-corrected and accelerated bootstrap method for estimating confidence intervals.

Aspect Details
Context Applied to paired log-loss deltas for primary metric gating
Related terms Primary Metric, Window Pairing, Confidence Interval
report fields primary_metric.ci, primary_metric.reps, dataset.windows.stats.bootstrap
See also BCa Bootstrap Derivation

Example: BCa bootstrap with 2000 replicates produces ci: [0.995, 1.008] on paired ΔlogNLL, then exponentiated to ratio CI.


C–D

report

Structured evidence artifact summarizing an evaluation run and its validation status.

Aspect Details
Context Generated by invarlock evaluate or invarlock report --format report
Related terms Report, Evidence Bundle, Manifest
report fields schema_version, run_id, validation.*, artifacts.*
See also reports Reference

Example: evaluation.report.json with schema_version: v1 and validation.overall_pass: true


Compare & evaluate (BYOE)

Workflow that compares a subject model to a baseline, optionally with an external edit (Bring Your Own Edit).

Aspect Details
Context invarlock evaluate --baseline ... --subject ...
Related terms Baseline, Subject Run, report
report fields provenance.baseline.*, provenance.edited.*
See also Compare & evaluate Guide

Example: BYOE workflow evaluates an externally edited checkpoint against its unmodified baseline.


E–G

Evidence Bundle

Set of files produced for audit: reports, runtime-provenance sidecars, and supporting events or derived renderings.

Aspect Details
Context Output directory from invarlock evaluate or report --format report
Related terms Run report, evaluation report, Runtime manifest
Typical contents evaluation.report.json, evaluation_report.md, runtime.manifest.json
See also Artifact Layout

Canonical Guard Chain

The default guard chain is invariants (pre) → spectralRMTvarianceinvariants (post).

Aspect Details
Context Core safety checks in run and evaluate flows
Canonical order invariants (pre), spectral, rmt, variance, invariants (post)
Related terms Guard Chain, Guard Overhead
See also Guards Reference

Enforcement: Guards execute in canonical order for reproducibility; results are recorded in validation.invariants_pass, validation.spectral_stable, validation.rmt_stable.


Guard Chain (Canonical Order)

Fixed execution order for guard preparation and evaluation ensuring deterministic, auditable outcomes.

Aspect Details
Context Defined by guards.order in config YAML
Related terms Guard Chain (Canonical Order), Guard Overhead
report fields Not stored directly (order is config-driven).
See also Guards Reference

Guard Overhead

Performance impact of guard checks vs bare control run (no guards).

Aspect Details
Context Measured in Release profile; gate requires ≤ +1.0% PM overhead
Related terms Canonical Guard Chain, Timing Summary
report fields guard_overhead.{bare_ppl,guarded_ppl,overhead_ratio,overhead_percent}
See also Guard Overhead Method

Example: overhead_percent: +0.12% indicates guards add 0.12% to primary metric.


K–M

κ (kappa) Threshold

Per-family spectral cap used to flag abnormally high z-scores.

Aspect Details
Context spectral.family_caps.*.kappa in tier policy
Typical values ffn: 3.85, attn: 3.02, embed: 1.05 (Balanced tier)
Related terms Spectral Cap, z-score, Spectral Guard
See also Spectral FPR Derivation

Example: kappa=2.8 for attention family means z-scores > 2.8 are flagged.


Measurement Contract

Guard measurement procedure signature and digest recorded in reports.

Aspect Details
Context Spectral and RMT guards record estimator + sampling policy
Verified by invarlock verify --profile ci\|release (plus runtime.manifest.json runtime provenance for container-backed outputs)
report fields spectral.measurement_contract_hash, rmt.measurement_contract_hash
See also Guard Contracts

Enforcement: CI/Release profiles require measurement contract match between baseline and subject.


P–R

Policy Digest

Stable hash summarizing resolved policy thresholds for auditability.

Aspect Details
Context Stored in report for policy change detection
Related terms Tier Policy, Policy Overrides, Policy Provenance
report fields policy_digest.thresholds_hash, policy_provenance.*
See also Policy Provenance

Primary Metric

The canonical task metric used for gating (perplexity for LMs, accuracy for classification).

Aspect Details
Supported kinds ppl_causal, ppl_mlm, accuracy, accuracy
Gating logic Ratio vs baseline must stay within tier thresholds
Related terms Primary Metric Tail, BCa Bootstrap, Window Pairing
report fields primary_metric.{kind,preview,final,ratio_vs_baseline,ci}
See also reports Reference

Example: primary_metric.kind: ppl_causal with ratio_vs_baseline: 1.003


Primary Metric Tail

Optional tail regression gate checking high-loss windows (e.g., q95 ΔlogNLL).

Aspect Details
Context Catches regression in hard examples even when mean is acceptable
Mode warn (default) or fail
Related terms Primary Metric, BCa Bootstrap
report fields primary_metric_tail.{evaluated,passed,warned,stats}
See also reports Reference

Provider Digest

Dataset identity hash covering token IDs, tokenizer config, and masking strategy.

Aspect Details
Context Ensures baseline and subject use identical data
Related terms Window Pairing, Tokenizer Hash
report fields provenance.provider_digest.ids_sha256
See also Coverage & Pairing

Report

Run-level artifact with metrics, guard results, and metadata.

Aspect Details
Context Generated by invarlock evaluate; input to report generation
Related terms report, Evidence Bundle
File format report.json + events.jsonl
See also Artifact Layout

RMT ε (epsilon) Rule

Random Matrix Theory epsilon band used for activation edge-risk stability checks.

Aspect Details
Context rmt.epsilon_default and rmt.epsilon_by_family.* thresholds
Calibration Derived from null-sweep runs on target model families
Related terms RMT Guard, κ Threshold
report fields rmt.{epsilon_default,epsilon_by_family,stable,max_edge_ratio,max_edge_delta}
See also RMT ε Rule

RMT Guard

Guard that checks eigenvalue statistics against Random Matrix Theory bounds.

Aspect Details
Focus Activation edge-risk growth across model families
Validation validation.rmt_stable
Related terms Canonical Guard Chain, RMT ε Rule
report fields rmt.{families,stable,max_edge_delta}
See also Guards Reference

S–T

Spectral Cap

Limit on spectral z-scores per family to flag weight instability.

Aspect Details
Context Applied by spectral guard; counts violations per family
Related terms κ Threshold, z-score, Spectral Guard
report fields spectral.{caps_applied,caps_exceeded,top_z_scores}
See also Spectral FPR

Spectral Guard

Guard that monitors spectral norms and z-scores for weight matrices.

Aspect Details
Focus Baseline-relative weight matrix stability
Validation validation.spectral_stable
Related terms Canonical Guard Chain, Spectral Cap, κ Threshold
report fields spectral.{caps_applied,family_caps,top_z_scores,summary}
See also Guards Reference

Subject Run

The edited or target model run under evaluation (compared against baseline).

Aspect Details
Context subject checkpoint in Compare & evaluate
Related terms Baseline, report, Window Pairing
report fields provenance.edited.*
See also Compare & evaluate

Telemetry

Performance and resource metrics emitted with reports.

Aspect Details
Context Optional fields for performance analysis
Related terms Timing Summary, Guard Overhead
report fields telemetry.*, metrics.memory_mb_peak
See also Observability

Tier Policy

Guard threshold preset selecting the safety profile for a run.

Aspect Details
Options conservative (strictest), balanced (default), aggressive (loosest)
Source runtime/tiers.yaml
Related terms Policy Digest, Policy Overrides
report fields auto.tier, resolved_policy.*
See also Tier Policy Catalog

Timing Summary

Consolidated timing breakdown for an evaluation run.

Aspect Details
Context CLI output via print_timing_summary()
Includes Model load, dataset load, evaluation, report generation
Related terms Guard Overhead, Telemetry
See also Observability

Tokenizer Hash

Stable hash of tokenizer settings and vocabulary for reproducibility.

Aspect Details
Context Ensures baseline and subject use identical tokenization
Related terms Provider Digest, Window Pairing
report fields data.tokenizer_hash, meta.tokenizer_hash
See also Determinism Contracts

V–Z

Variance Effect (VE)

Guard that tracks variance change and applies equalization when beneficial.

Aspect Details
Context A/B test compares bare vs VE-enabled evaluation
Enabling condition CI excludes 0 AND mean Δ ≤ -min_effect_lognll
Related terms Canonical Guard Chain, Guard Overhead, Predictive Gate
report fields variance.{enabled,gain,predictive_gate.delta_ci,predictive_gate.passed}
See also VE Gate Power

Window Pairing

Alignment of baseline and subject evaluation windows for paired statistical testing.

Aspect Details
Requirements Same window IDs, zero overlap, 100% match fraction
Violation E001 pairing error in CI/Release profiles
Related terms BCa Bootstrap, Primary Metric, Provider Digest
report fields dataset.windows.stats.{paired_windows,window_match_fraction,window_overlap_fraction}
See also Coverage & Pairing

Example: paired_windows: 200, window_match_fraction: 1.0, window_overlap_fraction: 0.0


z-score

Standardized deviation used in spectral guard scoring.

Aspect Details
Formula z = (σ_edited - μ_baseline) / std_baseline
Thresholding Compared against family-specific κ caps
Related terms Spectral Cap, κ Threshold
report fields spectral.top_z_scores, spectral.family_caps.*.kappa
See also Spectral FPR

Example: max |z| = 2.1 indicates the largest z-score across all weight matrices.


See Also