Glossary¶

TL;DR: This glossary defines key terms used across InvarLock documentation, reports, and code. Terms are grouped by domain (metrics, guards, data, provenance) for quick reference. Each entry includes a definition, context, and cross-references to relevant assurance documents.

Plain language: When reading a report or debugging a pipeline, use this glossary to understand what each field means and where the term originated.

Quick Reference Tables¶

Primary Metric Terms¶

Term	Short Definition	report Field
Primary Metric	Canonical task metric for gating (ppl or accuracy)	`primary_metric.*`
BCa Bootstrap	Bias-corrected accelerated bootstrap for CIs	`primary_metric.ci`, `primary_metric.reps`
Ratio vs Baseline	Edited ÷ baseline metric (ppl: lower=worse, acc: higher=better)	`primary_metric.ratio_vs_baseline`
Primary Metric Tail	Tail regression gate (ΔlogNLL at q95)	`primary_metric_tail.*`

Guard Terms¶

Term	Short Definition	report Field
Canonical Guard Chain	invariants (pre) → spectral → RMT → variance → invariants (post)	`validation.{invariants_pass,spectral_stable,rmt_stable}`, `variance.{enabled,predictive_gate.passed}`
κ (kappa) Threshold	Per-family spectral cap for z-score outliers	`spectral.family_caps.*.kappa`
ε (epsilon) Band	RMT acceptance threshold for edge-risk	`rmt.epsilon_by_family.*`
Guard Overhead	Performance cost of guards vs bare run	`guard_overhead.*`
Measurement Contract	Estimator + sampling policy recorded in reports	`spectral.measurement_contract_hash`

Data Terms¶

Term	Short Definition	report Field
Window Pairing	Aligning baseline and subject eval windows	`dataset.windows.stats.paired_windows`
Provider Digest	Hash of dataset identity (ids/tokenizer/masking)	`provenance.provider_digest`
Tokenizer Hash	Stable hash of tokenizer settings	`meta.tokenizer_hash`

Policy Terms¶

Term	Short Definition	report Field
Tier Policy	Guard threshold preset (conservative/balanced/aggressive)	`auto.tier`
Policy Digest	Stable hash of resolved policy thresholds	`policy_digest.thresholds_hash`

Detailed Definitions¶

A–B¶

Baseline¶

The unedited reference model run used for comparison and gating.

Aspect	Details
Context	`baseline` report in Compare & evaluate workflow
Related terms	Subject Run, Window Pairing, report
report fields	`provenance.baseline.`, `baseline_ref.`
See also	Compare & evaluate

Example: invarlock evaluate --baseline gpt2 --subject gpt2-quant This follows the default runtime-container path unless a host-side workflow uses --execution-mode host.

BCa Bootstrap¶

Bias-corrected and accelerated bootstrap method for estimating confidence intervals.

Aspect	Details
Context	Applied to paired log-loss deltas for primary metric gating
Related terms	Primary Metric, Window Pairing, Confidence Interval
report fields	`primary_metric.ci`, `primary_metric.reps`, `dataset.windows.stats.bootstrap`
See also	BCa Bootstrap Derivation

Example: BCa bootstrap with 2000 replicates produces ci: [0.995, 1.008] on paired ΔlogNLL, then exponentiated to ratio CI.

C–D¶

report¶

Structured evidence artifact summarizing an evaluation run and its validation status.

Aspect	Details
Context	Generated by `invarlock evaluate` or `invarlock report --format report`
Related terms	Report, Evidence Bundle, Manifest
report fields	`schema_version`, `run_id`, `validation.`, `artifacts.`
See also	reports Reference

Example: evaluation.report.json with schema_version: v1 and validation.overall_pass: true

Compare & evaluate (BYOE)¶

Workflow that compares a subject model to a baseline, optionally with an external edit (Bring Your Own Edit).

Aspect	Details
Context	`invarlock evaluate --baseline ... --subject ...`
Related terms	Baseline, Subject Run, report
report fields	`provenance.baseline.`, `provenance.edited.`
See also	Compare & evaluate Guide

Example: BYOE workflow evaluates an externally edited checkpoint against its unmodified baseline.

E–G¶

Evidence Bundle¶

Set of files produced for audit: reports, runtime-provenance sidecars, and supporting events or derived renderings.

Aspect	Details
Context	Output directory from `invarlock evaluate` or `report --format report`
Related terms	Run report, evaluation report, Runtime manifest
Typical contents	`evaluation.report.json`, `evaluation_report.md`, `runtime.manifest.json`
See also	Artifact Layout

Canonical Guard Chain¶

The default guard chain is invariants (pre) → spectral → RMT → variance → invariants (post).

Aspect	Details
Context	Core safety checks in `run` and `evaluate` flows
Canonical order	`invariants` (pre), `spectral`, `rmt`, `variance`, `invariants` (post)
Related terms	Guard Chain, Guard Overhead
See also	Guards Reference

Enforcement: Guards execute in canonical order for reproducibility; results are recorded in validation.invariants_pass, validation.spectral_stable, validation.rmt_stable.

Guard Chain (Canonical Order)¶

Fixed execution order for guard preparation and evaluation ensuring deterministic, auditable outcomes.

Aspect	Details
Context	Defined by `guards.order` in config YAML
Related terms	Guard Chain (Canonical Order), Guard Overhead
report fields	Not stored directly (order is config-driven).
See also	Guards Reference

Guard Overhead¶

Performance impact of guard checks vs bare control run (no guards).

Aspect	Details
Context	Measured in Release profile; gate requires ≤ +1.0% PM overhead
Related terms	Canonical Guard Chain, Timing Summary
report fields	`guard_overhead.{bare_ppl,guarded_ppl,overhead_ratio,overhead_percent}`
See also	Guard Overhead Method

Example: overhead_percent: +0.12% indicates guards add 0.12% to primary metric.

K–M¶

κ (kappa) Threshold¶

Per-family spectral cap used to flag abnormally high z-scores.

Aspect	Details
Context	`spectral.family_caps.*.kappa` in tier policy
Typical values	ffn: 3.85, attn: 3.02, embed: 1.05 (Balanced tier)
Related terms	Spectral Cap, z-score, Spectral Guard
See also	Spectral FPR Derivation

Example: kappa=2.8 for attention family means z-scores > 2.8 are flagged.

Measurement Contract¶

Guard measurement procedure signature and digest recorded in reports.

Aspect	Details
Context	Spectral and RMT guards record estimator + sampling policy
Verified by	`invarlock verify --profile ci\\|release` (plus `runtime.manifest.json` runtime provenance for container-backed outputs)
report fields	`spectral.measurement_contract_hash`, `rmt.measurement_contract_hash`
See also	Guard Contracts

Enforcement: CI/Release profiles require measurement contract match between baseline and subject.

P–R¶

Policy Digest¶

Stable hash summarizing resolved policy thresholds for auditability.

Aspect	Details
Context	Stored in report for policy change detection
Related terms	Tier Policy, Policy Overrides, Policy Provenance
report fields	`policy_digest.thresholds_hash`, `policy_provenance.*`
See also	Policy Provenance

Primary Metric¶

The canonical task metric used for gating (perplexity for LMs, accuracy for classification).

Aspect	Details
Supported kinds	`ppl_causal`, `ppl_mlm`, `accuracy`, `accuracy`
Gating logic	Ratio vs baseline must stay within tier thresholds
Related terms	Primary Metric Tail, BCa Bootstrap, Window Pairing
report fields	`primary_metric.{kind,preview,final,ratio_vs_baseline,ci}`
See also	reports Reference

Example: primary_metric.kind: ppl_causal with ratio_vs_baseline: 1.003

Primary Metric Tail¶

Optional tail regression gate checking high-loss windows (e.g., q95 ΔlogNLL).

Aspect	Details
Context	Catches regression in hard examples even when mean is acceptable
Mode	`warn` (default) or `fail`
Related terms	Primary Metric, BCa Bootstrap
report fields	`primary_metric_tail.{evaluated,passed,warned,stats}`
See also	reports Reference

Provider Digest¶

Dataset identity hash covering token IDs, tokenizer config, and masking strategy.

Aspect	Details
Context	Ensures baseline and subject use identical data
Related terms	Window Pairing, Tokenizer Hash
report fields	`provenance.provider_digest.ids_sha256`
See also	Coverage & Pairing

Report¶

Run-level artifact with metrics, guard results, and metadata.

Aspect	Details
Context	Generated by `invarlock evaluate`; input to report generation
Related terms	report, Evidence Bundle
File format	`report.json` + `events.jsonl`
See also	Artifact Layout

RMT ε (epsilon) Rule¶

Random Matrix Theory epsilon band used for activation edge-risk stability checks.

Aspect	Details
Context	`rmt.epsilon_default` and `rmt.epsilon_by_family.*` thresholds
Calibration	Derived from null-sweep runs on target model families
Related terms	RMT Guard, κ Threshold
report fields	`rmt.{epsilon_default,epsilon_by_family,stable,max_edge_ratio,max_edge_delta}`
See also	RMT ε Rule

RMT Guard¶

Guard that checks eigenvalue statistics against Random Matrix Theory bounds.

Aspect	Details
Focus	Activation edge-risk growth across model families
Validation	`validation.rmt_stable`
Related terms	Canonical Guard Chain, RMT ε Rule
report fields	`rmt.{families,stable,max_edge_delta}`
See also	Guards Reference

S–T¶

Spectral Cap¶

Limit on spectral z-scores per family to flag weight instability.

Aspect	Details
Context	Applied by spectral guard; counts violations per family
Related terms	κ Threshold, z-score, Spectral Guard
report fields	`spectral.{caps_applied,caps_exceeded,top_z_scores}`
See also	Spectral FPR

Spectral Guard¶

Guard that monitors spectral norms and z-scores for weight matrices.

Aspect	Details
Focus	Baseline-relative weight matrix stability
Validation	`validation.spectral_stable`
Related terms	Canonical Guard Chain, Spectral Cap, κ Threshold
report fields	`spectral.{caps_applied,family_caps,top_z_scores,summary}`
See also	Guards Reference

Subject Run¶

The edited or target model run under evaluation (compared against baseline).

Aspect	Details
Context	`subject` checkpoint in Compare & evaluate
Related terms	Baseline, report, Window Pairing
report fields	`provenance.edited.*`
See also	Compare & evaluate

Telemetry¶

Performance and resource metrics emitted with reports.

Aspect	Details
Context	Optional fields for performance analysis
Related terms	Timing Summary, Guard Overhead
report fields	`telemetry.*`, `metrics.memory_mb_peak`
See also	Observability

Tier Policy¶

Guard threshold preset selecting the safety profile for a run.

Aspect	Details
Options	`conservative` (strictest), `balanced` (default), `aggressive` (loosest)
Source	`runtime/tiers.yaml`
Related terms	Policy Digest, Policy Overrides
report fields	`auto.tier`, `resolved_policy.*`
See also	Tier Policy Catalog

Timing Summary¶

Consolidated timing breakdown for an evaluation run.

Aspect	Details
Context	CLI output via `print_timing_summary()`
Includes	Model load, dataset load, evaluation, report generation
Related terms	Guard Overhead, Telemetry
See also	Observability

Tokenizer Hash¶

Stable hash of tokenizer settings and vocabulary for reproducibility.

Aspect	Details
Context	Ensures baseline and subject use identical tokenization
Related terms	Provider Digest, Window Pairing
report fields	`data.tokenizer_hash`, `meta.tokenizer_hash`
See also	Determinism Contracts

V–Z¶

Variance Effect (VE)¶

Guard that tracks variance change and applies equalization when beneficial.

Aspect	Details
Context	A/B test compares bare vs VE-enabled evaluation
Enabling condition	CI excludes 0 AND mean Δ ≤ -min_effect_lognll
Related terms	Canonical Guard Chain, Guard Overhead, Predictive Gate
report fields	`variance.{enabled,gain,predictive_gate.delta_ci,predictive_gate.passed}`
See also	VE Gate Power

Window Pairing¶

Alignment of baseline and subject evaluation windows for paired statistical testing.

Aspect	Details
Requirements	Same window IDs, zero overlap, 100% match fraction
Violation	`E001` pairing error in CI/Release profiles
Related terms	BCa Bootstrap, Primary Metric, Provider Digest
report fields	`dataset.windows.stats.{paired_windows,window_match_fraction,window_overlap_fraction}`
See also	Coverage & Pairing

Example: paired_windows: 200, window_match_fraction: 1.0, window_overlap_fraction: 0.0

z-score¶

Standardized deviation used in spectral guard scoring.

Aspect	Details
Formula	`z = (σ_edited - μ_baseline) / std_baseline`
Thresholding	Compared against family-specific κ caps
Related terms	Spectral Cap, κ Threshold
report fields	`spectral.top_z_scores`, `spectral.family_caps.*.kappa`
See also	Spectral FPR

Example: max |z| = 2.1 indicates the largest z-score across all weight matrices.