InvarLock Documentation¶

InvarLock is edit-agnostic (BYOE). A small built-in quantization demo (quant_rtn, 8-bit) exists for advanced smoke and demo workflows. See Compare & evaluate (BYOE).

Welcome to the documentation hub for InvarLock (Edit‑agnostic robustness reports for weight edits). The material below is organized so new users can ramp quickly while practitioners find detailed reference, design rationales, and assurance notes. It is aimed at checkpoint editors, CI and assurance owners, and researchers running paired evaluation on text workflows plus the included image-text path.

Start Here¶

Getting Started – environment setup and the first evaluate → verify → report html loop.
Quickstart – CLI highlights for common workflows.
Compare & evaluate (BYOE) – baseline ↔ subject paired evaluation with guardchain.
Primary Metric Smoke – tiny examples for ppl/accuracy kinds.

Choose Your Path¶

Wheel user / reviewer: start with Quickstart if you already have an evaluation.report.json bundle and want to verify, explain, or render it.
Evaluator: start with Getting Started if you need to run invarlock evaluate and produce a fresh evaluation bundle.
Repo maintainer: use the same user guides first, then reach for repo-only smokes, configs/, and local runtime-image flows after the core path is green.

Quick Example¶

pip install "invarlock[hf]"

# Compare & evaluate (BYOE checkpoints)
INVARLOCK_DEDUP_TEXTS=1 invarlock evaluate --allow-network \
  --baseline <BASELINE_MODEL> \
  --subject  <SUBJECT_MODEL> \
  --adapter  auto \
  --profile  ci

Tip: enable Hub downloads per command when fetching models/datasets: invarlock evaluate --allow-network ...

Security-default note: evaluate uses the runtime container by default. Use --execution-mode host only for host-side workflows that intentionally bypass that boundary. Advanced runtime-heavy workflows live under invarlock advanced.

Documentation Map¶

User Guide¶

Getting Started
Quickstart
Compare & evaluate (BYOE)
Primary Metric Smoke
Live Examples
Configuration Gallery
Example Reports
Reading a report
Troubleshooting — Error codes and common fixes
Plugins — Extending adapters and guards
Bring Your Own Data — Custom datasets
Evidence Packs — Validation suite bundles
Evidence Packs Internals — Suite architecture and preset derivation flow

Reference¶

Reference Index
CLI Reference
Tier Policy Tuning CLI (Calibration) — invarlock advanced calibrate for tier policy sweeps
Configuration Schema
Guards
Model Adapters
Model Family Catalog
reports — Schema, telemetry, and HTML export
Tier Policy Catalog (runtime tiers.yaml)
Datasets
Artifact Layout
Observability
API Guide
Programmatic Quickstart
Environment Variables

Maintainer-only runbooks may exist locally and are intentionally omitted from this public docs index.

Assurance¶

Note: Every assurance claim is backed by automated tests and cross-referenced in the docs. See Guard Contracts → Coverage Reference (assurance/04-guard-contracts.md) for the test index.

Calibration CSVs and proof reports referenced in these notes are produced by local or CI runs (typically under runs/null_sweeps/** and reports/calibration/**) and are not committed to the repository. Attach them to change proposals or releases when you update calibration.

Security¶

Threat Model — Assets and adversaries
Security Architecture — Components and defaults
Best Practices — Operational recommendations
Release Verification — Verification of published package artifacts and source tags
pip-audit Allowlist

Governance¶

Contribution Guidelines

Core Concepts¶

Configure – describe model, dataset, edit, and guard policies in YAML.
Execute – run invarlock evaluate under a CI or release profile; model-loading commands use the runtime container by default unless you pass --execution-mode host.
Validate – run invarlock verify and render HTML via invarlock report html; container-backed outputs include runtime.manifest.json next to evaluation.report.json. Directory inputs to invarlock report are only accepted when they contain canonical report.json or evaluation.report.json.
Iterate – compare runs, adjust edit plans, and reissue reports until gates pass.

The guard suite (invariants, spectral, variance, and RMT) keeps edits inside configured acceptance envelopes even when aggressive compression is attempted.

Live Example Verification¶

Curated CI-safe live examples are gated by make docs-live-fast and cover README.md, docs/user-guide/getting-started.md, docs/user-guide/quickstart.md, notebooks/invarlock_python_api.ipynb, and notebooks/invarlock_policy_tiers.ipynb.
Runnable documentation surfaces can be verified locally with make docs-live-fast, python scripts/verify_live_examples.py, or make docs-live.
The curated fast lane replays concrete Markdown CLI snippets in host mode with seeded demo evidence, then smoke-runs the curated notebook subset.
For heavyweight notebook cells that would otherwise trigger model downloads or full evaluations, the curated lane reuses seeded demo reports and keeps the later contract-reading and verification steps live.
make docs-live remains the broader local lane that replays runnable Markdown examples and smoke-runs notebooks under notebooks/, using the same host seeded-demo approach for heavyweight model-loading steps.
Artifacts land under tmp/live_examples/, including per-command JSONL results, notebook stdout/stderr logs, and a machine-readable summary.json.
Placeholder/template snippets must remain parseable, but only concrete runnable examples should be treated as copy-paste-ready.
GitHub Actions enforce the curated deterministic subset; the full verifier remains a local or long-gate lane.

Building Docs Offline vs Online¶

Offline (default): mkdocs builds without contacting the Internet. Mermaid diagrams are disabled by default to keep builds fully local.
Command: mkdocs build or run make docs without --strict.
Online (enable networked assets explicitly): enable Mermaid diagrams (via CDN) and keep strict checks.
Command: INVARLOCK_DOCS_MERMAID=1 mkdocs build --strict

Notes

The configuration references CDNs (MathJax/Polyfill) via extra_javascript in the generated HTML. These are not fetched at build time; they load when you view the HTML in a browser with network access.
The mermaid2 plugin pings the CDN; we gate it behind the INVARLOCK_DOCS_MERMAID environment variable to avoid network dependencies by default.

Support Matrix¶

Surface	Preset included	Adapter available	Pilot calibration config present	Published assurance basis
GPT-2 causal LM	Yes	Yes	Yes	Yes
BERT / RoBERTa MLM	Yes	Yes	Yes	Yes
Mistral 7B causal LM	Yes	Yes	Yes	No, repo-included pilot config only
Ministral 3 causal LM (text-only eval)	Yes	Yes	Yes	No, repo-included pilot config only
Qwen2 7B causal LM	Yes	Yes	Yes	No, repo-included pilot config only
Qwen2.5 7B causal LM	Yes	Yes	Yes	No, repo-included pilot config only
Qwen2.5 14B causal LM	Yes	Yes	Yes	No, repo-included pilot config only
Qwen3 causal LM	Yes	Yes	Yes	No, repo-included pilot config only
DeepSeek-R1-Distill-Qwen causal LM	Yes	Yes	Yes	No, repo-included pilot config only
Phi-4 causal LM (text-only eval)	Yes	Yes	Yes	No, repo-included pilot config only
Gemma 4 E2B causal LM (text-only eval)	Yes	Yes	Yes	No, repo-included pilot config only
TinyLlama 1.1B causal LM	Yes	Yes	Yes	No, repo-included pilot config only
OLMo 2 causal LM	Yes	Yes	Yes	No, repo-included pilot config only
Qwen3.5 causal LM	Yes	Yes	Yes	No, repo-included pilot config only
Seq2Seq / local pairs	Yes	Yes	No	No

Published assurance basis covers GPT-2 and BERT profiles. Repo-included presets and pilot calibration configs for additional experimental families, including Mistral 7B, Ministral 3 text-only, Qwen2 7B, Qwen2.5 7B, Qwen2.5 14B, Qwen3, DeepSeek-R1-Distill-Qwen, Phi-4 text-only, Gemma 4 E2B text-only, TinyLlama 1.1B, OLMo 2, and Qwen3.5, do not become part of the published assurance basis until supporting artifacts are attached. Access-gated vendor checkpoints are intentionally excluded from the included support matrix and preset inventory, and ungated families without clean pilot lanes remain in the model family backlog rather than the support matrix.

published_basis remains the narrow public evidence floor, while supported_experimental means the repo ships the preset, calibration config, targeted tests, and smoke/evidence path for the lane without claiming a published-basis fixture set.

Image-text evaluation uses the built-in hf_multimodal adapter and the vision_text provider. Public support remains text-only for the Gemma 4 lane, and audio evaluation is deferred.

Machine-readable support metadata lives in contracts/support_matrix.json. It is the canonical source of truth for normalized support tiers (published_basis, supported_experimental, community_experimental) and for published-basis evidence references.

Model evidence automation lives in scripts/model_evidence_sweep.py, with tmux-based remote launch support in scripts/run_model_evidence_remote.py and a nightly/manual runner workflow in .github/workflows/model-evidence-sweep.yml. Repo-prepared-but-not-yet-promoted lanes are tracked in contracts/model_family_catalog.json. For the new Gemma 4 text lane, the repo-maintained local smoke is the included manifest dry-run (scripts/model_evidence_sweep.py --slug gemma4_e2b --dry-run). The image-text path also includes an offline demo preset at configs/presets/multimodal/gemma4_e2b_vision_text_256.yaml plus tests/fixtures/vision_text/demo_manifest.jsonl for provider/config validation; live multimodal model execution requires an installed HF stack and model weights.

For the broader inventory of declared support, implemented-but-not-public coverage, usage-only checkpoint families, and recommended additions, see Model Family Catalog.

Common Workflows¶

Research¶

pip install "invarlock[adapters,guards,eval]"
invarlock doctor
INVARLOCK_DEDUP_TEXTS=1 invarlock evaluate --allow-network \
  --baseline gpt2 \
  --subject /path/to/edited \
  --adapter auto \
  --profile ci \
  --preset configs/presets/causal_lm/wikitext2_512.yaml

Development¶

invarlock advanced plugins adapters
invarlock advanced calibrate --help
bash scripts/verify_ci_matrix.sh

Production Evaluation¶

INVARLOCK_DEDUP_TEXTS=1 invarlock evaluate \
  --baseline /path/to/baseline \
  --subject  /path/to/edited \
  --adapter auto \
  --profile release \
  --preset configs/presets/causal_lm/wikitext2_512.yaml
invarlock verify reports/eval/evaluation.report.json
# expects reports/eval/runtime.manifest.json next to the report

Configuration Snapshot¶

model:
  id: gpt2
  adapter: hf_causal
  device: auto
dataset:
  provider: wikitext2
  seq_len: 768
  stride: 768
  preview_n: 240
  final_n: 240
  seed: 42
edit:
  # No edit by default (Compare & evaluate/BYOE recommended), or use built-in quant demo:
  # edit:
  #   name: quant_rtn
  #   plan:
  #     bitwidth: 8
  #     per_channel: true
guards:
  spectral:
    kappa: 3.2
  variance:
    tier: balanced
eval:
  pairing:
    enforce: true
output:
  dir: runs/

NET=1 INCLUDE_MEASURED_CLS=1 RUN=0 bash scripts/run_tiny_all_matrix.sh

Run with RUN=1 to execute the matrix.

Quick Links Getting Started · CLI Reference · Primary Metric Smoke · Example Reports · Contributing