Skip to content

InvarLock Documentation

InvarLock is edit-agnostic (BYOE). A small built-in quantization demo (quant_rtn, 8-bit) exists for advanced smoke and demo workflows. See Compare & evaluate (BYOE).

Welcome to the documentation hub for InvarLock (Edit‑agnostic robustness reports for weight edits). The material below is organized so new users can ramp quickly while practitioners find detailed reference, design rationales, and assurance notes. It is aimed at checkpoint editors, CI and assurance owners, and researchers running paired evaluation on text workflows plus the included image-text path.


Start Here

  1. Getting Started – environment setup and the first evaluateverifyreport html loop.
  2. Quickstart – CLI highlights for common workflows.
  3. Compare & evaluate (BYOE) – baseline ↔ subject paired evaluation with guardchain.
  4. Primary Metric Smoke – tiny examples for ppl/accuracy kinds.

Choose Your Path

  • Wheel user / reviewer: start with Quickstart if you already have an evaluation.report.json bundle and want to verify, explain, or render it.
  • Evaluator: start with Getting Started if you need to run invarlock evaluate and produce a fresh evaluation bundle.
  • Repo maintainer: use the same user guides first, then reach for repo-only smokes, configs/, and local runtime-image flows after the core path is green.

Quick Example

pip install "invarlock[hf]"

# Compare & evaluate (BYOE checkpoints)
INVARLOCK_DEDUP_TEXTS=1 invarlock evaluate --allow-network \
  --baseline <BASELINE_MODEL> \
  --subject  <SUBJECT_MODEL> \
  --adapter  auto \
  --profile  ci

Tip: enable Hub downloads per command when fetching models/datasets: invarlock evaluate --allow-network ...

Security-default note: evaluate uses the runtime container by default. Use --execution-mode host only for host-side workflows that intentionally bypass that boundary. Advanced runtime-heavy workflows live under invarlock advanced.


Documentation Map

User Guide

Reference

Maintainer-only runbooks may exist locally and are intentionally omitted from this public docs index.

Assurance

Note: Every assurance claim is backed by automated tests and cross-referenced in the docs. See Guard Contracts → Coverage Reference (assurance/04-guard-contracts.md) for the test index.

Calibration CSVs and proof reports referenced in these notes are produced by local or CI runs (typically under runs/null_sweeps/** and reports/calibration/**) and are not committed to the repository. Attach them to change proposals or releases when you update calibration.

Security

Governance


Core Concepts

  1. Configure – describe model, dataset, edit, and guard policies in YAML.
  2. Execute – run invarlock evaluate under a CI or release profile; model-loading commands use the runtime container by default unless you pass --execution-mode host.
  3. Validate – run invarlock verify and render HTML via invarlock report html; container-backed outputs include runtime.manifest.json next to evaluation.report.json. Directory inputs to invarlock report are only accepted when they contain canonical report.json or evaluation.report.json.
  4. Iterate – compare runs, adjust edit plans, and reissue reports until gates pass.

The guard suite (invariants, spectral, variance, and RMT) keeps edits inside configured acceptance envelopes even when aggressive compression is attempted.


Live Example Verification

  • Curated CI-safe live examples are gated by make docs-live-fast and cover README.md, docs/user-guide/getting-started.md, docs/user-guide/quickstart.md, notebooks/invarlock_python_api.ipynb, and notebooks/invarlock_policy_tiers.ipynb.
  • Runnable documentation surfaces can be verified locally with make docs-live-fast, python scripts/verify_live_examples.py, or make docs-live.
  • The curated fast lane replays concrete Markdown CLI snippets in host mode with seeded demo evidence, then smoke-runs the curated notebook subset.
  • For heavyweight notebook cells that would otherwise trigger model downloads or full evaluations, the curated lane reuses seeded demo reports and keeps the later contract-reading and verification steps live.
  • make docs-live remains the broader local lane that replays runnable Markdown examples and smoke-runs notebooks under notebooks/, using the same host seeded-demo approach for heavyweight model-loading steps.
  • Artifacts land under tmp/live_examples/, including per-command JSONL results, notebook stdout/stderr logs, and a machine-readable summary.json.
  • Placeholder/template snippets must remain parseable, but only concrete runnable examples should be treated as copy-paste-ready.
  • GitHub Actions enforce the curated deterministic subset; the full verifier remains a local or long-gate lane.

Building Docs Offline vs Online

  • Offline (default): mkdocs builds without contacting the Internet. Mermaid diagrams are disabled by default to keep builds fully local.
  • Command: mkdocs build or run make docs without --strict.
  • Online (enable networked assets explicitly): enable Mermaid diagrams (via CDN) and keep strict checks.
  • Command: INVARLOCK_DOCS_MERMAID=1 mkdocs build --strict

Notes

  • The configuration references CDNs (MathJax/Polyfill) via extra_javascript in the generated HTML. These are not fetched at build time; they load when you view the HTML in a browser with network access.
  • The mermaid2 plugin pings the CDN; we gate it behind the INVARLOCK_DOCS_MERMAID environment variable to avoid network dependencies by default.

Support Matrix

Surface Preset included Adapter available Pilot calibration config present Published assurance basis
GPT-2 causal LM Yes Yes Yes Yes
BERT / RoBERTa MLM Yes Yes Yes Yes
Mistral 7B causal LM Yes Yes Yes No, repo-included pilot config only
Ministral 3 causal LM (text-only eval) Yes Yes Yes No, repo-included pilot config only
Qwen2 7B causal LM Yes Yes Yes No, repo-included pilot config only
Qwen2.5 7B causal LM Yes Yes Yes No, repo-included pilot config only
Qwen2.5 14B causal LM Yes Yes Yes No, repo-included pilot config only
Qwen3 causal LM Yes Yes Yes No, repo-included pilot config only
DeepSeek-R1-Distill-Qwen causal LM Yes Yes Yes No, repo-included pilot config only
Phi-4 causal LM (text-only eval) Yes Yes Yes No, repo-included pilot config only
Gemma 4 E2B causal LM (text-only eval) Yes Yes Yes No, repo-included pilot config only
TinyLlama 1.1B causal LM Yes Yes Yes No, repo-included pilot config only
OLMo 2 causal LM Yes Yes Yes No, repo-included pilot config only
Qwen3.5 causal LM Yes Yes Yes No, repo-included pilot config only
Seq2Seq / local pairs Yes Yes No No

Published assurance basis covers GPT-2 and BERT profiles. Repo-included presets and pilot calibration configs for additional experimental families, including Mistral 7B, Ministral 3 text-only, Qwen2 7B, Qwen2.5 7B, Qwen2.5 14B, Qwen3, DeepSeek-R1-Distill-Qwen, Phi-4 text-only, Gemma 4 E2B text-only, TinyLlama 1.1B, OLMo 2, and Qwen3.5, do not become part of the published assurance basis until supporting artifacts are attached. Access-gated vendor checkpoints are intentionally excluded from the included support matrix and preset inventory, and ungated families without clean pilot lanes remain in the model family backlog rather than the support matrix.

published_basis remains the narrow public evidence floor, while supported_experimental means the repo ships the preset, calibration config, targeted tests, and smoke/evidence path for the lane without claiming a published-basis fixture set.

Image-text evaluation uses the built-in hf_multimodal adapter and the vision_text provider. Public support remains text-only for the Gemma 4 lane, and audio evaluation is deferred.

Machine-readable support metadata lives in contracts/support_matrix.json. It is the canonical source of truth for normalized support tiers (published_basis, supported_experimental, community_experimental) and for published-basis evidence references.

Model evidence automation lives in scripts/model_evidence_sweep.py, with tmux-based remote launch support in scripts/run_model_evidence_remote.py and a nightly/manual runner workflow in .github/workflows/model-evidence-sweep.yml. Repo-prepared-but-not-yet-promoted lanes are tracked in contracts/model_family_catalog.json. For the new Gemma 4 text lane, the repo-maintained local smoke is the included manifest dry-run (scripts/model_evidence_sweep.py --slug gemma4_e2b --dry-run). The image-text path also includes an offline demo preset at configs/presets/multimodal/gemma4_e2b_vision_text_256.yaml plus tests/fixtures/vision_text/demo_manifest.jsonl for provider/config validation; live multimodal model execution requires an installed HF stack and model weights.

For the broader inventory of declared support, implemented-but-not-public coverage, usage-only checkpoint families, and recommended additions, see Model Family Catalog.


Common Workflows

Research

pip install "invarlock[adapters,guards,eval]"
invarlock doctor
INVARLOCK_DEDUP_TEXTS=1 invarlock evaluate --allow-network \
  --baseline gpt2 \
  --subject /path/to/edited \
  --adapter auto \
  --profile ci \
  --preset configs/presets/causal_lm/wikitext2_512.yaml

Development

invarlock advanced plugins adapters
invarlock advanced calibrate --help
bash scripts/verify_ci_matrix.sh

Production Evaluation

INVARLOCK_DEDUP_TEXTS=1 invarlock evaluate \
  --baseline /path/to/baseline \
  --subject  /path/to/edited \
  --adapter auto \
  --profile release \
  --preset configs/presets/causal_lm/wikitext2_512.yaml
invarlock verify reports/eval/evaluation.report.json
# expects reports/eval/runtime.manifest.json next to the report

Configuration Snapshot

model:
  id: gpt2
  adapter: hf_causal
  device: auto
dataset:
  provider: wikitext2
  seq_len: 768
  stride: 768
  preview_n: 240
  final_n: 240
  seed: 42
edit:
  # No edit by default (Compare & evaluate/BYOE recommended), or use built-in quant demo:
  # edit:
  #   name: quant_rtn
  #   plan:
  #     bitwidth: 8
  #     per_channel: true
guards:
  spectral:
    kappa: 3.2
  variance:
    tier: balanced
eval:
  pairing:
    enforce: true
output:
  dir: runs/

NET=1 INCLUDE_MEASURED_CLS=1 RUN=0 bash scripts/run_tiny_all_matrix.sh

Run with RUN=1 to execute the matrix.


Quick Links Getting Started · CLI Reference · Primary Metric Smoke · Example Reports · Contributing