CLI Reference¶
Overview¶
| Aspect | Details |
|---|---|
| Purpose | Command-line interface for evaluation, verification, reporting, and advanced maintenance flows. |
| Audience | Operators running InvarLock from a terminal or CI. |
| Primary commands | evaluate, verify, report, doctor, advanced, version. |
| Runtime verifier | invarlock advanced runtime-verify for direct runtime manifest checks. |
| Requires | invarlock[hf] for model-loading workflows; extra backends are installed via Python extras. |
| Network | Offline by default; use evaluate --allow-network when a run needs model or dataset downloads. |
| Source of truth | src/invarlock/cli/app.py, src/invarlock/cli/commands/*.py, src/invarlock/cli/runtime_verify.py. |
Most users only need a narrow top-level surface:
invarlock evaluateinvarlock verifyinvarlock report html
Everything else is either diagnostics (doctor) or explicitly advanced
(invarlock advanced ...).
First-Touch Surfaces¶
These entrypoints are the ones users hit first when orienting themselves in a fresh install or wheel-only environment:
| Surface | Why it matters |
|---|---|
invarlock --help |
Top-level discovery of the supported public command set |
invarlock --version |
Confirms the installed package and schema pairing |
invarlock report --help |
Shows the report subcommands without requiring run artifacts |
invarlock advanced --help |
Lists the advanced maintenance namespace before drilling into subcommands |
invarlock advanced calibrate --help |
Establishes that calibration lives under advanced rather than the core loop |
invarlock advanced runtime-verify --help |
Wheel-native runtime-manifest verification for existing report bundles |
Quick Start¶
# Install the Hugging Face-backed evaluation stack
pip install "invarlock[hf]"
# Compare a baseline against a subject
invarlock evaluate --allow-network \
--baseline gpt2 \
--subject distilgpt2 \
--adapter auto \
--profile ci
# Validate the container-backed evaluation bundle
invarlock verify reports/eval/evaluation.report.json
# Render shareable HTML
invarlock report html -i reports/eval/evaluation.report.json -o reports/eval/evaluation.html
invarlock report explain --evaluation-report reports/eval/evaluation.report.json
Security Defaults¶
evaluatedefaults to--execution-mode container, which delegates model-loading work into the runtime container.- Use
--execution-mode hostonly for host-side workflows that intentionally bypass the container boundary. verifyexpectsruntime.manifest.jsonbeside container-backed evaluation outputs and fails closed when required runtime provenance is missing.- Network access remains opt-in through
evaluate --allow-network.
Task To Command Map¶
| Task | Command | Output |
|---|---|---|
| Compare baseline vs subject | invarlock evaluate |
reports/eval/evaluation.report.json plus runtime.manifest.json for container-backed runs |
| Validate an evaluation report | invarlock verify |
Exit code plus human or JSON verification output |
| Render HTML from an evaluation report | invarlock report html |
HTML file |
| Explain gate decisions from an evaluation bundle or explicit run reports | invarlock report explain |
Human-readable explanation |
| Inspect environment health | invarlock doctor |
Human or JSON diagnostics |
| Evidence-pack, policy, plugin, or calibration workflows | invarlock advanced ... |
Advanced artifacts and diagnostics |
Artifact Outputs Matrix¶
| Command | Writes runs/ |
Writes reports/ |
Notes |
|---|---|---|---|
invarlock evaluate |
Yes (--out, default runs/) |
Yes (--report-out, default reports/eval) |
Produces the paired evaluation report bundle |
invarlock verify |
No | No | Reads existing evaluation report JSON |
invarlock report html |
No | Yes (--output) |
Renders HTML from an existing report |
invarlock report explain |
No | No | Prefers evaluation.report.json, then auto-resolves linked run reports; also accepts explicit --subject-report and --baseline-report |
invarlock doctor |
No | No | Diagnostics only |
invarlock advanced evidence-pack |
Depends on subcommand | Depends on subcommand | Advanced evidence packaging |
invarlock advanced policy |
Depends on subcommand | No | Advanced policy-pack tooling |
invarlock advanced plugins |
No | No | Read-only plugin discovery and explanation |
invarlock advanced calibrate |
Yes | Yes | Advanced tier-policy calibration workflows |
Top-Level Command Index¶
| Command | Purpose |
|---|---|
invarlock evaluate |
Compare baseline and subject checkpoints with deterministic pairing |
invarlock verify |
Verify evaluation reports against schema, pairing, and runtime provenance rules |
invarlock report |
Explain, render, and validate existing report artifacts |
invarlock doctor |
Diagnose environment and configuration issues |
invarlock advanced |
Advanced evidence-pack, policy, plugin, and calibration workflows |
invarlock version |
Show the installed version |
invarlock advanced runtime-verify |
Verify an evaluation report against its sibling runtime.manifest.json |
Exit codes: 0=success, 1=generic failure, 2=usage/schema/config failure,
3=hard abort for profile-aware fail-closed paths.
invarlock evaluate¶
Purpose: compare a baseline against a subject and emit an evaluation report.
Common options:
--baseline: baseline checkpoint path or model ID--subject: subject checkpoint path or model ID--baseline-report: reuse a stored baseline report by passing the explicitreport.jsonfile path that captured the baseline windows--adapter: adapter name orauto--profile:ci,release, or another included profile--tier: tier label for policy context--preset: optional repo preset path--out: run-artifact directory--report-out: evaluation report directory--execution-mode container|host: execution policy forevaluate.containerkeeps model loading inside the runtime container;hostallows host-side execution and produces host artifacts that should be verified withverify --runtime-provenance host.--edit-config: optional demo/smoke edit overlay such asquant_rtn
Example:
INVARLOCK_DEDUP_TEXTS=1 invarlock evaluate --allow-network \
--baseline gpt2 \
--subject distilgpt2 \
--adapter auto \
--profile ci \
--report-out reports/eval
invarlock verify¶
Purpose: verify existing evaluation report JSON files.
Arguments:
REPORTS...: one or more evaluation report JSON paths or directories containing canonicalevaluation.report.json
Common options:
--baseline: optional baseline report for comparison flows--tolerance: float tolerance for recompute checks--profile: profile-aware validation mode--runtime-provenance container|host: runtime provenance policy for the supplied report artifacts--json: emit a single JSON envelope
Example:
invarlock verify --json reports/eval/evaluation.report.json
invarlock report¶
Purpose: operate on existing report artifacts through explicit subcommands.
Core subcommands:
invarlock report generate- Generate human-readable report output from existing run reports
- Options:
--run,--compare-run-report,--baseline-run-report,--format,--output invarlock report html- Render an evaluation report to HTML
- Options:
-i/--input,-o/--output,--embed-css,--force invarlock report explain- Explain gates and primary-metric behavior from the preferred evaluation bundle input, or from explicit subject/baseline run reports when needed
- Options:
--evaluation-report,--subject-report,--baseline-report invarlock report validate- Validate a report JSON against the v1 schema
- Directory inputs are command-specific:
report generateandreport explainaccept directories containing canonicalreport.jsonreport htmlandreport validateaccept directories containing canonicalevaluation.report.jsonreport explain --evaluation-reportaccepts directories containing canonicalevaluation.report.jsonverifyaccepts directories containing canonicalevaluation.report.jsonand optional baselines containing canonicalreport.jsonorevaluation.report.json- If a directory contains both canonical filenames, it is ambiguous and rejected; pass the exact file path instead.
Example:
invarlock report html -i reports/eval/evaluation.report.json -o reports/eval/evaluation.html
invarlock report explain --evaluation-report reports/eval/evaluation.report.json
invarlock report explain \
--subject-report runs/subject/report.json \
--baseline-report runs/baseline/report.json
invarlock doctor¶
Purpose: environment diagnostics that remain light-import safe.
Common options:
--json--profile--tier--baseline-report--subject-report--strict- Report inputs accept an explicit JSON file path or a directory containing
canonical
report.jsonorevaluation.report.json; ambiguous directories with both canonical files are rejected and require an explicit file path.
Example:
invarlock doctor --json
invarlock advanced¶
Purpose: advanced and maintenance-oriented workflows that are intentionally outside the core product contract.
Subcommands:
invarlock advanced evidence-pack- Inspect, build, and verify evidence packs
invarlock advanced policy- Build and verify policy-pack artifacts
invarlock advanced plugins- Read-only plugin discovery and explanation
invarlock advanced calibrate- Tier-policy calibration and sweep tooling
Examples:
invarlock advanced evidence-pack verify <pack> --strict
invarlock advanced policy verify policy-pack.json --json
invarlock advanced plugins list --json
invarlock advanced calibrate --help
Plugins & Entry Points¶
invarlock advanced plugins lists built-in and optional adapters, guards,
edits, datasets, and related entry points without mutating the active Python
environment.
Available read-only flows include:
invarlock advanced plugins listinvarlock advanced plugins adaptersinvarlock advanced plugins guardsinvarlock advanced plugins edits
Optional backends are installed through normal Python packaging, for example:
pip install "invarlock[hf]"
pip install "invarlock[awq,gptq]"
Plugin install and uninstall commands are not part of the CLI surface.
invarlock advanced runtime-verify¶
Purpose: package-native runtime provenance verification for an existing evaluation report and its sibling runtime manifest.
Common options:
--report: path toevaluation.report.json--manifest: path toruntime.manifest.json--json: emit a machine-readableruntime-verify-v1envelope
Example:
invarlock advanced runtime-verify \
--report reports/eval/evaluation.report.json \
--manifest reports/eval/runtime.manifest.json
JSON Output¶
Stable machine-readable output is available on the verification and advanced plugin surfaces.
invarlock verify --jsoninvarlock advanced plugins list --jsoninvarlock advanced evidence-pack verify --jsoninvarlock advanced policy verify --json
These commands emit a single JSON object suitable for CI parsing.
Command Layout¶
- The public top level is
evaluate,verify,report,doctor,advanced, andversion. - Evidence-pack, policy, plugin, and calibration workflows live under
invarlock advanced .... - Host execution for the core evaluation path is expressed as
--execution-mode host. - Internal delegated config execution uses a package-internal config-runner module, not a public CLI command.
- Optional runtime backends are installed with Python extras instead of CLI install and uninstall commands.