BENCHMARKS

HELM Benchmark Methodology

Claim

HELM adds 75µs p99 overhead on the governed allow path in the benchmark harness.

Deny path completes in 29µs p99. These numbers are measured under local benchmark conditions and scoped to the HELM execution boundary — not end-to-end network latency.

What is measured

The HELM hot path is the incremental work added by HELM versus direct execution:

Guardian.EvaluateDecision → crypto.SignReceipt → SQLite store.Append

This chain runs on every governed tool call. It is the boundary overhead — the cost of governance, not the cost of the tool itself.

Scenarios

Scenario	What it measures
baseline_no_helm	Mock tool call (JSON marshal only) — no governance
helm_allow	Full governed allow: Guardian PRG eval → Ed25519 receipt sign → SQLite persist
helm_deny	Undeclared tool → fail-closed deny with signed decision

What is included in the HELM path

PRG (Proof Requirement Graph) rule lookup and evaluation
Effect envelope validation
Decision record construction and signing (Ed25519)
Receipt construction, canonicalization, and signing (Ed25519)
SQLite WAL-mode persistence of signed receipt

What is excluded

Network I/O (upstream LLM call, proxy TCP)
Export/verify (separate operational concern, not request-path)
MCP transport overhead
Cold start / process initialization
TLS negotiation

Results

Measured on commit 4e52909d.

Scenario	p50	p95	p99	mean	σ	min	max
baseline_no_helm	0µs	1µs	2µs	0.17µs	2.3µs	0µs	126µs
helm_allow	46µs	54µs	75µs	48µs	11.1µs	40µs	409µs
helm_deny	20µs	23µs	29µs	21µs	5.9µs	18µs	317µs

Incremental overhead (allow - baseline):

p99: 73µs
mean: 48µs

Environment

Parameter	Value
Machine	Apple M-series (arm64)
OS	macOS
Go version	1.24.0
CPU cores	10
Iterations	10,000 per scenario
Warm-up	100 iterations discarded before measurement
SQLite mode	WAL, PRAGMA synchronous=NORMAL
Signing	Ed25519 (crypto/ed25519, not CGo)

Reproduction

# Clone and build
git clone https://github.com/Mindburn-Labs/helm-oss.git
cd helm-oss

# Full overhead report (writes benchmarks/results/latest.json)
make bench-report

# Standard Go benchmarks (3 runs)
make bench

# Individual components
cd core && go test -bench=. -benchmem ./pkg/crypto/
cd core && go test -bench=. -benchmem ./pkg/store/
cd core && go test -bench=. -benchmem ./pkg/guardian/
cd core && go test -bench=. -benchmem ./benchmarks/

Caveats

Local benchmark harness only. These numbers measure the HELM execution boundary in isolation, not end-to-end latency through a proxy or network stack.
In-memory SQLite. Production deployments using on-disk SQLite or Postgres will have higher store latency. WAL mode mitigates this but does not eliminate it.
Single PRG rule. The Guardian benchmark evaluates one rule. Complex policy graphs with many rules will increase eval time.
No optional gates. The benchmark runs Guardian without freeze controller, context guard, isolation checker, egress checker, threat scanner, or delegation store. Each enabled gate adds evaluation overhead.
Warm run only. Cold-start overhead (key generation, SQLite migration, PRG initialization) is excluded.

The claim is scoped: 75µs p99 on the governed allow hot path in the benchmark harness. Do not generalize to all deployment topologies without additional measurement.

Machine-readable output

benchmarks/results/latest.json contains structured results:

{
  "helm_version": "0.3.0",
  "go_version": "go1.24.0",
  "hot_path_p99_us": 75,
  "baseline_p99_us": 2,
  "overhead_p99_us": 73,
  "overhead_under_5ms": true,
  "scenarios": [...]
}

Run make bench-report on release candidates. If hot_path_p99_us exceeds 5000 (5ms), the release should be investigated. The 5ms threshold is a conservative regression gate — the expected range is 50–200µs depending on hardware.