Diagram interlude

Receipts make execution replayable as evidence.

Every governed action needs a receipt trail that can be inspected without turning private context into public proof.

ProofGraph Event TrailREPLAYABLESIGNEDHASH-CHAINED

Every action leaves a chain of signed evidence that can be replayed and verified.

Text description

Event	Timestamp	Actor	Hash	Decision
Proposal	14:32:01.442Z	agent-sprint-twin	a4f2…c891	Submitted
Policy Snapshot	14:32:01.443Z	helm-pep	e7b1…3d40	Captured
Approval State	14:32:01.510Z	helm-cpi	91c8…f712	Verified
Tool Contract	14:32:01.511Z	helm-connector	b3d0…8a21	Matched
Action	14:32:01.580Z	connector-github	f4e2…1b09	Executed
Receipt	14:32:01.581Z	helm-proof	c912…4df8	Signed
EvidencePack	14:32:01.590Z	helm-evidence	d1a7…92e3	Bundled
Replay	—	auditor	full chain	Verifier pass

The Auditability Gap in Autonomous Systems

When an API call is made by a human developer, the audit trail is usually straightforward: an identity, a timestamp, and a request payload. When an autonomous AI agent executes a sequence of actions, the context is far more complex.

Why did the agent make that specific decision? What data did it consider? Was it following a human instruction or its own derived logic? Traditional logging is insufficient for answering these questions, creating a critical compliance gap for enterprises.

Cryptographic Provenance

HELM addresses this gap by ensuring every action taken by the system generates Signed Receipts and Replayable Evidence.

The Evidence Pack

Whenever a proposal is generated and evaluated, HELM compiles an “Evidence Pack.” This is a signed, tamper-sensitive record containing:

The Original Intent: The user prompt or trigger that initiated the action.
The Context: The specific source-backed context state the model had access to at the time.
The Proposal: The exact JSON/Protobuf spec generated by the model.
The Policy Evaluation: The deterministic result of the HELM runtime evaluating the proposal against organizational rules.
The Human Approval: If a HitL gate was involved, the cryptographic signature of the human approver.
The Execution Result: The final outcome of the action.

Replayability for Trust and Debugging

Because every input and state transition is captured in the Evidence Pack, any execution can be deterministically replayed.

For Auditors: This gives reviewers a chain from human intent to machine execution. It is evidence, not a certification.
For Engineers: This allows developers to load a failed execution state locally, inspect exactly what the model saw, and debug the specific failure point without needing to recreate the entire non-deterministic conversation.

In the HELM loop, execution should not be a black box. It should leave a receipt that people can inspect.

← Back to Research