AI agent security

AI agent security is execution authority plus evidence.

Scanners and filters inspect inputs. The harder problem is the side effect an agent performs. HELM governs the consequential action and records a receipt, so security lives where the effect runs.

Govern the side effect. Deny the unknown by default. Prove the decision.

Run the execution risk test View on GitHub

The gap in scanner-only security

Inspecting the input is not securing the action.

Most AI agent security stops at the text: scan the prompt, filter the response, score the risk. Useful, but the consequence appears when a tool runs. If the side effect is ungoverned, the agent is still unguarded.

Scanners inspect inputs

Prompt scanners and content filters read what goes in and out. They can flag a risky request. They do not decide whether the resulting action may run.

The side effect is the risk

A tool-using agent does not just answer. It writes records, moves money, changes access, deploys code. Security has to govern that effect, not just the text around it.

After-the-fact logs are not proof

A log says something happened. It does not say the action was checked, what the verdict was, or which policy applied. Evidence has to be bound to the decision.

Execution authority plus receipts

Govern the effect, then prove it.

Real agent security is two things working together: a decision on the side effect, and evidence bound to that decision.

Decide

Return ALLOW, DENY, or ESCALATE for a proposed action, before the effect runs.

Bind

Bind the permitted effect to the verdict that authorized it, with scope and policy.

Prove

Sign a receipt and EvidencePack that anyone can verify offline, later.

Security by side effect

Authority is defined per action class.

HELM secures what an agent can do, by side effect. Each action class carries a default verdict and the evidence HELM records when it runs.

Side effect	Default verdict	Risk	Required evidence
Data export Export a customer list, download records, push data to a destination	ESCALATE	Critical	Data hash, principal, policy hash, destination, signed receipt
Database / record write Change a CRM, ticket, or policy-admin record	ALLOW	High	Before/after state hash, receipt, rollback semantics
IAM / access change Grant a role, revoke a token, reset a password	ESCALATE	Critical	Delegation-chain receipt, access-change EvidencePack
Deployment / infra change Deploy a service, update infrastructure, restart production	ESCALATE	Critical	Change receipt, CI evidence, rollback path
Code merge / PR action Open a PR, modify code, merge a dependency bump	ESCALATE	High	PR receipt, diff hash, reviewer disposition
Refund / credit Issue a refund, apply a credit, waive a fee	ESCALATE	High	Customer-action receipt, amount, policy, evidence
Customer communication Send a support reply, an outbound email, or a notice	ESCALATE	Medium	Message receipt, template version, approval where required
Incident response Quarantine a host, revoke a token, escalate a ticket	ESCALATE	Critical	Incident receipt, telemetry, disposition

Where HELM fits

Scanners and identity stay. HELM decides what may execute, and records the proof.

Input scanners and filters

Inspect prompts and responses for risky content.

Identity

Prove who or what is acting.

Gateways

Route and observe tool and MCP traffic.

Observability

Reconstruct what happened from logs.

HELM

Decides whether the side effect may run, returns ALLOW / DENY / ESCALATE, and records a signed receipt.

Questions

AI agent security, in plain terms.

What is AI agent security?

It is the discipline of bounding what a tool-using agent can do and proving what it did. Input scanning and identity are part of it, but the core question is whether a proposed side effect may run. That is execution authority, paired with evidence of the decision.

Isn’t a prompt scanner enough?

A scanner inspects the request. It cannot stop the side effect that follows or prove the action was authorized. HELM checks the proposed action against policy before it runs and records a receipt, so the control and the evidence live at the moment of execution.

How does HELM secure a tool-using agent?

When the agent proposes a consequential action, HELM returns ALLOW, DENY, or ESCALATE before the effect runs, denies anything unknown or unapproved by default, and binds a signed receipt to the action. External tool output and MCP servers are treated as untrusted unless explicitly normalized and approved.

Does this replace my identity or observability tools?

No. Identity proves who is acting and observability reconstructs history. HELM decides whether a consequential action may execute and records proof that survives outside those tools.

Keep reading

Follow the proof path.

Terms

Plain-language terms

EvidencePack: A small bundle of records used to verify one event or review path.
Use for replayable evidence slices.
ProofGraph: A record chain that helps replay and check what happened.
Use for HELM proof records and replay paths.
ALLOW: HELM lets the action run.
Use as a canonical verdict.
DENY: HELM blocks the action.
Use as a canonical verdict.
ESCALATE: HELM stops and asks for more facts, policy, or human approval.
Use as the canonical non-dispatch path for missing facts, policy hold, or approval.

Secure the action, not just the prompt.

Bring one consequential agent action to the boundary and see the verdict and the receipt.

Run the execution risk test Request an architecture review