Building a low-overhead, comprehensive mediation security framework for autonomous agent platforms.
For autonomous agent platforms like OpenClaw, we design a protection layer based on dynamic instrumentation mechanisms (e.g., Frida) to monitor agent behaviors and prevent system sabotage.
We also integrate a multi-tiered review mechanism: utilizing Large Language Models (LLMs) to review comprehensive plans, while employing lightweight deep learning models to inspect specific actions for malicious intent. This achieves a low-overhead, comprehensive mediation agent architecture.
A runtime security governance plane for autonomous agent platforms, minimizing overhead while maximizing control.
Prevents destructive actions such as deleting system files, unauthorized access to sensitive directories, modifying configurations or keys, and malicious lateral movement.
Avoids passing every action through heavy LLMs. High-frequency actions are fast-tracked using lightweight DL models and robust deterministic rules to minimize latency and cost.
Every block, allow, or escalation decision is fully explainable. We maintain a complete chain of events and decision logs to enable deep traceability and risk memory.
Combats goal drift and injection attacks (e.g., malicious payloads hidden in webpages or docs) by correlating local actions with holistic intention tracking.
Detects hidden malicious intents that span across multiple seemingly benign actions (e.g., enumerating home dirs -> finding tokens -> compressing -> exfiltrating).
Natively adapts to the Plan-Execute-Reflect agent loop, supporting tool usage, shell execution, browser automation, file I/O, and API invocations.
A triad of Plan Understanding + Action Discrimination + Runtime Control.
Uses LLMs to assess overall intent, detect goal drift, and rewrite risky plans into safer alternatives without breaking the agent loop.
Deploys ultra-fast lightweight models and rule engines to evaluate individual shell commands, tools, and scripts instantly.
Utilizes user-space dynamic hooking (Frida) and eBPF to monitor critical APIs directly at the OS level, enforcing hard boundaries.
Fuses inputs from all layers to make final routing decisions: allow, deny, modify, enforce sandbox, or request human approval.