Agent Security | WHU-ES Evolving Systems Lab

Research Focus

For autonomous agent platforms like OpenClaw, we design a protection layer based on dynamic instrumentation mechanisms (e.g., Frida) to monitor agent behaviors and prevent system sabotage.

We also integrate a multi-tiered review mechanism: utilizing Large Language Models (LLMs) to review comprehensive plans, while employing lightweight deep learning models to inspect specific actions for malicious intent. This achieves a low-overhead, comprehensive mediation agent architecture.

🛡️

Core Defense Objectives

A runtime security governance plane for autonomous agent platforms, minimizing overhead while maximizing control.

🛡️

System Sabotage Prevention

Prevents destructive actions such as deleting system files, unauthorized access to sensitive directories, modifying configurations or keys, and malicious lateral movement.

⚡

Low-Overhead Execution

Avoids passing every action through heavy LLMs. High-frequency actions are fast-tracked using lightweight DL models and robust deterministic rules to minimize latency and cost.

🔍

Complete Auditability

Every block, allow, or escalation decision is fully explainable. We maintain a complete chain of events and decision logs to enable deep traceability and risk memory.

💉

Prompt Injection Defense

Combats goal drift and injection attacks (e.g., malicious payloads hidden in webpages or docs) by correlating local actions with holistic intention tracking.

🧩

Multi-Step Attack Correlation

Detects hidden malicious intents that span across multiple seemingly benign actions (e.g., enumerating home dirs -> finding tokens -> compressing -> exfiltrating).

🔄

Seamless Workflow Integration

Natively adapts to the Plan-Execute-Reflect agent loop, supporting tool usage, shell execution, browser automation, file I/O, and API invocations.

Multi-Layered Architecture

A triad of Plan Understanding + Action Discrimination + Runtime Control.

Plan Review Layer

Uses LLMs to assess overall intent, detect goal drift, and rewrite risky plans into safer alternatives without breaking the agent loop.

Action Review Layer

Deploys ultra-fast lightweight models and rule engines to evaluate individual shell commands, tools, and scripts instantly.

Runtime Instrumentation

Utilizes user-space dynamic hooking (Frida) and eBPF to monitor critical APIs directly at the OS level, enforcing hard boundaries.

Policy Decision Engine

Fuses inputs from all layers to make final routing decisions: allow, deny, modify, enforce sandbox, or request human approval.

🤖 Agent & Embodied AI Security