In this Help Net Security interview, Thomas Squeo, CTO for the Americas at Thoughtworks, discusses why traditional security architectures often fail when applied to autonomous AI systems. He explains why conventional threat modeling needs to adapt to address autonomous decision-making and emergent behaviors.
Squeo also outlines strategies for maintaining control and accountability when AI agents operate with increasing autonomy.
Why do traditional security architectures expecting predictable behavior fail when applied to autonomous AI systems?
Autonomous AI systems, specifically agentic generative AI, differ fundamentally. These systems make their own inferences and decisions based on statistical patterns learned from data. They operate with a degree of autonomy that enables them to take actions, make decisions in pursuit of objectives, execute sequences of tasks, invoke tools or APIs, and adapt behavior based on context. This autonomy means the system may respond in unanticipated ways to novel inputs or situations. Failures can be emergent and hard to foresee in advance. In addition to the code AI’s behavior, model and prompts are a potential vulnerability.
What design approaches help maintain control over AI agents that learn their own paths to meet objectives?
Traditional security architectures struggle with autonomous AI agents because they make their own inferences and decisions based on learned patterns. Unlike traditional software that has static rules, agentic AI may respond unexpectedly leading to failures that are hard to foresee. This requires specific design approaches to maintain control and safety.
1. Safety harness around autonomy: Implement guardrails like timeouts, action limits, and kill switches to keep these systems within safe limits.
2. Real-time control and intervention:
- Output filtering and moderation: Check AI outputs for policy violations or sensitive content.
- Human-in-the-loop (HITL): Incorporate human checkpoints for high-impact decisions.
- Dynamic kill-switches: Add an abort capability to stop the agent’s process if needed.
- Rate limiting and quotas: Slow AI’s rate and volume to allow for detection.
- Ongoing policy updates: Update AI constraints in response to emerging threats.
3. Identity and access management (IAM): Treat the agent as an untrusted component by reducing privileges to a minimum and requiring scoped permissions. Additionally, incorporate authorization checks to verify that each AI action is within scope.
4. Continuous anomaly detection: To quickly spot AI malfunction, collect telemetry including prompts, outputs, decision processes, and tool calls.
How do you conduct threat modeling for AI systems when behavior is driven by data and can change over time?
Traditional threat modeling often assumes predictable, static behavior based on predefined rules in code. It focuses on identifying vulnerabilities in known entry points like APIs or user interfaces, and anticipating failure modes stemming from coding errors or configuration issues. However, AI agents’ behavior is driven by statistical patterns and inferences. This means they can respond unpredictably to novel inputs or situations, and their failures can be emergent and hard to foresee. The AI’s behavior is now a potential vulnerability.
To address this, threat modeling for AI systems requires:
1. Recognizing unique threat vectors: The attack surface now includes the model’s prompts and outputs, the training data, and model parameters and not just the code or infrastructure. Key threats include:
- Prompt injection: Attackers inject malicious directives that override AI’s intended instructions, making traditional input validation insufficient.
- Model inversion and data extraction: Queries that exploit the model to recover details from its training data.
- Unsafe autonomy: Poorly specified goals, emergent misbehavior, or exploitation of independent AI can cause unintended harm.
2. Augment traditional frameworks with AI-specific taxonomies: Traditional frameworks like NIST CSF should be augmented using resources specifically designed for AI systems. For example, using the MITRE ATLAS framework provides a taxonomy of techniques adversaries use against machine learning systems, which can inform the threat modeling process. Similarly, using AI-specific taxonomies of failure modes can help brainstorm how unique threats could manifest in your specific AI agent’s context.
3. Integrating into the secure development lifecycle (SDLC): Threat modeling should be conducted alongside AI systems. As the system design evolves, the threat model should be updated accordingly.
4. Focus on runtime behavior: Collect telemetry (prompts, outputs, decision traces), use automated anomaly detection, and apply validator models to pre-check outputs against policy.
5. Data integrity: Know where data comes from and implement integrity checks to mitigate threats like indirect prompt injection or data poisoning. An audit trail for model development helps trace back if suspicious behavior originates from a particular data source.
6. Adversarial testing: Actively try to “break” the AI or make it misbehave using known attack techniques to identify and fix vulnerabilities.
7. Governance and oversight: An AI Risk Committee or similar governance structure is essential for overseeing AI deployments. They vet AI use cases for potential harm and ensure appropriate controls are in place.
When AI agents interface with sensitive systems like financial APIs or patient data, how do you secure those connections without limiting the agent’s capabilities?
Traditional cybersecurity measures alone are insufficient when AI agents interface with sensitive systems like financial APIs or patient data. As agent’s behavior changes based on data, a secure-by-design approach is necessary, layering traditional practices with AI-specific controls. This involves managing the unique risks introduced by the AI’s expanded attack surface, autonomy, and decision authority.
Securing these connections without unduly limiting the agent’s legitimate capabilities relies on a combination of architectural design principles and control mechanisms rather than simply blocking access:
Identity and access management (IAM) for AI systems: Every interaction an AI agent has with other sources, including sensitive systems like patient data stores, should be authenticated and authorized. As I mentioned, AI agents should operate with limited permissions.
Authorization checks on actions: Beyond authenticating the AI’s access, the system should incorporate explicit authorization checks on the actions the AI agent attempts to perform on sensitive systems. This involves an intermediary service or policy engine verifying that a requested action (like approving a financial transaction or retrieving specific patient data) is within the defined scope of what the AI is allowed to do. This adds a layer of security on top of IAM, ensuring that even if the AI is tricked (e.g., via prompt injection) into requesting a malicious action, the request is vetted against policy before being executed by the sensitive system.
Isolation and sandboxing: Running AI services in isolated environments is crucial. For instance, containerizing the AI and using network policies to strictly control what external services it can talk to ensures that the agent can only connect to the intended sensitive APIs or databases and nothing else. This reduces the potential attack surface and limits unauthorized connections.
Data provenance and input integrity: The AI’s behavior is influenced by the data it consumes. Ensure the integrity and provenance of the inputs the AI uses to interact with sensitive systems. By establishing trusted data pipelines and implementing integrity checks on the data the AI retrieves (e.g., from internal knowledge bases), you help prevent indirect prompt injection or data poisoning that could manipulate the AI into making malicious requests to sensitive systems.
Anomaly detection: Since AI behavior is dynamic, continuous telemetry and monitoring are essential to gain visibility into its actions. This includes logging all prompts given to the AI and its outputs, especially when interacting with sensitive systems. Automated anomaly detection can learn normal patterns of interaction with sensitive systems and alert security teams to deviations, such as an unusual volume or type of requests, which could indicate misuse or malfunction.
Real-time control and intervention mechanisms: Design the system with the ability to control the AI’s actions in real time. This includes implementing output filtering and moderation. Before an AI’s output is delivered or used to trigger an action on a sensitive system, it should pass through a filter that checks for policy violations or attempts to extract sensitive information. This acts as a safety layer to ensure that even if the AI generates a risky instruction or attempts to output sensitive data obtained from a connected system, it is caught before causing harm. Human-in-the-loop checkpoints can also be designed for high-impact decisions involving sensitive data or systems.
Zero trust principles: Treat the AI agent and its interactions as inherently untrusted. This requires rigorously verifying every access attempt and action, particularly when interacting with sensitive systems. This strategy provides multifaceted protection against misuse by weaving security checks throughout the process, ensuring that even authorized connections are constantly scrutinized for malicious activity.
Data protection and privacy: While not solely about securing the connection itself, implementing encryption for data at rest and in transit is standard practice for any system handling sensitive data, including the AI’s memory, logs, and the data being transmitted to/from sensitive APIs. Additionally, privacy by design and using data loss prevention (DLP) techniques on the AI’s outputs help mitigate the risk of sensitive data leakage inadvertently obtained through legitimate access channels.
What does accountability look like when AI agents make decisions or initiate actions? Can we trace those actions clearly enough for auditing and regulatory review?
Accountability for AI agents that make decisions or initiate actions is achieved through rigorous logging, real-time monitoring, defined authorization controls, and robust governance structures. Because agentic AIs possess autonomy based on learned patterns rather than predefined rules, their behavior can be unpredictable, introducing new security challenges compared to traditional software. This necessitates building security and trustworthiness into the architecture from the ground up.
Accountability looks like designing systems where every step the AI agent takes leaves a trace, allowing reconstruction and understanding of what happened after the fact.
To make AI agent actions traceable and auditable for regulatory review and incident investigation, systems must implement comprehensive logging and auditability:
1. Prompt and response logging: Recording all inputs (prompts) given to the AI and its outputs is essential for auditing and real-time monitoring.
2. Logging the decision process: It is important to log the AI’s decision process as much as possible. If the AI uses chain-of-thought or intermediate reasoning steps, these should be captured.
3. Logging tool/API calls: If the AI consults external tools or calls APIs (e.g., for financial transactions, retrieving patient data), the inputs and outputs of these specific calls must be logged.
4. Tracking every step: Essentially, every single step or action the AI agent takes in pursuit of its objective should be recorded.
5. Correlating actions in external systems: If the AI modifies data or triggers actions in other systems (like approving a payment or updating a record), these actions should be logged in both the target system and the AI system’s logs, ideally with a correlation ID to link them.
6. Logging human oversight: Any human intervention, such as approvals or reviews of AI-suggested actions, should also be logged.
7. Including explainability aids: Logging metadata like the AI’s rationale or confidence score for a decision can help auditors understand why it acted in a certain way.
8. Securing and managing logs: Logs containing AI interactions can be sensitive and must be secured (e.g., encrypted at rest, access restricted) and retained according to policy, balancing audit needs with privacy. Logs should be immutable or appropriately secured to prevent tampering.
9. Designing for log review: The operational process must include review of these logs, especially for high-risk AI applications.