An LLM agent that can run code is an LLM agent that can run any code it generates, including code with bugs, infinite loops, file system operations, and network calls. A sandbox does not prevent the agent from writing bad code. It prevents bad code from escaping the container it runs in. Here is how to build one that actually contains the damage.
Analysis Briefing
- Topic: Sandboxed code execution environment design for LLM agents
- Analyst: Mike D (@MrComputerScience)
- Context: A structured investigation kicked off by Claude Sonnet 4.6
- Source: Pithy Cyborg | Pithy Security
- Key Question: What does it actually take to run LLM-generated code without letting it touch anything it shouldn’t?
The Threat Model
An LLM code executor needs to contain four categories of damage:
Resource exhaustion. Infinite loops, memory bombs, and CPU-intensive operations that starve the host system. An agent that writes while True: pass or allocates a 10GB list should fail cleanly, not bring down your server.
File system access. Code that reads sensitive files (/etc/passwd, environment variables, SSH keys), writes to arbitrary paths, or deletes files. Without isolation, LLM-generated code runs with the same file system access as your application process.
Network access. Code that exfiltrates data, contacts external services, or performs reconnaissance. An agent processing sensitive documents should not be able to make HTTP requests to attacker-controlled URLs embedded in those documents.
Host escape. Code that exploits container or hypervisor vulnerabilities to break out of the execution environment. Defense against this requires running sandboxed code in an environment with a minimal attack surface.
Layer 1: Docker Container With Resource Limits
The first containment layer is a Docker container with explicit resource limits and a restricted user.
FROM python:3.12-slim
# Create non-root user for execution
RUN useradd -m -u 1000 -s /bin/bash sandbox
RUN mkdir /sandbox && chown sandbox:sandbox /sandbox
# Install only what's needed
COPY requirements.txt /tmp/
RUN pip install --no-cache-dir -r /tmp/requirements.txt
# Drop to non-root
USER sandbox
WORKDIR /sandbox
CMD ["python", "-c", "import time; time.sleep(3600)"]
Run with resource limits:
docker run \
--rm \
--network none \
--memory 256m \
--memory-swap 256m \
--cpus 0.5 \
--pids-limit 50 \
--read-only \
--tmpfs /tmp:size=64m,noexec \
--tmpfs /sandbox:size=32m \
--security-opt no-new-privileges \
--cap-drop ALL \
code-executor
The critical flags: --network none blocks all network access. --read-only makes the filesystem read-only except for the tmpfs mounts. --cap-drop ALL removes all Linux capabilities. --pids-limit 50 prevents fork bombs. --memory-swap 256m equal to --memory disables swap to prevent memory limit bypass.
Layer 2: The Execution API
The sandbox exposes a minimal HTTP API that accepts code and returns output. The API server runs inside the container. Your agent calls it over a controlled network interface.
# executor_api.py - runs inside the sandbox container
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import subprocess
import tempfile
import os
import signal
app = FastAPI()
class ExecuteRequest(BaseModel):
code: str
timeout_seconds: int = 10
language: str = "python"
class ExecuteResult(BaseModel):
stdout: str
stderr: str
exit_code: int
timed_out: bool
@app.post("/execute", response_model=ExecuteResult)
async def execute_code(request: ExecuteRequest):
if request.language != "python":
raise HTTPException(400, "Only Python supported")
if len(request.code) > 50_000:
raise HTTPException(400, "Code too large")
with tempfile.NamedTemporaryFile(
mode='w', suffix='.py', dir='/sandbox', delete=False
) as f:
f.write(request.code)
code_path = f.name
try:
result = subprocess.run(
["python", code_path],
capture_output=True,
text=True,
timeout=request.timeout_seconds,
cwd='/sandbox',
)
return ExecuteResult(
stdout=result.stdout[:10_000], # truncate large output
stderr=result.stderr[:10_000],
exit_code=result.returncode,
timed_out=False,
)
except subprocess.TimeoutExpired:
return ExecuteResult(
stdout="", stderr="Execution timed out",
exit_code=-1, timed_out=True
)
finally:
os.unlink(code_path)
The output truncation at 10,000 characters prevents a second resource exhaustion vector: code that prints gigabytes of output to fill the host’s disk through stdout capture.
Layer 3: The Agent-Side Client
Your agent calls the sandbox through a client that manages container lifecycle:
import docker
import httpx
import asyncio
from contextlib import asynccontextmanager
class SandboxExecutor:
def __init__(self, image: str = "code-executor:latest"):
self.docker_client = docker.from_env()
self.image = image
@asynccontextmanager
async def session(self):
"""Spin up a fresh container per execution session."""
container = self.docker_client.containers.run(
self.image,
detach=True,
network_mode="none",
mem_limit="256m",
memswap_limit="256m",
nano_cpus=500_000_000, # 0.5 CPU
pids_limit=50,
read_only=True,
tmpfs={"/tmp": "size=64m,noexec", "/sandbox": "size=32m"},
security_opt=["no-new-privileges"],
cap_drop=["ALL"],
ports={"8000/tcp": None}, # random host port
)
try:
# Wait for API to be ready
await asyncio.sleep(1)
port = container.ports["8000/tcp"][0]["HostPort"]
yield f"http://localhost:{port}"
finally:
container.stop(timeout=5)
container.remove(force=True)
async def execute(self, code: str, timeout: int = 10) -> dict:
async with self.session() as base_url:
async with httpx.AsyncClient(timeout=timeout + 5) as client:
response = await client.post(
f"{base_url}/execute",
json={"code": code, "timeout_seconds": timeout}
)
return response.json()
Each execution session gets a fresh container. State does not persist between calls. A container compromised in one session cannot affect the next.
What This Does Not Protect Against
Spectre/Meltdown-class CPU vulnerabilities. Timing attacks on the host CPU are possible from within a container. If your threat model includes sophisticated side-channel attacks, you need VM-level isolation (gVisor, Firecracker) rather than container isolation.
Container escape via kernel exploits. A zero-day in the Linux kernel’s container primitives can break out of Docker isolation. --cap-drop ALL and no-new-privileges reduce the attack surface but do not eliminate this class of risk.
Slow resource exhaustion below limits. Code that slowly allocates memory just under the limit, or that produces exactly 9,999 characters of output 1,000 times, can degrade the host without triggering hard limits. Rate limiting at the API layer addresses this.
For most production agent use cases, the Docker-based sandbox described here is sufficient. For high-security environments processing adversarial input at scale, layer gVisor on top of Docker for kernel-level isolation.
What This Means For You
- Always run LLM-generated code in an isolated container, not in your application process. The first time an agent generates
import os; os.system("rm -rf /")in your application process is the last time your application runs. - Block network access at the container level with
--network none, not at the application level. Application-level network blocks can be bypassed by code that directly calls syscalls. Container-level network blocking cannot. - Spin up a fresh container per session, not per request. A compromised container should be discarded after the session ends, not reused for the next user’s code.
- Truncate stdout and stderr output at the executor level before returning to the agent. Unbounded output capture is a resource exhaustion vector independent of CPU and memory limits.
Enjoyed this deep dive? Join my inner circle:
- Pithy Cyborg → AI news made simple without hype.
- Pithy Security → Stay ahead of cybersecurity threats.
