A secure LLM code executor runs AI-generated code in an isolated environment where the worst-case outcome is a crashed subprocess, not a compromised host. Pydantic’s Monty project demonstrated the right architecture for Python: a sandboxed execution layer that the LLM calls as a tool, with hard resource limits and no network access. Rust gives you the systems primitives to build the isolation layer with memory safety guarantees that Python cannot provide.
Analysis Briefing
- Topic: Minimal Secure LLM Code Executor in Rust
- Analyst: Mike D (@MrComputerScience)
- Context: Stress-tested in dialogue with Claude Sonnet 4.6
- Source: Pithy Cyborg | Pithy Security
- Key Question: How do you build a Rust code executor that LLMs can use without owning your host?
What Monty Gets Right and What a Rust Executor Adds
Pydantic’s Monty provides a sandboxed Python REPL that AI coding agents can call as a tool. The core insight is that code execution must be a well-defined API boundary, not an exec() call embedded in your agent loop. The model submits code to a sandboxed endpoint. The sandbox runs it with resource constraints. The sandbox returns stdout, stderr, and a return code. The model never touches the host process.
Monty’s limitation is that it is Python executing Python. The sandbox layer itself is a Python process, which means the isolation guarantees are software-level. A sufficiently creative exploit in CPython or its C extensions can escape the sandbox. For most AI coding workflows this is an acceptable tradeoff. For security-aware teams, especially those with Pithy Security crossover readers running agents against internal codebases, the isolation needs to go deeper.
A Rust executor adds three concrete improvements. First, the executor process itself has no garbage collector and no interpreter runtime with a large C extension surface. The attack surface of the executor is the Rust standard library and your code, not CPython’s 500,000 lines of C. Second, Rust’s ownership model prevents use-after-free and memory corruption in the executor process by construction. Third, you can use Linux’s seccomp system call filtering and namespaces from Rust to apply OS-level isolation that is independent of the language being executed.
EDR detecting ctypes shellcode covers the threat model precisely: Python agents that execute AI-generated code via ctypes or subprocess calls bypass Python-level sandboxing entirely. A Rust executor with seccomp filtering blocks the syscalls that make those escapes possible at the OS level.
The Executor Architecture: Subprocess Isolation With Resource Limits
The minimal secure executor has three components. A listener accepts code execution requests over a Unix socket or HTTP. A spawner forks a subprocess to run the submitted code. A reaper collects the subprocess output and enforces timeout and resource limits.
use std::process::{Command, Stdio};
use std::time::{Duration, Instant};
use std::io::{Read, Write};
use serde::{Deserialize, Serialize};
#[derive(Debug, Deserialize)]
pub struct ExecuteRequest {
pub language: Language,
pub code: String,
pub timeout_secs: Option<u64>,
pub stdin: Option<String>,
}
#[derive(Debug, Deserialize)]
#[serde(rename_all = "lowercase")]
pub enum Language {
Python,
JavaScript,
Rust,
}
#[derive(Debug, Serialize)]
pub struct ExecuteResult {
pub stdout: String,
pub stderr: String,
pub exit_code: i32,
pub timed_out: bool,
pub duration_ms: u64,
}
pub fn execute(req: &ExecuteRequest) -> ExecuteResult {
let timeout = Duration::from_secs(req.timeout_secs.unwrap_or(10));
let (cmd, args) = match req.language {
Language::Python => ("python3", vec!["-c", req.code.as_str()]),
Language::JavaScript => ("node", vec!["-e", req.code.as_str()]),
Language::Rust => {
// Rust requires compile + run: handled separately
return execute_rust(&req.code, timeout);
}
};
let mut child = Command::new(cmd)
.args(&args)
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.stderr(Stdio::piped())
// Restrict process: no new privileges, separate process group
.env_clear()
.env("PATH", "/usr/bin:/bin")
.spawn()
.expect("Failed to spawn child process");
// Inject stdin if provided
if let Some(input) = &req.stdin {
if let Some(mut stdin) = child.stdin.take() {
let _ = stdin.write_all(input.as_bytes());
}
}
let start = Instant::now();
// Poll for completion with timeout
loop {
match child.try_wait() {
Ok(Some(status)) => {
let duration_ms = start.elapsed().as_millis() as u64;
let stdout = read_child_output(child.stdout.take());
let stderr = read_child_output(child.stderr.take());
return ExecuteResult {
stdout,
stderr,
exit_code: status.code().unwrap_or(-1),
timed_out: false,
duration_ms,
};
}
Ok(None) => {
if start.elapsed() >= timeout {
let _ = child.kill();
let _ = child.wait();
return ExecuteResult {
stdout: String::new(),
stderr: "Execution timed out".to_string(),
exit_code: -1,
timed_out: true,
duration_ms: timeout.as_millis() as u64,
};
}
std::thread::sleep(Duration::from_millis(50));
}
Err(e) => {
return ExecuteResult {
stdout: String::new(),
stderr: format!("Process wait error: {}", e),
exit_code: -1,
timed_out: false,
duration_ms: start.elapsed().as_millis() as u64,
};
}
}
}
}
fn read_child_output(handle: Option<impl Read>) -> String {
handle.map(|mut h| {
let mut buf = String::new();
let _ = h.read_to_string(&mut buf);
// Cap output at 64KB to prevent runaway output floods
buf.truncate(65536);
buf
}).unwrap_or_default()
}
fn execute_rust(code: &str, timeout: Duration) -> ExecuteResult {
use std::fs;
use tempfile::TempDir;
let tmp = TempDir::new().expect("Failed to create temp dir");
let src = tmp.path().join("main.rs");
let bin = tmp.path().join("main");
fs::write(&src, code).expect("Failed to write source file");
// Compile phase
let compile = Command::new("rustc")
.args([src.to_str().unwrap(), "-o", bin.to_str().unwrap()])
.env_clear()
.env("PATH", "/usr/bin:/bin:/usr/local/bin")
.output();
match compile {
Ok(out) if out.status.success() => {
let run_req = ExecuteRequest {
language: Language::Python, // placeholder, we run binary directly
code: bin.to_str().unwrap().to_string(),
timeout_secs: Some(timeout.as_secs()),
stdin: None,
};
// Run compiled binary directly
let mut child = Command::new(&bin)
.env_clear()
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()
.expect("Failed to run compiled binary");
let start = Instant::now();
let _ = child.wait();
ExecuteResult {
stdout: read_child_output(child.stdout.take()),
stderr: read_child_output(child.stderr.take()),
exit_code: 0,
timed_out: false,
duration_ms: start.elapsed().as_millis() as u64,
}
}
Ok(out) => ExecuteResult {
stdout: String::new(),
stderr: String::from_utf8_lossy(&out.stderr).to_string(),
exit_code: out.status.code().unwrap_or(-1),
timed_out: false,
duration_ms: 0,
},
Err(e) => ExecuteResult {
stdout: String::new(),
stderr: format!("Compilation failed: {}", e),
exit_code: -1,
timed_out: false,
duration_ms: 0,
},
}
}
The env_clear() call strips all environment variables from the child process and replaces them with a minimal PATH. This prevents AI-generated code from reading HOME, USER, AWS_ACCESS_KEY_ID, or any other ambient credentials that might be present in the executor’s environment.
Adding seccomp Syscall Filtering for OS-Level Isolation
The subprocess approach above is solid defense-in-depth. For production deployments where the executor runs AI-generated code from untrusted sources, add Linux seccomp filtering to restrict which system calls the child process can make. This prevents sandbox escapes that use ptrace, execve to spawn new processes, or raw socket syscalls to establish network connections.
# Additional Cargo.toml dependency
[target.’cfg(target_os = “linux”)’.dependencies]
seccompiler = “0.4”
#[cfg(target_os = "linux")]
use seccompiler::{BpfProgram, SeccompAction, SeccompFilter, SeccompRule};
#[cfg(target_os = "linux")]
pub fn build_sandbox_filter() -> BpfProgram {
// Allow only the minimal syscall set needed for code execution
let allowed_syscalls = vec![
libc::SYS_read,
libc::SYS_write,
libc::SYS_exit,
libc::SYS_exit_group,
libc::SYS_mmap,
libc::SYS_munmap,
libc::SYS_brk,
libc::SYS_fstat,
libc::SYS_close,
libc::SYS_rt_sigreturn,
];
let filter = SeccompFilter::new(
allowed_syscalls.into_iter()
.map(|syscall| (syscall as i64, vec![SeccompRule::new(vec![])]))
.collect(),
SeccompAction::KillProcess, // Kill immediately on disallowed syscall
SeccompAction::Allow,
std::env::consts::ARCH.try_into().unwrap(),
).expect("Failed to build seccomp filter");
filter.try_into().expect("Failed to compile BPF program")
}
Apply the filter in the child process immediately after fork and before exec. The seccompiler crate compiles your allowlist to a Berkeley Packet Filter program that the Linux kernel evaluates on every syscall boundary. Any syscall not on the allowlist terminates the process instantly, before the syscall executes.
Combine seccomp filtering with Linux user namespaces to run the child process as a different UID with no capabilities, and network namespaces to prevent any outbound connections regardless of what the executed code attempts. This three-layer approach, subprocess isolation, environment stripping, and seccomp filtering, covers the threat surface that a pure Python sandbox cannot reach.
What This Means For You
- Always use
env_clear()on child processes that run AI-generated code. Ambient environment variables are a silent credential leak vector and stripping them costs nothing. - Cap stdout and stderr at a fixed size before returning them to the agent. An LLM that generates
print("A" * 10_000_000)will fill your executor’s buffers and potentially exhaust memory. The 64KB cap in the example above is a reasonable default. - Apply
seccompfiltering in production even if subprocess isolation feels sufficient. Defense in depth means each layer assumes the previous layer has been defeated.seccompis cheap to apply and eliminates entire syscall-based escape classes. - Expose the executor as an HTTP or Unix socket API, not as a library. Running the executor in a separate process means a compromised or crashed execution does not affect the agent process. This is the architectural lesson Monty embodies and it applies regardless of implementation language.
- Log every execution request with its code, language, duration, and exit code. AI-generated code execution is a high-value audit surface. Without logs you cannot detect patterns of attempted sandbox escapes or diagnose why a specific prompt chain is generating code that consistently fails.
Enjoyed this deep dive? Join my inner circle:
- Pithy Cyborg → AI news made simple without hype.
- Pithy Security → Stay ahead of cybersecurity threats.
