LLM APIs fail. Grok goes down, Claude starts rate limiting, Gemini returns 503s for twenty minutes. A naive retry loop hammers the failing endpoint until your budget is gone or your users give up. A circuit breaker detects failure patterns, stops sending requests to degraded providers, and routes to fallbacks automatically.
Here is a production-ready circuit breaker for multi-provider LLM calls in async Rust.
Analysis Briefing
- Topic: Circuit breaker pattern for multi-provider LLM failover in Rust
- Analyst: Mike D (@MrComputerScience)
- Context: A technical briefing developed with Claude Sonnet 4.6
- Source: Pithy Cyborg | Pithy Security
- Key Question: When your primary LLM provider goes down, how does your app not go down with it?
Why a Circuit Breaker, Not Just Retry Logic
Retry logic handles transient failures: a single 500, a momentary timeout. Circuit breakers handle systemic failures: a provider that is down for minutes or an hour.
The difference matters for LLM workloads specifically because:
- LLM API calls are expensive. Retrying a 60-second timeout request five times wastes five minutes and potentially five API credits.
- Failed requests still consume time-to-first-token budget. Users waiting for a response that will not come are accumulating latency.
- Rate limit responses (429s) are often signals to back off completely, not to retry faster.
A circuit breaker transitions between three states:
Closed — Normal operation. Requests pass through. Failures are counted.
Open — The failure threshold was exceeded. Requests fail immediately without hitting the provider. After a timeout, the circuit transitions to half-open.
Half-open — A probe request is sent. If it succeeds, the circuit closes. If it fails, the circuit opens again with a longer timeout.
The Implementation
use std::sync::Arc;
use std::time::{Duration, Instant};
use tokio::sync::RwLock;
use thiserror::Error;
#[derive(Debug, Error)]
pub enum CircuitError {
#[error("Circuit is open for provider '{provider}', try again in {retry_in:?}")]
CircuitOpen { provider: String, retry_in: Duration },
#[error("Provider error: {0}")]
ProviderError(String),
#[error("All providers failed or are open")]
AllProvidersFailed,
}
#[derive(Debug, Clone, PartialEq)]
enum CircuitState {
Closed,
Open { opened_at: Instant, timeout: Duration },
HalfOpen,
}
#[derive(Debug)]
struct CircuitBreaker {
state: CircuitState,
failure_count: u32,
failure_threshold: u32,
base_timeout: Duration,
consecutive_successes: u32,
success_threshold: u32,
}
impl CircuitBreaker {
fn new(failure_threshold: u32, base_timeout: Duration) -> Self {
Self {
state: CircuitState::Closed,
failure_count: 0,
failure_threshold,
base_timeout,
consecutive_successes: 0,
success_threshold: 2,
}
}
fn can_attempt(&self) -> Result<(), Duration> {
match &self.state {
CircuitState::Closed => Ok(()),
CircuitState::HalfOpen => Ok(()),
CircuitState::Open { opened_at, timeout } => {
let elapsed = opened_at.elapsed();
if elapsed >= *timeout {
Ok(())
} else {
Err(*timeout - elapsed)
}
}
}
}
fn record_success(&mut self) {
self.failure_count = 0;
match self.state {
CircuitState::HalfOpen => {
self.consecutive_successes += 1;
if self.consecutive_successes >= self.success_threshold {
self.state = CircuitState::Closed;
self.consecutive_successes = 0;
tracing::info!("Circuit closed after successful probes");
}
}
_ => {
self.consecutive_successes = 0;
}
}
}
fn record_failure(&mut self) {
self.consecutive_successes = 0;
self.failure_count += 1;
match &self.state {
CircuitState::HalfOpen => {
// Failed probe: reopen with exponential backoff
let new_timeout = self.base_timeout * 2;
let new_timeout = new_timeout.min(Duration::from_secs(300)); // cap at 5 minutes
self.state = CircuitState::Open {
opened_at: Instant::now(),
timeout: new_timeout,
};
tracing::warn!("Circuit reopened after failed probe, timeout: {:?}", new_timeout);
}
CircuitState::Closed => {
if self.failure_count >= self.failure_threshold {
self.state = CircuitState::Open {
opened_at: Instant::now(),
timeout: self.base_timeout,
};
tracing::warn!(
"Circuit opened after {} failures",
self.failure_count
);
}
}
CircuitState::Open { .. } => {}
}
}
fn transition_to_half_open_if_ready(&mut self) {
if let CircuitState::Open { opened_at, timeout } = &self.state {
if opened_at.elapsed() >= *timeout {
self.state = CircuitState::HalfOpen;
tracing::info!("Circuit transitioning to half-open for probe");
}
}
}
}
The Multi-Provider Router
use std::collections::HashMap;
use async_trait::async_trait;
#[async_trait]
pub trait LlmProvider: Send + Sync {
fn name(&self) -> &str;
async fn complete(&self, prompt: &str) -> Result<String, String>;
}
pub struct CircuitBreakerRouter {
providers: Vec<Arc<dyn LlmProvider>>,
breakers: Arc<RwLock<HashMap<String, CircuitBreaker>>>,
failure_threshold: u32,
base_timeout: Duration,
}
impl CircuitBreakerRouter {
pub fn new(
providers: Vec<Arc<dyn LlmProvider>>,
failure_threshold: u32,
base_timeout: Duration,
) -> Self {
let mut breakers = HashMap::new();
for provider in &providers {
breakers.insert(
provider.name().to_string(),
CircuitBreaker::new(failure_threshold, base_timeout),
);
}
Self {
providers,
breakers: Arc::new(RwLock::new(breakers)),
failure_threshold,
base_timeout,
}
}
pub async fn complete(&self, prompt: &str) -> Result<String, CircuitError> {
let mut last_error = CircuitError::AllProvidersFailed;
for provider in &self.providers {
let name = provider.name().to_string();
// Check and potentially transition the circuit
{
let mut breakers = self.breakers.write().await;
let breaker = breakers.get_mut(&name).unwrap();
breaker.transition_to_half_open_if_ready();
match breaker.can_attempt() {
Err(retry_in) => {
tracing::debug!(
"Skipping provider '{}': circuit open, retry in {:?}",
name, retry_in
);
last_error = CircuitError::CircuitOpen {
provider: name.clone(),
retry_in,
};
continue;
}
Ok(()) => {}
}
}
// Attempt the call
tracing::debug!("Attempting provider '{}'", name);
match provider.complete(prompt).await {
Ok(response) => {
let mut breakers = self.breakers.write().await;
breakers.get_mut(&name).unwrap().record_success();
tracing::info!("Provider '{}' succeeded", name);
return Ok(response);
}
Err(e) => {
let mut breakers = self.breakers.write().await;
breakers.get_mut(&name).unwrap().record_failure();
tracing::warn!("Provider '{}' failed: {}", name, e);
last_error = CircuitError::ProviderError(e);
}
}
}
Err(last_error)
}
pub async fn provider_status(&self) -> HashMap<String, String> {
let breakers = self.breakers.read().await;
breakers
.iter()
.map(|(name, breaker)| {
let status = match &breaker.state {
CircuitState::Closed => "closed".to_string(),
CircuitState::HalfOpen => "half-open".to_string(),
CircuitState::Open { opened_at, timeout } => {
let remaining = timeout.saturating_sub(opened_at.elapsed());
format!("open (retry in {:?})", remaining)
}
};
(name.clone(), status)
})
.collect()
}
}
Concrete Provider Implementations
use reqwest::Client;
use serde_json::json;
pub struct GroqProvider {
client: Client,
api_key: String,
model: String,
}
impl GroqProvider {
pub fn new(api_key: impl Into<String>) -> Self {
Self {
client: Client::new(),
api_key: api_key.into(),
model: "llama-3.3-70b-versatile".to_string(),
}
}
}
#[async_trait]
impl LlmProvider for GroqProvider {
fn name(&self) -> &str { "groq" }
async fn complete(&self, prompt: &str) -> Result<String, String> {
let response = self.client
.post("https://api.groq.com/openai/v1/chat/completions")
.bearer_auth(&self.api_key)
.json(&json!({
"model": self.model,
"messages": [{"role": "user", "content": prompt}],
"max_tokens": 1024
}))
.timeout(Duration::from_secs(30))
.send()
.await
.map_err(|e| e.to_string())?;
if !response.status().is_success() {
let status = response.status();
let body = response.text().await.unwrap_or_default();
return Err(format!("HTTP {}: {}", status, body));
}
let data: serde_json::Value = response.json().await.map_err(|e| e.to_string())?;
data["choices"][0]["message"]["content"]
.as_str()
.map(String::from)
.ok_or_else(|| "Missing content in response".to_string())
}
}
Wiring It Together
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
tracing_subscriber::init();
let router = CircuitBreakerRouter::new(
vec![
Arc::new(GroqProvider::new(std::env::var("GROQ_API_KEY")?)),
// Arc::new(GeminiProvider::new(std::env::var("GEMINI_API_KEY")?)),
// Arc::new(ClaudeProvider::new(std::env::var("ANTHROPIC_API_KEY")?)),
],
3, // open after 3 consecutive failures
Duration::from_secs(60), // try again after 60 seconds
);
match router.complete("What is the capital of France?").await {
Ok(response) => println!("Response: {}", response),
Err(CircuitError::AllProvidersFailed) => {
eprintln!("All providers failed or are open");
}
Err(e) => eprintln!("Error: {}", e),
}
// Check circuit status
let status = router.provider_status().await;
for (provider, state) in &status {
println!("{}: {}", provider, state);
}
Ok(())
}
The Cargo.toml
[dependencies]
tokio = { version = "1", features = ["full"] }
async-trait = "0.1"
reqwest = { version = "0.11", features = ["json"] }
serde_json = "1"
thiserror = "1"
tracing = "0.1"
tracing-subscriber = "0.3"
What to Add for Production
Metrics: Emit counters for circuit state transitions, provider success/failure rates, and fallback events. These are the signals that tell you which provider is having a bad day before your users do.
Per-error-type handling: A 429 (rate limit) should open the circuit faster than a 500 (server error). A timeout should open the circuit differently from an authentication error. Differentiated failure handling reduces false positives.
Configurable per-provider thresholds: Groq’s free tier has different reliability characteristics than a paid Claude API plan. Set failure thresholds and timeouts appropriate to each provider’s SLA.
Health check endpoint: Expose the provider_status() output through an HTTP endpoint so your monitoring system can see circuit state without parsing logs.
Mike D writes Rust and builds AI infrastructure. Follow at @MrComputerScience.
Enjoyed this deep dive? Join my inner circle:
- Pithy Cyborg → AI news made simple without hype.
- Pithy Security → Stay ahead of cybersecurity threats.
