Can a Python Multi-Agent System Triage SOC Alerts?

Yes, but not reliably out of the box. LangGraph can orchestrate a multi-agent triage pipeline, but alert classification accuracy depends entirely on your prompts, your threat intel context, and whether a human stays in the loop for high-severity decisions.

Analysis Briefing

Topic: Multi-Agent SOC Alert Triage With LangGraph
Analyst: Mike D (@MrComputerScience)
Context: Stress-tested in dialogue with Claude Sonnet 4.6
Source: Pithy Cyborg | Pithy Security
Key Question: Where does an AI triage agent break down in a real SOC environment?

What a Multi-Agent Triage Pipeline Actually Does

The agentic pattern for SOC triage splits responsibility across specialized nodes. A researcher agent pulls context, an analyst agent classifies severity, and a decider agent recommends action. LangGraph connects them through a stateful graph with conditional routing based on each node’s output.

That sounds clean in a diagram. In practice, each agent is an LLM call with a role-specific prompt. The “intelligence” is prompt engineering plus whatever context you inject at runtime. The graph enforces the workflow. The model provides the judgment. Neither one validates the other.

The critical design decision is what each agent receives as input. An analyst agent handed a raw log line with no enrichment will produce generic classifications. The same agent handed the log line plus the user’s recent activity, the asset’s criticality score, and three similar historical alerts will produce actionable output.

Context injection is where multi-agent SOC pipelines succeed or fail. The graph is just plumbing.

# pip install langgraph langchain langchain-ollama
from langgraph.graph import StateGraph, END
from langchain_ollama import ChatOllama
from langchain_core.messages import SystemMessage, HumanMessage
from typing import TypedDict, Optional

llm = ChatOllama(model="llama3.2", temperature=0.0)

class TriageState(TypedDict):
    alert: str
    context: Optional[str]
    research: Optional[str]
    severity: Optional[str]
    recommendation: Optional[str]
    escalate: Optional[bool]

def researcher_node(state: TriageState) -> TriageState:
    prompt = f"""You are a threat intelligence researcher.
Alert: {state['alert']}
Additional context: {state.get('context', 'None provided')}

Identify: threat category, affected asset type, attack vector.
Be specific. Two to three sentences maximum."""
    response = llm.invoke([
        SystemMessage(content="You are a SOC threat researcher. Be concise and specific."),
        HumanMessage(content=prompt)
    ])
    return {**state, "research": response.content}

def analyst_node(state: TriageState) -> TriageState:
    prompt = f"""You are a SOC analyst classifying alert severity.
Alert: {state['alert']}
Research findings: {state['research']}

Classify severity as: Low, Medium, High, or Critical.
Output format: SEVERITY: <level>
REASON: <one sentence>"""
    response = llm.invoke([
        SystemMessage(content="You are a SOC analyst. Output structured severity classification."),
        HumanMessage(content=prompt)
    ])
    severity_line = [l for l in response.content.split('\n') if 'SEVERITY:' in l]
    severity = severity_line[0].replace('SEVERITY:', '').strip() if severity_line else "Unknown"
    return {**state, "severity": severity}

def decider_node(state: TriageState) -> TriageState:
    prompt = f"""You are a SOC decision agent.
Alert: {state['alert']}
Severity: {state['severity']}
Research: {state['research']}

Recommend: isolate, monitor, escalate, or close.
One sentence justification."""
    response = llm.invoke([
        SystemMessage(content="You are a SOC decision agent. Be direct and actionable."),
        HumanMessage(content=prompt)
    ])
    escalate = state['severity'] in ["High", "Critical"]
    return {**state, "recommendation": response.content, "escalate": escalate}

def route_after_analysis(state: TriageState) -> str:
    if state.get("severity") in ["High", "Critical"]:
        return "decider"
    return "decider"  # Always route to decider; swap for human-in-loop node in production

graph = StateGraph(TriageState)
graph.add_node("researcher", researcher_node)
graph.add_node("analyst", analyst_node)
graph.add_node("decider", decider_node)
graph.set_entry_point("researcher")
graph.add_edge("researcher", "analyst")
graph.add_conditional_edges("analyst", route_after_analysis)
graph.add_edge("decider", END)

app = graph.compile()

alert = "Suspicious login from unusual geography at 03:14 UTC. User account: svc_backup. 14 failed attempts followed by success."
result = app.invoke({"alert": alert, "context": "svc_backup is a privileged service account with domain admin rights"})

print(f"Severity: {result['severity']}")
print(f"Recommendation: {result['recommendation']}")
print(f"Escalate: {result['escalate']}")

This runs three sequential LLM calls against a local Llama 3.2 instance. No cloud exposure, no API costs during development, no sensitive log data leaving your network.

Where AI Triage Agents Break Down in Real SOC Environments

False confidence is the primary failure mode. An LLM will classify every alert with apparent certainty regardless of how ambiguous the underlying signal is. A human analyst knows when they don’t have enough information to decide. The model produces a recommendation either way.

Alert volume compounds this. SOC environments generate thousands of alerts daily. A multi-agent pipeline that takes three LLM calls per alert at even modest latency becomes a bottleneck fast. Local models help with cost but not necessarily with speed. A quantized 7B model doing three sequential inferences per alert will fall behind a real alert queue quickly.

Prompt injection is a specific threat that most SOC automation discussions ignore entirely. If your researcher agent ingests raw log data and passes it directly into an LLM prompt, an attacker who controls log content can inject instructions into your triage pipeline. This is exactly the attack vector described in detail when examining how prompt injection can silently compromise an entire agent pipeline.

Sanitize log inputs before they touch a prompt. Treat log data as untrusted user input, because that’s what it is.

When Human-in-the-Loop Changes Everything

The conditional routing in LangGraph supports interrupting graph execution and waiting for human input before proceeding. For High and Critical severity alerts, this isn’t optional, it’s the difference between a useful tool and a liability.

# pip install langgraph
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.memory import MemorySaver

# Add human review node for high severity alerts
def human_review_node(state: TriageState) -> TriageState:
    print(f"\nHUMAN REVIEW REQUIRED")
    print(f"Alert: {state['alert']}")
    print(f"AI Severity: {state['severity']}")
    print(f"AI Recommendation: {state['recommendation']}")
    approval = input("Approve AI recommendation? (yes/no/escalate): ").strip().lower()
    return {**state, "recommendation": f"Human-approved: {approval} | AI suggested: {state['recommendation']}"}

def route_for_human(state: TriageState) -> str:
    if state.get("escalate"):
        return "human_review"
    return END

# Rebuild graph with human-in-loop and checkpointing
memory = MemorySaver()
graph_with_human = StateGraph(TriageState)
graph_with_human.add_node("researcher", researcher_node)
graph_with_human.add_node("analyst", analyst_node)
graph_with_human.add_node("decider", decider_node)
graph_with_human.add_node("human_review", human_review_node)
graph_with_human.set_entry_point("researcher")
graph_with_human.add_edge("researcher", "analyst")
graph_with_human.add_edge("analyst", "decider")
graph_with_human.add_conditional_edges("decider", route_for_human)
graph_with_human.add_edge("human_review", END)

app_with_human = graph_with_human.compile(checkpointer=memory)

The checkpointer persists graph state across the interruption. When a human analyst resumes the workflow, context is fully preserved. No repeated LLM calls, no lost state.

This pattern is what separates a demo from a deployable tool. AI handles triage volume. Humans handle consequential decisions. The graph enforces which is which.

What This Means For You

Sanitize every log field before it enters a prompt, because raw log data is attacker-controlled input and prompt injection through log content is a real and underdefended attack vector in SOC automation pipelines.
Add human-in-the-loop routing for High and Critical severity before deploying anywhere near production, because an LLM that auto-closes a Critical alert without human review is not a SOC tool, it’s a liability.
Benchmark your local model’s inference speed against your actual alert volume early, because three sequential LLM calls per alert at scale will expose latency problems that never appear in single-alert demos.

Enjoyed this deep dive? Join my inner circle:

Pithy Cyborg → AI news made simple without hype.
Pithy Security → Stay ahead of cybersecurity threats.

Additional menu

Analysis Briefing

What a Multi-Agent Triage Pipeline Actually Does

Where AI Triage Agents Break Down in Real SOC Environments

When Human-in-the-Loop Changes Everything

What This Means For You

Footer

Get My Latest Artificial Intelligence Newsletter For FREE