Nobody should pay $200 a month to learn how to build AI agents. Here is the complete zero-cost stack I use in 2026 to build agents that run in production, handle real traffic, and do not secretly bill you at 3am when a loop goes wrong.
Everything here has a free tier that is genuinely useful, not a 14-day trial that converts to a $99/month subscription after you have already built something on it.
Analysis Briefing
- Topic: Zero-cost production AI agent stack in 2026
- Analyst: Mike D (@MrComputerScience)
- Context: Originated from a live session with Claude Sonnet 4.6
- Source: Pithy Cyborg | Pithy Security
- Key Question: Can you actually build and deploy a production AI agent without spending a dollar?
The Stack
| Layer | Tool | Free Tier Limit | Why This One |
|---|---|---|---|
| LLM inference | Groq | 14,400 req/day on Llama 3 | Fastest free inference available |
| Fallback LLM | Gemini Flash 2.0 | 1,500 req/day | Multimodal, longer context |
| Vector store | Supabase + pgvector | 500MB, 2 projects | Full Postgres, not a toy |
| Agent framework | LangGraph (Python) | Open source | Stateful, production-grade |
| Hosting | Railway | $5 free credits/month | Actual deploys, not just localhost |
| Orchestration | GitHub Actions | 2,000 min/month | Cron, triggers, no extra tooling |
| Observability | LangSmith | 5,000 traces/month | Trace every agent step |
| Secrets | GitHub Secrets + Railway env vars | Free | No Vault needed at this scale |
Total monthly cost at moderate usage: $0. Total setup time from zero to deployed agent: under two hours if you follow this exactly.
Step 1: Inference With Groq
Groq’s free tier runs Llama 3.3 70B and Mixtral at speeds that embarrass paid OpenAI tiers. The limit is 14,400 requests per day on most models, which is more than enough for development and light production use.
from groq import Groq
client = Groq() # reads GROQ_API_KEY from environment
def call_llm(messages: list[dict], model: str = "llama-3.3-70b-versatile") -> str:
response = client.chat.completions.create(
model=model,
messages=messages,
max_tokens=1024,
)
return response.choices[0].message.content
Get your key at console.groq.com. Takes two minutes. Add it to your environment:
export GROQ_API_KEY=your_key_here
For Railway deploys, add it as an environment variable in the Railway dashboard. Never commit it.
Step 2: Fallback With Gemini Flash
When Groq rate limits hit, you need a fallback that does not cost money. Gemini Flash 2.0 gives you 1,500 requests per day free plus native multimodal support, which Groq does not have on free tier.
import google.generativeai as genai
import os
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
def call_gemini_fallback(messages: list[dict]) -> str:
model = genai.GenerativeModel("gemini-2.0-flash")
# Convert from OpenAI format to Gemini format
prompt = "\n".join([f"{m['role']}: {m['content']}" for m in messages])
response = model.generate_content(prompt)
return response.text
Wire both into a simple router with rate limit detection:
def call_with_fallback(messages: list[dict]) -> str:
try:
return call_llm(messages)
except Exception as e:
if "rate_limit" in str(e).lower() or "429" in str(e):
print("Groq rate limited, falling back to Gemini")
return call_gemini_fallback(messages)
raise
This is not a production circuit breaker. It is good enough for a free-tier agent that you are not getting paid to maintain at 99.99% uptime.
Step 3: Vector Store With Supabase + pgvector
Supabase’s free tier gives you a real Postgres database with pgvector enabled. 500MB is enough for tens of thousands of document chunks depending on your embedding dimensions.
Create a project at supabase.com, then enable the vector extension and create your embeddings table:
-- Run this in the Supabase SQL editor
create extension if not exists vector;
create table documents (
id bigserial primary key,
content text not null,
embedding vector(1536), -- adjust to your embedding model's dimensions
metadata jsonb,
created_at timestamptz default now()
);
create index on documents using ivfflat (embedding vector_cosine_ops)
with (lists = 100);
For embeddings on the free tier, use the Gemini embedding model (free) or the local nomic-embed-text via Ollama if you want fully local:
import google.generativeai as genai
def embed_text(text: str) -> list[float]:
result = genai.embed_content(
model="models/text-embedding-004",
content=text,
)
return result['embedding']
Insert and query:
import psycopg2
import json
def insert_document(conn, content: str, embedding: list[float], metadata: dict):
with conn.cursor() as cur:
cur.execute(
"INSERT INTO documents (content, embedding, metadata) VALUES (%s, %s, %s)",
(content, embedding, json.dumps(metadata))
)
conn.commit()
def similarity_search(conn, query_embedding: list[float], limit: int = 5) -> list[dict]:
with conn.cursor() as cur:
cur.execute(
"""
SELECT content, metadata, 1 - (embedding <=> %s::vector) AS similarity
FROM documents
ORDER BY embedding <=> %s::vector
LIMIT %s
""",
(query_embedding, query_embedding, limit)
)
return [{"content": r[0], "metadata": r[1], "similarity": r[2]} for r in cur.fetchall()]
Step 4: Agent Loop With LangGraph
LangGraph is open source and handles the stateful agent loop pattern better than a hand-rolled while loop. Install it:
pip install langgraph langchain-groq
Minimal working agent with tool use:
from langgraph.graph import StateGraph, END
from langchain_groq import ChatGroq
from langchain_core.messages import HumanMessage, AIMessage, ToolMessage
from typing import TypedDict, Annotated
import operator
class AgentState(TypedDict):
messages: Annotated[list, operator.add]
llm = ChatGroq(model="llama-3.3-70b-versatile")
def search_tool(query: str) -> str:
"""Search the document store for relevant content."""
# Wire in your Supabase similarity search here
return f"Search results for: {query}"
tools = [search_tool]
llm_with_tools = llm.bind_tools(tools)
def agent_node(state: AgentState) -> AgentState:
response = llm_with_tools.invoke(state["messages"])
return {"messages": [response]}
def tool_node(state: AgentState) -> AgentState:
last_message = state["messages"][-1]
results = []
for tool_call in last_message.tool_calls:
if tool_call["name"] == "search_tool":
result = search_tool(tool_call["args"]["query"])
results.append(ToolMessage(content=result, tool_call_id=tool_call["id"]))
return {"messages": results}
def should_continue(state: AgentState) -> str:
last_message = state["messages"][-1]
if hasattr(last_message, "tool_calls") and last_message.tool_calls:
return "tools"
return END
graph = StateGraph(AgentState)
graph.add_node("agent", agent_node)
graph.add_node("tools", tool_node)
graph.set_entry_point("agent")
graph.add_conditional_edges("agent", should_continue)
graph.add_edge("tools", "agent")
app = graph.compile()
Step 5: Deploy to Railway
Railway’s free tier gives you $5 in credits per month, which covers a lightweight agent service running occasional requests. Create a Dockerfile:
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["python", "main.py"]
Push to GitHub, connect the repo in Railway, add your environment variables in the Railway dashboard, and deploy. The entire deploy pipeline takes about four minutes the first time.
Step 6: Observe With LangSmith
LangSmith’s free tier gives you 5,000 traces per month. That is every agent step, every tool call, every LLM response, fully logged and queryable. You cannot debug a multi-step agent without this.
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your_langsmith_key"
os.environ["LANGCHAIN_PROJECT"] = "my-free-agent"
Set these before your agent runs. LangSmith captures everything automatically when LangChain tracing is enabled. No instrumentation code required.
The Actual Limits You Will Hit
Groq: 14,400 requests per day sounds like a lot until you run an agent loop that makes 8 calls per user request. At 8 calls per request, you can serve 1,800 user requests per day before you hit the limit. That is fine for a side project. It is not fine for anything with real traffic.
Supabase: 500MB goes fast with large embedding dimensions. Use 768-dimension embeddings instead of 1536 if you need more documents in the free tier. The search quality difference is small for most use cases.
Railway: $5 of credits per month runs a lightweight service maybe 400 hours depending on the instance size. Deploy a small instance, not a large one.
LangSmith: 5,000 traces per month disappears in a long debugging session. Disable tracing in production once the agent is stable. Use it for development and turn it off before you ship.
What This Stack Cannot Do
It cannot handle high-throughput production traffic without money. The rate limits are real. If your agent gets traction, you will outgrow the free tiers, and the right response is to start paying for the things that are now worth paying for.
It cannot replace a real security review. This is a learning and prototyping stack. If you are handling user PII or making financial decisions, you need more than a free-tier Supabase instance and GitHub Secrets.
What it can do is get a real agent deployed and working at zero cost, which is the thing you need before you know whether any of this is worth paying for.
Mike D builds in public at @MrComputerScience. All code in this post runs. If it does not, open an issue.
Enjoyed this deep dive? Join my inner circle:
- Pithy Cyborg → AI news made simple without hype.
- Pithy Security → Stay ahead of cybersecurity threats.
