I Built a Fully Autonomous Coding Agent for Under $50/Month — Here's the Exact Setup

DEV Community

Suifeng023

May 11, 2026, 08:14 PM

I Built a Fully Autonomous Coding Agent for Under $50/Month — Here's the Exact Setup Three months ago, I watched an AI agent write, test, and deploy an entire microservice while I made coffee. That moment changed everything about how I work. After months of experimenting, I've built a coding agent setup that handles 70% of my daily development tasks — bug fixing, code generation, testing, documentation — running 24/7 on my own infrastructure. Total cost: $47/month. Here's exactly how I did it, and how you can replicate it in one afternoon. Don't get me wrong — GitHub Copilot is great. But it has limitations: It only suggests within your IDE — no terminal access, no file system operations, no deployment It can't run tests or validate its own output It doesn't learn from your project's specific patterns beyond what's in the current file You're limited to one model — what if Claude is better at refactoring while GPT is better at generating tests? A custom agent gives you full control over the model, the tools, and the workflow. ┌─────────────────────────────────────────┐ │ ORCHESTRATOR │ │ (Python + LangGraph) │ │ $0/month │ ├──────────┬──────────┬───────────────────┤ │ LLM 1 │ LLM 2 │ LLM 3 │ │ Claude │ GPT-4o │ Gemini Pro │ │ $20/mo │ $20/mo │ $7/mo │ ├──────────┴──────────┴───────────────────┤ │ TOOL LAYER │ │ Terminal │ File System │ Browser │ │ Git │ Docker │ npm/pip │ Linting │ ├─────────────────────────────────────────┤ │ KNOWLEDGE BASE │ │ Project docs │ Style guide │ Tests │ │ $0/month │ └─────────────────────────────────────────┘ The brain of the operation. I use LangGraph to build a state machine that routes tasks to the right model and tool combination. from langgraph.graph import StateGraph, END from typing import TypedDict, Annotated import operator class AgentState(TypedDict): task: str context: str model_used: str code_output: str test_results: str iteration: int messages: Annotated[list, operator.add] def route_task(state: AgentState) -> str: """Route to the best model based on task type.""" task = state["task"].lower() if any(w in task for w in ["refactor", "optimize", "clean", "improve"]): return "claude" # Claude excels at code quality elif any(w in task for w in ["test", "debug", "fix", "error"]): return "gpt4o" # GPT-4o is great at debugging elif any(w in task for w in ["document", "explain", "summary"]): return "gemini" # Gemini for documentation else: return "claude" # Default for generation def should_iterate(state: AgentState) -> str: """Decide if we need another iteration.""" if state["iteration"] >= 3: return END if "PASS" in state.get("test_results", ""): return END return "generate" The key insight? Different models excel at different tasks. Routing intelligently saves both money and quality. Here's my exact API spending breakdown: Model Provider Cost/Month Best For Claude 3.5 Sonnet Anthropic API ~$20 Code generation, refactoring GPT-4o OpenAI API ~$20 Debugging, test writing Gemini 1.5 Pro Google AI Studio ~$7 Documentation, large context Pro tip: Use Google AI Studio's free tier for Gemini — you get 60 requests/minute free, which is plenty for documentation tasks. import anthropic import openai import google.generativeai as genai class ModelRouter: def __init__(self): self.claude = anthropic.Anthropic() self.gpt = openai.OpenAI() genai.configure(api_key=os.getenv("GOOGLE_API_KEY")) self.gemini = genai.GenerativeModel("gemini-1.5-pro") def generate(self, model: str, prompt: str, context: str = "") -> str: if model == "claude": response = self.claude.messages.create( model="claude-sonnet-4-20250514", max_tokens=4096, messages=[{"role": "user", "content": f"{context}\n\n{prompt}"}] ) return response.content[0].text elif model == "gpt4o": response = self.gpt.chat.completions.create( model="gpt-4o", messages=[{"role": "system", "content": context}, {"role": "user", "content": prompt}] ) return response.choices[0].message.content elif model == "gemini": response = self.gemini.generate_content(f"{context}\n\n{prompt}") return response.text This is where the magic happens. Your agent needs hands to interact with the codebase. import subprocess import os from pathlib import Path class DevTools: """Tools the agent can use to interact with the codebase.""" def read_file(self, path: str) -> str: """Read a file from the project.""" return Path(path).read_text() def write_file(self, path: str, content: str) -> str: """Write content to a file.""" Path(path).parent.mkdir(parents=True, exist_ok=True) Path(path).write_text(content) return f"Written to {path}" def run_command(self, cmd: str, cwd: str = ".") -> str: """Execute a shell command safely.""" # Safety: block dangerous commands blocked = ["rm -rf /", "sudo", "DROP TABLE", "> /dev/sda"] if any(b in cmd for b in blocked): return f"BLOCKED: Dangerous command detected" result = subprocess.run( cmd, shell=True, cwd=cwd, capture_output=True, text=True, timeout=60 ) return result.stdout + result.stderr def run_tests(self, test_cmd: str = "pytest") -> str: """Run the test suite and return results.""" return self.run_command(test_cmd) def lint(self, path: str = ".") -> str: """Run linter on the codebase.""" return self.run_command(f"ruff check {path}") def git_diff(self) -> str: """Show what changed.""" return self.run_command("git diff") The safety layer is crucial — you're giving an AI the ability to run arbitrary commands. Always sandbox and always validate. Your agent needs context about your project. I use a simple approach: from langchain_community.vectorstores import Chroma from langchain_text_splitters import RecursiveCharacterTextSplitter class ProjectKnowledge: def __init__(self, project_path: str): self.project_path = project_path self.vectorstore = None def index_project(self): """Index all project documentation and code.""" docs = [] for ext in ["*.md", "*.py", "*.ts", "*.json"]: for file in Path(self.project_path).rglob(ext): # Skip node_modules, venv, etc. if any(skip in str(file) for skip in ["node_modules", "venv", ".git"]): continue docs.append({ "content": file.read_text(), "path": str(file), "type": ext }) splitter = RecursiveCharacterTextSplitter( chunk_size=2000, chunk_overlap=200 ) texts = [] metadatas = [] for doc in docs: chunks = splitter.split_text(doc["content"]) texts.extend(chunks) metadatas.extend([{"source": doc["path"]} for _ in chunks]) self.vectorstore = Chroma.from_texts( texts=texts, metadatas=metadatas ) def search(self, query: str, k: int = 5) -> list: """Search the knowledge base for relevant context.""" return self.vectorstore.similarity_search(query, k=k) Here's the main loop that ties everything together: def agent_loop(task: str, project_path: str): """Main agent execution loop.""" knowledge = ProjectKnowledge(project_path) tools = DevTools() router = ModelRouter() state = { "task": task, "context": "", "model_used": "", "code_output": "", "test_results": "", "iteration": 0, "messages": [] } # Build context from knowledge base relevant_docs = knowledge.search(task) state["context"] = "\n\n".join([d.page_content for d in relevant_docs]) while True: state["iteration"] += 1 model = route_task(state) state["model_used"] = model # Generate code with the best model state["code_output"] = router.generate( model=model, prompt=f"Task: {task}\n\nContext:\n{state['context']}\n\nPrevious attempt: {state.get('code_output', '')}\n\nTest results: {state['test_results']}\n\nPlease provide improved code.", context=state["context"] ) # Apply the changes # (In production, parse the model output to extract file changes) tools.write_file("output.py", state["code_output"]) # Run tests state["test_results"] = tools.run_tests() print(f"Iteration {state['iteration']}: Used {model}") print(f"Tests: {state['test_results'][:200]}") # Check if we should continue next_step = should_iterate(state) if next_step == END: break return state["code_output"] After three months of daily use, here's what the setup handles: Bug fixes: Paste the error, get the fix. 85% success rate on first try. Unit test generation: "Write tests for auth/utils.py" → 40 tests in 30 seconds. Documentation: Generates docstrings and README sections from code analysis. Code review: Flags potential issues before I even open the PR. Feature scaffolding: "Create a CRUD endpoint for orders" → gets 80% right. Database migrations: Generates migration files, I just review and apply. Refactoring: "Split this 500-line file into modules" → solid first draft. Architecture decisions: I describe the problem, it proposes 3 approaches with trade-offs. Security audits: Runs through OWASP checklist against the codebase. Cache everything. I cache LLM responses using Redis — identical queries don't hit the API twice. This alone cut my costs by 40%. Use the cheapest model first. Route simple tasks to GPT-4o-mini ($0.15/1M input tokens) instead of Claude. Batch your requests. Instead of asking "fix this bug" and "write tests" separately, combine them: "Fix this bug and write tests for the fix." Set spending limits. All three providers let you set monthly caps. I set mine at $30, $30, and $10 respectively — and I've never hit them. Use local models for simple tasks. Ollama + CodeLlama handles simple completions for free on my machine. The $47 Breakdown (Actual Receipts) Service Monthly Cost Notes Claude API $18.42 Code generation + refactoring OpenAI API $16.87 Debugging + test writing Google AI Studio $0.00 Free tier covers documentation VPS (DigitalOcean) $6.00 Runs the orchestrator 24/7 Redis (Upstash free tier) $0.00 Response caching ChromaDB (local) $0.00 Vector storage Total $47.29 Anthropic Console → Create API key OpenAI Platform → Create API key Google AI Studio → Free API key pip install langgraph langchain anthropic openai google-generativeai chromadb redis git clone https://github.com/your-repo/coding-agent cd coding-agent cp .env.example .env # Edit .env with your API keys from agent import ProjectKnowledge, agent_loop # Index your codebase kb = ProjectKnowledge("/path/to/your/project") kb.index_project() # Try your first task result = agent_loop("Fix the login bug in auth/views.py", "/path/to/your/project") print(result) Add project-specific tools (database queries, API calls) Fine-tune the routing logic for your tech stack Build a web UI with Streamlit for easier interaction Start with one model. I jumped into multi-model routing too fast. Start with Claude alone, add others as needed. Build the safety layer first. I accidentally ran rm -rf build/ instead of rm -rf dist/ once. Sandbox everything. Invest in context quality. The agent is only as good as its understanding of your project. Spend time on your README and code comments. Log everything. I use LangSmith to trace every agent decision — invaluable for debugging and optimization. The Future: Where This Is Going The coding agent space is moving fast. Here's what I'm watching: Claude Code and Cursor Agent mode are making this more accessible Multi-agent systems (dev agent + reviewer agent + QA agent) for better quality Fine-tuned models on your specific codebase for better context understanding Self-healing systems that detect and fix production issues autonomously But here's the thing — you don't need to wait. The setup I described works today with available tools and APIs. And for $47/month, it's cheaper than most IDE subscriptions. Have you built your own coding agent? I'd love to hear about your setup and what tasks you've automated. Drop a comment below! 👇 If you found this useful, follow me for more practical AI engineering guides. I write about building real AI products, not just theory.