How I Architected a Multi-Agent System for Customer Support (And What I'd Do Differently)
Six months ago, I built a multi-agent customer support system that handles 10,000+ conversations daily. It reduced response time from 4 hours to under 2 minutes. It now resolves 73% of tickets without human intervention. But here's what the case study won't tell you: it almost failed spectacularly in week two. And the reason reveals everything about how NOT to design multi-agent systems. Here's what I built initially: Single Agent Architecture (FAILURE): Customer Message → [Router Agent] → [Single Resolution Agent] → Response Simple, right? One agent receives, one agent resolves. Within two weeks, we hit three problems: The agent couldn't handle different timezones and urgency levels Complex issues (refunds + exchanges + account problems) required different knowledge bases Peak hours (Mondays, 9 AM) crashed the single agent The fix was obvious: multiple specialized agents working together. Here's what actually works: Customer Message ↓ [ triage_agent ] ← Fast, stateless, decides where to route ↓ ┌───┴───┐ ↓ ↓ [ billing ] [ shipping ] [ returns ] [ general ] ← Specialized, stateful ↓ ↓ [ resolution_agents ] ← Generate response, check policies ↓ [ quality_check_agent ] ← Final review before sending ↓ Response Let me walk through each component. The first decision point. It should be fast and stateless—no conversation history. class TriageAgent: SYSTEM_PROMPT = """ You are a customer support triage specialist. Your ONLY job: read the incoming message and route it correctly. Do NOT try to solve the problem. Just classify and route. Categories: - BILLING: charges, payments, subscriptions, invoices, refunds - SHIPPING: delivery, tracking, addresses, delays - RETURNS: return policy, return requests, exchanges - GENERAL: account, login, password, other Urgency levels: - URGENT: money involved, legal keywords, explicit threats - HIGH: dissatisfaction markers, complaint patterns - NORMAL: standard requests Output ONLY this format: { "category": "BILLING|SHIPPING|RETURNS|GENERAL", "urgency": "URGENT|HIGH|NORMAL", "confidence": 0.0-1.0, "summary": "one sentence summary of the issue" } """ def classify(self, message: str) -> dict: response = self.llm.chat([ {"role": "system", "content": self.SYSTEM_PROMPT}, {"role": "user", "content": message} ]) return self.parse_json(response) This agent has one job and does it well. It's fast because it's stateless. Each category gets its own agent with specialized knowledge: class BillingAgent: def __init__(self): self.tools = [ self.get_subscription_details, self.process_refund, self.update_payment_method, self.issue_credit, ] self.policy = load_billing_policy() def resolve(self, issue: dict, conversation_history: list) -> dict: # Check for edge cases first if self.is_high_risk_refund(issue): return self.escalate(issue, reason="refund_over_threshold") if self.requires_manager_approval(issue): return self.flag_for_review(issue) # Normal resolution path return self.generate_resolution(issue, conversation_history) def is_high_risk_refund(self, issue: dict) -> bool: refund_amount = issue.get("amount", 0) customer_tier = self.get_customer_tier(issue["customer_id"]) days_since_purchase = self.get_days_since_purchase(issue) return ( refund_amount > 500 or (customer_tier == "basic" and refund_amount > 100) or days_since_purchase > 30 ) Notice: the agent has tool access, not just text generation. It actually does things. Before sending any response to a customer, it goes through review: class QualityCheckAgent: def review(self, response: str, original_issue: dict, customer_tier: str) -> dict: checks = { "tone_appropriate": self.check_tone(response, customer_tier), "policy_compliant": self.check_policy(response, original_issue), "no_hallucinations": self.verify_claims(response, original_issue), "complete": self.check_completeness(response, original_issue), } all_passed = all(checks.values()) if all_passed: return {"approved": True, "response": response} else: return { "approved": False, "needs_revision": True, "issues": [k for k, v in checks.items() if not v] } This catched issues before customers see them. The magic is in how these agents coordinate: class SupportOrchestrator: def __init__(self): self.triage = TriageAgent() self.resolvers = { "BILLING": BillingAgent(), "SHIPPING": ShippingAgent(), "RETURNS": ReturnsAgent(), "GENERAL": GeneralAgent(), } self.quality = QualityCheckAgent() self.human_escalation = HumanEscalationHandler() async def handle(self, message: str, customer_id: str) -> str: # Step 1: Fast triage classification = self.triage.classify(message) if classification["urgency"] == "URGENT": await self.human_escalation.notify(message, customer_id) # Step 2: Get specialized resolver resolver = self.resolvers[classification["category"]] # Step 3: Resolve with conversation context resolution = resolver.resolve( issue=classification, conversation_history=self.get_history(customer_id) ) # Step 4: Quality check quality_result = self.quality.review( response=resolution["response"], original_issue=classification, customer_tier=self.get_customer_tier(customer_id) ) if quality_result["needs_revision"]: # Loop back with feedback resolution = resolver.revise( previous_response=resolution, quality_feedback=quality_result["issues"] ) return resolution["response"] Looking back, here's what I'd change: I added logging in week three. Should have been there from the start. # Add this everywhere from day one async def handle(self, message: str, customer_id: str) -> str: trace_id = generate_trace_id() start_time = time.now() logger.info({ "trace_id": trace_id, "customer_id": customer_id, "message_preview": message[:100], "stage": "start" }) try: result = await self._handle_impl(message, customer_id) logger.info({ "trace_id": trace_id, "duration": time.now() - start_time, "success": True }) return result except Exception as e: logger.error({ "trace_id": trace_id, "error": str(e), "stage": "failure" }) raise # Have a fallback agent ready FALLBACK_RESOLVER = """ You are a general support agent. The specialized agent was unavailable. Apologize briefly, then: 1. Acknowledge the customer's issue 2. Promise a human will follow up within 4 hours 3. Create a ticket for manual resolution """ Track which resolutions worked and which didn't: # After customer interaction ends def record_outcome(trace_id: str, customer_feedback: str): # Did they accept the resolution? # Did they escalate? # Did they express satisfaction? # Store for agent improvement After 6 months in production: 73% of tickets resolved without human intervention Average response time: 1 minute 47 seconds Customer satisfaction: 4.2/5 (up from 3.1/5) Cost per ticket: $0.34 (down from $4.80) Peak load handling: 500 concurrent conversations The architecture isn't magic. It's just well-designed coordination between agents that each do one thing well. If you're building a similar system, I've documented the full architecture, including the prompts, error handling, and deployment setup in my AI Agent Engineering Playbook. Includes the complete prompt templates and code patterns. Building multi-agent systems is hard. But with the right architecture, it doesn't have to be painful. AI #MachineLearning #Architecture #Programming
