AI News Hub Logo

AI News Hub

Deep Dive into Open Agent SDK (Part 4): Multi-Agent Collaboration

DEV Community
NEE

A single Agent, no matter how powerful, is just one executor. Real development tasks are often multi-step and multi-role: someone explores the codebase, someone designs a plan, then someone writes code and runs tests. A single Agent working alone easily bloats its context and loses efficiency. Open Agent SDK addresses this at three levels: Sub-Agents — The main Agent dynamically spawns sub-agents during execution, delegating specialized tasks Task System — Tracks progress and results of multi-step work Team + Messaging — Multiple Agents form a team, communicating via a mailbox system This article analyzes each level's implementation, then examines how they combine for task orchestration. Sub-agent spawning isn't AgentTool directly creating a new Agent — there's a protocol layer in between. SubAgentSpawner is defined in Types/AgentTypes.swift: public protocol SubAgentSpawner: Sendable { func spawn( prompt: String, model: String?, systemPrompt: String?, allowedTools: [String]?, maxTurns: Int? ) async -> SubAgentResult func spawn( prompt: String, model: String?, systemPrompt: String?, allowedTools: [String]?, maxTurns: Int?, disallowedTools: [String]?, mcpServers: [AgentMcpServerSpec]?, skills: [String]?, runInBackground: Bool?, isolation: String?, name: String?, teamName: String?, mode: PermissionMode?, resume: String? ) async -> SubAgentResult } Two methods: a basic version (5 parameters) and an enhanced version (13 parameters). The protocol also provides a default implementation where the enhanced version calls the basic one, so existing implementations don't need changes to be compatible. Why is the spawner defined in Types/ instead of Core/? Because Tools/Advanced/AgentTool.swift needs it, but Tools/ shouldn't import Core/. The protocol is defined in Types/, with concrete implementation in Core/, injected via ToolContext.agentSpawner — a common dependency inversion pattern in the SDK. DefaultSubAgentSpawner in Core/DefaultSubAgentSpawner.swift does the following: final class DefaultSubAgentSpawner: SubAgentSpawner, @unchecked Sendable { private let apiKey: String private let baseURL: String? private let parentModel: String private let parentTools: [ToolProtocol] private let provider: LLMProvider private let client: (any LLMClient)? func spawn(...) async -> SubAgentResult { // 1. Filter out AgentTool to prevent infinite recursion var subTools = parentTools.filter { $0.name != "Agent" } // 2. If allowedTools specified, filter further if let allowed = allowedTools, !allowed.isEmpty { let allowedSet = Set(allowed) subTools = subTools.filter { allowedSet.contains($0.name) } } // 3. disallowedTools filters again (higher priority than allowedTools) if let disallowed = disallowedTools, !disallowed.isEmpty { let disallowedSet = Set(disallowed) subTools = subTools.filter { !disallowedSet.contains($0.name) } } // 4. Create sub-agent and execute let options = AgentOptions( apiKey: apiKey, model: model ?? parentModel, systemPrompt: systemPrompt, maxTurns: maxTurns ?? 10, tools: subTools ) let agent = Agent(options: options) let result = await agent.prompt(prompt) return SubAgentResult( text: result.text.isEmpty ? "(Subagent completed with no text output)" : result.text, toolCalls: [], isError: result.status != .success ) } } Key points: Recursion prevention: Sub-agents never receive AgentTool, preventing Agent-in-Agent-in-Agent scenarios Tool inheritance: Sub-agents inherit all parent tools (except AgentTool) by default, but can be restricted via allowedTools/disallowedTools Blocking execution: The parent Agent awaits after calling spawn(), waiting for the sub-agent to finish before continuing AgentTool is the tool exposed to the LLM. When the LLM calls the Agent tool, it passes a prompt and parameters. AgentTool handles calling the spawner to generate a sub-agent. It has two built-in sub-agent types: private let BUILTIN_AGENTS: [String: AgentDefinition] = [ "Explore": AgentDefinition( name: "Explore", description: "Fast agent specialized for exploring codebases...", systemPrompt: "You are a codebase exploration agent...", tools: ["Read", "Glob", "Grep", "Bash"], maxTurns: 10 ), "Plan": AgentDefinition( name: "Plan", description: "Software architect agent for designing implementation plans...", systemPrompt: "You are a software architect. Design implementation plans...", tools: ["Read", "Glob", "Grep", "Bash"], maxTurns: 10 ), ] Explore: Codebase exploration, using Glob to find files, Grep to search content, Read to read files Plan: Software architect, understanding the codebase then outputting implementation plans When the LLM calls AgentTool, it specifies the type via the subagent_type field: { "prompt": "Explore the project structure and find all Swift source files", "description": "Explore codebase", "subagent_type": "Explore" } AgentTool also supports optional parameters: model, maxTurns, run_in_background, isolation, team_name, mode. These are passed through to the spawner. The SDK includes a SubagentExample demonstrating the full flow of a coordinator main Agent delegating to an Explore sub-agent: // Main agent system prompt let systemPrompt = """ You are a coordinator agent. When given a task, you should delegate it to a sub-agent \ using the Agent tool. The Agent tool will spawn a specialized agent (e.g., "Explore" type) \ that can use Read, Glob, Grep, and Bash tools to investigate the codebase. \ After the sub-agent returns its findings, summarize the results for the user. """ // Register tools: core tools + AgentTool let agent = createAgent(options: AgentOptions( apiKey: apiKey, model: defaultModel, systemPrompt: systemPrompt, maxTurns: 10, tools: getAllBaseTools(tier: .core) + [createAgentTool()] )) // Send task — main Agent will call AgentTool to delegate to Explore sub-agent for await message in agent.stream(""" Explore the current project directory. Find all Swift source files, \ examine the project structure, and provide a summary. \ Use the Agent tool to delegate this task to an Explore sub-agent. """) { switch message { case .toolUse(let data): if data.toolName == "Agent" { print("[Sub-agent Delegation: \(data.toolName)]") } case .toolResult(let data): print("[Result: \(data.content.prefix(200))]") case .result(let data): print("Turns: \(data.numTurns), Cost: $\(data.totalCostUsd)") default: break } } Execution flow: user sends prompt → main Agent decides it needs to explore the codebase → calls AgentTool → AgentTool spawns Explore sub-agent via spawner → sub-agent uses Glob/Grep/Read → results returned to main Agent → main Agent summarizes and responds to user. Sub-agents solve the "who does the work" problem. The Task system solves "how much work is done, who's doing it, and what are the results." TaskStore is a Swift Actor, ensuring concurrency safety: public actor TaskStore { private var tasks: [String: Task] = [:] private var taskCounter: Int = 0 public func create( subject: String, description: String? = nil, owner: String? = nil, status: TaskStatus = .pending ) -> Task { taskCounter += 1 let id = "task_\(taskCounter)" let now = dateFormatter.string(from: Date()) let task = Task( id: id, subject: subject, description: description, status: status, owner: owner, createdAt: now, updatedAt: now ) tasks[id] = task return task } } Using an Actor instead of a regular class means all methods are implicitly serialized — no manual locking needed. Multiple Agents creating tasks simultaneously won't cause race conditions. Tasks have 5 states with clear transition rules: public enum TaskStatus: String, Sendable, Equatable, Codable { case pending // Waiting to start case inProgress // In progress case completed // Completed case failed // Failed case cancelled // Cancelled } State transitions have constraints: pending and inProgress can transition to any state, but completed, failed, and cancelled are terminal states that cannot change: private func isValidTransition(from: TaskStatus, to: TaskStatus) -> Bool { switch from { case .pending, .inProgress: return true case .completed, .failed, .cancelled: return false // Terminal state, cannot transition } } As a state diagram: pending ──→ inProgress ──→ completed │ │ │ ├──→ failed │ │ └──→ cancelled ←──┘ TaskStatus also has a convenient parse() method supporting both camelCase (inProgress) and snake_case (in_progress), since LLM JSON formats aren't always consistent: public static func parse(_ string: String) -> TaskStatus? { if let direct = TaskStatus(rawValue: string) { return direct } // snake_case → camelCase let camel = string .split(separator: "_") .enumerated() .map { $0.offset == 0 ? String($0.element) : String($0.element).capitalized } .joined() return TaskStatus(rawValue: camel) } A Task instance includes dependency relationships and metadata beyond basic status tracking: public struct Task: Sendable, Equatable, Codable { public let id: String public var subject: String public var description: String? public var status: TaskStatus public var owner: String? // Who's working on it public let createdAt: String public var updatedAt: String public var output: String? // Result public var blockedBy: [String]? // Blocked by which tasks public var blocks: [String]? // Which tasks this blocks public var metadata: [String: String]? } The blockedBy and blocks fields show the Task system has built-in task dependency support — Task A can declare "I need Tasks B and C to complete before I can start." The SDK provides three tools for the LLM to operate the Task system: TaskCreate — Create a task: public func createTaskCreateTool() -> ToolProtocol { return defineTool( name: "TaskCreate", description: "Create a new task for tracking work progress.", inputSchema: taskCreateSchema, isReadOnly: false ) { (input: TaskCreateInput, context: ToolContext) in guard let taskStore = context.taskStore else { return ToolExecuteResult(content: "Error: TaskStore not available.", isError: true) } let initialStatus: TaskStatus = input.status.flatMap { TaskStatus.parse($0) } ?? .pending let task = await taskStore.create( subject: input.subject, description: input.description, owner: input.owner, status: initialStatus ) return ToolExecuteResult( content: "Task created: \(task.id) - \"\(task.subject)\" (\(task.status.rawValue))", isError: false ) } } TaskList — List tasks (supports filtering by status and owner): // LLM can query "list all pending tasks" or "list tasks assigned to agent-1" let tasks = await taskStore.list(status: status, owner: input.owner) TaskUpdate — Update a task (status, description, owner, output): do { let task = try await taskStore.update( id: input.id, status: status, description: input.description, owner: input.owner, output: input.output ) return ToolExecuteResult( content: "Task updated: \(task.id) - \(task.status.rawValue) - \"\(task.subject)\"", isError: false ) } catch let error as TaskStoreError { return ToolExecuteResult(content: "Error: \(error.localizedDescription)", isError: true) } Note that TaskUpdate throws invalidStatusTransition errors — e.g., trying to change a completed task to inProgress. The LLM receives the error message and can adjust its strategy. The Task system tracks "what to do." The Team system answers "who works with whom." Like TaskStore, TeamStore is an Actor: public actor TeamStore { private var teams: [String: Team] = [:] private var teamCounter: Int = 0 public func create( name: String, members: [TeamMember] = [], leaderId: String = "self" ) -> Team { teamCounter += 1 let id = "team_\(teamCounter)" let team = Team( id: id, name: name, members: members, leaderId: leaderId, createdAt: dateFormatter.string(from: Date()), status: .active ) teams[id] = team return team } } Teams have two states: active and disbanded. Deleting a Team doesn't actually delete it — the status changes to disbanded. Disbanded Teams cannot have members added or removed. public enum TeamRole: String, Sendable, Equatable, Codable { case leader // Team leader case member // Regular member } public struct TeamMember: Sendable, Equatable, Codable { public let name: String public let role: TeamRole } When TeamCreateTool creates a Team, all members default to member role, and leaderId defaults to "self" (the creator): let members: [TeamMember] = input.members?.map { TeamMember(name: $0) } ?? [] let team = await teamStore.create( name: input.name, members: members, leaderId: "self" ) TeamStore also supports dynamic member management: // Add member try teamStore.addMember(teamId: "team_1", member: TeamMember(name: "agent-coder")) // Remove member try teamStore.removeMember(teamId: "team_1", agentName: "agent-coder") // Find which team an Agent belongs to let team = await teamStore.getTeamForAgent(agentName: "agent-coder") getTeamForAgent is important for messaging — when sending a message, you need to know which Team the sender belongs to in order to verify the recipient is a teammate. Besides TeamStore, there's an AgentRegistry tracking all active Agents: public actor AgentRegistry { private var agents: [String: AgentRegistryEntry] = [:] private var nameIndex: [String: String] = [:] // name -> agentId public func register(agentId: String, name: String, agentType: String) throws -> AgentRegistryEntry { if nameIndex[name] != nil { throw AgentRegistryError.duplicateAgentName(name: name) } let entry = AgentRegistryEntry(...) agents[agentId] = entry nameIndex[name] = agentId return entry } public func getByName(name: String) -> AgentRegistryEntry? { guard let agentId = nameIndex[name] else { return nil } return agents[agentId] } } Name uniqueness constraint — no two Agents with the same name in one AgentRegistry. nameIndex is a reverse lookup index supporting O(1) name lookups. With Teams in place, Agents need to communicate. The SDK uses a Mailbox pattern — messages aren't pushed directly to the recipient but placed in their mailbox for them to pick up. public actor MailboxStore { private var mailboxes: [String: [AgentMessage]] = [:] // Point-to-point send public func send(from: String, to: String, content: String, type: AgentMessageType = .text) { let message = AgentMessage(from: from, to: to, content: content, timestamp: dateFormatter.string(from: Date()), type: type) if mailboxes[to] == nil { mailboxes[to] = [] } mailboxes[to]?.append(message) } // Broadcast — to all Agents with mailboxes public func broadcast(from: String, content: String, type: AgentMessageType = .text) { let timestamp = dateFormatter.string(from: Date()) for (agentName, _) in mailboxes { let message = AgentMessage(from: from, to: agentName, content: content, timestamp: timestamp, type: type) mailboxes[agentName]?.append(message) } } // Read and clear mailbox public func read(agentName: String) -> [AgentMessage] { guard let messages = mailboxes[agentName] else { return [] } mailboxes[agentName] = [] // Clear after reading return messages } } Three core operations: send (point-to-point), broadcast (broadcast), read (read). read is destructive — reading clears the mailbox. broadcast only sends to Agents that already have mailboxes, not creating new ones. Message types beyond plain text (.text) include .shutdownRequest, .shutdownResponse, .planApprovalResponse — special types for team management coordination. SendMessageTool performs three layers of validation: // 1. Must have MailboxStore guard let mailboxStore = context.mailboxStore else { ... } // 2. Must have TeamStore guard let teamStore = context.teamStore else { ... } // 3. Must know who the sender is guard let senderName = context.senderName else { ... } // 4. Sender must be in a Team guard let team = await teamStore.getTeamForAgent(agentName: senderName) else { ... } // 5. Recipient must be a teammate let isMember = team.members.contains { $0.name == input.to } guard isMember else { ... } Broadcast uses "*" as recipient: { "to": "*", "message": "Phase 1 complete, starting Phase 2." } Point-to-point uses a specific name: { "to": "agent-coder", "message": "Here's the spec for module A." } Failed validations return error messages. The LLM can see which members are available and adjust the target. Individual Agent, Task, Team, and Mailbox capabilities are clear. How do they combine in practice? The simplest pattern. The main Agent receives a complex task and launches multiple sub-agents simultaneously, each handling a portion: let agent = createAgent(options: AgentOptions( apiKey: apiKey, model: "claude-sonnet-4-6", systemPrompt: """ You are a coordinator. Break complex tasks into subtasks, \ delegate each to an Explore sub-agent, then synthesize results. """, maxTurns: 20, tools: getAllBaseTools(tier: .core) + [ createAgentTool(), createTaskCreateTool(), createTaskUpdateTool(), createTaskListTool() ], taskStore: TaskStore() )) The LLM might orchestrate like this: TaskCreate("Analyze module A") — Create task Agent(prompt: "Analyze module A", subagent_type: "Explore") — Delegate to sub-agent TaskUpdate(id: "task_1", status: "completed", output: result) — Mark complete Repeat steps 1-3 for other modules Synthesize all results When multiple Agents need to collaborate long-term, use Team + Mailbox: let mailboxStore = MailboxStore() let teamStore = TeamStore() let agent = createAgent(options: AgentOptions( apiKey: apiKey, model: "claude-sonnet-4-6", agentName: "coordinator", mailboxStore: mailboxStore, teamStore: teamStore, tools: getAllBaseTools(tier: .core) + [ createAgentTool(), createTeamCreateTool(), createTeamDeleteTool(), createSendMessageTool(), createTaskCreateTool(), createTaskListTool(), createTaskUpdateTool() ] )) The LLM's orchestration might look like: TeamCreate(name: "refactor-team", members: ["explorer", "planner", "coder"]) — Form team TaskCreate("Explore codebase", owner: "explorer") — Create task Agent(prompt: "...", name: "explorer", subagent_type: "Explore") — Launch explore agent SendMessage(to: "planner", message: "Exploration done, here's the summary...") — Notify planner TaskCreate("Write implementation plan", owner: "planner") — Next task Continue progressing... Use the Task system as a work queue. The main Agent creates a batch of tasks, and sub-agents claim and execute them one by one: Main Agent: TaskCreate("Fix bug #1") → task_1 (pending) TaskCreate("Fix bug #2") → task_2 (pending) TaskCreate("Add feature X") → task_3 (pending) Sub-Agent A: TaskList(status: "pending") → [task_1, task_2, task_3] TaskUpdate(task_1, status: "in_progress", owner: "agent-a") ... do work ... TaskUpdate(task_1, status: "completed", output: "Fixed by ...") Sub-Agent B: TaskList(status: "pending") → [task_2, task_3] TaskUpdate(task_2, status: "in_progress", owner: "agent-b") ... do work ... TaskStore is an Actor, so multiple Agents concurrently updating the same task won't cause issues (first-come-first-served), but there's no automatic assignment — the LLM coordinates who claims which task. This multi-agent collaboration mechanism involves several deliberate design choices: Why can't sub-agents spawn their own sub-agents? DefaultSubAgentSpawner filters out AgentTool when creating sub-agents. This is an intentional limit — without it, an Agent spawning an Agent spawning an Agent leads to uncontrollable recursion depth and exponential token consumption. Why is messaging pull-based instead of push-based? MailboxStore.read() is destructive reading — Agents must actively call to receive messages. This is much simpler than push mode — no callbacks to maintain, no handling for offline Agents. The trade-off is reduced real-time responsiveness, but at the frequency of tool calls in the Agent Loop (tools can be called every turn), pull latency is acceptable. Why doesn't the Task state machine auto-transition? The blockedBy field declares dependency relationships, but TaskStore.update() doesn't automatically check whether prerequisite tasks are complete. This means "wait for Task A before doing Task B" logic must be implemented by the LLM — calling TaskList to check status, then deciding next steps. This is a pragmatic trade-off: automatic dependency resolution could be added, but for the LLM, explicit checking is more controllable. Open Agent SDK's multi-agent collaboration consists of three layers: Sub-Agents: Via SubAgentSpawner protocol and AgentTool, the main Agent dynamically spawns sub-agents at runtime for task delegation, with built-in Explore and Plan types Task System: Task tracking based on TaskStore Actor with a clear state machine (pending → inProgress → completed/failed/cancelled), where terminal states are irreversible Team + Mailbox: TeamStore manages teams and members, MailboxStore implements mailbox-style messaging, supporting point-to-point and broadcast All three layers can be used independently or combined — use Tasks to track progress, Teams to organize members, Mailbox for coordination, and sub-agents to execute the actual work. The next article covers the SDK's session persistence: how Agent conversation history is stored, restored, and how to continue previous work after a restart. Deep Dive into Open Agent SDK (Swift) Series: Part 0: Open Agent SDK (Swift): Build AI Agent Applications with Native Swift Concurrency Part 1: Deep Dive into Open Agent SDK (Part 1): Agent Loop Internals Part 2: Deep Dive into Open Agent SDK (Part 2): Behind the 34 Built-in Tools Part 3: Deep Dive into Open Agent SDK (Part 3): MCP Integration in Practice Part 4: Deep Dive into Open Agent SDK (Part 4): Multi-Agent Collaboration Part 5: Deep Dive into Open Agent SDK (Part 5): Session Persistence and Security Part 6: Deep Dive into Open Agent SDK (Part 6): Multi-LLM Providers and Runtime Controls GitHub: terryso/open-agent-sdk-swift