Deep Dive into Open Agent SDK (Part 1): Agent Loop Internals

DEV Community

NEE

Apr 26, 2026, 11:54 PM

Most LLM wrapper libraries do three things: send a request, get a response, done. But a true Agent goes further — it decides whether to call tools, executes them, feeds the results back to the LLM, and loops until it arrives at a final answer. This loop is the Agent Loop. This article analyzes the Open Agent SDK (Swift) Agent Loop implementation — how it uses native Swift concurrency to run the entire cycle in-process. In one sentence: user sends a prompt → LLM returns a response → if the LLM wants to call tools, execute them → feed tool results back to the LLM → repeat until the LLM says "I'm done". Rendered as a flowchart: flowchart TD A["User prompt"] --> B["Build messages + tools"] B --> C["Call LLM API"] C -->|end_turn / stop_sequence| D["Return result"] C -->|max_tokens| C2["Append 'please continue'"] C2 --> C C -->|tool_use| E["Extract tool_use blocks"] E --> F["Partition into read-only / mutation"] F --> G["Read-only tools: concurrent execution"] F --> H["Mutation tools: serial execution"] G --> I["Micro-compact large results"] H --> I I --> J["tool_result appended to messages"] J --> C Several key decision points in this loop: When to stop? Normal exit when the LLM returns end_turn or stop_sequence; forced stop at maxTurns; interrupted when exceeding budget (maxBudgetUsd); or user-initiated cancellation. How to execute tools? Read-only tools run concurrently (up to 10), mutation tools run serially — avoiding concurrent file writes. What if context gets too long? Auto-compaction — use an LLM call to summarize history, freeing up space to continue. What if something goes wrong mid-loop? Built-in retry, fallback models, and error isolation (tool errors don't crash the loop). The SDK provides two ways to trigger the Agent Loop: let agent = createAgent(options: AgentOptions( apiKey: "sk-...", model: "claude-sonnet-4-6", maxTurns: 10 )) let result = await agent.prompt("Read Package.swift and summarize it.") print(result.text) print("Turns: \(result.numTurns), Cost: $\(String(format: "%.4f", result.totalCostUsd))") prompt() is the "fire and wait" mode. A single call runs through all turns and returns the final QueryResult. Best for scenarios where you don't need to see intermediate steps — background tasks, CLI tools, etc. for await message in agent.stream("Explain this codebase.") { switch message { case .partialMessage(let data): print(data.text, terminator: "") // Real-time text output case .toolUse(let data): print("[Using tool: \(data.toolName)]") case .toolResult(let data): print("[Tool done, \(data.content.count) chars]") case .result(let data): print("\nDone: \(data.numTurns) turns, $\(String(format: "%.4f", data.totalCostUsd))") default: break } } stream() returns AsyncStream, continuously pushing events as the LLM processes. The SDK defines 17 message types — from partialMessage (text fragments) to toolUse (tool invocations) to result (final outcome) — covering every stage of the Agent Loop. Which one to choose depends on your UI requirements: use stream() for real-time display, prompt() when you don't need it. Regardless of the entry point, the core logic of each turn is identical. Let's trace through the code. if shouldAutoCompact(messages: messages, model: model, state: compactState) { let (newMessages, _, newState) = await compactConversation( client: client, model: model, messages: messages, state: compactState, fileCache: fileCache, sessionMemory: sessionMemory ) messages = newMessages compactState = newState } Before each turn, check whether the estimated token count of the message history is approaching the context window limit. If so, use an LLM call to compress history into a summary, replacing the original messages. The compaction threshold is model context window - 10,000 tokens (buffer). After 3 consecutive compaction failures, attempts stop to avoid wasting tokens. response = try await withRetry({ try await client.sendMessage( model: model, messages: messages, maxTokens: maxTokens, system: buildSystemPrompt(), tools: apiTools, ... ) }, retryConfig: retryConfig) All LLM requests are wrapped with withRetry, handling transient errors (network timeouts, 429 rate limits, etc.) according to the configured retry policy. If the primary model fails completely, a fallbackModel is configured to retry: if let fallbackModel = self.options.fallbackModel, fallbackModel != self.model { // Retry with fallbackModel... } The stop_reason in the LLM response determines the loop's direction: stop_reason Meaning Loop Behavior end_turn LLM is done speaking Normal loop exit stop_sequence Hit a stop sequence Normal loop exit tool_use LLM wants to call tools Execute tools, continue loop max_tokens Output was truncated Append "please continue", continue loop The max_tokens case has a guard: at most 3 auto-continuations, preventing infinite loops. When the LLM returns tool_use, the SDK doesn't just queue tools sequentially. Instead, it partitions them into buckets: // ToolExecutor.partitionTools() for block in blocks { let tool = tools.first { $0.name == block.name } if let tool = tool, tool.isReadOnly { readOnly.append(item) // Read-only bucket } else { mutations.append(item) // Mutation bucket } } Read-only tools (Read, Glob, Grep, WebSearch, etc.) can safely run concurrently using TaskGroup, up to 10 at a time: let batchResults = await withTaskGroup(of: ToolResult.self) { group in for item in batchSlice { group.addTask { await executeSingleTool(block: item.block, tool: item.tool, context: ...) } } // Collect results } Mutation tools (Write, Edit, Bash, etc.) must execute serially, one after another, to avoid concurrent write conflicts: for item in items { let result = await executeSingleTool(...) results.append(result) } Execution order: all read-only tools first (concurrent), then all mutation tools (serial). This significantly improves performance when the LLM returns multiple tool calls in one response — for example, when the LLM requests reading 5 files simultaneously, all 5 reads complete in parallel. After tool execution, results go through micro-compaction before being fed back to the LLM: for result in toolResults { let processedContent = await processToolResult(result.content, isError: result.isError) processedResults.append(ToolResult( toolUseId: result.toolUseId, content: processedContent, isError: result.isError )) } If a tool returns content exceeding 50,000 characters (e.g., reading a large file), the SDK uses an additional LLM call to compress it. Error results are not compacted — full error information is preserved for LLM diagnosis. After each LLM call, the SDK updates token usage and cost: let turnCost = estimateCost(model: model, usage: turnUsage) totalCostUsd += turnCost costByModel[model] = CostBreakdownEntry( model: model, inputTokens: turnUsage.inputTokens, outputTokens: turnUsage.outputTokens, costUsd: turnCost ) costByModel records costs grouped by model. This means if you switch models mid-session (via switchModel()), each model's cost is tracked separately. The final result.costBreakdown tells you exactly how much each model cost. Budget checking happens after each turn: if let budget = options.maxBudgetUsd, totalCostUsd > budget { status = .errorMaxBudgetUsd break } When the budget is exceeded, the loop exits immediately, but any text already generated is preserved in the result — you get a partial result, not a blank one. Swift's structured concurrency uses Task.isCancelled for cooperative cancellation. The SDK checks this flag at multiple checkpoints in the loop: While loop entry Between read-only and mutation tools Inside the SSE event loop Before and after tool execution // Loop entry if Task.isCancelled || _interrupted { status = .cancelled break } // Between read-only/mutation if Task.isCancelled { return results } stream() additionally supports cancellation via the interrupt() method — internally it cancels the Task holding the stream. After cancellation, the result is a QueryResult(isCancelled: true) with the partial text and token usage as of the cancellation moment. The SDK's error handling principle: tool execution errors don't propagate, API errors get retries, final failures preserve partial results. During tool execution, any error is captured as ToolResult(isError: true): static func executeSingleTool(...) async -> ToolResult { guard let tool = tool else { return ToolResult(toolUseId: block.id, content: "Error: Unknown tool", isError: true) } // ... try executing let result = await tool.call(input: block.input, context: context) return ToolResult(toolUseId: block.id, content: result.content, isError: result.isError) } Tool error results are still fed back to the LLM, which can see the error message and adjust its strategy. The Agent Loop never crashes due to a tool failure. API-level errors (network issues, 500s, etc.) trigger retries; after retries are exhausted, the fallback model kicks in; if everything fails, an errorDuringExecution status is returned. The Agent Loop fires Hook events at critical points: Hook Event Trigger Timing sessionStart Before the loop starts preToolUse Before each tool execution postToolUse After successful tool execution postToolUseFailure After failed tool execution stop When the loop ends (normal or abnormal) sessionEnd Before returning the result A typical use of Hooks is to intercept dangerous operations at preToolUse: await hookRegistry.register(.preToolUse, definition: HookDefinition( matcher: "Bash", handler: { input in return HookOutput(message: "Bash blocked in production", block: true) } )) Tools intercepted by Hooks are not executed — instead, an error result is returned. The LLM sees "Bash blocked in production" and can find an alternative way to complete the task. Besides prompt() and stream(), the SDK provides a third entry point — streamInput(), which accepts an AsyncStream as input: let input = AsyncStream { continuation in continuation.yield("What's in this project?") continuation.yield("Now explain the test structure.") continuation.finish() } for await message in agent.streamInput(input) { // Handle the response for each input } Each input element is treated as a new user message, triggering a complete prompt cycle. This is ideal for chat-style interactions: each user message is an element in the input stream, and the Agent processes them one by one with streaming output. The Agent Loop is the heart of the entire SDK. Once you understand how it works, everything else is layered on top: Tool System — The "execute tools" step in the loop MCP Integration — Connecting external tool servers when the loop starts Session Persistence — Saving the messages array after the loop ends Permission Control — Interception points before tool execution Hook System — Lifecycle event callbacks in the loop The next article dives into the Tool System: how the 34 built-in tools are organized, the design philosophy behind the ToolProtocol, and how to create custom tools with defineTool. Deep Dive into Open Agent SDK (Swift) Series: Part 0: Open Agent SDK (Swift): Build AI Agent Applications with Native Swift Concurrency Part 1: Deep Dive into Open Agent SDK (Part 1): Agent Loop Internals Part 2: Deep Dive into Open Agent SDK (Part 2): Behind the 34 Built-in Tools Part 3: Deep Dive into Open Agent SDK (Part 3): MCP Integration in Practice Part 4: Deep Dive into Open Agent SDK (Part 4): Multi-Agent Collaboration Part 5: Deep Dive into Open Agent SDK (Part 5): Session Persistence and Security Part 6: Deep Dive into Open Agent SDK (Part 6): Multi-LLM Providers and Runtime Controls GitHub: terryso/open-agent-sdk-swift