Deep Dive into Open Agent SDK (Part 6): Multi-LLM Providers and Runtime Controls

DEV Community

NEE

Apr 26, 2026, 11:55 PM

An Agent shouldn't be locked to a single LLM provider. Different tasks suit different models — simple questions use cheap models, complex reasoning uses expensive ones, and some scenarios even require local models. Runtime needs change too: users might want deeper thinking mid-session, discover the budget is running low and need to downgrade, or switch to a local model to save money. Open Agent SDK's approach: define a unified LLMClient protocol, with Anthropic and OpenAI-compatible providers each having an implementation. Internally, the Agent uses Anthropic format throughout. Switching providers requires changing only one configuration parameter, and models can be switched dynamically at runtime with adjustable thinking depth and budget control. This article analyzes the SDK's multi-provider adaptation mechanism and runtime control capabilities. First, the protocol definition: public protocol LLMClient: Sendable { nonisolated func sendMessage( model: String, messages: [[String: Any]], maxTokens: Int, system: String?, tools: [[String: Any]]?, toolChoice: [String: Any]?, thinking: [String: Any]?, temperature: Double? ) async throws -> [String: Any] nonisolated func streamMessage( model: String, messages: [[String: Any]], maxTokens: Int, system: String?, tools: [[String: Any]]?, toolChoice: [String: Any]?, thinking: [String: Any]?, temperature: Double? ) async throws -> AsyncThrowingStream } Two core methods: one blocking, one streaming. The parameter list covers all capabilities of mainstream LLM APIs: model selection, message history, token limit, system prompt, tool definitions, tool choice strategy, thinking configuration, and temperature. Key design decision: return values are always in Anthropic format dictionaries. Whether the underlying API is Anthropic native or OpenAI-compatible, the Agent internally receives the same structure — content arrays with {"type": "text", "text": "..."} or {"type": "tool_use", "name": "...", "input": {...}}, and stop_reason as end_turn/tool_use/max_tokens. This means Agent Loop processing logic doesn't need to care about which API is underneath. Streaming returns use AsyncThrowingStream, where SSEEvent is an enum: public enum SSEEvent: @unchecked Sendable { case messageStart(message: [String: Any]) case contentBlockStart(index: Int, contentBlock: [String: Any]) case contentBlockDelta(index: Int, delta: [String: Any]) case contentBlockStop(index: Int) case messageDelta(delta: [String: Any], usage: [String: Any]) case messageStop case ping case error(data: [String: Any]) } 7 event types covering all streaming response events from the Anthropic Messages API. OpenAI-compatible layer streaming output is converted to the same SSEEvent sequence. AnthropicClient is the Anthropic native implementation of LLMClient, using actor for concurrency safety: public actor AnthropicClient: LLMClient { private let apiKey: String private let baseURL: URL // Default https://api.anthropic.com private let urlSession: URLSession public init(apiKey: String, baseURL: String? = nil, urlSession: URLSession? = nil) { self.apiKey = apiKey self.baseURL = URL(string: baseURL ?? "https://api.anthropic.com")! self.urlSession = urlSession ?? URLSession.shared } } Requests are POST to /v1/messages with x-api-key and anthropic-version headers: private nonisolated func buildRequest(body: [String: Any]) throws -> URLRequest { var request = URLRequest(url: URL(string: baseURL.absoluteString + "/v1/messages")!) request.httpMethod = "POST" request.timeoutInterval = 300 request.setValue(apiKey, forHTTPHeaderField: "x-api-key") request.setValue("2023-06-01", forHTTPHeaderField: "anthropic-version") request.setValue("application/json", forHTTPHeaderField: "content-type") request.httpBody = try JSONSerialization.data(withJSONObject: body, options: []) return request } Since it uses the Anthropic native API, sendMessage request and response bodies don't need format conversion — request parameters are assembled directly as dictionaries, responses are parsed directly. Streaming mode directly parses Anthropic SSE text. A security detail: all error messages replace the API Key with *** to prevent key leakage into logs: let safeMessage = errorMessage.replacingOccurrences(of: apiKey, with: "***") AnthropicClient directly supports Extended Thinking. When the Agent configures ThinkingConfig, the thinking parameter is passed through: if let thinking { body["thinking"] = thinking } OpenAIClient is the heavy lifter. It accepts Anthropic-format parameters, converts them to OpenAI Chat Completion API format, sends the request, then converts the OpenAI response back to Anthropic format. The Agent is completely unaware of the underlying OpenAI-compatible API. public actor OpenAIClient: LLMClient { private let apiKey: String private let baseURL: URL // Default https://api.openai.com/v1 public init(apiKey: String, baseURL: String? = nil, urlSession: URLSession? = nil) { self.apiKey = apiKey self.baseURL = URL(string: baseURL ?? "https://api.openai.com/v1")! self.urlSession = urlSession ?? URLSession.shared } } Requests go to /chat/completions with Bearer token authentication — standard practice for OpenAI-compatible APIs. Any provider supporting the /v1/chat/completions endpoint works with this client. Several key differences between Anthropic and OpenAI message formats must be handled during conversion: 1. System Message Position Anthropic passes the system prompt as a top-level parameter; OpenAI includes it as the first role: "system" message: if let system { result.append(["role": "system", "content": system]) } 2. Tool Result Representation Anthropic packages multiple tool_results in one role: "user" message's content array; OpenAI requires each tool result as a separate role: "tool" message: let toolResults = blocks.filter { $0["type"] as? String == "tool_result" } if !toolResults.isEmpty { return toolResults.map { block in [ "role": "tool", "tool_call_id": block["tool_use_id"] as? String ?? "", "content": block["content"] ?? "", ] } } 3. Tool Use Representation Anthropic uses type: "tool_use" blocks in the content array; OpenAI uses a tool_calls array at the message top level: result["tool_calls"] = toolUseBlocks.enumerated().map { index, block in let inputDict = block["input"] as? [String: Any] ?? [:] let arguments = (try? JSONSerialization.data(withJSONObject: inputDict, options: [])) .flatMap { String(data: $0, encoding: .utf8) } ?? "{}" return [ "id": block["id"] as? String ?? "call_\(index)", "type": "function", "function": [ "name": block["name"] as? String ?? "", "arguments": arguments, // OpenAI requires JSON string, not dictionary ], ] } Note that OpenAI's arguments must be a JSON string, not a dictionary object — serialization is done here. OpenAI's response structure (choices[0].message) needs conversion to Anthropic format: // stop_reason mapping private static func mapStopReason(_ finishReason: String) -> String { switch finishReason { case "stop": return "end_turn" case "tool_calls": return "tool_use" case "length": return "max_tokens" default: return finishReason } } // usage mapping usage = [ "input_tokens": openAIUsage["prompt_tokens"] as? Int ?? 0, "output_tokens": openAIUsage["completion_tokens"] as? Int ?? 0, ] Streaming conversion is more complex. OpenAI's streaming format (data: {"choices":[{"delta":{...}}]}) must be converted chunk by chunk to Anthropic's SSEEvent sequence: First chunk → messageStart Text delta → contentBlockDelta(type: "text_delta") Tool call start → contentBlockStart(type: "tool_use"), parameter delta → contentBlockDelta(type: "input_json_delta") End → contentBlockStop + messageDelta + messageStop The conversion function tracks how many content blocks are open, whether text blocks are closed, and which tool call blocks are still open to generate correct index values. A safety check ensures messageStop is always emitted, even if the original stream doesn't end normally. Connecting to different OpenAI-compatible providers only requires changing baseURL and model: // DeepSeek let agent = createAgent(options: AgentOptions( apiKey: "sk-...", model: "deepseek-chat", baseURL: "https://api.deepseek.com/v1", provider: .openai )) // Ollama local let localAgent = createAgent(options: AgentOptions( apiKey: "ollama", // Ollama doesn't need a key, any value works model: "qwen3:8b", baseURL: "http://localhost:11434/v1", provider: .openai )) // GLM let glmAgent = createAgent(options: AgentOptions( apiKey: "xxx.glm-xxx", model: "glm-4-plus", baseURL: "https://open.bigmodel.cn/api/paas/v4", provider: .openai )) The SDK supports dynamic model switching at runtime without recreating the Agent: let agent = createAgent(options: AgentOptions( apiKey: apiKey, model: "claude-sonnet-4-6", fallbackModel: "claude-haiku-4-5" // Used if primary model fails )) // Use sonnet for a simple question first let result1 = await agent.prompt("What is 2 + 3?") print(result1.costBreakdown) // [CostBreakdownEntry(model: "claude-sonnet-4-6", inputTokens: 45, outputTokens: 3, costUsd: 0.000180)] // Switch to opus for reasoning-intensive question try agent.switchModel("claude-opus-4-6") let result2 = await agent.prompt("Explain the difference between structs and classes in Swift.") print(result2.costBreakdown) // [CostBreakdownEntry(model: "claude-opus-4-6", inputTokens: 52, outputTokens: 156, costUsd: 0.011970)] switchModel() implementation: public func switchModel(_ model: String) throws { let trimmed = model.trimmingCharacters(in: .whitespacesAndNewlines) guard !trimmed.isEmpty else { throw SDKError.invalidConfiguration("Model name cannot be empty") } let oldModel = self.model self.model = trimmed self.options.model = trimmed Logger.shared.info("Agent", "model_switch", data: ["from": oldModel, "to": trimmed]) } No allowlist validation — whatever model name is passed gets used. Unsupported models will error at the API level. This design choice exists because OpenAI-compatible provider model names can't be exhaustively listed. fallbackModel is configured in AgentOptions. When the primary model fails completely (retries exhausted), the SDK automatically retries with the fallback: if let fallbackModel = self.options.fallbackModel, fallbackModel != self.model { let fallbackResponse = try await retryClient.sendMessage( model: fallbackModel, messages: retryMessages, ... ) // Temporarily switch to fallback for cost tracking let originalModel = self.model self.model = fallbackModel // ... process response } CostBreakdownEntry records costs grouped by model name: public struct CostBreakdownEntry: Sendable, Equatable { public let model: String public let inputTokens: Int public let outputTokens: Int public let costUsd: Double } If models are switched mid-query (or fallback triggered), QueryResult.costBreakdown contains multiple entries with per-model costs. Costs are calculated from built-in price tables: public nonisolated(unsafe) var MODEL_PRICING: [String: ModelPricing] = [ "claude-opus-4-6": ModelPricing(input: 15.0 / 1_000_000, output: 75.0 / 1_000_000), "claude-sonnet-4-6": ModelPricing(input: 3.0 / 1_000_000, output: 15.0 / 1_000_000), "claude-haiku-4-5": ModelPricing(input: 0.8 / 1_000_000, output: 4.0 / 1_000_000), // ... ] Custom models can register pricing via registerModel(_:pricing:): registerModel("glm-4-plus", pricing: ModelPricing( input: 0.1 / 1_000_000, output: 0.1 / 1_000_000 )) The SDK uses the ThinkingConfig enum to control LLM deep thinking: public enum ThinkingConfig: Sendable, Equatable { case adaptive // Model decides whether to think case enabled(budgetTokens: Int) // Specify thinking token budget case disabled // Disable deep thinking } Three modes for different uses: adaptive: Let the model judge — no thinking for simple questions, automatic thinking for complex ones. Most convenient for daily use. enabled(budgetTokens:): Explicitly control thinking budget. For deep analysis, allocate 10,000 thinking tokens. disabled: Turn off thinking entirely for maximum speed. EffortLevel is a higher-level abstraction mapping to specific thinking token budgets: public enum EffortLevel: String, Sendable, CaseIterable { case low // 1024 tokens case medium // 5120 tokens case high // 10240 tokens case max // 32768 tokens public var budgetTokens: Int { switch self { case .low: return 1024 case .medium: return 5120 case .high: return 10240 case .max: return 32768 } } } Set in AgentOptions: let agent = createAgent(options: AgentOptions( apiKey: apiKey, model: "claude-sonnet-4-6", effort: .high // 10240 thinking tokens )) setMaxThinkingTokens() adjusts the thinking budget between queries: // Simple question, fewer thinking tokens try agent.setMaxThinkingTokens(2048) let r1 = await agent.prompt("Summarize this file.") // Complex reasoning, increase budget try agent.setMaxThinkingTokens(16000) let r2 = await agent.prompt("Design a concurrent data structure for...") // Disable thinking try agent.setMaxThinkingTokens(nil) Positive integer enables thinking with the specified budget; nil disables it. Zero or negative throws SDKError.invalidConfiguration. ModelInfo describes each model's capabilities: public struct ModelInfo: Sendable, Equatable { public let value: String public let displayName: String public let description: String public let supportsEffort: Bool public let supportedEffortLevels: [EffortLevel]? public let supportsAdaptiveThinking: Bool? public let supportsFastMode: Bool? } This lets UI layers dynamically show available options based on model capabilities. Skills are a special extension mechanism in the SDK — essentially "prompt templates with tool restrictions." A Skill defines a set of prompt instructions, an allowed tool subset, and an optional model override. public struct Skill: Sendable { public let name: String public let description: String public let aliases: [String] // Aliases, e.g. ["ci"] for commit public let userInvocable: Bool // Whether users can invoke via /command public let toolRestrictions: [ToolRestriction]? // Restrict available tools, nil = all public let modelOverride: String? // Override model during execution public let isAvailable: @Sendable () -> Bool // Runtime availability check public let promptTemplate: String // Prompt template content public let whenToUse: String? // Tell LLM when to use this skill public let argumentHint: String? // Argument hint, e.g. "[message]" public let baseDir: String? // Absolute path to skill directory public let supportingFiles: [String] // Supporting files (references, scripts, etc.) } The SDK predefines 5 common Skills accessible via the BuiltInSkills namespace: Skill Aliases Allowed Tools Function commit ci bash, read, glob, grep Analyze git diff, generate commit message review review-pr, cr bash, read, glob, grep Review code changes from 5 dimensions simplify — bash, read, grep, glob Review code for reuse, quality, efficiency debug investigate, diagnose read, grep, glob, bash Analyze errors, locate root cause test run-tests bash, read, write, glob, grep Generate and execute test cases Each Skill restricts its tool scope. commit only allows bash, read, glob, grep — no file writing needed. debug is also read-only (read, grep, glob, bash), diagnosing without modifying. test is the only built-in Skill allowing write, since it creates test files. test Skill also has a runtime availability check: isAvailable: { let cwd = FileManager.default.currentDirectoryPath let testIndicators = [ "Package.swift", "pytest.ini", "jest.config", "vitest.config", "Cargo.toml", "go.mod", ] for indicator in testIndicators { if FileManager.default.fileExists(atPath: cwd + "/" + indicator) { return true } } return false } The test Skill is only visible to users when a test framework configuration file is detected. SkillRegistry is a thread-safe skill manager using DispatchQueue for concurrent access protection: public final class SkillRegistry: @unchecked Sendable { private var skills: [String: Skill] = [:] private var orderedNames: [String] = [] private var aliases: [String: String] = [:] private let queue = DispatchQueue(label: "com.openagentsdk.skillregistry") public func register(_ skill: Skill) { ... } public func find(_ name: String) -> Skill? { ... } // Find by name or alias public var allSkills: [Skill] { ... } public var userInvocableSkills: [Skill] { ... } } Register, find, replace, and delete are all queue.sync-protected operations. Aliases automatically build mappings on registration — after registering BuiltInSkills.commit, registry.find("ci") also finds it. Skills don't all need code registration. SkillLoader can automatically discover skills from the filesystem — any directory containing a SKILL.md file is recognized as a skill package. Scanning directories by priority from low to high: ~/.config/agents/skills (lowest priority) ~/.agents/skills ~/.claude/skills $PWD/.agents/skills $PWD/.claude/skills (highest priority) Same-named skills discovered later override earlier ones (last-wins). SKILL.md uses YAML frontmatter for metadata: --- name: polyv-live-cli description: Manage live streaming services aliases: live, plv allowed-tools: Bash, Read, Write, Glob when-to-use: user asks about live streaming management argument-hint: [action] [options] --- # polyv-live-cli Skill You are a live streaming management assistant... The allowed-tools in frontmatter is parsed into ToolRestriction arrays, restricting which tools the skill can use during execution. SkillLoader uses a "progressive loading" strategy: only loading the SKILL.md Markdown body as the prompt template. Supporting files (references, scripts, templates) only have their paths recorded without loading content. The Agent reads them on-demand via Read/Bash tools when needed. let registry = SkillRegistry() registry.register(BuiltInSkills.commit) registry.register(BuiltInSkills.review) // Discover custom skills from filesystem let count = registry.registerDiscoveredSkills() // Or specify directories registry.registerDiscoveredSkills(from: ["/opt/custom-skills"]) // Or only register whitelisted skills registry.registerDiscoveredSkills(skillNames: ["polyv-live-cli"]) ToolRestriction enum defines restrictable tools: public enum ToolRestriction: String, Sendable, CaseIterable { case bash, read, write, edit, glob, grep case webFetch, webSearch, askUser, toolSearch case agent, sendMessage case taskCreate, taskList, taskUpdate, taskGet, taskStop, taskOutput case teamCreate, teamDelete case notebookEdit, skill } When a Skill sets toolRestrictions: [.bash, .read, .glob], the Agent can only use these three tools during execution. Other tool calls are intercepted. To make Skills available to an Agent, add SkillTool to the tools list: var tools = getAllBaseTools(tier: .core) tools.append(createSkillTool(registry: registry)) let agent = createAgent(options: AgentOptions( apiKey: apiKey, model: "claude-sonnet-4-6", permissionMode: .bypassPermissions, tools: tools )) // Agent auto-discovers and invokes based on skill list in system prompt let result = await agent.prompt("Use the commit skill to analyze current changes") SkillRegistry.formatSkillsForPrompt() generates a skill list snippet injected into the system prompt, including each skill's name, description, and trigger conditions. The LLM sees this list and knows when to invoke which skill. maxBudgetUsd sets the cost ceiling per query: let agent = createAgent(options: AgentOptions( apiKey: apiKey, model: "claude-sonnet-4-6", maxBudgetUsd: 0.05 // Maximum 5 cents )) Cumulative cost is checked after each turn: if let budget = options.maxBudgetUsd, totalCostUsd > budget { status = .errorMaxBudgetUsd break } When the budget is exceeded, the loop exits immediately. Any text and token statistics already generated are preserved in QueryResult — you get a partial result, not a blank one. Two ways to interrupt an in-progress query: // Method 1: Call interrupt() agent.interrupt() // Method 2: Cancel Task let task = Task { await agent.prompt("Long running query...") } // Later task.cancel() interrupt() internally sets the _interrupted flag and cancels the stream task. The Agent Loop checks this flag at multiple checkpoints (loop entry, between read-only/mutation tools, inside SSE event loop, before/after tool execution), exiting immediately on detection. Runtime permission mode and tool authorization callbacks can be switched: // Switch permission mode agent.setPermissionMode(.askForPermission) // Set custom authorization callback (higher priority than permissionMode) agent.setCanUseTool { toolName, input in if toolName == "Bash" { return .deny("Bash is disabled") } return .allow } // Revert to permissionMode control agent.setCanUseTool(nil) setCanUseTool callback takes priority over permissionMode. Calling setPermissionMode() clears any previously set callback. The SDK supports configuration via environment variables. Priority: code settings > environment variables > defaults. Environment Variable Corresponding Field Default CODEANY_API_KEY apiKey nil CODEANY_MODEL model claude-sonnet-4-6 CODEANY_BASE_URL baseURL nil (use provider default) Merged using SDKConfiguration.resolved(): // Code-set values take priority; unset values read from environment let config = SDKConfiguration.resolved(overrides: SDKConfiguration( apiKey: "sk-...", // Overrides CODEANY_API_KEY model: "claude-sonnet-4-6" // Overrides CODEANY_MODEL )) // Environment variables only let envConfig = SDKConfiguration.fromEnvironment() All LLM requests are wrapped with withRetry: public struct RetryConfig: Sendable { public let maxRetries: Int // Max retries, default 3 public let baseDelayMs: Int // Base delay, default 2000ms public let maxDelayMs: Int // Max delay, default 30000ms public let retryableStatusCodes: Set // Default [429, 500, 502, 503, 529] } Exponential backoff + 25% random jitter to avoid thundering herd. Only SDKError.apiError with status codes in the retryable set triggers retries; other errors are thrown directly. let delay = config.baseDelayMs * (1 << attempt) let jitterMs = Int(Double(delay) * 0.25 * (Double.random(in: -1...1))) let totalMs = max(0, min(delay + jitterMs, config.maxDelayMs)) Six articles complete, covering the full architecture of Open Agent SDK (Swift): Part 0: Project overview — what the SDK does, overall architecture, how to use it Part 1: Agent Loop internals — the complete cycle from prompt to multi-turn conversation Part 2: 34 built-in tools — ToolProtocol design, three-tier architecture, custom extensions Part 3: MCP integration — connecting external tool servers, discovery, and communication Part 4: Multi-agent collaboration — Team/Task models, inter-agent communication Part 5: Session persistence and security — session storage, permission control, Hook system Part 6 (this article): Multi-LLM providers and runtime controls — LLMClient protocol, OpenAI adapter, model switching, Thinking/Effort, Skills system Starting from the Agent Loop core, the tool system is the loop's "execution" stage, MCP is external tool extension, multi-agent is the collaboration pattern, sessions are state persistence, security and Hooks are governance mechanisms, and this article's multi-provider and runtime controls ensure flexibility — letting the same Agent choose the most appropriate model and control strategy for each scenario. Deep Dive into Open Agent SDK (Swift) Series: Part 0: Open Agent SDK (Swift): Build AI Agent Applications with Native Swift Concurrency Part 1: Deep Dive into Open Agent SDK (Part 1): Agent Loop Internals Part 2: Deep Dive into Open Agent SDK (Part 2): Behind the 34 Built-in Tools Part 3: Deep Dive into Open Agent SDK (Part 3): MCP Integration in Practice Part 4: Deep Dive into Open Agent SDK (Part 4): Multi-Agent Collaboration Part 5: Deep Dive into Open Agent SDK (Part 5): Session Persistence and Security Part 6: Deep Dive into Open Agent SDK (Part 6): Multi-LLM Providers and Runtime Controls GitHub: terryso/open-agent-sdk-swift