AI News Hub Logo

AI News Hub

Deep Dive into Open Agent SDK (Part 2): Behind the 34 Built-in Tools

DEV Community
NEE

The previous article analyzed how the Agent Loop works, including one crucial step: "execute tools." When the LLM says "I need to call Bash," the SDK actually spawns a process to run the command. But the tool system behind this is far more nuanced than simply "calling a function." How are 34 built-in tools organized? How do you safely convert the LLM's JSON input into Swift types? How do you control which tools are available? This article starts from the protocol definition and examines the Open Agent SDK tool system layer by layer. Every tool in the SDK conforms to the ToolProtocol protocol: public protocol ToolProtocol: Sendable { var name: String { get } var description: String { get } var inputSchema: ToolInputSchema { get } var isReadOnly: Bool { get } var annotations: ToolAnnotations? { get } func call(input: Any, context: ToolContext) async -> ToolResult } Five properties and one method. Let's go through each. name is the tool's unique identifier. The LLM uses this name in tool_use blocks to specify which tool to invoke. All built-in tools use PascalCase naming: Read, Bash, Glob, CronCreate. description is the tool description shown to the LLM. This text is included as part of the tool definition sent to the API, and its quality directly affects when the LLM chooses to invoke this tool. inputSchema is a [String: Any] JSON Schema dictionary describing the input structure the tool accepts. It's passed as-is to the input_schema field in API calls. isReadOnly is a boolean flag telling the Agent Loop whether the tool has side effects. As mentioned in the previous article, the Agent Loop uses this field for bucketing: read-only tools execute concurrently, mutation tools execute serially. annotations are optional behavioral hints containing four boolean fields: public struct ToolAnnotations: Sendable, Equatable { public let readOnlyHint: Bool // Read-only, no side effects public let destructiveHint: Bool // May perform irreversible operations public let idempotentHint: Bool // Idempotent, multiple calls produce the same result public let openWorldHint: Bool // Interacts with the external world } Note that destructiveHint defaults to true — the SDK takes a "default dangerous" stance, requiring tools to proactively declare themselves safe. These hints don't affect the SDK's own execution logic, but the LLM references them when deciding how to use tools. The call() method returns ToolResult, the content fed back to the LLM after tool execution: public struct ToolResult: Sendable { public let toolUseId: String // Corresponds to the LLM's tool_use ID public let content: String // Text content public let typedContent: [ToolContent]? // Multi-modal content (text, images, resource references) public let isError: Bool // Whether this is an error result } There's a compatibility design between content and typedContent: when typedContent has a value, content extracts all .text types and concatenates them; otherwise it returns the stored string directly. This way, older code using only content still works, while new code can use typedContent for non-text content like images. ToolContent is an enum supporting three content types: public enum ToolContent: Sendable { case text(String) case image(data: Data, mimeType: String) case resource(uri: String, name: String?) } Inside tool closures, ToolExecuteResult is used — structurally almost identical to ToolResult, just missing toolUseId (this ID is auto-filled by the calling layer). ToolContext is injected context for each tool execution, with many fields: Field Purpose cwd Current working directory toolUseId tool_use ID for this invocation agentSpawner Sub-agent spawner (used by AgentTool) cronStore Scheduled task store (used by CronTools) todoStore Todo item store (used by TodoWrite) worktreeStore Worktree store (used by WorktreeTools) planStore Plan mode store (used by PlanTools) taskStore Task management store (used by Task*Tools) mailboxStore Mailbox store (used by SendMessage) teamStore Team store (used by TeamCreate) hookRegistry Hook event registry permissionMode Permission mode canUseTool Custom permission check callback skillRegistry Skill registry (used by SkillTool) restrictionStack Tool restriction stack sandbox Sandbox settings mcpConnections MCP connection info fileCache File cache env Custom environment variables With this many optional fields, the rule is simple: inject what a tool needs; everything else is nil. The Read tool only looks at cwd, sandbox, fileCache; AgentTool only looks at agentSpawner; CronTools only looks at cronStore. Each tool depends only on its specific Store, unaware of and unconcerned with other Stores. ToolContext also provides two copy methods: withToolUseId() for updating the call ID (called by ToolExecutor on each tool execution), and withSkillContext() for incrementing skill nesting depth (used when SkillTool calls sub-skills). The SDK divides 34 tools into three tiers: Core (10), Advanced (11), and Specialist (13). Core Tier (10) Advanced Tier (11) Specialist Tier (13) ┌──────────┐ ┌──────────────┐ ┌───────────────┐ │ Read │ │ Agent │ │ CronCreate │ │ Write │ │ Skill │ │ CronDelete │ │ Edit │ │ TaskCreate │ │ CronList │ │ Glob │ │ TaskGet │ │ LSP │ │ Grep │ │ TaskList │ │ Config │ │ Bash │ │ TaskOutput │ │ TodoWrite │ │ AskUser │ │ TaskStop │ │ EnterPlanMode │ │ ToolSearch│ │ TaskUpdate │ │ ExitPlanMode │ │ WebFetch │ │ SendMessage │ │ EnterWorktree │ │ WebSearch │ │ TeamCreate │ │ ExitWorktree │ └──────────┘ │ TeamDelete │ │ RemoteTrigger │ │ NotebookEdit │ │ ListMcpRes │ └──────────────┘ │ ReadMcpRes │ └───────────────┘ The tier distinction is based not on technical implementation difficulty, but on dependency complexity and use case. The 10 Core tools are the Agent's foundational capabilities — reading files, writing files, searching code, running commands. They share a common trait: they only depend on basic ToolContext fields (cwd, sandbox, fileCache), requiring no Store injection. Take the Read tool. Its input is a file path with optional offset and limit: private struct FileReadInput: Codable { let file_path: String let offset: Int? let limit: Int? } The execution logic is straightforward: resolve path → check sandbox → query cache → read file → paginate → return content with line numbers. A file caching detail: if context.fileCache is available, it checks the cache first, skipping disk I/O on a hit. The Bash tool is much more complex, handling timeouts, output truncation, and background processes. Bash's input has 5 fields: private struct BashInput: Codable { let command: String let timeout: Int? let description: String? let runInBackground: Bool? let dangerouslyDisableSandbox: Bool? } Key implementation details: Timeout control. Default 120 seconds, maximum 600 seconds. Uses DispatchQueue.global().asyncAfter for timeout, calling process.terminate() when time's up. Output truncation. Output exceeding 100,000 characters keeps only the first 50,000 + last 50,000, connected with ...(truncated).... Background execution. When run_in_background = true, the process starts and returns a task ID immediately without waiting for completion. Process output collection uses ProcessOutputAccumulator, marked @unchecked Sendable because Pipe's readability handler and termination handler both dispatch on the same run loop queue, preventing data races. Bash's annotations sets destructiveHint: true, explicitly telling the LLM this tool is destructive. Advanced tier tools start requiring external dependencies — AgentTool needs agentSpawner, Task* tools need taskStore, SendMessage needs mailboxStore and teamStore. The Agent tool is representative of this tier. Its purpose is letting the LLM "dispatch a sub-agent" for complex tasks: public func createAgentTool() -> ToolProtocol { return defineTool( name: "Agent", description: "Launch a subagent to handle complex, multi-step tasks autonomously.", inputSchema: agentToolSchema, isReadOnly: false ) { (input: AgentToolInput, context: ToolContext) async throws -> ToolExecuteResult in guard let spawner = context.agentSpawner else { return ToolExecuteResult( content: "Error: Agent spawner not available.", isError: true ) } // Parse built-in agent types, permission mode, then spawn sub-agent let result = await spawner.spawn( prompt: input.prompt, model: input.model ?? agentDef?.model, systemPrompt: agentDef?.systemPrompt, allowedTools: agentDef?.tools, ... ) return ToolExecuteResult(content: result.text, isError: result.isError) } } AgentTool's input supports 11 fields: prompt, description, subagent_type, model, name, maxTurns, run_in_background, isolation, team_name, mode, resume. The subagent_type can specify built-in Explore or Plan types, or use a custom name. Note that agentSpawner is injected through ToolContext as a protocol type — AgentTool doesn't know how sub-agents are created. It just calls spawner.spawn(), with the concrete implementation injected by the Core layer. This dependency inversion means the Tools layer never needs to import the Core module. Specialist tier tools have heavier dependencies — each needs its own dedicated Store, and their functionality is highly domain-specific. CronTools is a set of three tools: CronCreate, CronDelete, CronList, accessing scheduled task storage via context.cronStore: public func createCronCreateTool() -> ToolProtocol { return defineTool( name: "CronCreate", description: "Create a scheduled recurring task (cron job).", inputSchema: cronCreateSchema, isReadOnly: false ) { (input: CronCreateInput, context: ToolContext) async throws -> ToolExecuteResult in guard let cronStore = context.cronStore else { return ToolExecuteResult(content: "Error: CronStore not available.", isError: true) } let job = await cronStore.create( name: input.name, schedule: input.schedule, command: input.command ) return ToolExecuteResult( content: "Cron job created: \(job.id) \"\(job.name)\"", isError: false ) } } All three tools use guard let cronStore = context.cronStore for pre-checks — if the Store isn't injected, they return an error rather than crashing. The LSP tool is another interesting example. It uses grep to simulate common Language Server Protocol operations (go to definition, find references, symbol search) without depending on an actual language server: case "goToDefinition", "goToImplementation": // 1. Extract symbol name at cursor position using regex guard let symbol = getSymbolAtPosition( filePath: filePath, line: line, character: character ) else { ... } // 2. Grep search for definition patterns let pattern = "(func|class|struct|enum|protocol|typealias|let|var|export)\\s+\(symbol)" let results = await runGrep( arguments: ["grep", "-rn", "-E", pattern, cwd], cwd: cwd ) LSP depends only on context.cwd, requiring no Store — the lightest tool in the Specialist tier. The SDK provides the defineTool factory function, letting developers create ToolProtocol-conforming tools with minimal code. It has four overloads covering different use cases. The most commonly used overload accepts a Codable input type and a closure returning String: let greetTool = defineTool( name: "Greet", description: "Generate a greeting message.", inputSchema: [ "type": "object", "properties": [ "name": ["type": "string", "description": "Person's name"] ], "required": ["name"] ], isReadOnly: true ) { (input: GreetInput, context: ToolContext) async throws -> String in return "Hello, \(input.name)!" } // Input type only needs to conform to Codable struct GreetInput: Codable { let name: String } Internally, defineTool does four things: Casts the LLM's Any type input to [String: Any] Serializes to Data using JSONSerialization Decodes to your defined Input type using JSONDecoder Calls your closure If any step fails (input isn't a dictionary, JSON serialization fails, decoding fails, closure throws), it returns an isError: true result instead of crashing the Agent Loop. This means you can safely use try in your closures — errors are gracefully caught. If a tool needs to explicitly mark errors (rather than using try to throw), use the overload returning ToolExecuteResult: let divideTool = defineTool( name: "Divide", description: "Divide two numbers.", inputSchema: [ "type": "object", "properties": [ "a": ["type": "number"], "b": ["type": "number"] ], "required": ["a", "b"] ] ) { (input: DivideInput, context: ToolContext) async throws -> ToolExecuteResult in guard input.b != 0 else { return ToolExecuteResult(content: "Error: Division by zero.", isError: true) } return ToolExecuteResult(content: "\(input.a / input.b)", isError: false) } Most built-in tools use this overload because many errors are logic-level (file doesn't exist, Store not injected) and aren't well represented by exceptions. Some tools don't need input parameters (e.g., list operations, health checks): let listTool = defineTool( name: "ListItems", description: "List all items.", inputSchema: ["type": "object", "properties": [:]] ) { (context: ToolContext) async throws -> String in return "No items found." } The closure only receives ToolContext, completely ignoring input. The final overload skips Codable decoding, passing the raw [String: Any] dictionary directly to the closure. Useful when input field types are dynamic — e.g., ConfigTool's value field can be a string, number, boolean, array, object, or null: let configTool = defineTool( name: "Config", description: "Read or write configuration values.", inputSchema: configSchema ) { (input: [String: Any], context: ToolContext) async -> ToolExecuteResult in let key = input["key"] as? String ?? "" let value = input["value"] // Any type // ... } LLM-sent JSON field names typically use snake_case (e.g., file_path, run_in_background), but Swift convention is camelCase. Input types map between them using the CodingKeys enum: private struct BashInput: Codable { let command: String let runInBackground: Bool? private enum CodingKeys: String, CodingKey { case command case runInBackground = "run_in_background" } } This is standard Swift Codable practice — defineTool's internal JSONDecoder automatically uses CodingKeys for field name conversion. Tools aren't just thrown at the LLM wholesale. The SDK has an assembly and filtering mechanism. assembleToolPool merges three tool sources into a deduplicated tool pool: public func assembleToolPool( baseTools: [ToolProtocol], // SDK built-in tools customTools: [ToolProtocol]?, // User-defined custom tools mcpTools: [ToolProtocol]?, // MCP server-provided tools allowed: [String]?, disallowed: [String]? ) -> [ToolProtocol] { // 1. Merge all sources: base + custom + MCP var combined = baseTools if let customTools { combined.append(contentsOf: customTools) } if let mcpTools { combined.append(contentsOf: mcpTools) } // 2. Deduplicate by name (latter overwrites former) var byName = [String: ToolProtocol]() for tool in combined { byName[tool.name] = tool } // 3. Apply filtering rules return filterTools( tools: Array(byName.values), allowed: allowed, disallowed: disallowed ) } Deduplication uses a Dictionary — same-named tools encountered later overwrite earlier ones. This means the priority is: MCP > custom > built-in — users can replace built-in tools with custom or MCP tools of the same name. filterTools implements allowlist/denylist filtering: public func filterTools( tools: [ToolProtocol], allowed: [String]?, // Allowlist, nil or empty means no filter disallowed: [String]? // Denylist, nil or empty means no filter ) -> [ToolProtocol] { var filtered = tools // Apply allowlist first if let allowed, !allowed.isEmpty { let allowedSet = Set(allowed) filtered = filtered.filter { allowedSet.contains($0.name) } } // Then apply denylist (denylist takes priority over allowlist) if let disallowed, !disallowed.isEmpty { let disallowedSet = Set(disallowed) filtered = filtered.filter { !disallowedSet.contains($0.name) } } return filtered } When both rules exist, the denylist takes priority — even if a tool is in the allowlist, it's excluded if it appears in the denylist. ToolRestrictionStack is a stack structure used by the Skills system to control tool visibility. When a Skill configures toolRestrictions, it pushes restrictions before execution and pops them after: let stack = ToolRestrictionStack() stack.push([.bash, .read]) // Skill A: only Bash and Read stack.push([.grep, .glob]) // Skill B (nested): only Grep and Glob // currentAllowedToolNames now returns only Grep and Glob stack.pop() // Skill B done → back to Bash and Read stack.pop() // Skill A done → restore all tools The stack's LIFO nature ensures correct behavior for nested Skills — inner Skill restrictions override outer ones, automatically restored on exit. Thread safety is ensured by an internal serial DispatchQueue. currentAllowedToolNames logic is simple: empty stack returns all tools; non-empty stack returns only tool names in the top restriction list. The final step is converting tools to the format required by the Anthropic API: public func toApiTool(_ tool: ToolProtocol) -> [String: Any] { var result: [String: Any] = [ "name": tool.name, "description": tool.description, "input_schema": tool.inputSchema ] if let annotations = tool.annotations { result["annotations"] = [ "readOnlyHint": annotations.readOnlyHint, "destructiveHint": annotations.destructiveHint, "idempotentHint": annotations.idempotentHint, "openWorldHint": annotations.openWorldHint ] } return result } annotations are only included when present — saving tokens. Tying everything together, here's a custom tool you can run directly — fetching weather: import Foundation import OpenAgentSDK // 1. Define input type struct WeatherInput: Codable { let city: String let unit: String? // "celsius" or "fahrenheit" private enum CodingKeys: String, CodingKey { case city, unit } } // 2. Create tool with defineTool let weatherTool = defineTool( name: "Weather", description: "Get current weather for a city.", inputSchema: [ "type": "object", "properties": [ "city": [ "type": "string", "description": "City name, e.g. 'Beijing'" ], "unit": [ "type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature unit, defaults to celsius" ] ], "required": ["city"] ], isReadOnly: true, annotations: ToolAnnotations( readOnlyHint: true, destructiveHint: false, openWorldHint: true // Needs to access external API ) ) { (input: WeatherInput, context: ToolContext) async throws -> ToolExecuteResult in let unit = input.unit ?? "celsius" // Call weather API (specific implementation omitted) let weather = try await fetchWeather(city: input.city, unit: unit) return ToolExecuteResult(content: weather, isError: false) } // 3. Register with Agent let agent = createAgent(options: AgentOptions( apiKey: "sk-...", model: "claude-sonnet-4-6", customTools: [weatherTool] // Custom tools automatically join the tool pool )) This tool gets merged, deduplicated, and filtered by assembleToolPool along with built-in tools, then sent to the LLM. When the LLM sees the tool definition, it automatically invokes it when it needs weather data. defineTool's internal Codable bridge automatically decodes the LLM's JSON into WeatherInput — you don't need to handle any JSON parsing manually. The tool system's design philosophy can be summarized in a few keywords: Protocol-driven. ToolProtocol specifies only the shape of a tool (name, description, input schema, execution method), not how tools are implemented. This means built-in and custom tools follow the exact same code path. Dependency injection. ToolContext's 20+ optional fields look like a lot, but each tool only reads the fields it needs. AgentTool doesn't know CronStore exists; CronCreate doesn't know SubAgentSpawner exists. Tiered organization. The Core/Advanced/Specialist tiers aren't code layers (their code structure is identical), but a division by dependency complexity. Core tools run independently, Advanced tools need Stores, Specialist tools need more specialized domain infrastructure. Fault tolerance first. defineTool wraps all potential failure points (type casting, serialization, decoding, execution) in do/catch blocks. Any error returns isError: true instead of crashing. Tool errors in the Agent Loop don't propagate — the LLM gets the error message and can adjust strategy. The next article covers MCP Integration: how the SDK connects to external tool servers, converts MCP tools to ToolProtocol, and coexists with built-in tools in the Agent Loop. Deep Dive into Open Agent SDK (Swift) Series: Part 0: Open Agent SDK (Swift): Build AI Agent Applications with Native Swift Concurrency Part 1: Deep Dive into Open Agent SDK (Part 1): Agent Loop Internals Part 2: Deep Dive into Open Agent SDK (Part 2): Behind the 34 Built-in Tools Part 3: Deep Dive into Open Agent SDK (Part 3): MCP Integration in Practice Part 4: Deep Dive into Open Agent SDK (Part 4): Multi-Agent Collaboration Part 5: Deep Dive into Open Agent SDK (Part 5): Session Persistence and Security Part 6: Deep Dive into Open Agent SDK (Part 6): Multi-LLM Providers and Runtime Controls GitHub: terryso/open-agent-sdk-swift