AI News Hub Logo

AI News Hub

πŸ¦€ PicoClaw Deep Dive πŸ€– β€” A Field Guide to Building an Ultra-Light AI Agent in Go 🐹

DEV Community
Truong Phung

A comprehensive, actionable guide to the principles, techniques, and architecture behind sipeed/picoclaw β€” written so you can build a similar system from scratch. 🧩 What PicoClaw Is and Why It Matters 🎯 Design Philosophy πŸ—οΈ High-Level Architecture πŸ”„ Core Concept #1 β€” The Agent Loop & Pipeline πŸ•ΉοΈ Core Concept #2 β€” Steering (Mid-Loop Message Injection) 🀝 Core Concept #3 β€” SubTurn (Hierarchical Sub-Agents) πŸ’Ύ Core Concept #4 β€” Sessions & JSONL Persistence 🧭 Core Concept #5 β€” Rule-Based Model Routing πŸͺ Core Concept #6 β€” The Hook System πŸ“‘ Core Concept #7 β€” Channel Abstraction (18+ chat platforms) πŸ€– Core Concept #8 β€” Provider Abstraction (30+ LLMs) πŸ› οΈ Core Concept #9 β€” Tools, Skills, and MCP ⚑ Resource-Efficiency Techniques (the + aliases β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ pkg/agent (the loop) β”‚ β”‚ β”‚ β”‚ pipeline_setup β†’ pipeline_llm β†’ β”‚ β”‚ pipeline_execute (tools) β†’ pipeline_finalizeβ”‚ β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ steering β”‚ β”‚ subturn β”‚ β”‚ hooks β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β–² β–² β”‚ β”‚ β”‚ tools β”‚ MCP β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ pkg/tools β”‚ β”‚ pkg/mcp β”‚ β”‚ fs / shell / β”‚ β”‚ isolated β”‚ β”‚ hardware / β”‚ β”‚ command β”‚ β”‚ search ... β”‚ β”‚ transport β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ pkg/providers (factory + facades) β”‚ β”‚ anthropic / openai_compat / azure / β”‚ β”‚ bedrock / oauth / cli ... β”‚ β”‚ cooldown Β· ratelimiter Β· fallback Β· β”‚ β”‚ error_classifier β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Three top-level binaries are produced from cmd/: picoclaw β€” the agent itself (CLI + headless server) picoclaw-launcher-tui β€” terminal UI launcher membench β€” internal memory benchmark used to keep the Stable, opaque, the source of truth Legacy agent:main:direct:user123 Backward compat, resolved transparently The JSONL backend resolves legacy aliases to canonical keys during reads and writes β€” so you can rename schemes without losing history. Per session: .jsonl β€” one providers.Message per line, append-only. .meta.json β€” { summary, created_at, updated_at, line_count, skip_offset, scope, aliases }. Why two files: messages are append-only and crash-safe; metadata is overwritten under a per-shard mutex but small enough that a torn write is recoverable from the JSONL. "Designed around append-first durability and stale-over-loss recovery." The allocator turns inbound metadata into scope values: space β†’ : chat β†’ : topic β†’ topic: sender β†’ canonicalized through identity-link mappings (so that a user's Telegram ID and Slack ID map to the same logical sender) Special case: Telegram forum topics append / to chat values when topic is not an explicit dimension β€” preventing topic cross-talk by default. A 64-shard mutex array (hash key β†’ shard) serializes per-session writes without keeping an unbounded mutex map. This is a small but important pattern: lock striping is essentially free and fixes 99% of session-store contention bugs. On startup the system attempts to migrate legacy JSON sessions into JSONL. If migration fails, it falls back to the legacy SessionManager rather than crash-looping the agent. Make session keys content-addressed (sha256 over a canonical scope signature) so renaming dimensions doesn't break history. Sidecar metadata is far simpler than embedding a header line in the JSONL. Lock striping > one big mutex > one mutex per session. 64 shards is a good default. pkg/routing is a two-stage pipeline: Agent dispatch β€” Router picks which agent definition handles the message (rules over channel, sender, content, command-prefix, etc). Model routing β€” once an agent is chosen, the RuleClassifier decides whether to use the agent's primary (heavy) model or a globally-configured cheap light model. { "routing": { "enabled": true, "light_model": "gemini-2.0-flash", "threshold": 0.35 } } The classifier is intentionally language-agnostic (no keyword lists), using five structural features: Feature What it measures TokenEstimate Approximate token count (CJK-aware rune counting) CodeBlockCount Number of fenced ` blocks in latest message RecentToolCalls Tool invocations in the last 6 history entries ConversationDepth Total history length HasAttachments Media references or recognized file extensions Signal Weight Has attachments 1.00 Code block present 0.40 Tokens > 200 0.35 Recent tool calls > 3 0.25 Tokens > 50 0.15 Recent tool calls 1–3 0.10 Conversation depth > 10 0.10 With threshold 0.35, trivial chat stays cheap; code, attachments, or active tool use trigger heavy. Long plain prompts cross at the 200-token boundary. pkg/agent/turn_coord.go swaps the candidate provider list to agent.LightCandidates when score to chat values when topic isn't an explicit dimension. Tool side effects after a user correction Skip remaining tools on steering arrival; emit explicit skip results. Orphan SubTurn results crashing parent 16-slot result buffer + Critical: true for must-finish work. context.Background() vs parent ctx confusion Document explicitly in your SubTurn API; default to independent timeouts. API keys in plaintext config Two files: config.json + .security.yml with stricter perms. Memory regressions slipping in Ship membench and gate it in CI. MIPS LE binaries refused by kernel Patch ELF e_flags at offset 36 after build. Hooks blocking turns Per-class timeouts: observer 200ms, interceptor 5s, approval 30s. Rebuilding when adding a provider Provider config is protocol/model strings; factory dispatches at runtime. Schema drift between sessions Lazy migration in JSONL backend; never edit applied "migrations" β€” append new ones. Routing rules buried in code Routing is data β€” JSON rules + features. Hot-reload friendly. 30 channels each duplicating retry logic Centralize retry/split/rate-limit in manager.go; channels send a single chunk. MCP server bug killing the agent Spawn each MCP server in an isolated process via isolated_command_transport. One mutex around the session store 64-shard mutex array on hash(key). If you read these files in this order, the architecture clicks fast: cmd/picoclaw/main.go β€” the boot sequence. pkg/bus/types.go β€” the typed message contract that flows through the whole system. pkg/agent/definition.go β€” what an agent is as data. pkg/agent/pipeline.go β†’ pipeline_setup.go β†’ pipeline_llm.go β†’ pipeline_execute.go β†’ pipeline_finalize.go β€” the loop. pkg/agent/turn_coord.go β€” the brains tying routing, providers, and steering together. pkg/agent/steering.go β€” the most copy-worthy single concept in the project. pkg/agent/subturn.go β€” sub-agent semantics. pkg/session/manager.go + jsonl_backend.go + allocator.go β€” durable state. pkg/routing/router.go + classifier.go + features.go β€” cheap-first routing. pkg/agent/hooks.go + hook_mount.go + hook_process.go β€” extensibility. pkg/channels/manager.go + base.go + interfaces.go β€” channel abstraction. pkg/providers/factory.go + cooldown.go + fallback.go + error_classifier.go β€” provider reliability stack. pkg/tools/registry.go + toolloop.go β€” tool execution. pkg/mcp/manager.go + isolated_command_transport.go β€” MCP integration. pkg/skills/registry.go + installer.go β€” plugin marketplace. Makefile β€” cross-compilation matrix, ELF patching, version stamping. docs/architecture/*.md β€” official narrative for steering, subturn, sessions, routing, hooks. Use Go. Static binaries, small RSS, uniform across architectures. Typed message bus with first-class Peer, Sender, MessageID. Pipelined agent loop: setup β†’ LLM β†’ tools β†’ finalize, with a turn state struct. Steering: per-session FIFO queue polled at 4 checkpoints; skipped tools get explicit results. SubTurns with depth ≀ 3, concurrency ≀ 5, independent timeouts, Critical flag for must-finish. Sessions: structured SessionScope β†’ canonical sk_v1_ key, JSONL + .meta.json, 64-shard locking. Routing: classifier with 5 structural features, weighted score, light_model below threshold. Hooks: 5 sync points + observer events, in-process or JSON-RPC over stdio, per-class timeouts. Channels: each in its own sub-package, embed BaseChannel, declare optional capabilities by interface, manager owns retries/splitting/rate-limit. Providers: factory + facades + cooldown + ratelimiter + fallback + error_classifier, configured by protocol/model strings, secrets in .security.yml. Tools / MCP / Skills: in-process tools for built-ins; MCP for untrusted external tools (isolated transport); skills as installable bundles from a registry. Bounded queues, streaming, lazy init, -ldflags="-s -w", -trimpath, membench regression gate. Cross-compile to amd64/arm/arm64/riscv64/mipsle + Darwin + Windows + NetBSD; patch MIPS ELF e_flags; ship a launcher that auto-picks the binary. Build steps 1–12 from Β§16 in order, validate with the patterns in Β§17, and you have a PicoClaw-class agent. If you found this helpful, let me know by leaving a πŸ‘ or a comment!, or if you think this post could help someone, feel free to share it! Thank you very much! πŸ˜ƒ