🦀 PicoClaw Deep Dive 🤖 — A Field Guide to Building an Ultra-Light AI Agent in Go 🐹

DEV Community

Truong Phung

Apr 28, 2026, 04:54 AM

A comprehensive, actionable guide to the principles, techniques, and architecture behind sipeed/picoclaw — written so you can build a similar system from scratch. 🧩 What PicoClaw Is and Why It Matters 🎯 Design Philosophy 🏗️ High-Level Architecture 🔄 Core Concept #1 — The Agent Loop & Pipeline 🕹️ Core Concept #2 — Steering (Mid-Loop Message Injection) 🤝 Core Concept #3 — SubTurn (Hierarchical Sub-Agents) 💾 Core Concept #4 — Sessions & JSONL Persistence 🧭 Core Concept #5 — Rule-Based Model Routing 🪝 Core Concept #6 — The Hook System 📡 Core Concept #7 — Channel Abstraction (18+ chat platforms) 🤖 Core Concept #8 — Provider Abstraction (30+ LLMs) 🛠️ Core Concept #9 — Tools, Skills, and MCP ⚡ Resource-Efficiency Techniques (the + aliases │ └──────────────────────┬─────────────────────┘ ▼ ┌────────────────────────────────────────────┐ │ pkg/agent (the loop) │ │ │ │ pipeline_setup → pipeline_llm → │ │ pipeline_execute (tools) → pipeline_finalize│ │ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ steering │ │ subturn │ │ hooks │ │ │ └──────────┘ └──────────┘ └──────────┘ │ │ │ │ ▲ ▲ │ │ │ tools │ MCP │ └───────┼──────────────────────────┼──────────┘ │ │ ┌───────┴────────┐ ┌───────┴────────┐ │ pkg/tools │ │ pkg/mcp │ │ fs / shell / │ │ isolated │ │ hardware / │ │ command │ │ search ... │ │ transport │ └────────────────┘ └────────────────┘ ┌────────────────────────────────────────────┐ │ pkg/providers (factory + facades) │ │ anthropic / openai_compat / azure / │ │ bedrock / oauth / cli ... │ │ cooldown · ratelimiter · fallback · │ │ error_classifier │ └────────────────────────────────────────────┘ Three top-level binaries are produced from cmd/: picoclaw — the agent itself (CLI + headless server) picoclaw-launcher-tui — terminal UI launcher membench — internal memory benchmark used to keep the Stable, opaque, the source of truth Legacy agent:main:direct:user123 Backward compat, resolved transparently The JSONL backend resolves legacy aliases to canonical keys during reads and writes — so you can rename schemes without losing history. Per session: .jsonl — one providers.Message per line, append-only. .meta.json — { summary, created_at, updated_at, line_count, skip_offset, scope, aliases }. Why two files: messages are append-only and crash-safe; metadata is overwritten under a per-shard mutex but small enough that a torn write is recoverable from the JSONL. "Designed around append-first durability and stale-over-loss recovery." The allocator turns inbound metadata into scope values: space → : chat → : topic → topic: sender → canonicalized through identity-link mappings (so that a user's Telegram ID and Slack ID map to the same logical sender) Special case: Telegram forum topics append / to chat values when topic is not an explicit dimension — preventing topic cross-talk by default. A 64-shard mutex array (hash key → shard) serializes per-session writes without keeping an unbounded mutex map. This is a small but important pattern: lock striping is essentially free and fixes 99% of session-store contention bugs. On startup the system attempts to migrate legacy JSON sessions into JSONL. If migration fails, it falls back to the legacy SessionManager rather than crash-looping the agent. Make session keys content-addressed (sha256 over a canonical scope signature) so renaming dimensions doesn't break history. Sidecar metadata is far simpler than embedding a header line in the JSONL. Lock striping > one big mutex > one mutex per session. 64 shards is a good default. pkg/routing is a two-stage pipeline: Agent dispatch — Router picks which agent definition handles the message (rules over channel, sender, content, command-prefix, etc). Model routing — once an agent is chosen, the RuleClassifier decides whether to use the agent's primary (heavy) model or a globally-configured cheap light model. { "routing": { "enabled": true, "light_model": "gemini-2.0-flash", "threshold": 0.35 } } The classifier is intentionally language-agnostic (no keyword lists), using five structural features: Feature What it measures TokenEstimate Approximate token count (CJK-aware rune counting) CodeBlockCount Number of fenced ` blocks in latest message RecentToolCalls Tool invocations in the last 6 history entries ConversationDepth Total history length HasAttachments Media references or recognized file extensions Signal Weight Has attachments 1.00 Code block present 0.40 Tokens > 200 0.35 Recent tool calls > 3 0.25 Tokens > 50 0.15 Recent tool calls 1–3 0.10 Conversation depth > 10 0.10 With threshold 0.35, trivial chat stays cheap; code, attachments, or active tool use trigger heavy. Long plain prompts cross at the 200-token boundary. pkg/agent/turn_coord.go swaps the candidate provider list to agent.LightCandidates when score to chat values when topic isn't an explicit dimension. Tool side effects after a user correction Skip remaining tools on steering arrival; emit explicit skip results. Orphan SubTurn results crashing parent 16-slot result buffer + Critical: true for must-finish work. context.Background() vs parent ctx confusion Document explicitly in your SubTurn API; default to independent timeouts. API keys in plaintext config Two files: config.json + .security.yml with stricter perms. Memory regressions slipping in Ship membench and gate it in CI. MIPS LE binaries refused by kernel Patch ELF e_flags at offset 36 after build. Hooks blocking turns Per-class timeouts: observer 200ms, interceptor 5s, approval 30s. Rebuilding when adding a provider Provider config is protocol/model strings; factory dispatches at runtime. Schema drift between sessions Lazy migration in JSONL backend; never edit applied "migrations" — append new ones. Routing rules buried in code Routing is data — JSON rules + features. Hot-reload friendly. 30 channels each duplicating retry logic Centralize retry/split/rate-limit in manager.go; channels send a single chunk. MCP server bug killing the agent Spawn each MCP server in an isolated process via isolated_command_transport. One mutex around the session store 64-shard mutex array on hash(key). If you read these files in this order, the architecture clicks fast: cmd/picoclaw/main.go — the boot sequence. pkg/bus/types.go — the typed message contract that flows through the whole system. pkg/agent/definition.go — what an agent is as data. pkg/agent/pipeline.go → pipeline_setup.go → pipeline_llm.go → pipeline_execute.go → pipeline_finalize.go — the loop. pkg/agent/turn_coord.go — the brains tying routing, providers, and steering together. pkg/agent/steering.go — the most copy-worthy single concept in the project. pkg/agent/subturn.go — sub-agent semantics. pkg/session/manager.go + jsonl_backend.go + allocator.go — durable state. pkg/routing/router.go + classifier.go + features.go — cheap-first routing. pkg/agent/hooks.go + hook_mount.go + hook_process.go — extensibility. pkg/channels/manager.go + base.go + interfaces.go — channel abstraction. pkg/providers/factory.go + cooldown.go + fallback.go + error_classifier.go — provider reliability stack. pkg/tools/registry.go + toolloop.go — tool execution. pkg/mcp/manager.go + isolated_command_transport.go — MCP integration. pkg/skills/registry.go + installer.go — plugin marketplace. Makefile — cross-compilation matrix, ELF patching, version stamping. docs/architecture/*.md — official narrative for steering, subturn, sessions, routing, hooks. Use Go. Static binaries, small RSS, uniform across architectures. Typed message bus with first-class Peer, Sender, MessageID. Pipelined agent loop: setup → LLM → tools → finalize, with a turn state struct. Steering: per-session FIFO queue polled at 4 checkpoints; skipped tools get explicit results. SubTurns with depth ≤ 3, concurrency ≤ 5, independent timeouts, Critical flag for must-finish. Sessions: structured SessionScope → canonical sk_v1_ key, JSONL + .meta.json, 64-shard locking. Routing: classifier with 5 structural features, weighted score, light_model below threshold. Hooks: 5 sync points + observer events, in-process or JSON-RPC over stdio, per-class timeouts. Channels: each in its own sub-package, embed BaseChannel, declare optional capabilities by interface, manager owns retries/splitting/rate-limit. Providers: factory + facades + cooldown + ratelimiter + fallback + error_classifier, configured by protocol/model strings, secrets in .security.yml. Tools / MCP / Skills: in-process tools for built-ins; MCP for untrusted external tools (isolated transport); skills as installable bundles from a registry. Bounded queues, streaming, lazy init, -ldflags="-s -w", -trimpath, membench regression gate. Cross-compile to amd64/arm/arm64/riscv64/mipsle + Darwin + Windows + NetBSD; patch MIPS ELF e_flags; ship a launcher that auto-picks the binary. Build steps 1–12 from §16 in order, validate with the patterns in §17, and you have a PicoClaw-class agent. If you found this helpful, let me know by leaving a 👍 or a comment!, or if you think this post could help someone, feel free to share it! Thank you very much! 😃