AI News Hub Logo

AI News Hub

The Mental Framework for Unlocking Agentic Workflows

DEV Community
Basti Ortiz

πŸ“Œ This article was originally presented as a talk at the inaugural Claude Code Manila meetup (March 5, 2026). After several requests from the attendees to share the slides, I've decided to instead publish them as a full article here. I figured that this was a more effective medium to share my insights on reliable long-running agentic workflows. One of the most common mistakes that I see people make when they've just started out chatting with AI is dump everything into the same conversation. Why? Because "it remembers better" apparently. And true enough, Claude will remember what you put in the current session. Long-running conversations can go on seemingly forever… until the "compacting conversation" alert comes up, and suddenly Claude gets a bit dumber. The reason for this, of course, is that Claude (like all current-gen LLMs) have a limited context window. I'll spare you the details because you're likely already aware of how this context window works; otherwise you wouldn't have attended a Claude Code meetup, right? Anyway, what's important to point out here is that anecdotally, LLMs like Claude have a so-called "dumb zone". Beyond 40% of the context window, "context rot" kicks in and the model's ability to reason, recall, and perform significantly degrades. That's why Claude Code automatically compacts a conversation at 85% capacity. But what this means is that whatever you were doing or talking about prior to compaction suddenly gets compressed into a lossy summary that gets prefixed at the start of the next session. What looks like a long conversation is actually just compacted summaries stitched together. Now this is typically not a big problem in Claude Chat, where most conversations are short and follow the question-and-answer format. But where you do hit the limits is when you start to use Claude for agentic workflows. Namely: Coding Sessions (with Claude Code) Automated Long-Running Analyses Just to give you a picture of how bad context rot can get if we're not careful, imagine if we used Claude Code like how we would use Claude Chat: a single long conversation for everything. System Prompt (few thousand tokens) MCPs, skills, rules, etc. (a couple thousand tokens) Read CLAUDE.md (hundred more tokens) Plan Mode (few thousand more tokens) Explore Mode (Glob, WebSearch, Read, Bash, etc.) Execute Mode (Bash, Write, etc.) More Tool Calls! You'll probably get away with one or two features. But in the limit, compaction will be inevitable. And soon, your compaction will get compacted. And your compacted compaction accumulates in the conversation preamble until the summary itself takes up more than 40% of the context window! (That's happened to me before.) Without proper context management, these workflows are outright impossible. A single parallelized Explore agent can take up as much as 60k tokens. If those 60k tokens weren't self-contained by the sub-agent's context window, that would've been dumped onto your main session! Imagine what that looks like in a typical Plan Mode. A medium-sized feature in a fairly large codebase typically spawns 3 parallel Explore sub-agents, each of which finish with ~60k tokens by the end. So if that were done sequentially in the main context window, that's already ~180k tokens out of the 200k budget! We haven't even gotten to the main planning work, much less the actual task! It gets even worse because when Claude approaches its context limits, it's trained to be lazier and lazier in an effort to preserve its remaining window. Anthropic calls this context awareness; I call it laziness. 🀣 It's therefore in your best interest to keep your context lean so you can have reliable, comprehensive, and correct outputs from Claude. So, what is the correct mental framework for agentic workflows? We start with the Principle of Least Context. Treat the main session exclusively as an orchestrator. In an ideal world, it should know nothing about the internals of the workflow. Zero data leakage is the goal. The moment you leak data into the main context window, that's precious tokens wasted on what could've been orchestration. In practice, this is what sub-agents are for. Sub-agents allow us to delegate work in a dedicated context window. This is crucial because it lets us contain/isolate several tool calls and file reads into its own throwaway window. From the perspective of the main orchestrator, when the sub-agent's work is done, it just reads the summary/output of the work without leaking any of the intermediate discovery and artifacts along the way. This is the Principle of Least Context at work. Now multiply these sub-agents in parallel, and you have a powerful primitive that enables your agent to analyze (practically) infinite context without intelligence degradation. If you're processing several documents every day (e.g., emails, meeting transcripts, procurement requests, Yahoo Finance news articles, feature requirements, engagement proposals, etc.), the Principle of Least Context is the key that automates reliable batch processing for you. Let's put this into practice. Here's an example workflow that I used to do manually once a week for half a day, but I now pass to Claude Code and have it done within the hour. As a software engineer, I maintain several codebases at work and for my side projects. Each of those codebases has third-party dependencies. These are a bunch of open-source code that I didn't write, but still download and depend on for my project to work. It just makes development easier than writing everything from scratch. But these dependencies need to be kept up to date for the latest features, performance improvements, bug fixes, and security patches. The problem is that not all dependency upgrades are safe; some can actually break existing code, which needs to be fixed after the fact. So, Basti… how do you figure out which dependencies are safe to upgrade? In the olden days, I would scour the GitHub repositories of every single outdated dependency. The changelogs and breaking change notices are typically in the Releases page… But sometimes they're only in the CHANGELOG.md... But sometimes they're named RELEASES.md... But sometimes the upgrade is so large that they have a separate blog post about it… But sometimes the blog posts point to a dedicated migration guide in the official docs! Multiply this workflow by 20 dependencies, and now you have an entire half-day of research ahead of you on top of assessing the architectural impacts on the codebase, and also not to mention applying the necessary code fixes to the breaking changes. And this is just for a week's worth of dependencies, by the way. If you slip up and lag behind by at least one month, it's easy to be outdated by 40+ dependencies. Boy, you're in trouble now. When it comes to dependency management, it's so much easier to keep up than catch up. / dependency-wrangler Dependency Wrangler A Claude Code plugin marketplace for dependency management tools. Installation /plugin marketplace add bastidood/dependency-wrangler /plugin install dependabump@dependency-wrangler Plugins Dependabump Orchestrates dependency bumping across package managers: npm, pnpm, yarn, bun, uv, and cargo. /dependabump:bump-dependencies Flag Effect --include-major Include major version bumps (deferred by default) --include-patch Analyze patch bumps (assumed safe by default) The workflow detects outdated packages, scrapes changelogs, assesses codebase impact, and proposes a staged upgrade plan ordered from safest to riskiest. Warning Running this workflow across ~20 dependencies can consume up to half of the 5-hour rate limits in a $100 Claude Max subscription. With ~40 parallelized dependencies, you may even hit rate limits on a single run entirely. This is why the --include-major and --include-patch flags are disabled by default. Typically, most unexpected breaking changes occur in minor version bumps anyway. But, it's still better to not… View on GitHub And that's why I automated this workflow and wrote the Dependabump plugin for Claude Code. Dependabump takes everything that I just explained to you and does the changelog research for all outdated dependencies in parallel sub-agents. In a typical run, Claude Code orchestrates around ~20 parallel researcher sub-agents to explore changelogs. At its peak, the workflow achieved 40 parallel sub-agents, which accomplished a full day's manual research in 15 minutes. But our story doesn't end there. It's easy to parallelize work, but how do you consolidate the results? The (naive) default way is to dump all the summaries to the orchestrator. But this is stupid! We're back to square one by polluting the context window with changelog summaries. The better way to do this is to have sub-agents dump their analyses in Markdown files, and only tell the orchestrator about these new files. This significantly reduces the context dump in the reduction step of the workflow. Now here's the clever part: we spawn a brand new sub-agent whose responsibility is to read all of these *.md files and perform the consolidation work in its own context window. Congratulations! You just shielded the orchestrator from the context dump. πŸŽ‰ Side channels are a general pattern that you can apply in your own agentic workflows. The Principle of Least Context urges us to push heavy context into side channels such as file systems, databases, etc. By the end of the workflow: You will have probed ~20 outdated dependencies in parallel. Get accurate codebase impact assessments. Produce actionable migration plans and breaking change resolutions. But burn through half of your Claude Max (5-hour) rate limits. This is fine because… The alternative is a slower, less efficient, less reliable, less accurate, and less useful workflow that is constantly auto-compacting. Imagine doing this with an unoptimized workflow for emails. For financial statements? For legal documents? For meeting transcripts? Suddenly it makes more sense now why we need this mental framework to produce the best outputs. But despite that, the main orchestration window remains at only 20-30% of its context window, which is suitable for follow-up planning and work. Imagine combing through a million tokens, but still keeping your main session at 20-30%. That is peak context management. Okay, but let's take a step back because I glossed over a few details on how to implement effective and efficient sub-agents. So a while back, the Claude Code team introduced the notion of forked skills. Simply by setting your agent skill to use context: fork, you can now invoke an agent skill as if it were in its own context window. Because this feature blurs the line between skills and sub-agents, I want to set the record straight on what I've found to be the most effective way to use these features and how to think about them in our mental framework. The sub-agent prompt determines the "personality" of the workflow. Like most system prompts, this is where you describe roles, goals, restrictions, and tools. This is the wrong place to put your workflow steps. A system prompt is supposed to be lean guidance, much like how we strive to trim down our CLAUDE.md. Less is more. Meanwhile, the skill prompt describes the exact steps of the workflow. This is where you describe (in excruciating detail) all the processes, the edge cases, and the expected output. Being in an agent skill also grants you the superpower of progressive disclosure. Leverage references/, scripts/, and assets/ directories in your skills so the agent can progressively load conditional context. In Dependabump, I use this technique to only load the relevant package manager specifics of the project. Otherwise, I would've dumped them all into the core SKILL.md (e.g., npm, pnpm, yarn, bun, cargo, etc.). That's just a waste of tokens. The interplay between skills and sub-agents becomes interesting when context: fork is set. From the official Claude Code docs, context: fork invokes a skill in a new context window, where the designated agent is the system prompt while the SKILL.md is the task delegation message. This is exactly why the skill should contain the workflow steps. This is opposed to having a sub-agent with preloaded skills. In this case, we have no control over the delegation message. When a skill is invoked from within a sub-agent, the Claude Code harness generates the message for us.1 Approach System Prompt Task Also loads... Skill with context: fork From agent type (e.g., Explore, Plan, general-purpose) SKILL.md content CLAUDE.md Sub-agent with skills field Sub-agent's own Markdown body Claude-generated delegation message Preloaded skills + CLAUDE.md So, for maximum control, I recommend the paired sub-agent + skill pattern: the sub-agent is the "personality" while the skill is the "workhorse". ⚠️ The main limitation of forked skills, however, is that they cannot be invoked in parallel unlike sub-agents. For parallel work, I still encode the workflow in the sub-agent's system prompt. This exception is unfortunately more of a Claude Code harness limitation than it is a conceptual limitation. To keep things organized, I follow a simple naming convention: nouns for sub-agents and verbs for skills. For example: Dependabump has a changelog-scraper agent and an associated scrape-dependency-changelogs skill. They come in pairs, so name them in pairs, too. I'll admit that this is fairly verbose, which is why I advocate for wrapping single-entry-point workflows in Claude plugins so that they can be self-contained and namespaced. Then, just make sure to set your "private skills" (implementation details) as user-invocable: false. .claude/ β”œβ”€β”€ agents/ β”‚ β”œβ”€β”€ codebase-researcher.md β”‚ β”œβ”€β”€ changelog-consolidator.md β”‚ └── impact-assessor.md └── skills/ β”œβ”€β”€ assess-impacts/ β”‚ └── SKILL.md β”œβ”€β”€ consolidate-changelogs/ β”‚ └── SKILL.md └── research-codebase/ └── SKILL.md Of course, you don't always have to write your own bespoke sub-agent; sometimes the builtin ones like Explore, Plan, and general-purpose are enough. But, I must warn you that sub-agents do not inherit the skills from the parent conversation. If you don't explicitly list out the skills available to a sub-agent, then the Claude Code harness intentionally makes them inaccessible. This is by design. name: coordinator description: Coordinates work across specialized agents tools: [Bash, Glob, Read] skills: [api-conventions, error-handling-patterns] # Only these skills are visible! That means the builtin sub-agents (which have no configured skills, by the way!) have zero visibility into your entire corpus of agent skills. Yeah, you heard me right: the Explore and Plan sub-agents cannot read your skills! 🀯 So to save you from hours of debugging, I recommend just defining your own custom sub-agent so you can explicitly configure the available tools and skills. The alternative is you finding out late into the workflow that your sub-agent had been stuck in a failure loop because it didn't have access to a particular skill. So what did we learn? Context rot undermines the reliability and diligence of your agentic workflows. Therefore, the Principle of Least Context is your best friend. Defend your orchestrator's context window with your life! Use side channel techniques to shield your agents from context dumps. Implementation-wise: use a two-pronged agent + skill approach in your workflows for maximum control over task delegation. Use nouns for sub-agents + verbs for skills to enforce a semantic naming convention. If possible, wrap these in a Claude plugin to namespace entry points and hide private skills. There's so much more to discuss like progressive disclosure techniques, but let's save that for the next meetup. In the meantime, I hope you've learned a lot from this demo. And please do install Dependabump in your projects. Star ⭐ the GitHub repo and share it with your colleagues. I know how much of a hassle it is to keep up with dependencies. I hope my little plugin bridges that gap for you and your team so you don't have to worry about outdated dependencies ever again. This table was adapted from the official Claude Code documentation. β†©