CLMA Frame Test

DEV Community

Robin King

May 4, 2026, 10:20 AM

CLMA vs Web Chat: Putting Iterative Verification to the Test Posted on May 6, 2026 · #CLMA #MultiAgent #CodeGeneration #EventSourcing #Comparison #Python All code is open source on GitHub: github.com/kriely/CLMA This is a companion piece to Building CLMA: A Self-Verifying Multi-Agent Framework from Scratch. In that article, I described the framework. Here, I put it to the test — head to head against a plain web chat, same model, same problem. Same LLM (DeepSeek) tasked with writing the same code. No human intervention on either side. Two questions: Q1 — Thread-safe bounded blocking queue (put/get with timeout) Q5 — Event sourcing framework for a bank account (events, replay, serialization, optimistic concurrency, business rules, freeze/unfreeze) For Q5, the CLMA version went through 3 automated iteration rounds (Solver → Verifier → Refiner → Verifier → Refiner → Verifier → Evaluator). The web chat version was a single-shot output. Both implementations passed all 12 test cases — basic put/get, blocking/unblocking behavior, timeout, edge cases (maxsize=1, maxsize=0), queue state queries, and invalid capacity. 12/12 pass for both. On the surface, a draw. But the engineering quality tells a different story. # Two separate Conditions — put and get don't contend self.not_empty = threading.Condition(self._lock) self.not_full = threading.Condition(self._lock) # time.monotonic() — immune to system clock adjustments remaining = timeout while self.full(): if remaining is not None: if remaining None: if isinstance(event, Deposited): self.balance += event.amount elif isinstance(event, Withdrawn): self.balance -= event.amount elif isinstance(event, Frozen): self.is_frozen = True elif isinstance(event, Unfrozen): self.is_frozen = False # ← Added by Verifier else: raise ValueError(...) The web version has a clean architecture — proper Event base class, register_event decorator, payload() abstraction, serialization round-trip. But it has no Unfrozen event. @register_event class AccountFrozen(Event): def __init__(self, aggregate_id: str): ... # ... no Unfrozen counterpart exists The freeze() method works, but there's no unfreeze(). Once frozen, the account stays frozen forever. Category CLMA Web Chat Event basics (IDs, timestamps, types) ✅ ✅ Serialization / deserialization ✅ ✅ Event replay (deposit 100+50, withdraw 30 = 120) ✅ ✅ Business rules (no negative, no overdraft) ✅ ✅ Freeze → reject withdrawal ✅ ✅ Unfreeze → allow operations again ✅ ❌ Missing Optimistic concurrency ✅ ✅ Both frameworks pass all standard event sourcing tests. But the missing Unfrozen event in the web chat version is not a cosmetic issue — it's a domain modeling gap. In any real banking system, frozen accounts need a thaw mechanism. The third iteration round is where the value shows. The Verifier's feedback was: "The freeze flow is incomplete. Freezing is an operation that must be reversible. Consider adding an Unfrozen event and updating the aggregate to apply it." A human reviewer would spot this too. But the CLMA Verifier catches it automatically, in seconds, with no developer in the loop. This is the difference between code review as a process and code review as a downloaded prompt. Q1 (Blocking Queue) Q5 (Event Sourcing) CLMA 12/12 ✅ + better design Full feature set ✅ Web Chat 12/12 ✅ + usable but less robust Missing Unfrozen event ❌ For simple, well-defined problems (Q1), a single-shot chat prompt gets you 90% of the way. The CLMA advantage is marginal — better engineering choices, but the output is functionally equivalent. For complex, multi-faceted problems (Q5) where completeness matters — domain events, edge cases, business rules — the iterative verification loop earns its keep. The 3 rounds of automated review caught a real domain modeling gap that a single prompt missed. Not because the LLM couldn't write an Unfrozen event, but because no single prompt can anticipate all the completeness conditions of a non-trivial domain. The pattern is clear: Generation quality is already good. Verification quality is where the gap is. And verification is exactly what CLMA automates. File Description 1.py CLMA — bounded blocking queue 2.py Web chat — bounded blocking queue 3.py Web chat — event sourcing framework 4.py CLMA — event sourcing framework (3 iterations) test_compare.py Q1 test suite — 12 cases for both test_q5_compare.py Q5 test suite — auto-detects class names All comparison files are in the CLMA repository. Tags: #CLMA #MultiAgent #CodeGeneration #EventSourcing #Comparison #Python #DeepSeek