Automated tests are required now
Many teams still test their software by having a person click around in a staging environment and report bugs back. Or by a developer doing the same thing in a local environment. This has always been a bad practice. It's becoming an untenable one. For a long time, the case for an automated test suite was a productivity argument. You could ship without tests (plenty of companies did, plenty still do) but you paid for it in slower iteration, scarier deploys, longer regression cycles, and the slow accretion of fear around the parts of the codebase nobody wanted to touch. Manual QA worked, in the sense that it caught some bugs some of the time. It just didn't scale: every new feature meant a longer test plan, every release meant a longer freeze, and every refactor was a gamble. Then agents started writing meaningful amounts of the code. A coding agent will happily produce more code in an hour than a developer used to ship in a week. If you're paying attention, that should sound less like a productivity win and more like a stress test on every part of your engineering process that wasn't designed for that volume. Code review becomes a bottleneck. Manual QA becomes a much worse bottleneck; the human in the loop who has to click through forty flows after every change is now the slowest moving part of a system whose other parts have all gotten dramatically faster. But it's not just speed. It's signal. An agent that writes a change has no way to know whether the change works. It can read the code, it can reason about it, but it cannot verify it until something runs. If the only thing that can tell the agent whether the change works is a person opening the app in staging, then either you've put a human on the critical path of every single change, defeating the point, or the agent ships the change without proof and you find out later, in production, that something subtle broke. Tests give the agent a way to prove its work. That's what they've always done for humans, too: the value didn't change, the volume did. But when you have agents producing code at a rate that manual QA can't possibly keep up with, "we'll just test it by hand" stops being a tradeoff and starts being a non-answer. There's a second, quieter problem. Agents pattern-match on the codebase they're working in. If the codebase has tests, the agent will write tests, because that's the convention. If the codebase has no tests, the agent will not write tests, because that's also the convention. The codebase teaches the agent what "done" looks like. This means an untested codebase doesn't just lack tests — it actively trains every contributor, human or otherwise, that tests aren't part of the work. The longer this goes on, the more entrenched it gets, and the harder it is to break out of, because the new code being written assumes the absence of tests and is shaped in ways that make testing harder. This is the most common reason teams stay untested: the existing codebase wasn't built for it, and the gap between zero and "tested" looks impossibly large. It isn't, and you don't have to close it all at once. Start with characterization tests. These are tests that don't try to specify what the code should do — they pin down what it currently does. You run the existing code, you observe its outputs, you write a test that asserts those outputs. The test is now a tripwire: if you change the code's behavior, even by accident, the test will tell you. It doesn't matter if the current behavior is right or wrong; you're not making a moral claim about the code, you're making a factual one about what it does today. Once you have characterization tests around the parts that matter most, you've bought yourself the ability to change those parts safely. From there, you keep going. The next feature gets real tests. The next bug fix gets a regression test. The most-important, most-changed, most-feared module gets enough coverage that you can finally refactor it. You don't need 100% coverage — you need enough coverage in the right places that the work you're actually doing is protected. The deeper objection isn't that there are no tests, it's that the code resists them. Functions are 600 lines long and reach out to half the system. Database calls are sprinkled inline. Globals are mutated from anywhere. The class you'd want to test takes a configuration object in its constructor that itself takes the entire universe. You've tried, and writing a single useful unit test required mocking nine things, and you're not even sure the test is testing what you thought it was. This is real. Some codebases genuinely are structured in a way that makes testing painful. But "untestable" is almost always "untested in this shape" and the path from one to the other is the same iterative path you took to get coverage on the parts you could already test, just with a small refactoring step folded in. The move you make over and over is: introduce a seam. A seam is a place where you can substitute behavior without changing surrounding code. You don't fix the whole module to test one piece of it; you isolate the piece you care about by pulling it through one well-chosen seam, and leave the rest alone for now. A handful of techniques come up again and again: Extract a function. Pull a chunk of logic out of a larger function so it can be called on its own. The extracted function takes its inputs as parameters, returns a value, and is trivial to test. Often this single move is the entire refactor. Pass dependencies in. Instead of reaching for a database, a clock, or an HTTP client inside the function, accept them as arguments. The production caller passes the real thing; the test passes a fake. Wrap external systems behind a thin interface. Don't test directly against a library or service you don't control. Wrap it in your own small interface that says exactly what your code needs from it, and substitute a fake implementation in tests. Parameterize the side effect. If a function reads a file, accept the contents as a parameter. If it asks the clock for now, accept now as a parameter. The "where it comes from" question moves up one layer; the function itself becomes pure. Separate the decision from the action. Split "compute what should happen" from "make it happen." The decision function is pure and easy to unit-test; the action function is thin and verified by a smaller number of integration tests. The first test in a section like this is the most expensive: you're paying the seam-creation cost, the fake-setup cost, and the "what does this function actually do" cost all at once. The second test is dramatically cheaper because most of that work has already been done. By the fifth or sixth, you're moving at normal speed, and the code around your seam is meaningfully cleaner than it was before. That isn't a coincidence. Code that's easy to test tends to be code with explicit inputs, narrow responsibilities, and few hidden dependencies — the same properties that make code easy to read and easy to change. Working toward testability is working toward better design; the test is what tells you you've gotten there. When even that's too hard at first, write a slower, broader test. An end-to-end test that drives the system through a real database is worse than a fast unit test in almost every way — slower, flakier, less precise about what failed — but it's better than no test. It gives you a tripwire. With the tripwire in place, you can refactor the inside toward something easier to cover with smaller tests, and you'll know if you broke anything along the way. The honest version of "we can't test this code" is "we can't test this code without changing it." That's true, and the answer is: change it. Not all at once. Just the part you need to test today, in the smallest way that gives you a foothold. Tomorrow you'll have a foothold and a test, which is exactly the position you need to be in to take the next step. Once tests are in place, things you couldn't reasonably do before become possible. Refactoring becomes a normal activity instead of a heroic one, because the tests catch you when you slip. Test-driven development becomes available. You can write the test first, watch it fail, make it pass, and trust the result, which is a fundamentally different experience from writing code and hoping. Designs improve, because code that's easy to test tends to be code with clean boundaries and explicit dependencies, and writing tests pushes you toward that shape whether you intended it or not. The system gets healthier in a way that compounds. It gets more predictable, because behavior is pinned down. More robust, because regressions get caught. More well-defined, because the tests become an executable specification of what the code is supposed to do. The codebase starts answering questions instead of raising them. The other piece of the on-ramp is what you do every time something breaks. When a bug is reported — or worse, when one slips into production — the temptation is to fix the code and move on. Don't. Write the test first: the test that reproduces the bug, fails because of it, and passes once the fix is in. Now the bug isn't just fixed, it's fenced. That exact regression cannot happen again without something explicitly noticing. Over time, this turns the test suite into an accumulated record of every mistake the system has ever made. The bugs that have already happened are unusually likely to happen again — the same subtle interaction, the same edge case, the same off-by-one — and each one you've fenced off is a class of failures that can no longer eat your time. A team that does this consistently will find its bug reports start looking different: fewer "this used to work," more genuinely new issues. This composes naturally with characterization tests. Both pin down what is rather than specify what should be — one captures current behavior, the other captures broken behavior that's been corrected. Together they're how a codebase that started without tests becomes one with meaningful coverage where it matters. The deeper thing tests give you is strictness. They are a forcing function: the code has to actually work, in a specific way, on specific inputs, every time the suite runs. Vague intentions don't pass tests. Hand-waving doesn't pass tests. "It worked when I tried it" doesn't pass tests. The bar is concrete and the bar is enforced automatically. In a world where more and more of your code is being written by something that doesn't share your intuition, your context, or your sense of what "obviously shouldn't break" means, strictness is the thing that keeps the system coherent. Tests are one of the best-leveraged ways to get it. Types are another. Linters and formatters are smaller versions of the same idea. All of them push the codebase toward a state where the rules are explicit and the machine, not the reviewer's memory, enforces them. Testing used to be a discipline you adopted to make your team more productive. It's becoming a discipline you adopt to keep your codebase functional at all. The teams that have tests are going to absorb the throughput of coding agents and turn it into shipped, working software. The teams that don't are going to drown in unverified changes and spend their time chasing bugs the suite would have caught. You don't have to write all the tests today. You do have to start. Pick the most important module. Add characterization tests. Refactor under their cover. Move to the next module. Keep going. It's not optional anymore, and pretending it is just means the codebase will keep teaching everyone, including the agents, that testing isn't part of the job. It is.
