Can Biologists Rewrite the Genome’s Spaghetti Code?
What if biology stopped being something we study and started becoming something we design? That’s the premise of Adrian Woolfson’s new book, On the Future of Species: Authoring Life by Means of Artificial Biological Intelligence, which published on 28 April from MIT Press. He argues that advances in AI and DNA synthesis are pushing biology toward an engineering paradigm—one in which scientists can generate new genetic sequences and eventually build organisms to order. He calls this emerging capability artificial biological intelligence, or ABI, a catchall term for systems that can design, construct, and ultimately “boot up” living things. That vision runs into a basic problem: Evolution didn’t produce clean, modular systems. It produced genomes shaped by billions of years of incremental change, with overlapping functions and little of the tidy structure that engineers rely on. Some synthetic biology researchers have tried to “refactor” genetic code (the same way engineers restructure computer code) by reorganizing genomes to make them easier to understand and manipulate. But how far can that approach go? And what would it take to make biology predictable enough to engineer? In a conversation with IEEE Spectrum, Woolfson lays out both the promise and the limits of designing life. You describe the genome as “spaghetti code” produced by evolution. What makes biology so inherently hostile to traditional engineering principles? Adrian Woolfson: In human-made machines, the components are typically orthogonal. Every component has a predetermined function. And if the component breaks, guess what? You can just replace it, or in some cases repair it. But sadly, biology doesn’t work like that. In biology, we’re talking about a complex network with emergent behaviors, which are built upon tiny contributions from many many components. Biology has this requirement to be robust and to be able to deal with damage in an efficient way. It also always had to build upon preexisting architectures. It can never reinvent. Biological machines are this complex entanglement of history and current design, and we have design components that an engineer would find risible. If you were to take the human genome and look at it from an engineering perspective, you’d say, “My God, what an absolute mess.” Because it was built in an opportunistic, incremental manner with no foresight or intentionality. How are synthetic biologists trying to improve this code? Can you explain how researchers are refactoring genomes? Woolfson: Drew Endy was a pioneer. He took a bacteriophage and he said, “What if we treat this as a bit of spaghetti code, and we literally clean it up and refactor it and reorganize it into a more user-friendly configuration?” Now, sadly, he had the idea way in advance of there being technologies that made that a particularly easy thing to do. But he pioneered that computer code approach to genomes and the idea that you could refactor them. Genomes have not been refactored for around four billion years—imagine if you had a piece of computer code that hadn’t been refactored for four billion years. How far have researchers gotten with this effort? Woolfson: The best example might be the synthetic yeast genome project known as Sc2.0, which was pioneered by Jef Boeke in New York City. It has taken him around 15 years, and he has slowly been assembling all these synthetic chromosomes into a single organism. What he’s done is more than refactoring; it’s redesigning really. For example, yeast has 16 chromosomes, and he has built an entirely new 17th synthetic chromosome. In separate work, he showed that you could join the 16 chromosomes up into two massive chromosomes. That’s a massive reconfiguration of the way in which the genetic material is stored. But when you start to mess around with these genomes and reconfigure them, inevitably you introduce bugs into the code. And those bugs often impair functionality and growth. It’s not that you couldn’t redesign totally without creating a growth impediment, it’s just that you need to invest the time to identify the optimal way to do it. Of course, AI wasn’t around when Boeke started, and it makes all of that so much easier. AI is going to have a huge impact on our ability to turn DNA into a predictive engineering material. AI-Powered Artificial Biological Intelligence Speaking of AI, you introduce the concept of artificial biological intelligence (ABI). What specific capabilities will AI give us that we don’t have today? Woolfson: Before AI, we didn’t have the ability to design DNA at scale. We couldn’t invent totally new DNA sequences that performed functions at the level of a biological entity. Now we have these so-called genome language models, which are a bit like the chatbots that we use to manipulate text. But instead of manipulating the 26 letters of the English alphabet, they manipulate the four letters of the language of DNA. When we manipulate the language of DNA, we need to have a very wide context window, because unlike text, where most of the meaning is in sentences or paragraphs, in DNA distant regions can talk to one another. So we need to have AI that can discern those action-at-a-distance relationships. In the case of one particular genome language model, Evo 2, it uses an architecture that has a context window of a million base pairs. That means it can see how base pairs a million bases away from one another are interacting. Designing the code is only half the battle. How are researchers tackling the bottleneck of physically manufacturing DNA at scale? Woolfson: Another crucial thing that wasn’t present in the past is the ability to write DNA at scale rapidly, efficiently, at low cost, and of any degree of complexity. When you bring together these two capabilities of design and construction, you become an engineer. We’ve achieved cost reduction with a technology called Sidewinder, which enables us to build DNA in a massively parallel manner and thereby hugely reduces the cost and scalability of DNA construction. That alone makes the proposition of using DNA as an engineering material far more feasible. Once you have designed and synthesized the DNA, what does it take to boot up a living organism? Woolfson: That’s probably the most difficult bit. Because right now we have no idea how to build an artificial cell. Craig Venter showed that you can destroy the genome in a bacterium and put in a new one. In other words, the cell behaves like a nanocomputer and a genome behaves like software. But getting genomes into cells is not trivial. The term “ABI” addresses the design capability and the buildout capability, but it also encompasses the ability to then boot that up into a living thing. If you have all those capabilities, you’re in full mastery of biology as a technology. And all of a sudden, DNA becomes a programmable material which you can manipulate in a predictive manner. Biology as the Next Engineering Material If researchers gain that mastery, what will be possible? Woolfson: My prediction is that within 50 years, biology will be the engineering material of choice, and many of the people reading this article will become bioengineers. Biology can deliver most of the functionality that materials deliver; for example, spider silk has the tensile strength of steel. When we redesign it using AI, it might get to a point where it’s five times the tensile strength of steel. And biology, of course, has the additional advantage that it can generate intelligent materials. So imagine if you could have an intelligent form of steel. How would an engineer go about utilizing that in buildings? What is the single hardest technical problem preventing you from designing a functional multicellular organism from scratch? MIT Press Woolfson: I think it’s our inadequate knowledge of the grammar of life. AI turns out to be a great tool for unpicking those grammatical rules. It looks at huge databases and can discern the patterns within those databases. We won’t be able to design a complex multicellular organism until we can speak the language of DNA more fluently, and to do that we need to understand the grammar, and to understand the grammar we need to interrogate more complex and more nuanced databases. We need to be grammar hunters. Every time we destroy a species, we’re destroying a page of the grammar book. We need to pull all the information together into a grammar book. Finally, as you begin this journey into engineering life, what are the realistic failure modes? Woolfson: I can interpret “failure mode” in two ways. One is a kind of mechanical failure: As you strip away all of this non-orthogonality, the system becomes brittle, because biological machines are designed not to fail and they’ve got all these overlapping fail-safe mechanisms. The other way in which these things could fail is by being dangerous. We don’t understand ecosystems. They’re incredibly difficult to compute. So if we release engineered organisms into complex ecosystems, they could create havoc. And obviously, these technologies themselves are inherently dangerous in the wrong hands. So, we need to learn how to use them safely, responsibly, ethically, transparently, and equitably in a way that benefits society.
