Science

How much of a reality is AI-enabled biology?

The Evo 2 project uses AI to develop a novel bacteriophage.

In 1935, a virus incapable of infecting humans was isolated from the sewers of Paris. Over the next ninety years, it became one of biology’s favourite experimental systems – and the first DNA genome ever sequenced. Last year, it earned a distinction of another kind: researchers at the Arc Institute used it as a template for the first viable bacteriophage genomes ever designed by an artificial intelligence. 

This achievement is striking. It is also a useful illustration of a question that now runs through much of AI-enabled biology: what is the difference between producing a working biological design and understanding why it works?

The genome language model, Evo 2, was trained on roughly 9 trillion base pairs of DNA sourced from across the tree of life. Given a relatively long genetic prompt, Evo 2 generated complete, novel bacteriophage (viruses that infect bacteria) genomes. Researchers built the most promising designs, each carrying all 11 required genes, assembled them in competent bacteria, and observed new phage particles proliferate.

Sixteen of such bacteriophages that had never been observed in known biological databases, once introduced into laboratory strains of E.coli, infected and replicated inside their hosts to inhibit bacterial growth.

On its surface, this is an impressive result! It has been received in some quarters   as evidence that generative biology got an upgrade. If AI can write a working virus, then perhaps new antibiotics, industrial microbes, and other engineered organisms are simply a matter of scale and compute.

However, the numbers tell a more complicated story.

Evo 2 generated a thousand candidate genomes per prompt. After computational filtering, 302 were thought worth synthesising. Of those, 285 were successfully assembled into physical DNA and put into bacteria, where 16 ultimately proved capable of replication, representing a six percent success rate. The previous tally for AI-designed viruses was zero, thus this result, which would usually not be a call for celebration, marks an exciting step forward – but one that comes with caveats.     

To guarantee host-specificity, Evo 2 was asked to make the designs target a particular harmless version of E.coli, while avoiding a known close-relative. When tested against a panel of E.coli strains, the phages worked to near perfection except 15 out of 16 also showed measurable, albeit weaker infectivity in a strain that had not been written into the constraints.

That finding does not mean the system failed, nor does it prove that AI models cannot design highly specific biological agents. It does suggest, however, that generating a viable phage and engineering their precise biological behaviour are two different challenges. 

The model produced something that looks like a phage without reasoning about a phage the way a microbiologist would, as a biological system with complexity in behaviour resulting from interactions with the environment. From vast patterns of existing sequences, the model knows which DNA sequences tend to exist close together and can recombine those evolutionary solutions into plausible new arrangements.

While the model is doing something beyond rote repetition, it remains a genome language model and as Philip Ball argues in his book, “How Life Works”, life cannot be reduced to DNA sequences alone. To really drive home the difficulty of extracting true innovative value from Evo 2, Patrick Hsu, one of the model’s own contributors said of using the model, “it’s like reading Russian, except 1% of the words are in English.”

Ask for a generic phage and it can draw upon its training data. Ask for a phage that activates under blue light and infects only bacteria that have just metabolised lactose – the kind that could potentially have medical applicability – and the limitations surface. Here, invention depends on verified understanding of multiple interacting biological mechanisms at a higher order.

Whether current generative models can make that transition remains an open question. Evo 2 is evidence that AI can produce a novel, functional genome – and one can assume it will be the first of many to come.

To explore what invention in genome design looks like, it is worth turning attention to a project that has been making slower, less fashionable headlines for around 15 years.

The Synthetic Yeast Genome Project, or Sc2.0, has been pursuing the audacious goal of redesigning and chemically synthesising, from scratch, every one of the 16 chromosomes of baker's yeast. All 16 have been completed individually. The final stage of assembling them into a single living cell, alongside a 17th chromosome carrying all the cell's transfer RNAs (tRNAs), is now underway.

Many of the project’s design decisions have no equivalent in nature. Scientists relocated the tRNAs onto a dedicated chromosome. They decided, too, to strip out repetitive elements that evolution had left behind. They incorporated a system called SCRaMbLE, which allows researchers to induce controlled genome shuffling. 

Some of these choices worked immediately. Many led to what one project lead called “soul-destroying” debugging, as mistakes were particularly challenging to locate in the genome. In one synthetic chromosome, a single recoded codon folded a messenger RNA into the wrong shape and crippled the cell: the sequence looked correct, but the consequence was fatal! 

A simple lesson about comprehension is worth carrying out of the Sc2.0 project: the researchers knew how to trace each failure back to the mechanism producing it, allowing them to fix issues as they arose. The temptation of the generative AI movement is to mistake fluent imitation for genuine comprehension. Sc2.0 refuses it. Understanding a system’s outputs, its failure points, and its behaviour outside training conditions is a design specification that cannot be ignored.

Notably, Sc 2.0 spanned laboratories on four continents and trained a generation of synthetic biologists – Professor Tom Ellis’s lab at Imperial College London among them – to be able to follow the causal chain from sequence to function to phenotype, and fix what breaks because they know why it broke. 

That discipline must transpose to AI-enabled biology, where the cost of error is far less forgiving. The developers of Evo 2 understood this, excluding human-infecting viruses out of its training data for safety. However, an interesting counter to this occurred when a non-biologist tinkering with an AI agent over a single weekend managed to fine-tune the model to recover exactly this forbidden capability. 

Still, it would be unfair to cast Evo2 as merely a cautionary tale. The same model is being put to a use where its limitations look less like a flaw.

At the Mayo Clinic, the model has been turned on the problem of variants of uncertain significance i.e. genetic mutations that a sequencing report flags but cannot interpret. The model has generated structured, plain-language hypotheses about the possible mechanism of some 2.7 million such variants, a volume no human team could work through by hand. 

Here, crucially, the output is not treated as a conclusion. It is a hypothesis, explicitly labelled as one, handed to a clinician who will test it against evidence. That may be the most useful way to think about generative biology today. A phage that surprises its designers and an AI generated explanation a doctor checks are the same kind of thing: eloquent and plausible but requiring further investigation.

AI-enabled biology is here to stay, and many more frontier labs are shipping models of their own, but with guardrails multiplying at a slower pace. What separates a useful tool from a dangerous one is whether someone still can explain the result. So far, in the room that matters, someone always can.

From Issue 1899

5 June 2026

Discover stories from this section and more in the list of contents

Explore the edition

Read more

RNA markers in blood predict disease progression

RNA markers in blood predict disease progression

In a proof-of-concept study, researchers at Imperial have tested VeloCD, a bioinformatics-based method that successfully predicts illness progression and treatment efficacy using RNA markers in blood. Their research found that the test could accurately predict disease trajectories and future infection status in controlled human challenge studies for

By Anya Chaudhary
Imperial’s deep tech entrepreneurial environment:  how do science ideas sell?

Imperial’s deep tech entrepreneurial environment: how do science ideas sell?

As students at a university that has long prided itself in its research and innovation, it is inevitable to ask ourselves throughout our degree how exciting breakthroughs materialise into products on the market. The business component of discovery is therefore paramount to its success, which Imperial reflects in its STEMB

By Cristina Carrillo and Nadeen Daka