Go Faster

Digital viruses and digital vaccines

How 7.5kb of data brought the world to its knees, and how only the digital twin can save it.

By Stephen Ferguson

The late Nobel Laureate Sir Peter Medawar once described viruses as “bad news wrapped in protein”. In the case of the SARS-CoV-2 virus, that “bad news” has prematurely ended millions of lives and continues to cause billions of dollars’ worth of economic damage that will haunt us for many years into the future.

The virus itself is surprisingly simple - just 29 proteins wrapped around a single strand of RNA. The entire genome for the SARS-CoV-2 virus is just 30,000 letters long. You can type the whole thing on about 13 sheets of paper. In total it contains just 7.5 kilobytes of data. By comparison, the human genome is more than 3 billion letters long - which is about the same as a stack of 1000 King James Bibles or about 725 Mb of data.

It turns out that this idea of considering a virus as a unit of data storage is more than just a metaphor; the solution to this crisis depends on a whole generation of “digital vaccines”, some of which were designed in silico (on a computer) rather than in vivo (in a lab), and approved in less than a year - something that would have been unheard of even a few years ago.

However, the challenge does not end with the discovery of vaccines. The world has never before needed to implement mass immunization of its entire adult population. That amounts to over 10 billion doses, at two doses per adult, even neglecting the enormous potential for spoiling and wastage of temperature-sensitive vaccines. This represents the biggest manufacturing and logistics effort since the end of the Second World War: scaling up production from a small quantity of “working vaccine” in a lab to billions of doses and eventually successfully inoculated patients.

This article is the story of the virus, the vaccines, and the digital twin technology behind the effort to vaccinate the whole world.


Anatomy of a virus So how did less than 10 kilobytes of data manage to bring the whole world to its knees? Partly because this apparent simplicity is deceptive. Designed by the forces of evolutionary selection - which work rapidly in a virus that is copying itself billions of times a second in host organisms around the world - the SARS-CoV-2 virus is a carefully engineered invasion and replication machine.

Each of its 29 proteins has a finely tuned role to play in fulfilling the objective of invading the cells of a host organism and injecting 7.5kb bytes of deadly data (a single strand of RNA) into its nucleus, hijacking the cell’s protein production machinery to produce multiple copies of the virus (which in turn can then infect other cells, and eventually other hosts, transported in mucus particles in virus-induced coughing and sneezing fits).

There are saboteur proteins that disrupt the natural production of proteins by the infected cell, forcing it to make virus proteins instead. Copying proteins (NSP12), which help to make new copies of the RNA genome that will end up inside new copies of the virus. Camouflage proteins (NSP10, NSP16) that disguise the virus from the body’s immune system. There are even proofreading proteins (NSP14) that check for replication errors in copies of the genome and escape proteins that drill holes in the cell membrane (ORFra) and evade the cell’s anti-escape mechanisms (ORF7a).

However, of all these proteins, the most prominent is the so-called spike proteins which form a “crown” of extensible bulbous protrusions from the virus surface (each about 20 nm long), and for which the “coronavirus” is named. These spike proteins are the principal means by which the SARS-CoV-2 virus invades host cells in the human respiratory system. Firstly, they latch onto a host cell, binding with ACE2 enzymes that are found on cells in the upper airway, before penetrating the host cell’s membrane.

Without these spike proteins, the SARS-CoV-2 virus would be unable to enter the cells of the host organism and would therefore be rendered harmless - viruses can only replicate inside a host cell. Because of this, spike proteins are the target of vaccination efforts.

Some of the 29 proteins that make up the SARS-CoV-2 virus


Vaccines Vaccines work by tricking the immune system into “recognizing” a virus that it has never been exposed to. By providing it with a “preview” of the virus structure, vaccines allow the immune system of the patient to design powerful antibodies that can be deployed to neutralize the real virus if the individual is ever infected. The World Health Organization estimates that 2-3 million lives a year are saved as a result of vaccination.

There are currently about 70 vaccines in various stages of clinical trials, with many more in preclinical trials. While the ultimate aim of each is the same, there are four different approaches to vaccination.

Inactivated or Attenuated Coronavirus Vaccines: traditional vaccines, in which the patient is inoculated with a strain of the SARS-CoV-2 virus that has been weakened through genetic engineering, or has been inactivated through exposure to chemicals, and is, therefore, unable to replicate.

Protein-Based Vaccines: rather than inoculating the patient with the complete virus, these “sub-unit” vaccines contain copies of some of the signature proteins. In the case of SARS-CoV-2, the candidate vaccines use the spike protein, sometimes connected to the shell of a nanoparticle. Since no genetic material is included (RNA or DNA) these proteins cannot replicate.

Genetic Vaccines: these work by injecting a portion of the SARS-CoV-2 genome, encoded in either RNA or DNA, surrounded by a protective lipid bubble. Once absorbed by host cells, these genetic instructions tell the cell to manufacture multiple copies of certain SARS-CoV-2 proteins (usually spike proteins) for several days. The synthetic genetic material is only a partial copy of the genome and cannot replicate.

Viral Vector Vaccines: are similar to the Genetic Vaccines described above, but instead of using a bubble of lipids to host the genetic material (which is usually DNA), it uses a deactivated or weakened host-virus (for example adenoviruses or influenza viruses). Once absorbed by host cells, the RNA or DNA instructs the cell to manufacture multiple copies of certain SARS-CoV-2 proteins (usually spike proteins) for a number of days. The synthetic genetic material is only a partial copy of the genome and cannot replicate.

At the time of writing, only two vaccines have received widespread emergency approval and are being deployed in large scale inoculation programs around the world. Both are mRNA genetic vaccines, one from Pfizer-BioNTech that had an efficacy of 95 percent in stage three trials, and another from Moderna which demonstrated efficacy of 94.5 percent in trials. These vaccines are the first genetic vaccines ever to reach wide-scale approval. Genetic vaccines have the advantage of being entirely synthetic, and so do not require the cultivation of large numbers of host cells and viruses.

Designed in-silico, genetic vaccines also have the advantage of being relatively easy to “update” should mutations in the virus reduce their efficacy. The major disadvantage of the two approved vaccines is that mRNA is fragile which means that the vaccines must be stored and transported at very low temperatures, which presents a significant logistical challenge.

The viral vector vaccine from Oxford-AstraZeneca has also reached emergency approval in over 100 countries.. It shares many of the advantages of the existing genetic viruses, in that genetic instructions that it carries are designed in-silico and are relatively easy to update. However, because it uses a genetically engineered adenovirus (a common cold-like virus) to carry DNA into host cells, it is significantly more robust than the mRNA vaccines. This means that the Oxford-AstraZeneca vaccine can be transported and stored using conventional refrigeration facilities.

From a manufacturing perspective, all of the vaccines described above can be broadly split into two groups, the genetic mRNA vaccines, which are entirely synthetic, and all of the others, which are biological in nature, and require the cultivation enormous quantities of host cells in which viruses can replicate.

The coronavirus spike protein (red) mediates the virus entry into host cells. It binds to the angiotensin-converting enzyme 2 (blue) and fuses viral and host membranes.

Virus farming for vaccine production Except for entirely synthetic (and still rather novel) mRNA genetic vaccines, the production of vaccines begins with the selection of a “seed strain” of a virus. That might be SARS-CoV-2 that is either weakened through genetic engineering, or a standard strain that will eventually be inactivated through exposure to chemicals as part of a downstream process. Or, in the case of a viral vector vaccine, it might be an adenovirus that has been genetically engineered to contain some of the SARS-CoV-2 genome.

In either case, the aim is to generate enough of the virus, initially to supply animal trials, and then in increasing amounts to support the various stages of clinical trials, each requiring progressively larger population sizes. Eventually, for a successful vaccine, production needs to be scaled to provide enough virus for billions of doses.

Because viruses only replicate inside the cells of a suitable host organism, non-synthetic vaccine production relies on identifying, and cultivating, a suitable living medium in which large numbers of viruses can be produced. For many viruses that “host cell platform” for vaccine production is embryo containing chicken eggs, however, SARS-CoV-2 vaccines typically use cultures of human kidney cells. For viral vector vaccines these cells are genetically engineered to allow the adenoviruses to replicate in a way that they do not in regular human cells.

These “virus infected” cells are grown inside bioreactors, carefully controlled nutrient enriched mixing environments in which the temperature, oxygen and pH are precisely controlled to maximize the yield of both the cells and viruses (or in the case of sub-unit is used earlier in the article vaccines, proteins manufactured by the virus infected cells).

However, producing enough virus in a lab to produce a small number of vaccines is only a small part of the challenge. In order to produce billions of doses, vaccine manufacturers need to scale up those processes to an industrial scale in multiple production facilities spread around the world. Therein lies the challenge – biological processes do not scale linearly with the geometric size of the bioreactor. Maintaining the ideal cultivation conditions developed in a 10-liter bench experiment in a 4000-liter industrial bioreactor is a difficult engineering problem, and one of the main reasons behind the production delays that have been extensively reported in the press, and are the cause of considerable international political tension.

Of all the vaccine production facilities in the world, the largest by volume is the Serum Institute of India, who are licensed to produce the Oxford-AstraZeneca vector vaccine and the (as yet unapproved) Novax sub-unit vaccine. They aim to produce billions of low-cost vaccine doses to supply India and low and middle-income countries. The Serum Institute recently purchased six 4000-liter single-use bioreactors from ABEC, the American manufacturer of the world’s largest bioreactors, allowing them to double the production of vaccine per unit floor space - to achieve the lowest possible cost per dose.

The challenge is that biological processes scale at different rates as the geometric size of the bioreactor is increased; the design of an industrial scale bioreactor will be significantly different from that of the laboratory scale bioreactors in which the original process was designed.

Paul Kubera, vice president of process technology at ABEC, told us “A typical scenario might involve a project that has moved from the laboratory bench at the tens-of-liters scale to process development, which may be operating on a few hundreds-of-liters scales. And then into production where they need to ramp up by thousands of liters in multiple units.”

The challenge with bioreactors is keeping the host-cell-platform alive. According to Kubera, “The growth of the organism must be supported – it needs food, a carbon source and to take in oxygen and give off carbon dioxide. It is critical to be able to deliver a known amount of oxygen in a given timeframe and remove carbon dioxide for all the organisms in the vessel.”

In the traditional vaccine development process, which took many years, the scale up process was often lengthy and expensive, and based mostly around trial and error. However, ABEC, like most of the industry. now uses extensive computational fluid dynamics simulations to perfect scaling processes.

“With Simcenter, we can run a computational simulation of the laboratory configuration and confirm the same results. We can then run a large-scale simulation and be confident that the measured performance of the delivered equipment will track with expectations. As an example, we demonstrated that we can cut blend time 50 percent by using laboratory tests to screen options and Simcenter simulation to extend the results.”

Platform incompatibilities present another potential obstacle. No single facility can produce billions of doses and so industrial scale production means ensuring similar production standards across multiple facilities. ABEC assists biopharmaceutical customers in bridging different platforms, from small to large, to ensure the same results are achieved at different facilities. The challenge is to create a uniform environment that is consistent for each organism in a vessel at each of those scales across platforms.

Purely synthetic vaccine manufacture The first two vaccines to reach wide scale approval were both mRNA genetic vaccines, which are entirely synthetic and do not require the cultivation of viruses or host-cells. This is an enormous advantage for manufacturing point of view because synthetic processes are easier to scale than biological ones.

Rather than use an actual virus to inject DNA into the cells of the patient, mRNA vaccines (and potentially DNA genetic vaccines) carry the genetic information inside a bubble of lipid nanoparticles – the diameter of each is measured in tens of nanometers (there are a million nanometers in a single millimeter). In order to manufacture this type of vaccine, the thread of genetic material must be combined with the lipid nanoparticles in a carefully controlled mixing process. Because of the scale of the particles involved, and the fragility of the RNA, macroscale bulk mixing is not effective.

A small lab scale bioreactor used to breed viruses in the early stages of vaccine production

Instead the manufacture of synthetic genetic vaccines relies on the emerging discipline of “microfluidics” in which individual streams of RNA (or possibly DNA) and lipid-particles at a nanoscale. Mixing is achieved using “chaotic mixers” which employ complicated three-dimensional grooves as mixing structures. Care must be taken to ensure that individual lipid nanoparticles do not clog up these grooves and stagnate the mixing process.

The whole field of microfluidics depends entirely on multiphase computational fluid dynamics simulations to design and validate the devices, and synthetic vaccine production would not be impossible without digital simulation.

Unsurprisingly, these human-made “virus like particles” are less robust than actual viruses that are designed over millions of years of evolution. A consequence of this is that viral vaccines need to be stored at much lower temperatures than more traditional vaccines (which only require standard refrigeration). Simulation has been extensively used in designing the cold chain logistics processes that ensure that these vaccines reach the patient in good condition.

The future In this article I have highlighted a few of the ways that simulation is helping to scale up the global vaccination effort. There are many more.

When they come to tell the story of the Covid-19, and all of the terrible devastation that it caused, I hope enough attention is given to the monumental scientific and engineering effort evolved in developing many vaccines used to combat it, in months rather than years. To me it is one of the greatest illustrations of engineering innovation. And an illustration of how much we can achieve as a species if we all work together.

For manufacturing of the vaccine, large scale bioreactors are used, often measured in thousands of liters