Virtual Cells: Predict, Explain, Discover

Arthur McAuthor
Valence Labs, Recursion

Drug discovery is fundamentally a process of inferring the effects of treatments on patients, and would therefore benefit immensely from computational models that can reliably simulate patient responses, enabling researchers to generate and test large numbers of therapeutic hypotheses safely and economically before initiating costly clinical trials. Even a more specific model that predicts the functional response of cells to a wide range of perturbations would be tremendously valuable for discovering safe and effective treatments that successfully translate to the clinic. Creating such virtual cells has long been a goal of the computational research community that unfortunately remains unachieved given the daunting complexity and scale of cellular biology. Nevertheless, recent advances in AI, computing power, lab automation, and high-throughput cellular profiling provide new opportunities for reaching this goal. In this perspective, we present a vision for developing and evaluating virtual cells that builds on our experience at Recursion. We argue that to drive evidence-based discovery, virtual cells must accurately 1) predict the functional response of a cell to perturbations and 2) explain that the predicted response is a consequence of modifications to key biomolecular interactions. We then describe a lab-in-the-loop approach for generating novel insights with virtual cells, introduce key principles for designing therapeutically-relevant virtual cells, and advocate for biologically-grounded benchmarks to guide virtual cell development. Finally, we make the case that our approach to virtual cells provides a useful framework for building other models at higher levels of organization, including virtual patients. We hope that these directions prove useful to the research community in developing virtual models optimized for positive impact on drug discovery outcomes.

1 Introduction

Drug discovery is fundamentally a process of accurately inferring the effects of treatments on patients.
Unfortunately, it is notoriously costly and riddled with failure (Wong et al., 2019; Jones & Wilsdon,
2018; DiMasi et al., 2016; Paul et al., 2010). Despite decades of innovations, for every ten drugs that
enter clinical trails today, roughly nine of those will fail to receive approval, representing substantial
losses in R&D investment, unfortunate delays in addressing patient needs, and a significant deficit in
our collective understanding of human physiology and pathology. Nevertheless, the impact of each
approved therapy on the lives of patients, particularly those addressing unmet need, is hard to overstate,
thus any approach that meaningfully improves our ability to correctly predict the effect of treatments
in patients would be of immense value to both patients and drug discoverers alike.
One such approach is the computational simulation of therapeutic interventions in virtual patients, or
mechanistic models accounting for the physiological factors necessary to accurately infer patient-level
response to treatments. Virtual patients could revolutionize drug discovery by enabling researchers to
generate and test large numbers of therapeutic hypotheses safely and economically before initiating
costly clinical trials. However, though simulation has already revolutionized a number of industries
(Winsberg, 2019; Singh et al., 2022), examples of practical and effective simulation in drug discovery are
rare due to the challenges inherent in modeling the scale and complexity of biological systems (Ideker
et al., 2001; Goldberg et al., 2018; Georgouli et al., 2023). Even the simulation of a single prokaryotic
cell is daunting (Karr et al., 2012), and simulating the full complexity of a eukaryotic cell lies beyond
current capabilities (Georgouli et al., 2023). Nevertheless, the ability to faithfully simulate the effect
of therapeutic interventions at any level of biological organization—cell, tissue, organ, patient—in a
corresponding virtual model has the potential to significantly improve drug discovery outcomes.

Box 1: Virtual Cells Predict, Explain and Discover

Predict the functional response of cells to perturbations across diverse biological contexts,
timepoints and modalities. This includes modeling gene expression, morphology, protein activity,
and other phenotypic changes under genetic or chemical interventions.

Explain these responses by identifying key biomolecular interactions, causal pathways, and
context-dependent regulatory mechanisms. Correct explanations support predictions by enabling
generalization beyond the training data and enable reasoning about counterfactuals and the
response of biological systems at higher levels of organization.

Discover new biological insights and actionable therapeutic hypotheses through lab-in-the-loop
experimentation, using the virtual cell as a world model for systematic hypothesis generation,
testing, and refinement.

1.1 The Predict-Explain-Discover capabilities of virtual models
What exactly makes virtual models so potentially valuable for drug discovery? They would enable
researchers to accurately

  1. predict the effects of interventions on the model system,
  2. explain the predicted response in terms of one or more changes to supporting mechanisms, and
  3. discover novel insights by generating and testing therapeutic hypotheses.
    To better understand what we mean, we give two examples using well-known cancer treatments; these
    are only meant to motivate the concepts and not to imply that these would represent novel discoveries
    if made today:

    Pembrolizumab. A virtual patient could be used to discover that Pembrolizumab is a likely cancer
    treatment by predicting a reduction in average tumor volume of a patient in response to treatment with
    the drug and explaining that the drug blocks the Programmed Cell Death Protein 1 (PD-1) immune
    checkpoint.

    Vorinostat. A virtual cell could be used to discover that Vorinostat is an effective cancer treatment
    by predicting the up-regulation of tumor suppression genes in response to treatment with the drug and
    explaining that the drug inhibits histone deacetylase (HDAC) enzymes.

    Throughout this paper, we will refer to these three capabilities as the Predict-Explain-Discover, or PED,
    capabilities of virtual models, and claim that it is precisely the ability of virtual models to accurately
    predict outcomes and explain them mechanistically that would make them powerful tools to discover
    novel therapeutic insights. Figure 1 provides an illustration of the PED capabilities for a virtual cell
    (see also Box 1 for more)

1.2 Virtual models without fully mechanistic simulation
Given the current difficulties building fully mechanistic virtual model simulators, the question naturally
arises: can we build these models without resorting to fully mechanistic simulation? We argue that
four recent advances put this objective within reach today, particularly for virtual cells:

  1. modern AI and machine learning (AI/ML),
  2. modern compute infrastructure,
  3. automated labs for high-throughput cellular data generation, and
  4. the proliferation of cellular omics datasets.
    We briefly describe each of these advances below:
    AI/ML. While traditional computational drug discovery techniques have struggled to deal with
    biological complexity (Sams-Dodd, 2005; Swinney & Anthony, 2011; Waring et al., 2015), recent