Interpreting whole genome variants
Since our founding years ago, we have ceaselessly pointed out (and built AI solutions to address) the critical importance of target selection to solve complex disease and for drug discovery in general. This is because the wrong target choice contributes to a large proportion of the high failure rates in the clinic, i.e., failures due to imprecise mechanisms (especially in neurological diseases). There’s sustained evidence that even a modicum of genetic support confers to targets more than twice the likelihood of clinical success over targets that lack such genetic backing.
Thankfully, genetically-defined target selection is slowly gaining more attention. This is an encouraging development. However, there is an erroneous yet pervasive belief that a genetically-defined target is solely a target where the genetic mutation is directly present.
This belief is likely inherited from monogenic diseases where a single penetrant change in a single gene is the cause, and from cancer diagnostics where an exomic panel can be sufficient to identify large passenger mutations that denote cancer. In practice, this belief manifests as a majority of efforts to find disease causing variants focusing on analyzing only the protein-coding exome, meaning analyzing 1% of the genome and ignoring the remaining 99% where critical regulatory elements and non-coding RNA that control the expression of protein-coding regions actually reside.
For many others, the practice of focusing only on 1% of the genome to find disease targets is not because they don’t understand the critical importance of the remaining 99% of the genome. They do, and they appreciate that complex, polygenic diseases are caused by changes in many different genetic entities that contribute to the disease. But they are held back by the extreme difficulty to decipher the great noise of the genome and the sample size problem, thus popular description of the whole genome problem as an “impossible” challenge.
We have previously reported taking on solving these hard scientific challenges. And we are here to report that the challenges don’t stop at solving the genomic noise and sample complexities, and that being able to effectively analyze the whole genome and study causal mutations across its entirety helps question the popularly held belief that a genetically-defined target is a target where the genetic mutation necessarily resides.
With causal variants at whole genome scale in hand, the challenge shifts to interpreting them. And while the number of variants is greatly reduced by at least an order of magnitude by our Bergspitze capability compared to what a simple GWAS analysis produces in the same context, there is still a large number of variants that must be interpreted and arrive at an actionable target selection. To boot, most of these variants are outside the exome as is to be expected.
We built Franklin to help with the interpretation challenge. Franklin is as a virtual representation of the body’s universe of interactions. Through Franklin’s multi-parametric model, causal genetic variants are assigned function and are deep traced through various genetic and biomolecular pathway graphs at an unprecedented scale and logic, ultimately allowing for prioritizing and selecting actionable disease targets. Put differently, Franklin helps address questions like: Is the causal variant’s position a functional element? What kind of functional element? What does it control/affect? What’s the direction of control? What’s the magnitude? How does that relate and mesh with other such interactions and events taking place at other identified nodes? How do the causal variants as a whole collectively affect the biomolecular pathway(s)? What are the key nodes that when modulated affect that flow and so can serve as actionable disease targets?
Let’s recap a few key concepts:
a) Disease causing variants are present all over the genome (in fact, far more variants are found outside of the exome). Therefore, it is important to analyze the whole genome effectively, something Bergspitze allows us to do. Worth also pointing out that while the exome holds about 20,000 genes, there are well over 300,000 functional regions outside of the exome.
b) When a causal variant is identified outside the exome, it is important to identify whether the region it resides in is a regulatory element, if so what genetic feature (or gene) it regulates, and the effect of the variant on the regulation of the genetic feature (up or down and at what magnitude). Franklin helps us do this.
c) Variants do not necessarily directly cause phenotype and instead achieve their effect through a network of interacting nodes in the biomolecular pathway, which is important to decipher, something that Franklin also allows us to do.
The key takeaway here is that a “genetically-defined” target can also be a target that does not have a mutation in it, but rather, is controlled by regulatory elements outside the exome where the mutations reside instead. Further, such a target may not be directly affected by disease-causing variants, but rather indirectly affected, via other nodes in the network that are affected by the causal variants.
Together with Bergspitze’s causal variant identification throughout the whole genome, Franklin provides an unprecedented level of genetic intelligence to allow for actionable target selection at whole genome scale. It interprets causal genetic variants detected by Bergspitze and anchors them to targets for an etiology model for complex diseases. It is not only more sensitive than competing approaches but faster and more actionable to boot.