ABSTRACT
Understanding the etiology of congenital disorders requires interdisciplinary research and close collaborations between clinicians, geneticists and developmental biologists. The pace of gene discovery has quickened due to advances in sequencing technology, resulting in a wealth of publicly available sequence data but also a gap between gene discovery and crucial mechanistic insights provided by studies in model systems. In this Spotlight, I highlight the opportunities for developmental biologists to engage with human geneticists and genetic resources to advance the study of congenital disorders.
Introduction
As a human geneticist studying a congenital developmental disorder, I frequently receive emails from my developmental biologist colleagues asking if we have found mutations in their new favorite gene. I have a growing stack of post-it notes waiting for the day when we find a genetic variant in the DNA samples from our research participants that fill our freezers. The emails and post-it notes are evidence of collaborative partnerships between my human genetics lab and the mouse and zebrafish labs of my colleagues. Our collaboration works both ways; for every incoming query about human sequence variants there is an outgoing email from my lab asking about functional validation of sequence variants for one of our new genes.
Understanding disorders of development requires a close relationship between clinicians, geneticists, developmental biologists and others to understand the molecular mechanisms of disease pathogenesis. Ultimately, both geneticists and developmental biologists seek to understand the factors that cause cells to organize into a specific form, whether ordered or disordered. Our common goals are also mirrored in our scientific approaches. Forward genetic screens with spontaneous or mutagen-induced mutations are foundational approaches in developmental biology, while rapid reverse genetic screens are enabled by genetic engineering technologies such as CRISPR. And although most human geneticists would not describe their work as a ‘screen’ per se, advances in sequencing technology mean there are now enormous amounts of sequence and phenotype data that are often publicly available. Large genomic resources in cohorts with a particular disease have made forward human genetics more efficient, but there is also sufficient data to identify human ‘knockouts’ from large-scale genetic databases with linked health records, bringing reverse-genetics approaches to the study of human phenotypes (Alkuraya, 2015; Narasimhan et al., 2016b). Although this is still an inefficient endeavor because of the low rate of homozygous mutations in randomly-mating populations, deep phenotyping of individual ‘knockouts’ identified by population-based sequencing of consanguineous or bottlenecked populations shows the promise of this approach as databases expand (Lim et al., 2014; Narasimhan et al., 2016a; Saleheen et al., 2017). Despite the inherent similarities in the two fields and the necessity for collaboration, we are too often siloed and the potential for model organism research to advance diagnosis, mechanism and treatment of these disorders goes unrealized. The goal of this Spotlight is to highlight the opportunities, tools and resources (summarized in Table 1) for developmental biologists to engage with human genetics databases, researchers and patients in the study of congenital disorders.
Congenital disorders are collectively common and one of the leading causes of childhood mortality (Wallingford, 2019). In recent years, there has been a concerted effort through clinical and research based sequencing programs to identify the genetic causes of these disorders. For example, the Gabriella Miller Kids First Pediatric Research Program (GMKF; http://commonfund.nih.gov/kidsfirst) is a pan-National Institutes of Health program to develop a large-scale publicly-available data resource to identify the genetic causes of structural birth defects and childhood cancer (and to understand the strong association between them; Botto et al., 2013; Carozza et al., 2012). Since its launch in 2015, GMKF has sequenced almost 20,000 DNA samples from affected individuals and their parents and generated 1.3 petabytes of whole-genome sequencing data on congenital heart disease, diaphragmatic hernia, orofacial clefts, craniosynostosis, disorders of sex development and many others. Because of programs such as GMKF and the Centers for Mendelian Genomics (http://mendelian.org/), discoveries of new genes associated with human disease phenotypes are being made at an average rate of 263-281 new discoveries per year (Boycott et al., 2017; Posey et al., 2019) and rapidly outpace functional studies. Consequently, little is generally known about the effects of these variants or the roles of implicated genes. The number of known genes without an associated human phenotype decreases with each discovery, but examples of phenotype expansion (multiple phenotypes associated with mutation in the same gene) and multiple modes of inheritance at a single locus are increasing (Posey et al., 2019), underscoring the complexity of human genetics. The gap between disease-gene associations and a mechanistic understanding is likely to widen: recent bibliometric analysis suggest that current research is heavily influenced by previous studies causing a ‘rich get richer’ loop that hinders translational research (Stoeger et al., 2018). The good news is that there is enormous potential and a crucial need for new research directions to bridge this gap.
The N=1 problem
Although some congenital disorders are quite common (i.e. congenital heart disease, orofacial clefts), others are rare or ultrarare in the population, presenting their own challenges for gene discovery. Because even seemingly healthy humans carry a remarkably large number of deleterious variants, usually in the heterozygous state (Lek et al., 2016), confidently defining causal variants requires multiple unrelated individuals with a similar phenotype who have mutations in the same gene. Theoretically, three unrelated cases with homozygous or compound heterozygous variants are sufficient to confirm a link between gene and disease for autosomal recessive disorders and five affected cases with heterozygous variants are necessary for autosomal dominant disorders (Gilissen et al., 2012). This is a difficult task in rare and ultrarare disorders, for which the ‘N=1’ problem makes it difficult to establish causality when only a single individual or family has been identified. It is not impossible to publish studies on single affected individuals or families; careful assessment of the genetic evidence is required, and it is insufficient to assume a variant is disease causing simply because it is rare (Whiffin et al., 2019). In fact, it is now accepted that the previous thresholds for calling a variant rare (and therefore potentially deleterious) were too lenient and may account for many published false associations between variants and human disease (Lek et al., 2016). In the absence of independent cases, detailed functional studies in model systems are essential for providing the evidence of causality (Liegel et al., 2019; Lukacs et al., 2019).
To overcome the ‘N=1’ problem, programs such as Matchmaker Exchange (http://matchmakerexchange.org) have been created to enable sharing of phenotypic and genotypic information to connect cases from around the world (Sobreira et al., 2017). Whether matches are made between cases or not, functional studies can provide crucial insights on gene function and the consequences of genetic variants.
But how do the clinicians or researchers with these cases get matched to the basic scientists most equipped to carry out these studies? In Matchmaker Exchange, the same sharing mechanisms that connect clinicians to each other also connect clinicians with the basic scientists working on a particular gene, pathways or phenotypes in model organisms. One of the many projects connected to Matchmaker Exchange is GeneMatcher (see Table 1), a web-based tool to which clinicians or basic science researchers can submit genetic or phenotypic information (Au et al., 2015; Sobreira et al., 2015). When a match is made, submitters receive an email notification with contact information of their match to follow up on the submitted information. As of February 2020, GeneMatcher has 40,450 submissions from 8018 submitters in 89 countries. The submissions cover 12,065 unique genes but only 6834 have matches. The numbers of users and submissions are rapidly increasing and many new gene discoveries have been made (Au et al., 2015; Yigit et al., 2020).
A similar resource connected to GeneMatcher, but searchable through its own website, is DECIPHER (Database of Genomic Variation and Phenotype in Humans Using Ensemble Resources) (Table 1) (Firth et al., 2009). The DECIPHER database is a repository of genetic variants (copy-number variants and exomes) and associated phenotypes from patients with neurocognitive developmental disorders, many of whom have additional structural defects.
Another less conventional option for identifying novel genes in need of follow-up is to connect with genetic testing laboratories. Genetic testing that used to be carried out as a several-hundred gene panel is increasingly done through targeted sequencing panels capturing thousands of genes, or as whole-exome sequencing, for technical efficiency. Only a portion of clinical tests are positive, but the genetic data for the remaining ‘unsolved’ patients are static and easily stored. Reanalysis of the existing, deidentified data from clinical laboratories has the potential to reveal novel genes and new diagnoses (Butler et al., 2018; Mattison et al., 2018).
Interpreting variants
As sequencing technologies have evolved so has the thinking about access to one's own genetic information. Patients and their families now have unprecedented access to their genetic information through traditional providers and direct-to-consumer genotyping and sequencing companies. Families are organizing websites, blog posts and social media support groups to share genetic information and observations, and to catalyze discoveries (Might and Wilsey, 2014). MyGene2 (mygene2.org) (Table 1) is a Matchmaker Exchange-associated portal on which families can create their own profiles to share their genetic information and offer support. Like GeneMatcher, MyGene2 also allows clinicians and researchers studying a rare disorder or candidate gene to create profiles for matching.
We are witnessing a democratization of the gene discovery process. Families, armed with genetic information and a desire for answers, may directly contact basic scientists for help: ‘My son has a mutation in this gene. I read your papers, does this mutation cause his condition?’ Before hanging up the phone, consider the resources at your disposal. Of course, an appropriate response will always be to refer them to a clinical geneticist at your institution. But there is value in developmental biology researchers knowing how to interpret genetic variants (Fig. 1). A first stop should be ClinVar (Table 1), a database in which genetic variants from clinical labs are catalogued and annotated using pathogenicity guidelines from the American College of Medical Genetics and Genomics (ACMGG) (Richards et al., 2015).
The second stop should be gnomAD (Table 1), a database of 71,702 genomes (gnomAD v3) or 125,748 exomes and 15,708 genomes (gnomAD v2) from anonymous individuals from around the world (Karczewski et al., 2020 preprint). There is no phenotype information available, but individuals known to be affected by severe pediatric conditions have been removed, so these databases allow us to define ‘normal’ genetic variation in adult human populations. For rare disorders, any variants found with an allele frequency greater than 0.001-0.1% and in an affected individual are unlikely to be disease causing. As indicated earlier, analyzing allele frequencies allows us to remove variants that are too common to cause disease, but every individual carries hundreds of rare genetic variants so rarity is insufficient to classify a variant as disease-causing. In silico prediction tools (i.e. Polyphen, SIFT, Mutation Taster) lack sensitivity and specificity, but agreement across multiple tools is an additional piece of evidence considered under ACMGG pathogenicity guidelines.
Another useful metric for determining if a variant is potentially causal is to determine whether the gene is under constraint (the intolerance of a gene to heterozygous variants due to natural selection). In gnomAD, this is measured as ratios of observed counts of synonymous, missense or loss-of-function variants to expected counts based on a mutational model that considers sequence context and coverage (Karczewski et al., 2020 preprint; Samocha et al., 2014). For example, haploinsufficient genes, such as those that cause dominant Mendelian syndromes, are depleted for loss-of-function variants in gnomAD, whereas genes that are dispensable, such as olfactory receptors, are unconstrained. What this means for establishing pathogenicity of a variant for a congenital disorder is that identifying a rare loss-of-function variant in a gene with high constraint is supportive evidence of pathogenicity. To give some benchmarks for these constraint scores or observed/expected ratios (o/e): a gene with a loss-of-function o/e of 0.4 indicates that only 40% of the expected loss-of-function variants were observed and therefore is likely under selection against loss-of-function variants. In clinical and research interpretation of Mendelian and rare disorders, it is suggested to use the o/e<0.35 if a hard threshold is needed to identify a constrained gene (lower o/e scores are even better).
Variant interpretation requires careful assessment of multiple lines of evidence on allele frequency in large population databases, properties of the mutated gene and in silico pathogenicity predictions. The computational tools and resources described here can be helpful in prioritizing variants but, ultimately, functional studies in model systems are a crucial piece of evidence in this endeavor.
What is not solved
Since the advent of exome sequencing, clinical genetics prognosticators have discussed the day when all Mendelian disease genes will be identified. The identification of disease genes and recapitulation of the human phenotype in a model system should not be the end point for disease gene discovery. In addition to the still unknown fraction of disease attributable to the noncoding portion of the human genome (Smedley et al., 2016), we do not yet understand the extent to which allelic and locus heterogeneity, the presence of pathogenic variants at multiple loci and common variants influence Mendelian disorders. The same is true for non-Mendelian disorders, which are even more complex. A more precise understanding of variable expressivity and penetrance will be necessary to completely realize the goals of genomic medicine. There is therefore a need for the development of high-throughput assays to test human variants for individual genes and non-coding regulatory elements (Findlay et al., 2018; Kvon et al., 2020). Not only could these assays distinguish benign from pathogenic variation, quantitative assays may sort out differences in disease severity. The implementation of these assays is predicated on knowing about how these genes are regulated, the function of their encoded proteins and the mechanisms by which they cause human disease, insights that are best gleaned from bench science and model organisms. Large-scale efforts from the International Mouse Phenotyping Consortium (Meehan et al., 2017) and the Deep Genome Project (Lloyd et al., 2020) to annotate and catalog the function of human gene orthologs in the mouse genome form a foundation for these studies. Human disease is also causally influenced and modified by environmental exposures. For congenital disorders, the interplay of gene and environmental exposures necessarily must be studied in model systems, which may include a combination of traditional animal models and advanced in vitro models such as organoids (Lancaster and Knoblich, 2014) and microphysiological models (Wikswo, 2014).
Sequencing and gene discovery in human disease and congenital disorders will continue at a rapid pace for the foreseeable future. An interdisciplinary effort is essential to understand the etiology of congenital disorders and elucidate pathogenic mechanisms. Functional follow-up of variants is necessary, time-consuming and costly, making it even more important that geneticists and developmental biologists work together to prioritize variants for the most impact. Making the most of these publicly available sequence resources will limit duplication of effort and, as the Matchmaker model shows, the path from discovery to functional studies need not be unidirectional.
Acknowledgements
Many thanks to Tamara Caspary and Rob Lipinski for helpful discussions in preparing this manuscript, and to the many study participants and colleagues from all disciplines that make our research on craniofacial birth defects meaningful.
Footnotes
Funding
The authors' research is supported by the National Institutes of Health (DE025060, DE027193). Deposited in PMC for release after 12 months.
References
Competing interests
The authors declare no competing or financial interests.