Understanding how genes are expressed in the correct cell types and at the correct level is a key goal of developmental biology research. Gene regulation has traditionally been approached largely through observational methods, whereas perturbational approaches have lacked precision. CRISPR-Cas9 has begun to transform the study of gene regulation, allowing for precise manipulation of genomic sequences, epigenetic functionalization and gene expression. CRISPR-Cas9 technology has already led to the discovery of new paradigms in gene regulation and, as new CRISPR-based tools and methods continue to be developed, promises to transform our knowledge of the gene regulatory code and our ability to manipulate cell fate. Here, we discuss the current and future application of the emerging CRISPR toolbox toward predicting gene regulatory network behavior, improving stem cell disease modeling, dissecting the epigenetic code, reprogramming cell fate and treating diseases of gene dysregulation.
Every organism is endowed with a distinct genome sequence that holds the instructions to build a body plan consisting of thousands of different cell types capable of carrying out distinct functions and responding to changing external conditions. Understanding how the genome orchestrates cell type- and condition-specific expression of genes is one of the major challenges of developmental biology.
Deciphering gene regulation has lagged behind deciphering gene function because it is an inherently more difficult problem. Gene function relies on a simple genetic code of 64 codons that was cracked in the 1960s (Nirenberg et al., 1966) and a splicing code based on singular splice donor and acceptor consensus motifs that, although more complex than the genetic code, is also largely understood (Barash et al., 2010). It is now possible to predict with decent accuracy which coding and splicing mutations will or will not disrupt gene function (Jaganathan et al., 2019; Kircher et al., 2014). Gene regulation, meanwhile, has not been as easy to solve because it relies on more diverse distributed coding principles. There are over 1600 human transcription factors (TFs), each of which recognizes a set of sequences (motifs; see Glossary, Box 1) based on thermodynamic features of protein-DNA binding affinity (Lambert et al., 2018). This property makes the TF-binding code orders of magnitude more complex than the genetic and splicing codes. Moreover, individual TF motifs are rarely – if ever – sufficient to enable binding in the genome owing to competition with histones, which hinder accessibility of TFs to DNA; rather, TFs bind combinatorially, using principles that have not been well elucidated (Reiter et al., 2017). Compounding this issue of combinatorics, most TFs have cell-type-specific expression, so valid cofactor combinations vary substantially depending on context. The binding of a TF to an individual DNA regulatory region, in turn, tends to induce only modest impacts on gene expression. Here, again, combinatorics among gene regulatory regions (see Glossary, Box 1) is required to fully explain the expression of any given gene. Finally, gene regulatory interactions are only loosely spatially limited within megabase-scale topologically-associating domains (TADs; see Glossary, Box 1) (Gonzalez-Sandoval and Gasser, 2016), making it difficult to determine a priori to which gene(s) a regulatory element belongs. Altogether, the cognate prediction task that is relatively well solved for gene function and splicing – how a given non-coding motif or mutation will alter gene regulation – remains unsolved.
Base editing: CRISPR-based technique in which Cas9-nickase is attached to a deaminase enzyme (cytosine deaminases or adenosine deaminases), which catalyzes the conversion of targeted C:G or A:T base pairs into T:A or G:C, respectively.
CRISPR-Cas9: Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated protein 9 (Cas9) is a genome editing tool, adapted from naturally occurring bacterial defense system against viruses, and directed to its target DNA region with a single guide RNA sequence (sgRNA) that is able to bind Cas9 nuclease and target DNA.
CRISPRa/i: Gene activating/inhibiting CRISPR tools in which transcriptional activators/inhibitors are attached to dCas9.
dCas9: ‘Deactivated’ or ‘dead’ Cas9, which is mutated in the catalytic domain so that it binds but does not cleave the target DNA.
Enhancer: A regulatory DNA region, which may be located distally to target genes, that activates gene expression by binding transcriptional activators.
Epistatic relationship: Interaction between two or more different components, in which the combined function is distinct from the additive function of each individual component.
eRNA: Enhancer RNA, a short non-coding RNA transcribed from an enhancer region.
Gene regulatory regions/elements: Usually non-coding DNA segments such as enhancers, silencers, promoters, topological elements (see below) or insulators that regulate gene expression through interaction with transcription factors and non-coding RNAs.
Genome-wide CRISPR-Cas9 knockout screens: An experimental approach in which sgRNAs are designed to target every gene in the genome, facilitating discovery of key genes or genetic sequences that elicit a specific function or phenotype through Cas9-induced genetic loss-of-function.
GRN: Gene regulatory network comprises a set of interacting trans-regulatory elements such as transcription factors and cis-regulatory elements such as enhancers and promoters that control a biological process such as differentiation.
GWAS: Genome-wide association studies, which use matched genotype and phenotype information from large cohorts of human patients to associate common genetic variants with physiological and disease phenotypes.
Motif: A short sequence of DNA that recurs within the genome and has a biological function, such as binding a transcription factor.
PAM: The protospacer adjacent motif is a short DNA motif in the genome following the sgRNA-targeted sequence, which is necessary for specific recognition of the target genome by Cas9.
PIC-seq: A single-cell sequencing strategy to elucidate the sequence information of physically interacting cells.
Prime editing: A recent CRISPR-based genome editing platform, in which Cas9-nickase is attached to a reverse transcriptase. When paired with a prime editing guide RNA that is extended to include a primer and template for the reverse transcriptase, prime editing is capable of inducing all possible base-to-base conversions and short indels without requiring a double-strand break or a distinct template.
Promoter: DNA region necessary to initiate proximal transcription by binding RNA polymerases, general transcriptional machinery and transcription factors.
Regulatory sequences: Transcription factor-binding sites within regulatory regions that mediate gene regulatory function of regulatory regions or elements.
RNAi: RNA interference is a process in which transcripts are targeted by small complementary RNA, which eventually leads to transcript degradation and gene knockdown.
Silencer: A regulatory DNA region that binds repressors to decrease the likelihood of transcription.
TADs: Topologically associating domain, a region of the genome in which the contained sequences are more likely to physically interact with each other.
Topological elements: Boundary elements which constrain the physical interaction of chromosomal regions, such as TADs.
Improving our understanding of gene regulation would have a major impact on our ability to understand, and control, cellular identity and function. Stem cell differentiation relies on recapitulating the gene regulatory process that occurs during development, and improved understanding of gene regulation should lead to improved ability to manipulate stem cell differentiation and reprogramming (Cherry and Daley, 2012; Cohen and Melton, 2011; Qian et al., 2020; Xu et al., 2015; Yeo et al., 2020; Zhou et al., 2020). Beyond cell fate engineering, non-coding genomic sequence variants, which are largely thought to act through gene regulation, underlie the bulk of genetic risk of disease (Khera et al., 2018; Nishizaki and Boyle, 2017). Again, understanding gene regulatory principles would help us predict, and possibly correct, disease-inducing non-coding sequence changes.
In this Review, we advance the view that the emergence of CRISPR-Cas9 (see Glossary, Box 1) genome editing has the potential to accelerate progress towards deciphering gene regulation. Gene regulation research has traditionally been chiefly observational, with large efforts to map transcriptional differences among cell types (He et al., 2020; Regev et al., 2017; Wilbrey-Clark et al., 2020), chromatin states that correlate with gene regulatory status (Roadmap Epigenomics Consortium et al., 2015; Stunnenberg et al., 2016) and TF-binding sites (The ENCODE Project Consortium, 2012). Previous perturbational approaches have either lacked precision, such as studies of animal-wide TF knockouts, or have relied on empirically derived sets of sequence variants, such as analysis of quantitative trait loci (Aguet et al., 2017; Degner et al., 2012; McVicker et al., 2013) that are biased by evolutionary pressures, limited in their ability to dissect the function of sequences in linkage disequilibrium and lacking the scale to examine the full spectrum of possible sequence changes. CRISPR-Cas9 has begun to transform the study of gene regulation, allowing for precise, unbiased manipulation of gene regulatory sequences (see Glossary, Box 1) and the alteration of the expression of any gene at will. This ability has already led to the discovery of new paradigms in gene regulation and, as new CRISPR-based tools and methods continue to be developed and perfected, promises to lead to even more insights into how genes are regulated and cells are specified during development. Here, we introduce the different tools and discuss how they can be applied to further our understanding of gene regulation in development and differentiation. We then discuss how CRISPR-based tools can be employed to precisely control gene expression in experimental systems and disease therapeutics.
The genome editing toolbox
The CRISPR genome editing toolbox has expanded substantially over the last few years since the first demonstration of CRISPR-Cas9-nuclease genome editing in mammalian cells (Cong et al., 2013; Jinek et al., 2013; Mali et al., 2013). Adapted from a bacterial immune system, the CRISPR system has been co-opted for use in mammalian genome editing by engineering single guide RNAs (sgRNAs) that complex with a CRISPR-associated nuclease protein, most commonly Cas9, to induce double-stranded breaks (DSBs) at a specific location in the genome, specified by the sequence of the sgRNA and a neighboring protospacer-adjacent motif (PAM; see Glossary, Box 1) (Fig. 1; Fig. 2A).
Improvements to CRISPR-Cas specificity
Initially, CRISPR editing was restricted to sequences proximal to the PAM. Now, however, a plethora of Cas9 and Cas12a variants from distinct bacteria and phages have been characterized, each with distinct PAMs (reviewed by Manghwar et al., 2019; Pausch et al., 2020). In addition, Cas9 variants have been engineered with ‘looser’ PAM recognition that serve to broaden the targeting range (Kleinstiver et al., 2015a,b; Nishimasu et al., 2018; Walton et al., 2020). Furthermore, variants with decreased off-target editing have been developed, although they concomitantly reduce editing efficiency (Kim et al., 2020; Kleinstiver et al., 2016; Slaymaker et al., 2016). Cas9-nucleases now exist to target nearly any PAM sequence with near-complete editing efficiency in most cell types.
The spectrum of insertion/deletion mutations (indels) generated by Streptococcuspyogenes SpCas9-nuclease, the most widely used and well-characterized Cas9 protein, is highly predictable given the target sequence and cell type. Therefore, an appreciable fraction of genomic targets yield precise outcome distributions dominated by a single indel product (Allen et al., 2019; Shen et al., 2018). There is now a convenient webtool (http://indelphi.giffordlab.mit.edu/) capable of predicting the outcome distribution for any sgRNA, and the ability to predict outcome distributions enables the design of sgRNAs that maximize frameshifts or disruption of genomic features of interest (Shen et al., 2018).
Prime editing and base editors
Prime editing (see Glossary, Box 1) employs a fusion of Cas9-nickase and a reverse transcriptase enzyme to enable precise genotypic replacement and short insertions (≤44 nt) or deletions (≤80 nt) with a substantially improved ratio of desired to undesired editing genotypes (Anzalone et al., 2019) (Fig. 2B). However, prime editing currently has low editing efficiency (<30% of alleles) and often requires substantial optimization of each targeting strategy. Base editing (see Glossary, Box 1), in which Cas9-nickase is fused to an enzyme capable of chemically transforming nucleotide identity, has similarly been shown to mediate nucleotide transitions with high efficiency and specificity (Gaudelli et al., 2017; Komor et al., 2016) (Fig. 2B).
In addition to CRISPR-Cas9 mutagenesis, CRISPR-Cas9 has also been used to epigenetically modify genomic regions through the fusion of nuclease-dead Cas9 (dCas9; see Glossary, Box 1) variants with epigenetic modifying agents (reviewed by Pickar-Oliver and Gersbach, 2019), including epigenetic silencers (CRISPRi; Fig. 2C) and transcriptional activators (CRISPRa; Fig. 2D) (see Glossary, Box 1).
dCas9 can modestly inhibit transcription in human cells when directed to the proximal promoter or coding region of a gene by sterically hindering the transcriptional machinery or RNA polymerase processivity, respectively (Qi et al., 2013). To further enhance the efficiency of repression in mammalian cells, various transcriptional repression domains have been tethered to dCas9 among which KRAB (Krüppel-associated box) domains provide the most consistent and effective transcription repression (Gilbert et al., 2013). dCas9-KRAB efficiently silences transcription when targeted to the promoter (see Glossary, Box 1) (Gilbert et al., 2013) and/or enhancer (see Glossary, Box 1) (Thakore et al., 2015; Gao et al., 2014) of a target gene, and CRISPRi has been shown to provide consistent repression across most genes in the genome (Gilbert et al., 2014; Horlbeck et al., 2016).
Ectopic gene activation has been achieved by recruiting transcriptional activator domains through direct or indirect fusion to dCas9. Initial CRISPRa platforms used direct dCas9 fusion of four copies of the Herpes simplex virus protein VP16 (VP64) (Gilbert et al., 2013; Perez-Pinera et al., 2013; Maeder et al., 2013; Farzadfard et al., 2013). As these first generation CRISPRa platforms gave weak and inconsistent gene activation, different activator domains have been fused together and recombined to dCas9 either directly or indirectly through antibody and RNA-binding protein intermediaries (known as second generation CRISPRa systems), significantly improving the activation of target genes (Chavez et al., 2015, 2016; Konermann et al., 2015; Tanenbaum et al., 2014) (Fig. 2D). Nonetheless, CRISPR-based gene activation is still inconsistent and weak at many loci.
CRISPR-based regulation of RNA
Finally, the last few years have seen rapid advances in the use of RNA-modulating CRISPR-based approaches (reviewed by Smargon et al., 2020). Such approaches enable cleavage, editing, fluorescent tracking and pulldown of specific transcripts. In particular, the advent of RNA-modulating CRISPR enzymes of the Cas13 family (O'Connell, 2019) promises to improve our understanding of the roles of RNA in gene regulation. Cas13 enzymes have been shown to cut RNA (Abudayyeh et al., 2017), edit adenosine to inosine in RNA (Cox et al., 2017) and image RNA (Yang et al., 2019). Altogether, since the first demonstration of CRISPR-Cas9 genome editing in mammalian cells (Cong et al., 2013; Jinek et al., 2013; Mali et al., 2013), a powerful and varied toolbox has been constructed, facilitating precise and scalable manipulation of the genome.
Understanding gene regulation
Dissecting gene function
Understanding the genetic underpinnings of cell fate acquisition during development has been a longstanding goal in developmental biology. Traditionally, gene function has been unraveled based on phenotypic data from forward and reverse genetic screens in model organisms. For example, projects such as the International Mouse Phenotypic Consortium have now generated knockouts of over 6000 genes in mice (Dickinson et al., 2016; Cacheiro et al., 2019), largely using techniques that predate CRISPR-Cas9, such as ENU mutagenesis. Such organism-wide and tissue-specific knockouts of TFs have complemented techniques, such as chromatin immunoprecipitation sequencing (ChIP-seq), that measure TF binding to reveal roles for hundreds of TFs in organismal development (Spitz and Furlong, 2012). CRISPR-Cas9 has largely replaced traditional methods in the production of transgenic and knockout animals given its rapid timeline and high editing efficiency (Burgio, 2018). More transformatively, CRISPR-Cas9 has expanded the scope of efforts to understand how gene networks control cell fate acquisition.
In addition to animal models, human pluripotent stem cells (hPSCs), which can differentiate into any human lineage, provide an in vitro model system to dissect the genetic underpinnings of human cell fate acquisition. Early studies employed RNAi-based loss-of-function (RNAi; see Glossary, Box 1) screens to dissect the transcriptional programs driving hPSC specification (Hu et al., 2009). The ease, precision and scalability of CRISPR-Cas9 has improved upon this screening approach (Fig. 3A). Such screens have used flow cytometric sorting in order to identify sgRNAs that enhance or inhibit specific differentiation outcomes, either through the introduction of a GFP reporter transgene or through the use of antibodies that report on expression of a lineage-specific gene (Li et al., 2019; Parnas et al., 2015; Xu et al., 2020). Through this approach, genome-wide CRISPR-Cas9 knockout screens (see Glossary, Box 1) have dissected genetic underpinnings of human pluripotency states (Fu et al., 2019; Li et al., 2018), as well as revealing novel roles for JNK/Jun signaling in endoderm differentiation and ZIC2 in cardiac differentiation (Li et al., 2019; Xu et al., 2020).
Directed differentiation paradigms also allow the interrogation of later stages of differentiation. To this end, a doxycycline (dox)-inducible Cas9 (iCas9) system has been developed to induce gene knockout at any chosen time point, which allows investigation into genes specifically required at distinct stages of differentiation (Boyle et al., 2017; Khera et al., 2018; Zhu et al., 2016) (Fig. 3A). In addition to studying particular differentiation paths, CRISPR-Cas9 can be combined with emerging gastruloid models (Moris et al., 2020) to study the genetic underpinnings of human early embryonic patterning in a scalable and representative model system, which has traditionally been difficult (Shahbazi, 2020). The flexibility of CRISPR-Cas9 approaches to introduce genetic perturbations at scale at defined developmental stages opens the door to genetic dissection of human developmental processes that were previously impervious to such probing.
Although marker gene readouts are limited in what they reveal about the causes of aberrant differentiation, a revolution in the ease, throughput and affordability of single-cell RNA-sequencing (scRNA-seq) opens the door to unraveling gene networks in finer detail. By linking single-cell transcriptomes to corresponding sgRNA perturbations, it is now possible to perform CRISPR-Cas9 screens using scRNA-seq as the readout (Adamson et al., 2016; Dixit et al., 2016; Jaitin et al., 2016) (Fig. 3A). Thus, instead of monitoring single marker genes, the response of entire gene regulatory networks (GRNs; see Glossary, Box 1) to genetic perturbations can be analyzed in high-throughput. Such approaches have been employed to group gene functions by the pathway in which they act, to refine regulatory networks by comparing transcriptomic phenotypes for a group of gene knockouts, and to understand defects in stem cell differentiation (Adamson et al., 2016; Dixit et al., 2016; Genga et al., 2019; Jaitin et al., 2016). In addition, dual-sgRNA screening, in which combinations of multiple sgRNA-induced perturbations are introduced in the same cell, have been applied to investigate how genes interact with each other (epistatic relationships; see Glossary, Box 1) to alter cell state (Adamson et al., 2016) (Fig. 3A). This multiplexing capability is crucial to dissect the complicated and often redundant functions of genes within signaling pathways.
However, the quality of data provided from current scRNA-seq approaches is heterogeneous; unique transcripts can be exceedingly low for lowly-expressed genes (such as many cell fate-determining TFs) (Zhang et al., 2019a), and approaches such as CRISPR-Cas9-nuclease targeting do not produce uniformly disruptive mutations in every cell (Shi et al., 2015). As a result, proper computational analysis that accounts for such noise and dropout is crucial (Luecken and Theis, 2019), and scRNA-seq is currently best-suited to applications for which phenotypes involve large changes to the transcriptome, not changes in expression of individual genes. We anticipate that the pairing of CRISPR-Cas9 and scRNA-seq will shed light on key questions such as why defined stem cell differentiation paradigms sometimes yield non-homogeneous end products (Veres et al., 2019), how genetic mutations in the same set of developmental genes cause a range of phenotypic severity from organ agenesis to incompletely penetrant adult-onset disease (Ashcroft and Rorsman, 2012), and what molecular mechanisms underlie common disease-associated genetic variants (Box 2).
Most common diseases (e.g. cancer, heart disease, autoimmune disease, type II diabetes) have a substantial genetic risk component. Recent research has converged on the finding that such diseases are more often caused by a combination of multiple low impact variants (polygenic risk) rather than one high-effect variant alone, as is the paradigm in rare monogenic diseases (Boyle et al., 2017; Khera et al., 2018; Nishizaki and Boyle, 2017). In order to identify genetic variants associated with risk of complex diseases, one common tool is the genome wide association study (GWAS) (Tam et al., 2019), in which each variant is tested for correlation with disease in a large cohort of often >100,000 people. Although such screens identify dozens to hundreds of genomic loci associated with common diseases (e.g. Crohn's disease, coronary artery disease, type II diabetes), it can be difficult to interpret disease-associated variants (daVs) found from GWAS given that >90% of daVs are in non-coding regions, so it is difficult to connect daVs to impacted genes. In addition, co-inherited daVs can be in strong linkage disequilibrium, and therefore there are often multiple linked variants that correlate equivalently to disease, complicating identification of causal daVs through GWAS data alone. As a group, daVs are enriched in cis-regulatory elements, indicating that these variants often act through regulating gene expression (Gallagher and Chen-Plotkin, 2018). CRISPR-Cas9 is a useful tool in deciphering the function of GWAS variants. CRISPR-Cas9 tools can be used to install candidate daVs to study their effects on gene regulation through functional assays such as ATAC-seq, CHIP-seq, RNA-seq, protein expression, or molecular and cellular assays (Smith et al., 2018). Understanding the gene regulatory functions of daVs is a frontier in understanding the etiology of and proposing therapies for common genetic diseases.
Developmental patterning is driven by cell-cell communication driving cis-regulatory binding changes that alter expression of TFs that further alter cellular gene expression and function. To study cell-cell communication, a set of emerging single-cell technologies has unlocked other phenotypic measurements that can be productively paired with CRISPR-Cas9 screening (Fig. 3A). In PIC-seq (see Glossary, Box 1), physically interacting cells (PICs) are sorted by flow cytometry and their combined RNA transcriptomes are sequenced and compared with their single-cell transcriptomes. Computational modeling is then used to map interactions between cells (Giladi et al., 2020). This method could be combined with perturbing signaling pathways to determine how intercellular signaling impacts cell-cell interactions, for example in Notch pathway-driven processes (Henrique and Schweisguth, 2019) and in stem cell-niche interactions (Centonze et al., 2020; Chacón-Martínez et al., 2018).
To investigate changes in cis-regulatory binding, droplet single-cell assay for transposase-accessible chromatin using sequencing (dscATAC-seq) can be used to measure accessible chromatin at single-cell resolution (Lareau et al., 2019). If combined with CRISPR, this method could be used to investigate the role of altering signaling pathways, TFs and cis-regulatory elements on chromatin accessibility in high-throughput. The combination of CRISPR-Cas9 screening and sensitive single-cell measurement promises to elucidate such networks in greater detail than has been possible (Davidson, 2010).
Identifying and characterizing regulatory regions
Tiled CRISPR screens
Our ability to predict how changes to non-coding regions will impact cellular gene expression and function is limited. As non-coding variation within and between species is a major driver of disease and evolution, respectively (Banerjee and Sherwood, 2017; Nishizaki and Boyle, 2017; Reilly and Noonan, 2016; Tickle and Urrutia, 2017), it is imperative to improve our interpretation of the regulatory genome.
To address these questions, a new class of high-throughput assay has been developed in which CRISPR sgRNAs non-specifically tile or target specific candidate regulatory regions, followed by measurement of resulting changes in gene expression (Canver et al., 2015, 2017; Diao et al., 2016, 2017; Fulco et al., 2016, 2019; Gasperini et al., 2019; Klann et al., 2017; Korkmaz et al., 2019; Rajagopal et al., 2016) (Figs 2A and 3B). These approaches provide a platform to assess whether putative regulatory regions truly cause changes in gene expression. Such assays vary in the type of CRISPR-based manipulation induced at the target site and the readout method, tailoring the assays to measure distinct activities.
For example, Cas9-nuclease tiling assays enable pinpointing of regulatory regions within ∼20 nucleotide regions because this is the size of a typical indel (Canver et al., 2015; Korkmaz et al., 2016; Rajagopal et al., 2016; Sanjana et al., 2016). This fine resolution is helpful to resolve effects of individual binding motifs; however, indels induced by a given sgRNA are variable (Shen et al., 2018; van Overbeek et al., 2016), leading to a high degree of variability in the phenotypes that can result from the activity of each sgRNA. Paired sgRNAs can be used to create defined deletions, excising candidate regulatory regions (Diao et al., 2017), which has a benefit of precisely defining the bounds of the candidate region with the downside that Cas9-induced deletions often do not constitute the majority of edited products (Figs 2A and 3B). In addition, CRISPRi provides a tool to perform uniform epigenetic repression because, in contrast to Cas9-nuclease, a given sgRNA should induce the same phenotype in every cell (Fulco et al., 2016; Gasperini et al., 2019) (Fig. 3B). However, CRISPRi (see Glossary, Box 1) has a resolution limit of ∼200-500 bp because of the spreading of epigenetic repression, preventing dissection of the role of sequence features within regulatory elements (Fulco et al., 2019).
Conversely, CRISPRa screening methods allow probing of genomic regions capable of acting as distal enhancers (Klann et al., 2017) (Fig. 3B), but the generalizability of CRISPRa effectors is not well-established, warranting caution in the interpretation of such data. Likewise, several methods for evaluating gene expression phenotypes have been used, including cell survival and drug resistance when targeting regulatory regions associated with genes essential to such traits, fluorescent reporter gene expression, cell surface protein expression, RNA fluorescence in situ hybridization (FISH)-based FlowFISH and scRNA-seq (Fig. 3A).
Several computational pipelines have been developed to analyze data from tiled CRISPR screens. Tiled non-coding CRISPR screens differ from gene-targeting CRISPR screens based on the assumptions that can be made about expected outcomes. In gene-targeting screens, it is expected that sgRNAs that target the same gene will yield identical phenotypes. This assumption is not entirely true in practice because sgRNAs can differ in their editing efficiency (Kim et al., 2019b), their induction of frameshifts (Shen et al., 2018) and the importance of the targeted region of the gene (Shi et al., 2015); nonetheless, computational pipelines have been purpose-built to analyze gene-targeting CRISPR screening data effectively, including MAGeCK (Li et al., 2014) and BAGEL (Hart and Moffat, 2016). A distinct set of assumptions is required to facilitate interpretation of tiled non-coding CRISPR screen data. Unlike in gene-targeting screens, there are no sets of sgRNAs that can be assumed to yield identical phenotypes; instead, partial phenotypic correlation is expected among neighboring sgRNAs that target the same regulatory element. However, where such elements begin and end in the genome and how much correlation to expect among different sgRNAs targeting a given element are not well known. Moreover, the expected phenotypic correlation depends strongly on the editing mode. CRISPRi tends to induce relatively uniform epigenetic repression of an entire regulatory element, whereas Cas9-nuclease can yield variable phenotypes at neighboring sgRNAs because neighboring sgRNAs may fail to disrupt the same TF-binding motif (Hsu et al., 2018; Rajagopal et al., 2016). With these parameters in mind, two pipelines, CRISPR-SURF (Hsu et al., 2018) and RELICS (Fiaux et al., 2020), have been designed to analyze tiled non-coding CRISPR screens. These platforms account for the expected partial correlation among neighboring sgRNAs and allow semi-supervised adjustment of the size of genomic regions that are expected to share phenotypic outcomes.
Insights from tiled CRISPR screens
These methods have allowed for the identification of new types of regulatory regions and the design of enhanced predictive models of regulatory function. For example, several such screens have converged on the regulatory importance of the promoters of neighboring genes (Fulco et al., 2016; Rajagopal et al., 2016), complementing computational studies that demonstrate widespread co-expression of neighboring genes (Mihelčić et al., 2019). In addition, a number of studies have shown that disrupting CTCF TF-binding sites through CRISPR-Cas9 editing is able to reconfigure long-range chromatin looping of entire chromosomal territories (Guo et al., 2015; Khoury et al., 2020; Korkmaz et al., 2019; Tarjan et al., 2019). This evidence converges on the idea that gene expression may be more influenced by shared regulatory regions within chromosomal neighborhoods than previously believed.
In addition, such screens have refined prediction of which regulatory elements are likely to influence the expression of a nearby gene and to what extent. By analyzing thousands of enhancer-gene contacts, Fulco and colleagues developed the activity by contact (ABC) model, which calculates the quantitative effect of a regulatory element on expression of a nearby gene based on the strength of the region as an enhancer (measured by chromatin accessibility and H3K27 acetylation) and estimates contact frequency between the regulatory element and target gene promoter in that cell type (determined by Hi-C, a technique that measures three-dimensional architecture of chromatin) (Fulco et al., 2019). Although predictions are highly cell-type specific and rely on additional epigenomic and Hi-C data that is not available for most cell types, the ABC model is useful for predicting the relevance of genetic variants in non-coding regions. A high-throughput enhancer-targeting CRISPRi screen that used scRNA-seq as readout similarly found that proximity to the target gene and strong Hi-C contact frequency with the target promoter are correlated with enhancer activity (Gasperini et al., 2019).
Perturbational approaches promise to further advance our understanding of gene regulation. Ultimately, a predictive model of gene expression should enable prediction of the level of expression of a gene, based solely on the adjacent regulatory genomic sequence and the gene expression profile of a cell. In the ABC model, the difficult parts of such modeling are skipped over by taking into account epigenomic data and contact frequency, which integrate binding of typically dozens of TFs at a regulatory region (Partridge et al., 2020; Sherwood et al., 2014). How such factor combinations lead to transcriptional activation (Smith et al., 2013), how distal regulatory elements interact in sometimes specific ways with neighboring promoters presumably because of specific TF-mediated interactions (Zabidi et al., 2015) and how multiple distal regulatory elements interact to contribute to expression of a given gene remain outstanding challenges in the field.
A recent CRISPRi enhancer tiling study has made the intriguing observation that epigenetic inhibition of pairs of enhancers adjacent to the same gene provides more repression than would be expected from adding the repressive effects of each individual enhancer (Xie et al., 2017). This analysis was only performed at two genes, so it remains to be seen whether non-linear combinatorial activation is the exception or the rule for enhancer function. We anticipate that CRISPR-based tools will contribute greatly to understanding these questions, eventually leading to a more specific lexicon of gene regulation. Instead of merely grouping regulatory elements into rough classes such as enhancers, promoters and silencers, we anticipate that a much more rich understanding of how specific TFs and combinations thereof orchestrate lineage- and stimulus-specific transcriptional dynamics will emerge.
To improve predictive resolution from the level of kilobase-scale regulatory elements to the level of individual <10 nucleotide TF-binding sites requires editing approaches with nucleotide-level precision. Pairing tiled Cas9-nuclease screening with a computational model such as inDelphi (Shen et al., 2018) that predicts the range of indels created by Cas9-nuclease editing would improve our understanding of which genomic elements are likely to be altered by each sgRNA. Exploration of screens using other Cas species such as Cas12a and Cas3 that produce differently sized indels could also be worthwhile (Dolan et al., 2019; Morisaka et al., 2019; Zetsche et al., 2015). In addition, precise CRISPR targeting methods, such as base editing and prime editing (Anzalone et al., 2020), promise to improve our ability to monitor the regulatory consequences of precise genome modifications.
In addition to CRISPR-based approaches that alter the native genomic sequence, massively parallel reporter assays (MPRAs) provide an important complementary approach to dissect regulatory element function. MPRAs are widely used to test the transcriptional regulatory activity of thousands of candidate regulatory sequences in parallel through linking such regulatory activity with the transcription of unique barcode sequences (Arnold et al., 2013; Inoue and Ahituv, 2015; Melnikov et al., 2012). MPRAs have been used to identify differences in transcriptional output associated with pathogenic genomic variants (Tewhey et al., 2016; van Arensbergen et al., 2019). Two caveats of MPRAs is that they have traditionally been performed on plasmids, which lack the native chromatin environment of the genome, and they have typically assessed transcriptional output but not the underlying mechanistic components of gene regulation. CRISPR-based homology-directed repair (HDR) provides solutions to both of these issues, enabling study of large numbers of variant regulatory elements in defined genome-integrated settings. Using CRISPR-based HDR, thousands of synthesized ∼100 bp DNA sequences can be integrated at a fixed locus in mouse embryonic stem cells and the binding strength of a TF at each sequence can then be measured. Recent work has demonstrated the feasibility and power of this approach: one study has found that the binding of the TF Tcf7l2 depends on the local level of chromatin accessibility provided by cofactors, as well as whether Tcf7l2 binds on the same side of the DNA helix as its cofactors Oct4 and Klf4 (Szczesnik et al., 2020). Another recent study used a similar technique to measure changes in chromatin accessibility induced by the integration of thousands of 100-bp genomic sequences, revealing that such short sequences are sufficient to induce differential accessibility between stem cells and differentiated endoderm (Hammelman et al., 2020). Sequence determinants of TF binding and chromatin accessibility can now even be measured at single-allele resolution through marking accessible bases with adenine methylation and performing PacBio long-read sequencing (Stergachis et al., 2020). This, in combination with measurements of transcriptional activation, will provide a powerful complement to CRISPR screens to dissect the sequence features governing regulatory element activity and how such elements are influenced by disease-associated non-coding variants found in genome-wide association studies (GWAS; see Glossary, Box 1; Box 2).
Post-transcriptional regulation of gene expression and regulatory RNAs
CRISPR-Cas9 tools have also been used to explore roles of non-coding RNAs and have shed light on the role of enhancer transcripts (eRNAs; see Glossary, Box 1), which are known to be produced from most active enhancers (Kim et al., 2010) (Fig. 3C). In spite of their abundance, it has been controversial whether eRNAs are causally important to gene transcription or whether they are inert byproducts of the recruitment of RNA polymerase to gene regulatory loci (Andersson et al., 2014; Henriques et al., 2018; Young et al., 2017). A recent study has found that loss of the RNA Polymerase II elongation factor SPT5 (also known as Supt5 or Supt5h) coordinately decreases the expression of a subset of genes and eRNA transcripts at linked enhancers. This eRNA loss is causal because when the CRISPRa tool dCas9-VPR is recruited to a distal enhancer in SPT5-depleted cells, it drives eRNA transcription that restores neighboring gene expression (Fitz et al., 2020). This finding has been supported by another recent study showing that eRNA induction through distal enhancer-targeted CRISPRa upregulates local gene expression in a breast cancer model (Zhang et al., 2019b). The authors suggest that eRNA transcription facilitates enhancer-promoter interaction at pre-activated enhancers, although it is currently difficult to disentangle eRNA-mediated effects from distal gene activating effects of CRISPRa recruitment. Recent work has bolstered the idea that eRNAs enhance gene activation by showing that low-level RNA production, as is produced by enhancer transcription, increases the formation of transcription-promoting condensates, which are subsequently dissolved by robust gene transcription (Henninger et al., 2021). Although enticing, none of the evidence presented above is conclusive, and CRISPR-based approaches promise to play an outsized role in deciphering the roles, if any, of eRNAs.
In addition to elucidating roles of eRNAs, RNA-targeting CRISPR approaches promise to: (1) clarify functions of long intergenic non-coding RNAs (lincRNAs) (Engreitz et al., 2016; Kopp and Mendell, 2018); (2) study roles of RNA motifs and ribonucleoprotein complexes at specific DNA targets (Shechner et al., 2015); (3) dissect steps involved in the transcription process (Wang et al., 2019b).
Dissecting the roles of epigenetic modifications in gene expression
Epigenetic factors, such as post-transcriptional modifications of histones, DNA methylation and non-coding RNAs, contribute to cellular gene expression, differentiation potential and response to stimuli (Allis and Jenuwein, 2016). Dissecting the role of these epigenetic modifications on gene regulation has been accelerated by the ability to site-specifically alter the epigenome using the CRISPR/(d)Cas9 toolbox (Fig. 3D). dCas9 has been employed to recruit epigenetic modifiers to evaluate their functional consequences by being fused to epigenetic modifying enzymes and complexes including lysine-specific histone demethylase 1A (LSD1; KDM1A) (Kearns et al., 2015), histone acetyltransferase p300 (Hilton et al., 2015), disruptor of telomeric silencing 1-like histone H3K79 methyltransferase (Dot1l), PR/SET domain 9 (Prdm9) (Cano-Rodriguez et al., 2016), enhancer of zeste 2 polycomb repressive complex 2 subunit (EZH2) (O'Geen et al., 2019), BAF nuclear assembly factor 1 (BAF), heterochromotin protein 1 (HP1; CBX5) (Braun et al., 2017), DNA methyltransferases (DNMTs), and tet methylcytosine dioxygenase 1 (TET1) (Liu et al., 2018). These studies have collectively begun to address which epigenetic modifications lead to stable epigenetic memory and which modifications are transient cues that are rapidly overridden. This is a young field; the power of controlled site-specific genomic recruitment of domains from TFs and epigenetic regulators promises to enable much finer dissection of their specific roles in epigenetic and transcriptional modulation.
Understanding 3D genomic topology
In addition to the DNA and RNA elements involved in gene regulation, the three dimensional (3D) structure of chromosomes is known to play a major role in which genes are expressed. However, our knowledge of how this architecture is controlled and how the 3D genome impacts gene regulation is limited. CRISPR tools now enable manipulation of genome architecture (Fig. 3E). For example, by building and using a CRISPR tool named CLOuD9 (inducible and reversible chromatin loop reorganization using nuclease-deficient Cas9 and components of plant hormone ABA signaling pathway) in which chromatin loops can be dynamically manipulated in living cells, recent work has demonstrated that chromatin looping in the proper biological context is sufficient to alter gene expression (Morgan et al., 2017). Similarly, a light-activated-dynamic-looping (LADL) system, in which two genomic regions anchored by gRNAs are brought into proximity via a light-induced dCas9 fusion protein, shows that ectopically inducing proximity between an enhancer and a promoter that normally do not interact leads to a modest increase in target gene expression (Kim et al., 2019a). Finally, an approach named ‘CRISPR-GO’ (CRISPR genome organization), in which chromosomal regions can be reversibly recruited to distinct sub-nuclear compartments, has found that recruiting gene regions to the nuclear periphery or to Cajal bodies is sufficient to reduce reporter gene expression (Wang et al., 2018). These approaches add an extra dimension to gene regulatory perturbation, and as their precision is increased, they promise to improve our understanding of how gene regulation is influenced by nuclear organization.
Manipulating gene regulation
CRISPR-Cas9 is not just a powerful tool to decipher gene regulation, it also provides a method to alter gene expression and cellular state in ways that can be therapeutically relevant. The first therapeutic implementation of CRISPR-Cas9 to alter gene regulation has now been shown to be effective: following the finding that an erythroid-specific distal enhancer regulating the BCL11A gene mediates fetal hemoglobin persistence (Bauer et al., 2013), a clinical trial has now shown that CRISPR-Cas9-mediated disruption of this enhancer in hematopoietic stem/progenitor cells, followed by transplantation of edited cells into patients with β thalassemia or sickle cell disease, alleviates disease-associated clinical phenotypes (Frangoul et al., 2021). This landmark trial provides a proof-of-principle that CRISPR-based manipulation of gene regulation is more than just a research tool. CRISPR-based tools are being employed in several other settings to manipulate gene regulation in order to model and potentially treat human disease.
Modeling disease using pluripotent stem cells
Directed differentiation of induced pluripotent stem cells (iPSCs) to disease-relevant cell types has promised a revolution in understanding mechanisms of human genetic disease (Cherry and Daley, 2012). Although there have certainly been some groundbreaking insights gained through this approach (Rowe and Daley, 2019), one lesson that has emerged is that iPSCs derived from different patients – and even from the same patient – can have substantial heterogeneity at the genetic and transcriptional levels, which can make measurements between cohorts difficult (Kilpinen et al., 2017; Popp et al., 2018; Volpato et al., 2018). A complementary approach has emerged to use CRISPR-Cas9 to induce specific genetic alterations in otherwise identical iPSC genetic backgrounds (Musunuru, 2013; Musunuru et al., 2018). By specifically inducing focal genetic changes in pluripotent stem cell lines with an otherwise fixed genetic and epigenetic background, much of the noise associated with distinct genetic backgrounds and iPSC lines can be avoided. The development of base editing and prime editing techniques makes this an attractive approach to assess the causality of specific variants (or combinations thereof) with uncertain causality for particular disease phenotypes. It is important to recognize the strong disease-modifying effects of genetic background (Fahed et al., 2020) that merit inducing a given variant in several distinct iPSC backgrounds. Ideally, a bank of iPSCs with diverse genetic background and defined polygenic disease risk would be characterized and available for such experiments. It is also worth noting that inducing CRISPR-Cas9 alterations in a fixed background is much easier and more scalable than generating new iPSC cohorts for each new study, especially given the strong propensity for oncogenic and copy number alterations to occur during the iPSC derivation process (Kilpinen et al., 2017; Popp et al., 2018; Volpato et al., 2018). Overall, for each project aimed at modeling disease through iPSCs, it is worth weighing the benefits and drawbacks of installing variants through CRISPR-Cas9 as opposed to the more laborious process of generating new patient-specific iPSC cohorts.
Cellular reprogramming with CRISPRa
Cellular reprogramming, the conversion of one cell type into another, has primarily been achieved by overexpressing transgenes encoding TFs (Takahashi and Yamanaka, 2006), sometimes in combination with signaling molecules (Pagliuca et al., 2014) or knockdown of genes (Qian et al., 2020). Although this reprogramming paradigm has been successfully employed to produce distinct types of neurons and other lineages (Cherry and Daley, 2012; Vierbuchen et al., 2010), the pace at which new cell types have been reprogrammed has been slow due to the technical challenges associated with screening for and validating reprogramming cocktails. In theory, CRISPRa is ideally suited to this challenge owing to the ease of targeting any gene and multiplexing candidates in a scalable manner.
CRISPRa has been employed in iPSC and neural reprogramming paradigms (Black et al., 2016; Liu et al., 2018; Weltner et al., 2018). In the CRISPRa iPSC reprogramming work (Weltner et al., 2018), reprogramming efficiency has been increased through activating not only the known iPSC-promoting TFs (POU5F1, MYC, KLF4, SOX2 and LIN28A), but also a repetitive EGA-enriched Alu-motif (EEA-motif), which is enriched in promoters of genes expressed during human embryo genome activation (Töhönen et al., 2015), thus revealing an added benefit of CRISPRa that cannot easily be replicated using transgenes. Nonetheless, in all such paradigms, CRISPRa-based reprogramming has remained substantially less potent than transgene-based reprogramming, probably because the levels of TF induction required for efficient reprogramming cannot be reached with current CRISPRa technology. Thus, efficient CRISPR-based reprogramming remains a tantalizing prospect that appears just beyond the horizon. In the near future, more robust CRISPRa technology, paired with emerging approaches to predict TF cohorts capable of reprogramming cellular function (Cahan et al., 2014; Rackham et al., 2016), promises to usher in a new age of accelerated derivation of reprogramming cocktails to match the discovery and categorization of cell types in the human body brought on by the scRNA-seq revolution (Macosko et al., 2015; Regev et al., 2017).
CRISPRa for therapeutic correction of haploinsufficiency
Finally, there is promise that, in addition to the immense promise of CRISPR-based gene editing in treating genetic disease (Dunbar et al., 2018), CRISPR-based epigenetic modulation could have future therapeutic applications. Although modular oligonucleotide-based gene therapy approaches to reduce gene expression, such as antisense oligonucleotides and RNAi, have already progressed to the clinic (Bennett, 2019), increasing gene expression has been more challenging. Gain-of-function gene therapy uses gene delivery, which currently cannot be used for genes that exceed the ∼5 kb packaging limit of current state-of-the-art adeno-associated virus (AAV) vectors (Wang et al., 2019a). A recent study has shown that in vivo delivery of CRISPRa components targeted to gene promoters or enhancers is able to restore target gene levels enough to rescue the obesity phenotypes in mouse haploinsufficient genetic disease models induced by heterozygous mutations in the Sim1 and Mc4r genes (Matharu et al., 2019). Importantly, this study has successfully employed Streptococcus aureus Cas9, a smaller Cas9 species that fits into a single AAV vector along with an sgRNA, enabling upregulation of genes through AAV vectors, the size of which exceeds the AAV packaging limit. It is worth mentioning the caveat that a substantial fraction of people have pre-existing immunity to Cas9 (Charlesworth et al., 2019) and Cas9 has been shown to be immunogenic (Chew et al., 2016; Wang et al., 2015), so long-term therapeutic expression of Cas9 may be inadvisable clinically.
It is remarkable how fast the CRISPR-Cas9 field has progressed since it was first demonstrated in genome editing 8 years ago. A panoply of CRISPR-based tools has emerged, capable not only of cleaving DNA and RNA, but of enabling precise site-specific base alteration, epigenetic modulation, imaging and sub-nuclear rearrangement.
It is no surprise that, as each new CRISPR-based tool has emerged, it has been readily co-opted to illuminate the enigmatic code used to generate an entire organism from a fixed genome (Table 1). Genetic screens in differentiating hPSCs are filling in the blanks of which genes control the formation of each organ, enriched by the emergence of scRNA-seq to classify mutant phenotypes and approaches to correlate such phenotypes with chromatin state and the identity of neighboring cells. The complex regulatory code that dictates gene expression dynamics is beginning to be illuminated, as gene regulatory elements and their underlying TF-binding motifs can now be altered in high-throughput, allowing inference of their normal function. In addition to alterations of regulatory DNA itself, local chromatin, non-coding RNA species and 3D genome organization can all be manipulated in ever more complex ways through CRISPR-based tools. As a result, a more holistic view of the coordinated choreography involved in cell state transition is beginning to emerge.
As understanding of gene regulation improves, so does our ability to manipulate genes toward modeling and treating human disease. CRISPR-Cas9 gene regulation-modulating therapy is now in the clinic. CRISPR-Cas9 has dramatically simplified the process of modeling disease-associated mutations in pluripotent stem cells and their derivatives. More ambitiously, CRISPRa is beginning to be used to transform cellular identity through reprogramming and to correct diseases of insufficient gene expression. Such approaches are currently in their infancy, but given the pace of the CRISPR-Cas9 field, the ability to control cell fate may not be too far in the future.
In sum, the explosion of potent and precise techniques to edit genome sequence and epigenetic functionalization of this sequence is pushing us closer to true understanding of the complexities of development and the ability to precisely tweak gene regulation at will.
All figures were created with Biorender.com.
The authors acknowledge funding from the National Institutes of Health (R01HG008754, R21HG010391), American Cancer Society, American Heart Association, National Organization for Rare Disorders, Qatar Biomedical Research Institute, Nederlandse Organisatie voor Wetenschappelijk Onderzoek and the Merkin Institute for Transformative Technologies in Healthcare. Deposited in PMC for release after 12 months.
The authors declare no competing or financial interests.