ABSTRACT
Enhancers control the establishment of spatiotemporal gene expression patterns throughout development. Over the past decade, the development of new technologies has improved our capacity to link enhancers with their target genes based on their colocalization within the same topological domains. However, the mechanisms that regulate how enhancers specifically activate some genes but not others within a given domain remain unclear. In this Review, we discuss recent insights into the factors controlling enhancer specificity, including the genetic composition of enhancers and promoters, the linear and 3D distance between enhancers and their target genes, and cell-type specific chromatin landscapes. We also discuss how elucidating the molecular principles of enhancer specificity might help us to better understand and predict the pathological consequences of human genetic, epigenetic and structural variants.
Introduction
Gene expression is mainly regulated through communication between two types of non-coding regulatory elements: promoters and enhancers. Historically, promoters were defined as regulatory elements that determine where gene transcription starts (i.e. the transcription start site, TSS) and serve as binding platforms for the transcriptional machinery and general co-factors. Enhancers are key regulatory elements that bind tissue-specific transcription factors (TFs) and specifically transmit regulatory information to their target gene promoters, thus dictating where and when genes are expressed. However, these classical definitions are ambiguous, and recent genomic studies show that both elements can overlap in their functions and properties (Andersson and Sandelin, 2020). In agreement with the importance of enhancers and promoters in controlling gene expression, the malfunction of these elements can lead to aberrant gene expression profiles and disease.
Since enhancers can activate gene transcription over long genomic distances (i.e. long-range) and do not always activate their closest target genes, uncovering how enhancers specifically communicate with their correct target gene and not others is still a major challenge in the field. The genome is organized into large megabase (Mb)-scale regulatory domains, such as topologically associating domains (TADs), insulated neighborhoods and contact domains (Beagan and Phillips-Cremins, 2020; Ibrahim and Mundlos, 2020; Szabo et al., 2019; de Wit, 2020). These domains are separated by topological boundaries that favor communication between enhancers and genes located in the same domain while limiting interactions between genes and enhancers located in neighboring domains. However, on a smaller scale, additional regulatory layers control enhancer-gene communication and contribute to the establishment of precise and specific gene expression programs.
In this Review, we discuss the genetic and molecular principles that jointly regulate enhancer specificity, including TAD organization, enhancer-promoter compatibility, linear genomic distance, tethering elements and cell type-specific chromatin landscapes. Finally, we discuss how modifications in these different regulatory layers contributing to enhancer-gene communication can ultimately lead to human disease.
3D genome organization, regulatory domains and TADs
The emergence of chromatin conformation capture (3C)-related methods, especially Hi-C, has dramatically changed our understanding of 3D genome organization. Hi-C studies uncovered that vertebrate genomes are organized in large (sub-Mb scale) and dynamic self-interacting domains (e.g. TADs, contact domains and insulated neighborhoods) that can regulate enhancer function (Dixon et al., 2012; Dowen et al., 2014; Nora et al., 2012; Rao et al., 2014). For the sake of simplicity, we broadly refer to these domains as TADs. TAD boundaries restrict the search space of the enhancers to loci located within the same domain. Consequently, TADs might represent fundamental regulatory units that favor communication between genes and enhancers located in the same domain, while TAD boundaries act as insulators that prevent enhancers from contacting non-target genes located in other domains. TAD formation and function is mainly achieved by the combined action of the cohesin complex and CCCTC-binding factor (CTCF) through a process referred to as loop extrusion (Mirny and Solovei, 2021). Briefly, cohesin is loaded into chromatin and progressively extrudes a loop until it encounters a barrier or is released by the cohesion-unloading factor Wapl (Fudenberg et al., 2016; Haarhuis et al., 2017; Sanborn et al., 2015; Valton et al., 2021 preprint). In vertebrates, the most common cohesin barrier is formed by convergent CTCF-binding sites (CBSs) (de Wit et al., 2015; Rao et al., 2014). However, TSSs and transcription termination sites as well as other proteins, including RNA polymerase II, might also act as cohesin barriers (Banigan et al., 2022 preprint; Valton et al., 2021 preprint). Loop extrusion can favor the interaction of loci located within the same TAD (Dekker and Mirny, 2016; Mach et al., 2022 preprint). Nevertheless, TADs are highly dynamic structures (Gabriele et al., 2022; Mach et al., 2022 preprint) and, depending on boundary strength and chromatin composition, inter-TAD communication can also occur (Bonev et al., 2017; Paulsen et al., 2019; Szabo et al., 2018).
The relevance of TADs to enhancer function is supported by several observations. First, enhancers and their target genes usually lie within the same TAD (Symmons et al., 2014). For example, major developmental genes and their cognate enhancers frequently occupy large genomic regions, previously termed genomic regulatory blocks (GRBs), which show remarkable overlap with TADs (Akalin et al., 2009; Kikuta et al., 2007; Ruf et al., 2011). In addition, disruption of TAD boundaries by structural variants (Fig. 1A) or mutations in CBSs can result in enhancer adoption/hijacking and ectopic gains in gene expression (Franke et al., 2016; Hanssen et al., 2017; Lupiáñez et al., 2015). For example, removal of a CBS located at the α-globin locus leads to fusion between the α-globin-TAD and its neighbor TAD, the establishment of ectopic interactions between the α-globin enhancers and the neighboring TAD, and the upregulation of genes located in the extended TAD (Hanssen et al., 2017). Similarly, deletion of two CBSs located between the Prdm14 super-enhancer (SE) and Slco5a1 causes a fusion between these neighboring TADs and an increase in Slco5a1 expression in mESCs (Vos et al., 2021).
Nevertheless, TAD disruption does not always lead to enhancer rewiring. For example, only a minor set of genes show changes in their expression levels after most TADs are lost following CTCF or cohesin depletion (Nora et al., 2017; Rao et al., 2017). Similarly, highly rearranged chromosomes in Drosophila in which TAD organization is severely disrupted display minor gene expression changes (Ghavi-Helm et al., 2019). In mammals, deletion of a series of CBSs at the Sox9-Kcnj2 locus results in fusion of neighboring TADs, but without any major effects on gene expression (Despang et al., 2019). Furthermore, global analysis of how genes are distributed among TADs in the mouse limb, cortical neurons and ESCs revealed that only 12% of TADs contain a single gene (Ringel et al., 2021 preprint). Typically, these single-gene TADs contain developmental genes located within insulated regulatory landscapes as a mechanism that might ensure their specific expression (Wu et al., 2021). In contrast, the vast majority of TADs (88%) identified in these cell types contain on average 2.4 tissue-specific and 3.6 ubiquitously expressed genes (Ringel et al., 2021 preprint). Importantly, these multi-gene TADs can also contain major developmental genes and their cognate enhancers, together with additional genes (‘bystander’ genes) that do not seem to be regulated by those enhancers (Akalin et al., 2009; Kikuta et al., 2007) (Fig. 1B). Therefore, genes within a TAD are not always co-regulated, which strongly supports the existence of additional regulatory layers controlling the responsiveness between genes and enhancers and, thus, the establishment of specific gene expression profiles.
Promoter architecture and enhancer-gene compatibility
Eukaryotic promoters are composed of distinct core elements, including the TATA box, the initiator (Inr), the downstream promoter element (DPE), the downstream core element (DCE), the TFIIB-recognition element (BRE) and the motif ten element (MTE); these can appear in different combinations (reviewed by Haberle and Lenhard, 2016; Maston et al., 2006). A number of studies have examined how these various elements can influence enhancer-promoter specificity.
Since the original report showing that the 72 bp repeat of the SV40 remote viral sequence influences β-globin gene expression in a distance- and orientation-independent manner (Banerji et al., 1981), enhancers have been traditionally tested and validated according to their ability to influence transcription when placed immediately upstream of a minimal promoter using episomal vectors (i.e. reporter assays) (Banerji et al., 1983; Moreau et al., 1981; de Villiers et al., 1981). In addition, enhancer activity has been investigated using regulatory sensors consisting of a minimal promoter and a reporter gene that, once integrated into the genome, can detect the regulatory activity of nearby enhancers (Akhtar et al., 2014; Gierman et al., 2007; Ruf et al., 2011). Early studies using these two types of assays in Drosophila and mammals showed that, depending on their core elements, promoters can differ in their responsiveness to specific enhancers (Butler and Kadonaga, 2001; Merli et al., 1996; Ohtsuki et al., 1998; Simon et al., 1988; Wefald et al., 1990). In Drosophila, dpp enhancers regulate the expression of only dpp, despite being closer to other genes (i.e. Slh and oaf). Notably, replacing the oaf promoter with the dpp promoter makes oaf responsive to dpp enhancers (Merli et al., 1996). This model was further tested by generating Drosophila lines with random integrations of reporter genes containing a DPE- and TATA-motif promoter (Butler and Kadonaga, 2001). Selective excision of one of the two promoter motifs revealed that some enhancers are able to induce reporter gene expression only when they contain a specific promoter type.
More recently, the advent of next-generation sequencing has enabled the development of massive parallel reporter assays (MPRAs) in which the enhancer activity of libraries containing thousands of genomic fragments can be simultaneously measured (Arnold et al., 2013; Kheradpour et al., 2013; Neumayr et al., 2019). These libraries, which are tested in episomal plasmids (Arnold et al., 2013; Kheradpour et al., 2013; Neumayr et al., 2019) or randomly integrated in the genome (Dickel et al., 2014; Murtha et al., 2014), have revealed the striking selectivity of promoters as well as differences between Drosophila and mammalian promoters.
In Drosophila, one type of MPRA, referred to as self-transcribing active regulatory region sequencing (STARR-seq), was used to measure the enhancer activity of millions of non-coding sequences placed downstream of either housekeeping or developmental promoters (Zabidi et al., 2015). Remarkably, the identified enhancers showed clear preferences towards one of the two promoter classes. Furthermore, a variant of the STARR-seq method, termed self-transcribing active core promoter-sequencing (STAP-seq), determined a wide-range of responsiveness for thousands of Drosophila candidate core promoters tested against fixed enhancer sequences (Arnold et al., 2017). STAP-seq was also used to tether different co-activators to thousands of different Drosophila promoters (Haberle et al., 2019). In line with previous results, Drosophila promoters showed clear preferences for different co-activators, which could be connected to their highly variable enhancer responsiveness. Altogether, these MPRAs suggest that compatibility between promoters and enhancers in Drosophila is determined by both genetic and biochemical factors, which in turn can control the specific establishment of gene expression programs.
In the case of vertebrates, differences between core promoters were also observed, albeit not as clearly as in Drosophila. There are three main types of promoters in vertebrates that differ in the presence and density of sequences containing high frequency of CpG dinucleotides, i.e. CpG islands (CGIs) (Carninci et al., 2006; Saxonov et al., 2006) (Fig. 1C). Promoters associated with developmental genes contain large clusters of CGIs and are usually not associated with TATA elements (Lenhard et al., 2012). Promoters of housekeeping genes tend to have a single CGI and no TATA box, while tissue-specific promoters of genes expressed in terminally differentiated cells lack CGIs but are enriched in TATA elements (Lenhard et al., 2012). A recent MPRA combining different human promoters and enhancers in K562 cells showed that enhancers and promoters are broadly compatible, and combine their regulatory activities in a multiplicative manner (Bergman et al., 2022). Furthermore, this analysis showed that human housekeeping promoters are, in general, less responsive to enhancers, although this might be explained by the fact that core promoters already contain proximal regulatory elements that are intrinsically active (Bergman et al., 2022). On the other hand, context-specific/developmental promoters were shown to be generally more responsive to enhancers than housekeeping promoters, but without showing major preferences for any specific type of enhancer. Similarly, a recent MPRA study in which the activities of housekeeping and developmental core promoters inserted in different loci were measured in human K562 cell lines revealed that the intrinsic activity of different promoter classes is preserved across various genomic environments (Hong and Cohen, 2022). However, the genomic environment might scale the transcriptional strength of core promoters in a non-linear manner. That is, weak housekeeping or developmental core promoters might be more affected by the genomic environment than core promoters with strong transcriptional output (Hong and Cohen, 2022). Altogether, these results suggest that differences in enhancer-promoter compatibility are not as prevalent in human cells as in Drosophila. However, in another recent study using MPRAs in mouse ESCs (Martinez-Ara et al., 2022), the results were more similar to those previously reported in flies (Zabidi et al., 2015). Specifically, this approach combined all the distal regulatory elements (defined as DNAse I hypersensitive sites) and promoters from three different mouse loci and observed a wide range of compatibilities: some regulatory elements specifically activated or repressed a few promoters, whereas others were rather promiscuous (Martinez-Ara et al., 2022). The apparently contradictory results obtained in human, mouse and Drosophila cells might be explained by differences in experimental design (i.e. cell type, selection of candidate enhancers, the length of the promoter regions, etc.). For example, the lack of enhancer responsiveness reported for human housekeeping promoters is in stark contrast to previous findings in Drosophila. However, the core promoters used in the human MPRA were considerably longer than those used in the fly assays. In addition, most of the ‘housekeeping’ enhancers identified in flies overlap or are proximal (<200 bp) to TSSs. One possibility is that, at their endogenous loci, the ‘housekeeping’ enhancers described in Drosophila are part of extended housekeeping promoters that have been proposed for humans. Therefore, two major caveats of MPRAs are: (1) the inclusion of short promoter and enhancer sequences that do not contain the entire genetic information of the studied elements; and (2) investigation of the enhancers out of their endogenous genomic context, where the influence of the chromatin environment and the genetic distance between enhancers and promoters are neglected. Overall, MPRAs can offer highly valuable insights into enhancer function but, according to these recent and somehow contradictory results, they should be interpreted with caution.
Linear distance
Although enhancer function was initially considered to be distance independent, several studies have recently demonstrated that modifying the linear distance between enhancers and their target gene promoters can result in changes in gene expression (Fukaya et al., 2016; Pachano et al., 2021; Rinzema et al., 2021 preprint; Zuin et al., 2022). In Drosophila, positioning the sna enhancer at 6.5 or 9 kb from a reporter gene showed that transcription decreases as the linear enhancer-gene distance increases (Yokoshi et al., 2020). Notably, a recent study in which an enhancer was inserted around a reporter gene into more than 100 different genomic locations spanning ∼300 kb strongly suggests that transcription decreases with increasing genomic distances (Zuin et al., 2022). Moreover, modeling of the resulting data suggests that the relationship between genetic distance, contact probability and transcription is non-linear (Zuin et al., 2022), which might be explained by a promoter being able to remain in an active transcriptional state for periods longer than the duration of its contact with an enhancer (i.e. promoter hysteresis) (Xiao et al., 2021; Zuin et al., 2022). Similarly, mobilization of the β-globin locus control region (LCR) with respect to a reporter gene containing the HBG1 promoter also showed that the linear distance between genes and enhancers inversely correlates with transcription (Rinzema et al., 2021 preprint). In addition, studies of the endogenous β-globin locus further illustrate the preference of the LCR for proximally located genes (Hanscombe et al., 1991; Peterson and Stamatoyannopoulos, 1993). Briefly, human β-globin genes are arranged in a 5′-ε-Gγ-Aγ-δ-β-3′ orientation downstream of the LCR. The LCR regulates the temporal expression of the globin genes during the embryonic (ε-globin), fetal (γ-globin) and adult (β-globin) stages (Hardison et al., 1997). The generation of transgenic animals in which the β-globin gene was inserted closer to the LCR led to higher gene expression (Dillon et al., 1997). Accordingly, an inversion switching the relative position of the β-globin genes with respect to the LCR resulted in preferential activation of the gene located more proximal to the LCR (Tanimoto et al., 1999). More recently, a tissue-specific gene (Gria1) was shown to be responsive to a developmental enhancer when both were placed in close proximity, but not when the same enhancer was placed 100 kb upstream of the gene within the same TAD (Pachano et al., 2021). In contrast, the activation of Shh in mouse ESCs through the forced recruitment of TFs to different enhancers is not significantly affected by linear distance (Kane et al., 2021 preprint). However, communication between the more-distal enhancers and the Shh promoter might be influenced by CTCF-mediated interactions, which in turn could buffer, in a cohesin-dependent manner, the negative effect that linear distance has on gene expression (Calderon et al., 2022; Kane et al., 2021 preprint; Rinzema et al., 2021 preprint). Interestingly, motif analyses suggest that tissue-specific genes with CpG-poor promoters, such as the β-globin genes and Gria1, often have their key regulatory elements in promoter-proximal regions (Roider et al., 2009), while developmental genes with CpG-rich promoters (e.g. Shh) are frequently regulated by distal enhancers (Lenhard et al., 2012).
Together with observations using MPRAs, these results indicate that: (1) housekeeping genes show a generally low responsiveness to enhancers, regardless of distance, because their regulatory elements are embedded within or close to their promoter regions (Bergman et al., 2022); (2) tissue-specific genes with CpG-poor promoters show high responsiveness to proximal but not to distal enhancers, probably reflecting that, endogenously, these promoters are typically regulated by proximal cis-regulatory elements (Roider et al., 2009); and (3) developmental genes, which in vertebrates have large CGI clusters in their promoter regions, are often regulated by and are highly responsive to distal enhancers, sometimes separated by >1 Mb (Benko et al., 2009; Lettice et al., 2002; Lim et al., 2018; Long et al., 2020) (Figs 1C and 2). Therefore, by placing genes and enhancers further away from each other within the same TAD, it might be possible to specifically activate some genes (e.g. developmental genes with CpG-rich promoters) but not others (e.g. housekeeping genes, tissue-specific genes with CpG-poor promoters) (Akalin et al., 2009; Engström et al., 2007; Kikuta et al., 2007).
Factors that regulate enhancer-gene communication in 3D space
There are multiple examples of enhancers that can specifically control their target genes over long distances (Benko et al., 2009; Lettice et al., 2002; Long et al., 2020) or even on different chromosomes (Chang et al., 2004; Lim et al., 2018; Sanyal et al., 2012). Therefore, although genomic distance might contribute to enhancer-promoter specificity, there are additional mechanisms that enable distal enhancers to specifically encounter their target genes and execute their long-range regulatory function (Batut et al., 2022; Kane et al., 2021 preprint; Rinzema et al., 2021 preprint).
Tethering elements and GAGA
Tethering elements are short DNA sequences found in promoters and/or enhancers that can bring both of these elements into 3D proximity. Tethering elements have been described in Drosophila and mammals, although they might differ across species in their genetic composition and the proteins that mediate their function (Batut et al., 2022; Pachano et al., 2021).
In Drosophila, tethering elements were first described for the Scr-ftz locus (Calhoun et al., 2002). This region contains the T1 enhancer, which specifically activates Scr even though it is located closer to ftz. The specific activation of Scr can be explained by a 450 bp region located next to the Scr TSS that was suggested to specifically bring the T1 enhancer in close proximity to the Scr promoter. Similarly, the IAB5 Drosophila enhancer, which is located equidistant from the abd-A and Abd-B genes, specifically interacts with Abd-B and not with abd-A (Akbari et al., 2007). The preferential physical association between IAB5 and Abd-B can be explained by the presence of a short 255 bp tethering element at the Abd-B TSS. Deletion of this element directs the IAB5 enhancer to the abd-A promoter and induces its expression. Similarly, in a transgenic configuration in which the IAB5 enhancer is placed between two genes, both genes are induced (Akbari et al., 2007). However, placing the tethering element adjacent to the 3′ located gene results in preferential expression of this gene and no expression of the 5′ gene. These tethering elements frequently contain binding sites for the transcription factor GAGA (Gaf/Trl) (Mahmoudi et al., 2002). Once bound to both promoters and enhancers, GAGA can bring them into physical proximity due to its oligomerization capacity, thus facilitating enhancer function and gene activation (Mahmoudi et al., 2002). By mediating long-range enhancer-gene communication, these tethering elements enable the precise temporal transcription of developmental genes during early Drosophila embryogenesis (Batut et al., 2022). Interestingly, tethering elements are genetically and functionally distinct from TAD boundaries, the main function of which in Drosophila might be to insulate genes from spurious interactions with enhancers and silencers located in neighboring TADs (Batut et al., 2022).
CTCF-binding sites
Besides their roles in TAD boundaries and the insulation of regulatory landscapes, CBSs also occur at multiple mouse gene promoters, where they can facilitate physical interactions with distal enhancers (Kubo et al., 2021). For example, deletion of a CBS at the Vcan promoter impairs interactions with a distal enhancer and reduces its expression levels (Kubo et al., 2021). Vcan expression can be restored via the artificial tethering of CTCF to its promoter. In addition, deletion of the human TFF1 gene promoter can ‘release’ its cognate enhancer, which can then ectopically activate another gene from the same TAD that also contains a CBS on its promoter (Fig. 3A). Interestingly, the TFF1 promoter contains a CBS that, when disrupted, reproduces the effect of deleting the entire promoter (Oh et al., 2021). It is currently unclear whether this potential role of CTCF as a tethering element that facilitates enhancer function requires direct binding to both enhancers and promoters or whether binding to one of the regulatory elements is sufficient (Kubo et al., 2021; Oh et al., 2021). Recent work suggests that convergent CBSs flanking enhancer-gene pairs facilitate communication between genes and distal enhancers by favoring the formation of small contact domains (Rinzema et al., 2021 preprint) (Fig. 3B). In contrast, in the mouse Sox2 locus, CTCF binding to either the Sox2 promoter or a distal enhancer is dispensable for their interaction (Chakraborty et al., 2022 preprint; Taylor et al., 2022 preprint). It is important to note that, in the context of promoter regions, CTCF can function not only as a tethering element but also as an insulator and as a conventional transcriptional activator (Chernukhin et al., 2007; Paredes et al., 2013; Peña-Hernández et al., 2015), thus complicating the dissection of CTCF function in enhancer-driven gene expression.
CpG islands
In mammalian cells, a recent study identified a novel role for CGIs as tethering elements that facilitate the physical and functional communication between developmental genes and CpG-rich enhancers (also known as poised enhancers) (Pachano et al., 2021) (Fig. 3C). In this context, CGIs must be present at both promoters and enhancers to function as tethering elements. This results in homotypic intra-TAD interactions that enable enhancers to preferentially activate developmental genes (i.e. genes with CGI clusters in their promoters). Interestingly, the promoter from a CGI-poor gene (i.e. a tissue-specific gene) appears to be responsive to the CpG-rich enhancer when this is placed proximally but not distally (Pachano et al., 2021). This suggests that CGIs, and perhaps tethering elements in general, do not regulate the intrinsic compatibility between promoters and enhancers, but rather facilitate the long-range communication between enhancers and their target genes. As a corollary of this, tethering elements might not be necessary for proximal enhancers and their regulatory function might not be detectable using reporter assays.
Mechanistically, the interactions between CpG-rich enhancers and their target genes in pluripotent cells are likely to be mediated by Polycomb group (PcG) complexes, which are recruited to CGIs present at both enhancers and target promoters (Crispatzu et al., 2021; Cruz-Molina et al., 2017; Pachano et al., 2021). PcG complexes, particularly PRC1, can play important architectural roles and mediate long-range interactions (Blackledge and Klose, 2021; Cheutin and Cavalli, 2014; Entrevan et al., 2016; Pachano et al., 2019). It is currently unclear whether PcG complexes can still contribute to enhancer-gene communication once the CpG-rich enhancers and their target genes become active (Loubiere et al., 2020; Zhang et al., 2021) or whether, alternatively, other CGI-bound proteins (e.g. TrG) execute the tethering function in the active state (Xiang et al., 2020). Furthermore, regardless of whether enhancers contain CGI or not, promoters with large CGI clusters seem to be more responsive to distal enhancers than to CpG-poor promoters (Kraft et al., 2019; Pachano et al., 2021). In this case, the promoter CGIs might not act as tethering elements but rather confer a permissive chromatin state (e.g. low nucleosome occupancy and DNA hypomethylation) that increases enhancer responsiveness. Overall, mammalian CGIs and Drosophila tethering elements show remarkable functional similarities, and future work should elucidate whether this is achieved through either similar or distinct molecular mechanisms.
Transcription factors
There are several TFs that, as part of their trans-activating function, also contribute to the establishment of enhancer-gene contacts. One classical example is Sp1, which in mammals can bind to both enhancers and promoters, and, due to its oligomerization capacity, bring them into physical proximity (Mastrangelo et al., 1991). Although Sp1 can favor enhancer-gene communication over short distances (<2 kb), it is currently unknown whether it can also assist long-range enhancers. Another example of a TF intervening in enhancer-promoter looping is YY1 (Beagan et al., 2017; Weintraub et al., 2017): a zinc-finger protein that binds to active enhancers and promoters in mouse ESCs, and facilitates their physical and functional communication. Deletion of YY1-binding sites or degradation of YY1 leads to a loss of interactions between genes and enhancers and, consequently, to impaired gene expression. Moreover, the artificial tethering of YY1 to the Etv4 promoter bearing a mutation in a YY1-binding site rescues enhancer-promoter contacts and Etv4 expression (Weintraub et al., 2017). Another important group of TFs contributing to enhancer-gene communication in mammalian cells are the LIM-domain TFs, such as GATA1, LHX3 or ISL1. Previous work has shown that GATA1 binds to both the LCR and β-globin promoter in erythroid cells (Song et al., 2007; Tripic et al., 2009). Like other LIM-domain TFs, GATA1 interacts with LIM domain-binding protein 1 (LDB1), which mediates long-range interactions between the LCR and the β-globin gene due to its ability to form dimers through its SA domain (Song et al., 2007; Tripic et al., 2009). In support of the tethering role of LDB1, targeting LDB1 to the β-globin promoter leads to interactions with endogenous LDB1 present at the LCR and to the formation of a chromatin loop (Deng et al., 2012). Notably, this forced loop is sufficient to trigger β-globin gene induction in immature murine erythroblasts. Similarly, forced tethering of LDB1 or its SA domain to an otherwise silenced fetal γ-globin promoter was shown to activate gene expression in human primary adult erythroblasts (Bartman et al., 2016; Deng et al., 2014). Moreover, Chip, a Drosophila homolog of LDB1, also interacts with LIM domain-containing proteins and has been proposed to support enhancer-promoter interactions (Morcillo et al., 1997).
Other TFs might also contribute to the specific communication between genes and enhancers through the establishment of homotypic interactions (Bouwman and De Laat, 2015; Denholtz et al., 2013; Giammartino et al., 2019). In mouse ESCs, regions bound by pluripotency TFs (i.e. KLF4, OCT4, SOX2 and NANOG) interact with each other and form distinct nuclear hubs (Bouwman and De Laat, 2015; Denholtz et al., 2013). However, these TFs seem to be necessary but not sufficient for long-range enhancer-gene interactions (Giammartino et al., 2019; Wei et al., 2013). This suggests that most TFs might recruit tethering (e.g. LDB1) or architectural (e.g. cohesin) proteins, which can then mediate specific communication between genes and enhancers (Dowen et al., 2014; Giammartino et al., 2019).
Identifying the genomic sequences that act as tethering elements and the proteins that are bound to them will be a key challenge in understanding how enhancers and genes can specifically and functionally communicate with each other (Fig. 3D). It is likely that additional tethering elements and their associated proteins are yet to be discovered. Moreover, tethering elements might also operate in a combinatorial manner, such that the communication between certain enhancer-gene pairs might involve more than one type of tethering element in order to increase specificity and/or gene expression levels.
Cell type-specific chromatin landscapes
Cell type-specific changes in chromatin modifications can, in principle, modulate the activity and function of the regulatory elements described above (i.e. promoters, enhancers and tethering elements). Assessing how such chromatin landscapes affect regulatory elements, however, is technically challenging. One limitation of episomal MPRAs, for example, is that they do not recapitulate the chromatin context in which regulatory elements function endogenously. Transgenic reporter assays, such as those frequently used in mouse or zebrafish embryos, can mitigate some of the limitations of episomal MPRAs by integrating the reporter constructs (i.e. enhancer, minimal promoter and reporter gene) into the genome. Using this type of reporter assay, the activity of the Pen enhancer, which is required for proper Pitx1 expression in the hindlimb, was investigated in mouse embryos (Kragesteen et al., 2018). Interestingly, the Pen enhancer was shown to be able to drive reporter expression not only in the hindlimb but also in the forelimb, where Pitx1 is not expressed. However, in the hindlimb, Pitx1 is in close physical proximity to the Pen enhancer and thus expressed, whereas, in the forelimb, Pitx1 is inactive, far from the Pen enhancer and associated with Neurog1. Both Neurog1 and Pitx1 are inactive in the forelimb and are marked with PcG, which can bring these two genes into physical proximity and create a chromatin environment refractory to Pen enhancer function (Kragesteen et al., 2018). This and other studies in which transgenic reporter assays were used illustrate how the chromatin environment can affect the cis-regulatory activity of enhancers (Akhtar et al., 2014; Ruf et al., 2011; Symmons et al., 2014). However, this advantage over MPRAs can also be a limitation, as enhancer responsiveness of the reporter gene can be severely affected by the genomic location of the integrations sites. These position effects in transgenic reporter assays can be mitigated by using safe harbor systems (Kvon et al., 2020).
The importance of chromatin context in enhancer-gene communication is also supported by a recent study showing that the divergent expression of Fat1 and Rex1, two genes located in the same TAD, involves cell type-specific chromatin changes that affect TAD organization and DNA methylation patterns (Ringel et al., 2021 preprint). In ESCs, both Fat1 and Rex1 are expressed, enabled by a TAD configuration in which these genes specifically interact with their cognate proximal enhancers. As differentiation progresses, Fat1 remains expressed while Rex1 is silenced. Remarkably, in differentiated cells, Fat1 is regulated by enhancers that are more proximal to Rex1 than to Fat1. The loss of Rex1 responsiveness to the Fat1 enhancers might be caused by DNA hypermethylation of its promoter region (Ringel et al., 2021 preprint). Interestingly, the Rex1 promoter belongs to the CpG-poor category, which might render it more susceptible to DNA hypermethylation in comparison with the CpG-rich Fat1 promoter (Blackledge and Klose, 2011; Deaton and Bird, 2011; Long et al., 2016).
Cell type-specific changes in DNA methylation patterns might also underlie the observed variability in TAD organization across cell types (McArthur and Capra, 2021), which seems more pronounced than originally proposed (Dixon et al., 2012). This variability in TAD organization could be caused by cell-type specific changes in CTCF occupancy, as the binding of this TF to its target sequences is sensitive to DNA methylation levels (Beagan et al., 2017; Wang et al., 2012).
The medical relevance of enhancer-gene specificity
Human disorders caused by enhancer malfunction are often referred to as enhanceropathies and can involve structural variants, single nucleotide polymorphisms and epigenetic alterations (Claringbould and Zaugg, 2021; Lupiáñez et al., 2016; Rickels and Shilatifard, 2018; Smith and Shilatifard, 2014). Although some enhanceropathies are caused by the direct disruption of enhancers (Haro et al., 2021; Long et al., 2020), others involve either gains (i.e. enhancer adoption/hijacking) or losses (i.e. enhancer disconnection) in long-range enhancer-gene communication (Franke et al., 2016; Gröschel et al., 2014; Laugsch et al., 2019; Lupiáñez et al., 2015).
Structural variants
Structural variants (SVs) include deletions, inversions, duplications, translocations and insertions. Deletions can result in either the direct loss of enhancers or changes in TAD structure. For example, the Pierre-Robert sequence craniofacial disorder is caused by deletions that eliminate enhancers located within the SOX9 TAD that are required for proper SOX9 expression in neural crest cells (Amarillo et al., 2013; Gordon et al., 2009; Long et al., 2020). Deletions spanning TAD boundaries can lead to the fusion of neighboring TADs, which in turn can lead to ectopic enhancer-gene interactions (i.e. enhancer adoption/hijacking) and, consequently, to pathological gains in gene expression without directly perturbing enhancer sequences (Franke et al., 2016; Gröschel et al., 2014; Lupiáñez et al., 2015) (Fig. 4A). Deletions can also affect tethering elements, as reported for a 12 kb homozygous deletion encompassing three CBSs located downstream of the limb-specific ZRS enhancer, within one of the Lmbr1 introns (Ushiki et al., 2021). This deletion impairs communication between the Shh promoter and the ZRS enhancer, which reduces Shh expression in the limb and leads to a congenital limb truncation termed acheiropodia (Ushiki et al., 2021). On the other hand, inversions spanning TAD boundaries can place enhancers away from their target genes (i.e. enhancer disconnection) and/or in proximity of novel genes (i.e. enhancer adoption), which can result in either losses or gains in gene expression, respectively (Laugsch et al., 2019; Lupiáñez et al., 2016; Smith and Shilatifard, 2014; Spielmann et al., 2018) (Fig. 4A). For example, an inversion found in individuals with limb abnormalities relocates EPHA4 enhancers into the WNT6 TAD (Lupiáñez et al., 2015). Consequently, this leads to the ectopic and pathological expression of WNT6 in the limb. A recent study reported an individual with branchio-oculo-facial syndrome (BOFS) with a long heterozygous inversion that physically disconnects TFAP2A from its cognate neural crest enhancers (Laugsch et al., 2019). Notably, this results in the monoallelic and haploinsufficient expression of TFAP2A expression in the neural crest cells of the patient. Furthermore, the inversion also places novel genes in proximity to TFAP2A neural crest enhancers within a ‘shuffled’ TAD (Fig. 4A). However, none of these genes are responsive to the TFAP2A enhancers and enhancer adoption does not occur within this particular genomic context. Interestingly, although TFAP2A is a typical developmental gene with a large CGI cluster in its promoter region, none of the genes placed in the proximity of the neural crest enhancers due to the BOFS inversion display this type of promoter (Laugsch et al., 2019; Pachano et al., 2021). Thus, SVs that place enhancers and genes within the same TAD might not always lead to aberrant gene expression. In these cases, additional factors controlling enhancer-gene compatibility should be taken into consideration to better predict pathological changes in gene expression.
Deletions of promoter sequences can also lead to oncogenic gains in gene expression through long-range regulatory mechanisms (Cho et al., 2018; Oh et al., 2021). In this case, deletions can impair physical communication between enhancers and their endogenous target genes (Cho et al., 2018; Oh et al., 2021), which in turn enables the enhancers to interact with and activate neighboring oncogenes located within the same TAD. For example, in the absence of the ɑ-globin gene, its cognate LCR makes contact with and induces the expression of an unrelated gene, NME4, which is located ∼300 kb away (Lower et al., 2009). Interestingly, there are up to five additional genes between the ɑ-globin LCR and NME4, the expression of which is not affected by ɑ-globin deletion, further highlighting the importance of factors contributing to enhancer-gene specificity. Moreover, the previous examples illustrate how active promoters can also act as insulators that restrict enhancer activity, which might involve the formation of topological boundaries (Dixon et al., 2012) and/or promoter competition (Calhoun et al., 2002).
Single nucleotide polymorphisms
Genome-wide association studies (GWAS) have revealed that most single nucleotide polymorphisms (SNPs) associated with complex diseases occur within non-coding regulatory regions (Maurano et al., 2012). These SNPs often disrupt TF-binding sites within promoters or enhancer elements, resulting in quantitative changes in gene expression that can be either detrimental or beneficial (Rockman and Kruglyak, 2006). Below, we summarize some of these examples and direct the reader to excellent reviews on this topic for additional data (Carullo and Day, 2019; Claringbould and Zaugg, 2021; Rickels and Shilatifard, 2018; Smith and Shilatifard, 2014).
Individuals with hereditary persistence of fetal hemoglobin (HPFH) carry a SNP that reduces the symptoms of β-thalassemia and sickle cell anemia (Wilber et al., 2011). During development, a postnatal switch in the affinity of the LCR from the γ-globin promoter towards the β-globin promoter results in the downregulation of fetal hemoglobin in adulthood (Wilber et al., 2011) (Fig. 4B). However, individuals with HPFH show higher levels of fetal hemoglobin in adulthood because of an SNP in the γ-globin promoter that increases its affinity for the LCR (Fig. 4B). Consequently, fetal hemoglobin expression persists during adulthood, while the β-globin genes become downregulated (Wienert et al., 2017). One of the SNPs associated with HPFH (A198 T>C) generates a new binding site for the erythroid-specific TF KLF1 (Martyn et al., 2019; Wienert et al., 2015). Other β-thalassemia cases are caused by a SNP found within a non-coding region at the α-globin locus. This SNP, which is located between the α-globin genes and the LCR, creates a new promoter that interferes with α-globin transcription (Bozhilov et al., 2021). The postnatal switch from γ-globin expression towards β-globin expression is also regulated by BCL11A, a TF that represses γ-globin expression. Individuals with a SNP located at the BCL11A enhancer do not express BCL11A in adulthood and present HPFH (Bauer et al., 2015). Interestingly, CRISPR-Cas9 targeting of the BCL11A enhancer in individuals with β-thalassemia and sickle cell anemia resulted in an increase in fetal hemoglobin and reduced morbidity and mortality (Frangoul et al., 2020).
A SNP in the HNE-F2 enhancer, located at the Sox9 TAD, is implicated in the etiology of the Pierre Robin sequence condition (Benko et al., 2009). This SNP disrupts a binding site for MSX1 within the HNE-F2 enhancer and downregulates Sox9 expression in the mandibular and maxillary mesenchymal tissues (Benko et al., 2009). Several SNPs within the limb-specific ZRS enhancer in humans and other vertebrates have been shown to drive ectopic Shh in the anterior limb mesenchyme, causing a polydactyly phenotype (Hill and Lettice, 2013; Lettice et al., 2003; VanderMeer and Ahituv, 2011). Finally, GWAS revealed that almost all SNPs associated with Alzheimer's disease (AD) lie within intronic or intergenic regions (Kikuchi et al., 2019). However, only 30% of these SNPs map to enhancers, highlighting the possibility that these SNPs alter the function of other types of non-coding regulatory elements. Interestingly, some of these SNPs (e.g. rs1476679 and rs1990620) overlap with CBSs and can affect CTCF recruitment, the formation of chromatin loops and the expression of genes associated with AD (Gallagher et al., 2017; Kikuchi et al., 2019).
Epigenetic alterations
DNA methylation profiles are frequently altered in cancer, which can disrupt the binding of CTCF to its cognate elements (Bell and Felsenfeld, 2000; Hark et al., 2000; Liu et al., 2016). For example, recent studies indicate that DNA hypermethylation in different cancer types can block the recruitment of CTCF to key insulators, resulting in enhancer adoption and oncogenic gains in gene expression (Flavahan et al., 2016; 2019) (Fig. 4C).
Altogether, the above examples illustrate the prevalence of human enhanceropathies and the complexity of their associated pathomechanisms. Therefore, revealing the regulatory factors that control the specific communication between genes and enhancers is an essential step towards a complete and predictive understanding of human disease.
Conclusions
During the past few decades, the field of enhancer biology has vastly expanded through the identification of enhancers and their association with target genes. Despite this tremendous progress, it is still unclear how enhancers communicate specifically with their target genes. Recent MPRAs and genomic engineering approaches are starting to reveal a number of complex and diverse mechanisms that control enhancer specificity. While TAD structures might favor specific communication between genes and their enhancers, other regulatory layers further refine these interactions. One of the most striking examples is the observed lack of compatibility between developmental and housekeeping regulatory elements in Drosophila (Haberle et al., 2019; Zabidi et al., 2015; Zabidi and Stark, 2016). However, these compatibility rules do not seem to be as clear cut in mammalian cells (Bergman et al., 2022; Martinez-Ara et al., 2022), with chromatin context, genetic distance and tethering elements representing additional and important regulators of enhancer specificity.
In the case of tethering elements, recent reports have discovered novel roles for genetic elements (e.g. CGIs and GAGA elements) and proteins (e.g. CTCFs, LDB1 and YY1) that favor specific communication between enhancers and their target genes (Calhoun et al., 2002; Giammartino et al., 2019; Kubo et al., 2021; Oh et al., 2021; Pachano et al., 2021; Weintraub et al., 2017). Similar to its role in TAD boundary formation and enhancer insulation, CTCF tethering function is likely connected to its capacity to block cohesin and loop extrusion. However, the extent to which CTCF is required for enhancer-gene communication is still unclear, as either the deletion of CBSs or the depletion of CTCF does not always affect the topology or the expression of genes containing CBSs at their promoters (Chakraborty et al., 2022 preprint; Despang et al., 2019; Taylor et al., 2022 preprint). Moreover, the molecular mechanisms by which other types of tethering elements (e.g. CGIs, GAGA-elements) control enhancer specificity and function are still poorly understood. It is currently unclear whether the regulatory function of tethering elements is cohesion dependent and generally involves loop extrusion or whether alternative mechanisms (e.g. polymerization and liquid-liquid phase separation) might be also involved. Therefore, a future goal should be to identify the full repertoire of tethering elements that contribute to enhancer function and to compare their mechanisms of action.
More broadly, there are still fundamental questions regarding the mechanisms of enhancer function that remain to be solved. Uncovering the biochemical processes whereby enhancers activate their target gene might help us to better understand the compatibility rules between enhancers and promoters, and how tethering elements work. We anticipate that understanding the principles that regulate enhancer specificity will help us to better predict the effect of genetic variation in human diseases. As illustrated by the examples discussed above, disease-causing genetic variants do not necessarily disrupt enhancer or gene sequences, but instead can affect the ability of enhancers to specifically activate their target genes. Therefore, elucidating the molecular and genetic principles controlling enhancer specificity will certainly help us to uncover and predict the etiological basis of human disease.
Acknowledgements
We thank all the Rada-Iglesias lab members for useful discussions and ideas while preparing this Review.
Footnotes
Funding
Work in the Rada-Iglesias laboratory is funded by the Ministerio de Ciencia e Innovación and the Agencia Española de Investigación; by the European Regional Development Fund (PGC2018-095301-B-I00 and RED2018-102553-T); by the European Research Council (862022); and by the European Commission (H2020-MSCA-ITN-2019-860002).
References
Competing interests
The authors declare no competing or financial interests.