ABSTRACT
DNA methylation is a highly conserved epigenetic modification that plays essential roles in mammalian gene regulation, genome stability and development. Despite being primarily considered a stable and heritable epigenetic silencing mechanism at heterochromatic and repetitive regions, whole genome methylome analysis reveals that DNA methylation can be highly cell-type specific and dynamic within proximal and distal gene regulatory elements during early embryonic development, stem cell differentiation and reprogramming, and tissue maturation. In this Review, we focus on the mechanisms and functions of regulated DNA methylation and demethylation, highlighting how these dynamics, together with crosstalk between DNA methylation and histone modifications at distinct regulatory regions, contribute to mammalian development and tissue maturation. We also discuss how recent technological advances in single-cell and long-read methylome sequencing, along with targeted epigenome-editing, are enabling unprecedented high-resolution and mechanistic dissection of DNA methylome dynamics.
Introduction
DNA methylation is an epigenetic modification with profound roles in genomic stability, gene regulation, mammalian development and human disease (Bird, 2002; Greenberg and Bourc'his, 2019). On eukaryotic DNA, such chemical modification primarily occurs on the fifth carbon base of cytosines (5-methylcytosine; 5mC), usually in the context of symmetrical cytosine-guanine (CpG) dinucleotides. New 5mC patterns are initially established by the activity of the de novo DNA methyltransferases DNMT3A and DNMT3B (Okano et al., 1999) (Fig. 1A). Upon cell division, patterns of symmetric 5mCpGs are then faithfully inherited onto daughter strands by the maintenance methyltransferase DNMT1 through its interaction with ubiquitin-like, containing PHD and RING finger domains 1 (UHRF1), which selectively recognizes hemi-methylated CpGs on the parental strand (Bostick et al., 2007) (Fig. 1A).
Given its widely accepted role in the transcriptional repression of gene promoters, DNA methylation was originally considered a stable epigenetic modification primarily involved in maintaining long-term epigenetic memory in the classical phenomena of X-chromosome inactivation (XCI) and genomic imprinting (Bird, 2002). However, whole-genome high-resolution DNA methylome analysis reveals widespread tissue- and cell-type specific 5mC patterns, which also exhibit temporal dynamics during mammalian development and tissue maturation (Cedar et al., 2022; Luo et al., 2018a). Thus, understanding the cell type-specific patterning, temporal dynamics and functional effects of these DNA modifications at various cis-regulatory elements is crucial for revealing the mechanisms underlying lineage-specific gene expression in development and disease.
Recently, emerging single-cell DNA methylome sequencing technologies have begun characterizing dynamic methylation events across highly heterogeneous cellular populations, providing insights into how they may regulate gene transcription during cell state transitions. Furthermore, the advent of genetic tools, such as unbiased forward genetic screening or CRISPR-based epigenome editing, affords unparalleled opportunities to identify novel regulators of DNA methylation and understand the functional consequences of locus-specific DNA methylation. Here, we review recent insights into the patterning, mechanisms and functions of DNA methylome in mammalian development and stem cell differentiation. We discuss the utility of new technologies, such as single-cell multi-omics, non-destructive long-read methylome sequencing and epigenome editing tools, in dissecting the mechanistic relationship between methylation dynamics and gene regulation.
Mechanisms of de novo and maintenance DNA methylation
De novo DNA methylation
In mammals, two major de novo DNA methylation enzymes, DNMT3A and DNMT3B, with a highly conserved methyltransferase (MTase) domain in the carboxy terminus and two chromatin-binding domains (PWWP and ADD) are responsible for establishing new DNA methylation patterns across the genome (Fig. 1B) (Okano et al., 1999). DNMT3L, a catalytically inactive DNMT, stimulates the catalytic activity of DNMT3 enzymes specifically in germ cells (Bourc'his et al., 2001). Histone H3 lysine 4 (H3K4) modifications play an important role in regulating DNA methylation pattern establishment (Fig. 1C). The ADD domain of DNMT3 is repelled by increasing levels of H3K4 methylation (with H3K4me3 being the most inhibitory). The specific interaction between the ADD domain and unmodified H3K4 is required to relieve its auto-inhibitory binding to the MTase domain and promote de novo DNA methylation. Thus, in the presence of H3K4 methylation, the ADD domain binds the MTase domain and auto-inhibits DNMT catalytic activity. In addition, the PWWP domain binds to H3K36me3, a histone modification that specifically marks actively transcribed gene bodies (Fig. 1C). Recent studies suggest that polycomb repression complex 1 (PRC1)-deposited monoubiquitylated histone H2A lysine 119 (H2AK119ub) interacts with a ubiquitin-binding motif in the amino-terminal of full-length DNMT3A1, thereby targeting DNMT3 enzymes to PRC1-regulated regions (Gu et al., 2022; Weinberg et al., 2021).
Maintenance of DNA methylation
Upon cell division, symmetrical CpG methylation is maintained by DNMT1 in conjunction with its obligated multi-domain partner, UHRF1, a E3 ubiquitin-protein ligase. UHRF1 specifically binds hemi-methylated CpGs at replication loci through its SRA domain (Fig. 1D), while its Tudor domain interacts with H3K9me2/3. UHRF1 recruits DNMT1 through the interaction between its ubiquitin-like (UBL) domain and the RFTS domain of DNMT1. The RFTS auto-inhibits MTase activity in the absence of UBL interaction. UBL alters the DNMT1 auto-inhibitory conformation by releasing RFTS to bind ubiquitinated histone H3 tails modified by the UHRF1 RING finger domain (Fig. 1D), thereby promoting DNMT1 to methylate the nascent daughter strand and maintain CpG methylation patterns. DNMT1 activity or UHRF1 recruitment to the replicating fork can be further stimulated by the ATPase-dependent chromatin remodeler, LSH (also known as HELLS) (Han et al., 2020; Ming et al., 2020).
Mechanisms of global and targeted DNA demethylation
Dynamic DNA methylation regulation requires genome-wide or locus-specific 5mC removal. This can occur via passive replication-dependent loss of 5mC when DNA maintenance machinery is impaired (Fig. 1A) or enzymatic 5mC oxidation-dependent mechanisms (Fig. 2A) (Pastor et al., 2013; Wu and Zhang, 2014).
Global DNA demethylation via passive replication-dependent dilution
Although global methylation patterns are stably maintained in somatic cells, genome-wide loss of 5mC occurs following fertilization and during germ cell specification. Global 5mC erasure generally happens when functional DNA methylation maintenance machinery is impaired, precluding reestablishment of 5mCG patterning on nascent strands (Rougier et al., 1998) (Fig. 1A). Such global DNA demethylation is important for equalizing the differences between paternal and maternal methylomes in pre-implantation embryos, and for erasing parental-origin specific imprints in developing germ cells (discussed below). Specifically, UHRF1 levels in the nucleus are negatively regulated by DPPA3 (also known as PGC7 or STELLA) interaction (Du et al., 2019; Li et al., 2018a), which is specifically expressed in germ cells and pre-implantation embryos. As a conserved mechanism across mammals, DPPA3 can facilitate passive DNA demethylation through impairing UHRF1 chromatin binding and DNMT1-dependent maintenance methylation (Mulholland et al., 2020). Oxidized forms of 5mC (see below) may also impede DNA methylation maintenance (Hashimoto et al., 2012; Ji et al., 2014), thereby facilitating global passive DNA demethylation.
Active DNA demethylation via iterative oxidation of 5mC by TET proteins
In flowering plants (e.g. Arabidopsis thaliana), 5mC is directly excised by the DEMETER (DME)/REPRESSOR OF SILENCING 1 (ROS1) family of DNA glycosylases coupled with base excision repair (BER) to regenerate unmodified C (Jang et al., 2014; Zhu, 2009). Orthologs of DME/ROS1 enzymes have not been identified in animals, but mammalian cells can achieve enzymatic erasure of 5mC through a more complex pathway. First, the Ten-eleven translocation (TET) family of dioxygenases iteratively oxidize 5mC to 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) (He et al., 2011; Ito et al., 2010, 2011; Kriaucionis and Heintz, 2009; Tahiliani et al., 2009). Active DNA demethylation can then occur by replication-dependent dilution of oxidized 5mCs (ox-mCs) (Inoue et al., 2011; Wu and Zhang, 2014), or via thymine DNA glycosylase (TDG)-mediated excision repair of highly oxidized 5mC bases (5fC and 5caC) (He et al., 2011; Ito et al., 2011; Shen et al., 2013), to ultimately restore unmodified C (Fig. 2A,B). Thus, dynamic regulation of DNA methylation can occur in the mammalian genome through a stepwise enzymatic cascade comprised of cytosine methylation, iterative oxidation and excision repair (Fig. 2A). Specific TET proteins are expressed during global 5mC removal in pre-implantation embryos and developing germ cells, suggesting that the passive (impaired DNMT1/UHRF1) and active (TET-dependent) DNA demethylation mechanisms may work synergistically to rapidly remove 5mC.
Crosstalk between DNA methylation with histone modifications
Integrated structural, biochemical and functional analyses indicate that histone post-translational modifications play important roles in shaping DNA methylome patterning and dynamics. Genome-wide profiling and genetic studies further suggest that both positive and negative feedback between DNA methylation and specific histone modifications at distinct genomic elements contribute to gene regulation and dysregulation in normal development and human diseases (Janssen and Lorincz, 2022).
Negative feedback between DNA methylation and H3K4me at promoters and enhancers
The ‘active’ histone mark H3K4me3, which is generally present at transcriptionally permissive gene promoters, is particularly abundant at promoter regions of high CpG density [known as CpG-rich or CpG island (CGI) containing promoters] (Weber et al., 2007). Methylation of H3K4 inhibits de novo DNA methylation at transcriptionally active promoters by blocking the interaction between this histone tail residual (H3K4) and the ADD domain of DNMT3A/3B/3L (Guo et al., 2015; Li et al., 2011; Ooi et al., 2007; Otani et al., 2009). Lower H3K4 methylation states (H3K4me1/2), which are enriched at active/poised enhancers, are also likely to protect these distal cis-regulatory regions against de novo DNA methylation using the antagonism between H3K4 methylation and the ADD domain of DNMT3 enzymes.
Unlike the antagonistic relationship between H3K4me and DNMT3, CpG-rich promoters are enriched for TET enzymes, possibly recruited by the interaction between TET1/3 CXXC domains and unmethylated CpGs at these genomic regions (Pastor et al., 2013; Williams et al., 2011; Wu et al., 2011b). Thus, H3K4me3-marked active promoters remain in unmethylated states through the inhibition of de novo DNA methylation and their high rates of TET-mediated active DNA demethylation (Ginno et al., 2020; Parry et al., 2021).
Crosstalk between PRC1/2-deposited histone modifications and DNA methylation
The repressive histone modification H3K27me3 is deposited at many CGI promoters of developmental genes by polycomb repressive complex 2 (PRC2) (Cao et al., 2002). Genetic, biochemical and genome-wide profiling studies reveal an unexpected antagonistic relationship between H3K27me3 and DNA methylation (Bartke et al., 2010; Lindroth et al., 2008; Wu et al., 2010), and suggest that PRC2 binding to chromatin might be negatively regulated by DNA methylation (Bartke et al., 2010; Wu et al., 2010). Recent studies have uncovered the molecular basis of the antagonism between DNA methylation and H3K27me3, which generally show a mutually exclusive distribution across the genome.
In addition to the three core subunits (EED, SUZ12 and catalytically active EZH2), unique subunits contributing to chromatin binding are present in two major PRC2 complexes (PRC2.1 and PRC2.2). PRC2.1 requires polycomb-like (PCL) subunit (PHF1/PCL1, MTF2/PCL2 and PHF19/PCL3) for recruitment to CG-rich sequences (Li et al., 2017; Perino et al., 2018), but such binding is abrogated when CpG sites are fully methylated. Similarly, PRC2.2 is recruited to CGIs via the interaction between its JARID2 subunit and H2AK119ub (Blackledge et al., 2014), a repressive histone mark deposited by the PRC1 complex. Interestingly, recruiting the PRC1 complex to chromatin is partially mediated by KDM2B, a CXXC domain containing subunit that preferentially binds to unmethylated CpGs (Blackledge et al., 2014; He et al., 2013; Wu et al., 2013). Thus, both PRC2.1 and PRC2.2 bind CGIs in a DNA methylation-sensitive manner. Genetic knockout (KO) of DNMT3A in neural stem cells in vitro (Wu et al., 2010) or in the mouse brain in vivo (Li et al., 2022) leads to loss of DNA methylation and a concomitant increase of PRC2 recruitment and H3K27me3 levels at large regions overlapping and/or flanking many CGI-containing promoters. Conversely, TET1 in mouse embryonic stem cells (mESCs) can facilitate the binding of PRC2 and H3K27me3 deposition to CpG-rich sequences near transcriptionally poised ‘bivalent’ developmental gene promoters (Gu et al., 2018; Li et al., 2018b; Wu et al., 2011b), possibly by reducing DNA methylation levels at these regions (Fig. 2C).
Although PRC2/H3K27me3 and DNA methylation are generally enriched at different genomic regions, PRC1 may target de novo DNA methylation to polycomb-regulated regions via a selective interaction between H2AK119ub and a ubiquitin-dependent recruitment region (UDR) in the N-terminus of DNMT3A1 (Gu et al., 2022; Weinberg et al., 2021). Specific mutations in the PWWP domain of DNMT3A1 abolish its interaction with H3K36 methylation and re-direct de novo methylation to H2AK119ub/H3K27me3-marked bivalent CGI promoters, which may lead to transcriptional de-repression of these developmental genes (Sendzikaite et al., 2019; Weinberg et al., 2021) (Fig. 2C). Interestingly, this mechanism may recruit wild-type DNMT3A1 to regions flanking H3K27me3-marked bivalent gene promoters and is required for postnatal development (Gu et al., 2022). How, mechanistically, TET proteins are targeted to PRC1/2-regulated regions remains less clear, but this PRC1/H2AK119ub-dependent recruitment of DNMT3 enzymes may partially explain the relative enrichment of 5hmCG (Ficz et al., 2011; Pastor et al., 2011; Williams et al., 2011; Wu et al., 2011a) and 5fCG/5caCG, which accumulates in the absence of TDG and indicates dynamic DNA methylation/demethylation turnover (Shen et al., 2013; Song et al., 2013; Wu et al., 2016), at flanking regions of PRC2-regulated CGI promoters in mESCs. These observations support a model in which the dynamic recruitment and bi-directional crosstalk between PRC1/2-deposited histone marks, DNMT and TET enzymes collectively shape the epigenetic and transcriptional states of these bivalent developmental genes (Fig. 2C).
H3K9 methylation and DNA methylation at CpG-rich regulatory regions of retrotransposons, imprinted and germline-specific genes
H3K9me3 is a repressive histone modification mainly enriched at heterochromatic and repetitive sequences. Early studies in fungi and plants unraveled a positive role for H3K9 methylation in establishing DNA methylation (Jackson et al., 2002; Tamaru and Selker, 2001). In mammals, the relationship between H3K9 methylation and DNA methylation is more complex and dependent upon specific developmental stages. As mentioned above, H3K9 methylation promotes maintenance DNA methylation via interaction with the UHRF1 Tudor and PHD domains (Fig. 1D). This positive link between H3K9me2/3 and maintenance DNA methylation machinery may be essential in retaining 5mCG levels at specific genomic regions in pre-implantation embryos and primordial germ cells (PGCs): two developmental stages in which UHRF1/DNMT1 levels in the nucleus are limited (e.g. due to DPPA3-mediated sequestration of UHRF1 in the cytoplasm) and the genome becomes hypomethylated. For example, in globally hypomethylated early pre-implantation embryos, H3K9 methyltransferase SETDB1 (ESET) is recruited by the Krüppel-associated-box-domain zinc-finger proteins (KRAB-ZFPs) to deposit H3K9me3 at genomic imprinting control regions (ICRs), which in turn recruits DNMT1/UHRF1. Similarly, SETDB1 and H3K9 methylation maintain DNA methylation and transcriptional silencing at a subset of evolutionally young transposable elements (TEs), such as the intracisternal A-particle (IAP) family of long-terminal repeat retrotransposons in early embryos and migrating PGCs (Lane et al., 2003; Leung et al., 2014; Liu et al., 2014). Thus, this mechanism confers resistance to global demethylation at these H3K9 methylation-marked ICRs and TEs during specific developmental stages.
Lifelong silencing of germline-specific genes in somatic tissues is dependent upon DNMT3B-mediated de novo DNA methylation of these CpG-rich, methylation-sensitive promoters during embryonic implantation (Auclair et al., 2014). H3K9 methyltransferase G9a (EHMT2) and possibly H3K9me2 facilitate the recruitment of DNMT3B to methylate a subset of germline-specific gene promoters in developing embryos (Auclair et al., 2016). Thus, unlike most H3K4me3-marked CGI promoters that resist de novo methylation, CGI promoters of germline-specific genes are targeted by both H3K9 and DNA methylation to be silenced in somatic cells.
H3K36 methylation within gene body and intergenic regions
H3K36 di- or tri-methylation (H3K36me2/3) interacts with the PWWP domains of DNMT3A and DNMT3B (Fig. 1C), thereby directing de novo DNA methylation to genomic domains enriched for H3K36 methylation (Dhayalan et al., 2010). H3K36me3 is deposited within actively transcribed gene bodies by SETD2, during RNA polymerase II (RNAPII)-mediated transcriptional elongation (Edmunds et al., 2008; Yoh et al., 2008). Catalyzed by several histone methyltransferases, including NSD1-3 and ASH1L, H3K36me2 is enriched at intergenic regions (Wagner and Carpenter, 2012). Interestingly, genome-wide profiling studies show that DNMT3A tends to bind to H3K36me2-enriched intergenic regions (Weinberg et al., 2019), whereas H3K36me3-marked gene bodies of actively transcribed genes are preferentially targeted by DNMT3B (Baubec et al., 2015).
Gene body DNA methylation is highly conserved across most eukaryotes (Zemach et al., 2010), but the potential functions of the widespread H3K36me3-directed genic DNA methylation at actively transcribing genes remain incompletely understood. Genic DNA methylation has been suggested to regulate alternative promoters (Maunakea et al., 2010), inhibit spurious transcription (Neri et al., 2017) and affect co-transcriptional RNA splicing (Gelfman et al., 2013). However, these mechanisms appear to only regulate a subset of genes and cannot fully explain the prevalence of genic DNA methylation. Given the positive link between intragenic DNA methylation and active transcription across diverse cell types and developmental stages, H3K36me2/3-directed genic de novo DNA methylation may have additional roles in promoting transcription and establishing genomic imprints. DNMT3-dependent non-promoter, genic methylation can facilitate transcription of developmental regulators in embryos (Auclair et al., 2014) and postnatal neural stem cells (Wu et al., 2010), by potentially counteracting H3K27me3 repression. In addition, SETD2/H3K36me3-dependent recruitment of DNMT3A/3L is required for establishing DNA methylation patterns at maternal ICRs in oocytes (Xu et al., 2019); similarly, NSD1/H3K36me2 directs de novo DNA methylation to all three paternal imprinting regions in prospermatogonia (Shirane et al., 2020).
Interestingly, missense gain-of-function mutations in the PWWP domain disrupt DNMT3A with H3K36me3 interactions, cause ectopic hypermethylation of Polycomb-regulated developmental genes in mice, and cause microcephalic dwarfism in human patients (Heyn et al., 2019; Sendzikaite et al., 2019). Conversely, heterozygous DNMT3A haploinsufficiency mutations are associated with macrocephalic overgrowth with moderate intellectual disability phenotypes in Tatton-Brown-Rahman syndrome (TBRS) (Weinberg et al., 2019). Interestingly, Sotos syndrome, which is clinically comparable with TBRS, is caused by haploinsufficiency of NSD1/2 and presents with hypomethylated intergenic 5mC signatures like TBRS. These findings suggest that loss of intergenic and/or genic 5mC, by either abrogation of H3K36me2/3 deposition or DNMT3A recruitment, results in abnormal cellular overgrowth phenotypes.
DNA methylation patterns and dynamics during development and maturation
Proper DNA methylation patterns or dynamics are required for mammalian development as the loss of either de novo or maintenance DNMTs causes embryonic or perinatal lethality (Li et al., 1992; Okano et al., 1999; Smith and Meissner, 2013). Although mice deficient in individual TET proteins can undergo largely normal embryonic development (Pastor et al., 2013; Wu and Zhang, 2014), removal of all three TET genes leads to severe gastrulation phenotypes (Dai et al., 2016), suggesting that TET-mediated 5mC oxidation is required for normal embryonic development, but that different TET enzymes have overlapping functions. In addition, mutations in 5mC writers (DNMTs), readers (MBDs) and erasers (TETs) are linked to many human diseases, including growth disorders, immunodeficiency syndromes, neurodevelopmental disorders and blood cancers (Greenberg and Bourc'his, 2019; Janssen and Lorincz, 2022). Thus, DNA methylation dynamics and regulation are an integral component of epigenetic regulation for developmental gene programs. Below, we describe how different components of mammalian DNA methylation/demethylation pathways achieve cell type- or stage-specific regulation of methylome dynamics. We focus our discussion on several well-studied developmental contexts: pre-implantation development, post-implantation development, germ cell development, postnatal neuronal development and maturation, and hematopoietic stem cell differentiation.
Pre-implantation development
During embryonic development, the paternal and maternal epigenomes undergo two distinct waves of genome-wide DNA demethylation. The first wave occurs before blastocyst formation. Although the paternal pronucleus is hypermethylated compared with the maternal pronucleus at the onset of fertilization (Fig. 3), both methylomes concomitantly undergo DNA demethylation. In mice, the paternal pronucleus endures genome-wide 5mC conversion to 5hmC/5fC/5caC by maternal TET3 (Gu et al., 2011; Iqbal et al., 2011; Wossidlo et al., 2011), and subsequent replication-dependent ox-mC dilution from the two-cell stage until the pre-implantation blastocyst stage (Inoue et al., 2011; Inoue and Zhang, 2011). Conversely, the maternal genome is largely protected from TET3 enzymatic action but loses 5mC through passive replication-dependent dilution due to the cytoplasmic localization of the oocyte-specific DNMT1 isoform, DNMT1o (Howell et al., 2001). As a result of replication-dependent dilution of 5mC and ox-mC (mainly from paternal genomes) during pre-implantation embryogenesis, global methylation reaches its lowest level (∼20%) and paternal/maternal methylomes are largely equalized in the inner cell mass of blastocysts (Fig. 3). However, this model is challenged by the observation that pre-implantation embryos derived from maternal TET3 KO mice and oocytes also exhibit paternal pronucleus 5mC attenuation across replication (Guo et al., 2014a; Shen et al., 2014). The observed ox-mC dilution may, therefore, simply be a consequence of cytoplasmic DNMT1o localization. Also, pharmacological or genetic inhibition of Tet3 function does not affect initial loss of paternal 5mC (Amouroux et al., 2016), suggesting that conversion of 5mC to ox-mC is not a pre-requisite for the early stage of zygotic active DNA demethylation, which may occur in a TET3-independent unknown mechanism. Furthermore, Tet3 maternal KO mouse embryos exhibit minimal zygotic gene activation defects, and effects on pre-implantation embryonic development (Inoue et al., 2012, 2015). Thus, the relative contribution of Tet3-dependent ox-mC generation to gamete-specific DNA methylome erasure kinetics and its exact functions in early pre-implantation embryos remain unclear.
Note that the mouse paternal genome accumulates low levels of de novo methylation during initial global 5mC loss (Amouroux et al., 2016; Richard Albert et al., 2020), which is catalyzed by maternal DNMT3A and occurs at a subset of CGI promoters to constrain premature gene expression before blastocyst formation (Richard Albert et al., 2020). Comparatively, human zygotic embryos exhibit three distinct waves of DNA demethylation and two waves of de novo methylation during pre-implantation stages (Zhu et al., 2018). Specifically, this de novo methylation occurs at repetitive elements and precedes their demethylation during the early pronuclear and mid-pronuclear-to-four-cell stages. These findings in mammals reveal a puzzling case of dichotomous dynamic DNA methylation regulation in early pre-implantation development that delicately balances DNA demethylation and de novo methylation. Although DNA demethylation occurs precipitously on the paternal allele up to the two-cell stage, its methylome remains asymmetrically hypomethylated compared with the maternal allele, even beyond global genome remethylation during implantation in embryonic and extra-embryonic lineages (Zhu et al., 2018).
In other animals, such as zebrafish, paternal DNA methylation patterns are maintained throughout early embryogenesis, whereas the maternal genome simultaneously undergoes both passive DNA demethylation and de novo methylation to reach a pattern akin to the sperm methylome (Jiang et al., 2013). Thus, different species leverage divergent mechanisms to achieve global methylome reprogramming during early embryogenesis.
Post-implantation development
Following implantation, the epiblast of the mouse blastocyst transitions from a state of naive to primed pluripotency before gastrulation (Fig. 3). In blastocyst-derived pluripotent stem cells, developmental gene promoters acquire H3K27me3 and H3K4me3, establishing bivalent chromatin domains that represent poised transcriptional states (Bernstein et al., 2006; Zheng et al., 2016). As 5mC occludes H3K27me3 and H3K4me3 enrichment (Brinkman et al., 2012; Guo et al., 2015; Ku et al., 2008; Statham et al., 2012; Zheng et al., 2016), bivalent domains localizing at DNA methylation valleys (DMVs) are regulated by TET enzymes to maintain hypomethylated states (Gu et al., 2018; Xiang et al., 2020). In concordance, TET1/2 KO mouse embryos gain 5mC at DMVs and lose H3K4me3, supporting the well-established antagonistic function of 5mC on unmethylated CGI-binding domain-containing chromatin proteins, such as H3K4me3 methyltransferases and auxiliary proteins that direct PRC2 to deposit H3K27me3 (Li et al., 2017). Recently, a previously uncharacterized gene, QSER1, has been shown to interact with TET1 in human embryonic stem cells to safeguard bivalent promoters and poised enhancers near developmental genes against DNMT3A/B-mediated de novo methylation (Dixon et al., 2021).
Post-implantation mouse embryos undergo DNMT3A/3B-mediated de novo methylation across the genome and global DNA methylation is established by E6.5 in embryonic and extra-embryonic tissues (Dahlet et al., 2020). In Dnmt3a/3b double knockout (DKO) embryos, not only are CGI promoters of germline-specific and a small number of lineage-committed genes de-repressed, there is also a concomitant upregulation of many two-cell embryo-related genes and transposons (Argelaguet et al., 2019; Dahlet et al., 2020). Global de novo DNA methylation during early lineage specification in post-implantation embryogenesis is correlated with dynamic re-configuration in both chromatin accessibility and three-dimensional chromatin structures (Argelaguet et al., 2019; Zhang et al., 2018). Interestingly, recent single-cell multi-omics analysis links the concerted changes in DNA methylation and chromatin accessibility at lineage-specific enhancers to transcriptome changes that drive germ-layer formation, as opposed to changes at promoters (Argelaguet et al., 2019). Ectoderm-defining enhancers, specifically, are hypomethylated and accessible at the early epiblast stage, suggesting that murine epiblast cells are primed for ectodermal lineages during gastrulation (Argelaguet et al., 2019; Xiang et al., 2020). In contrast, mesoderm and endoderm cells maintain these hypomethylated ectodermal-defining enhancers but are actively differentiated via TET-mediated DNA demethylation of lineage-defining enhancers that accrued de novo 5mC before specification. In concordance, loss of TET1/2 in the epiblast (Zhang et al., 2018), and during induced pluripotent stem cell (iPSC) reprogramming (Sardina et al., 2018), results in tissue-specific enhancers that undergo active DNA demethylation to gain 5mC and impaired reprogramming efficacy. Similarly, in vitro cultured human peri-implantation embryos endure extensive global remethylation at enhancers and repetitive elements following implantation (Zhou et al., 2019). It is, however, of interest whether this observed change in chromatin accessibility following implantation is a causal consequence of DNA methylation, a phenomenon preceding DNA methylation, or merely a correlation. In addition, while TET proteins regulate specific signaling pathways (e.g. Lefty-Nodal pathway) required for normal gastrulation (Cheng et al., 2022; Clark et al., 2022; Dai et al., 2016), whether ox-mCs such as 5hmC play any gene regulatory roles beyond DNA demethylation intermediates in post-implantation embryo development remains unclear (Fig. 3).
Germ cell development
A second wave of genome-wide DNA demethylation occurs during the epigenetic reprogramming of PGCs to reinstate totipotency and erase parent-specific methylation patterns at ICRs (Fig. 3). Compared with autosomal genes, imprinted genes are mono-allelically transcribed from a parental allele and are regulated by DNA methylation at ICRs (Proudhon et al., 2012; Thorvaldsen et al., 1998; Williamson et al., 2006). Of the 20 ICRs identified in humans and mice, most of them are methylated in the oocyte and colocalize with CGI promoters, compared with paternal ICRs, which are CpG-poor and primarily intergenic (Kobayashi et al., 2012; Proudhon et al., 2012). Mechanistically, paternal- and maternal-methylated ICRs are protected from global DNA demethylation by two KRAB-ZFPs, ZFP57 (Li et al., 2008; Quenneville et al., 2011) and ZFP445 (Takahashi et al., 2019), and their binding partner KRAB-associated protein 1 (KAP1; TRIM28) (Messerschmidt et al., 2012; Quenneville et al., 2011; Strogantsev et al., 2015). ZFP57 selectively binds to a conserved ICR hexanucleotide motif (TGCCGC) and ZFP57/KAP1 recruits SETDB1 to mark ICRs with H3K9me3 and then DNMT1/UHRF1 to protect methylated ICRs in post-fertilization zygotes (Messerschmidt et al., 2012; Quenneville et al., 2011).
Global erasure of DNA methylation in mouse PGCs is a two-step process (Fig. 3). During the first phase (E6.5-E9.5), bulk 5mC is passively diluted during replication, likely from Dnmt3a/b and Uhrf1 downregulation in PGCs (Kagiwada et al., 2013; Kurimoto et al., 2008; Seisenberger et al., 2012). By distinction, methylated ICRs and germ line-specific genes are initially protected from global PGC demethylation (phase 1), but subsequently oxidized by TET1/2 to generate 5hmC (phase 2: E9.5-E13.5). 5hmC is then removed during replication (Yamaguchi et al., 2013a), and PGCs reach lowest 5mC levels at ∼E13.5 (6-8% global methylation), before sex-specific de novo methylation during gametogenesis (Hackett et al., 2012). Although TETs are dispensable for phase 1 global DNA demethylation (Vincent et al., 2013), TET1-mediated oxidative modification of 5mC is required for proper erasure of genomic imprinting at ICRs and activation of germline-specific genes (e.g. meiotic genes in female germ cells) in PGCs (Yamaguchi et al., 2012, 2013b).
During maturation, oocytes and sperm display gamete-specific remethylation dynamics. Sperm cells remethylate rapidly in mammals, via the action of DNMT3A/3L; however, oocytes maintain hypomethylation until the time of oogenesis and maturation (Seisenberger et al., 2012; Smallwood et al., 2011; Wang et al., 2014). In mice, male germline methylation is established during fetal stages, and requires the concerted efforts of DNMT3A, DNMT3L and DNMT3C for male fertility (Barau et al., 2016; Dura et al., 2022) (Fig. 3). DNMT3A/3L broadly methylates across genomic elements in the male germ cell genome to drive spermatogenetic differentiation commitment by inhibiting ectopic enhancer activity (Dura et al., 2022), whereas DNMT3C silences promoters of evolutionarily young transposons to constrain their expression (Barau et al., 2016). Although DNMT3C KO mice are somatically normal, they exhibit infertility and hypogonadism, supporting an integral role for DNA methylation in retrotransposon repression for developing male germ cells. Indeed, in the murine male germline, loss of Piwi domain-containing proteins piwi-like protein 4 (MIWI2; Piwil4) and piwi-like protein 2 (MILI; Piwil2), which generate piwi-interacting RNAs (piRNAs) to direct de novo methylation at repetitive elements, results in promoter hypomethylation and transposon de-repression (Aravin et al., 2008; Carmell et al., 2007; Kuramochi-Miyagawa et al., 2008; Manakov et al., 2015). As MILI primarily localizes in the cytoplasm for piRNA biogenesis and loading, it was originally speculated that MIWI2 had a larger role in transposon DNA methylation (Aravin et al., 2008; De Fazio et al., 2011). Despite this, both MILI and MIWI2 have independent roles in regulating transposable element DNA methylation, although the precise mechanism remains unclear (Manakov et al., 2015). Interestingly, MIWI2 KO animals also exhibit spermatogenesis deficiencies and hypogonadism, suggesting DNMT3C may interact downstream of piRNA generation to facilitate de novo transposon promoter methylation (Carmell et al., 2007).
Compared with sperm cells, oocytes are globally hypomethylated and exhibit distinct gamete-specific methylation profiles (Fig. 3) (Kobayashi et al., 2012). Although sperm display global hypermethylation except at CpG-dense regions, oocytes gain DNMT3A/3L-catalyzed de novo methylation primarily across transcriptionally expressed gene bodies and at maternally imprinted regions (Bourc'his et al., 2001; Chotalia et al., 2009; Kaneda et al., 2010; Kobayashi et al., 2012). To maintain the oocyte genome in a hypomethylated state during oogenesis, the maternal factor DPPA3 safeguards the methylome from ectopic DNMT1o-catalyzed de novo methylation by sequestering UHRF1 from the nuclei to the cytoplasm (Li et al., 2018a). DPPA3 KO in murine oocytes results in UHRF1 and DNMT1o nuclear mis-localization causing aberrant 5mC accumulation at inactive CGI promoters. Embryos derived from ectopically hypermethylated oocytes exhibit dysregulated zygotic genome activation and mostly fail to reach the blastocyst stage, underscoring the importance of DPPA3 in establishing the normal oocyte methylome.
Postnatal neuronal development and maturation
DNA methylation-mediated gene regulation and epigenetic dynamics are involved in postnatal brain maturation, cognitive function and neurodevelopmental disorders. Specifically, adult neurons are enriched for non-CpG methylation (5mCH, where H=A, C or T) and 5hmCG (Guo et al., 2011; Lister et al., 2013), the accumulation of which coincides with neuronal maturation and the peak of synaptogenesis in mammals. The timing and relative abundance of 5mCH and 5hmCG in the brain suggests an integral regulatory role for these modifications during postnatal brain development (Gabel and Greenberg, 2013). Unlike 5mCG levels that are generally high across diverse tissues and cell-types with a global level of 70-90%, 5mCH is relatively enriched in specific cell-types including adult neurons (>2%), oocytes (1-2%) and pluripotent stem cells (1-2%) (He and Ecker, 2015). Although global 5mCH levels (1-3%) are much lower than the 5mCG levels (∼80%), a substantial fraction of methylated sites (∼25%) are in the CH context, as the human genome has ∼40-fold more CH sites (∼1.1 billion) than CG sites (∼28 million) (He and Ecker, 2015). Unlike 5mCH in non-brain cells (enriched at 5mCAG), 5mCH typically exists in CAC context (5mCAC) in brain cell-types (Schultz et al., 2015). Neuronal 5mCAC is deposited postnatally by DNMT3A1 (the long DNMT3A isoform) (Gu et al., 2022) and becomes a widespread form of DNA methylation in mature neurons in adult brains (Lister et al., 2013; Xie et al., 2012). Recent studies using genome-wide methylome and transcriptome sequencing methods with high cellular and base resolution have demonstrated that genic 5mCA levels negatively correlate with cell-type specific gene expression, whereas genic 5hmCG positively links to active transcription (Colquitt et al., 2013; Guo et al., 2014b; Kozlenkov et al., 2018; Luo et al., 2017, 2018b; Mellen et al., 2012, 2017; Wu et al., 2011a).
Furthermore, in vitro studies show that these modifications have opposing interactions with repressive chromatin readers (Chen et al., 2015; Gabel et al., 2015; Mellen et al., 2012, 2017). The methyl-CpG binding protein 2 (MeCP2), a DNA methylation reader pervasively expressed in mature neurons (Skene et al., 2010), represses transcription through recruiting the nucleosome remodeling and deacetylase (NURD) complex and histone deacetylases (HDACs) (Nan et al., 1998). Notably, MeCP2 preferentially binds to 5mCG, 5mCH and 5hmCH, but is inhibited by 5hmC in the CpG context (5hmCG), underscoring the significance of context- and sequence-dependent regulation of 5mC and ox-mC (Kinde et al., 2015). Given the inability of traditional bisulfite sequencing methods to discern 5mC from 5hmC (Huang et al., 2010), and given their implications for opposing roles on chromatin protein binding, it is requisite to comprehensively resolve these modifications by integrating multiple base-resolution ox-mC sequencing methods (Schutsky et al., 2018; Stoyanova et al., 2021). Recent studies in post-mitotic neurons also revealed preferential enrichment of TET/TDG-dependent generation of 5fC/5caC and subsequent single-strand breaks (SSBs) at neuronal enhancers, highlighting the potential roles of the active DNA demethylation pathway in targeting DNA repair to cell-type specific cis-regulatory elements and in regulating their activities (Wang et al., 2022).
Loss-of-function mutations in MECP2 cause Rett Syndrome (RTT), an autism spectrum disorder (Amir et al., 1999; Gabel et al., 2015). By binding to both 5mCG and 5mCA sites, MeCP2 is believed to impact gene regulation by recruiting the transcription co-repressor complex NCOR-SMRT or by directly influencing the chromatin architecture (Lyst and Bird, 2015). Recent studies have found that MeCP2 preferentially modulates long genes that are enriched with 5mCA (Gabel et al., 2015) or mechanistically contributes to formation of heterochromatic condensates (Li et al., 2020). Interestingly, re-expression of MeCP2 in brains rescues severe neurological RTT symptoms and abrogates the premature lethality observed in MeCP2 KO mice (Guy et al., 2007). The reversibility of the RTT-like phenotype upon restoration of MeCP2 argues that RTT is not a neurodevelopmental disorder; rather, MeCP2 may be required for normal brain functions throughout life (McGraw et al., 2011). Moreover, following MeCP2 KO in specific neuronal subtypes (e.g. inhibitory neurons), changes in gene expression were minimal, indicating that MeCP2 may only fine-tune gene expression at genes demarcated with de novo 5mCA established in early postnatal brain development (Stroud et al., 2017). Concordant with this, RTT phenotypes and related differential gene expression is correlated with changes in DNMT3A-mediated 5mCA, but MeCP2 only binds a subset of DNMT3A-regulated genes in GABAergic neurons and has DNMT3A-independent functions that may contribute to RTT phenotypes (Lavery et al., 2020), highlighting new directions to pursue in future studies.
Hematopoietic stem cell differentiation
Hematopoietic stem cells (HSCs) undergo extensive epigenetic and transcriptional reprogramming during cell fate commitment, which is tightly regulated by DNA methylation dynamics (Challen et al., 2012; Zhang et al., 2016); mutations in DNMT3A and TET2 are frequently observed in hematological malignancies (Delhommeau et al., 2009; Figueroa et al., 2010; Fraietta et al., 2018; Langemeijer et al., 2011; Moran-Crusio et al., 2011) and age-associated clonal hematopoiesis (Challen and Goodell, 2020; Jaiswal and Ebert, 2019).
In HSCs, transcriptional priming from multipotency stages toward erythroid or myelomonocytic progenitors is controlled by the concerted programs of DNMT3A and TET2 to epigenetically regulate HSC genes, lineage-specific enhancers and transcription factor (TF) binding sites. Recent single-cell RNA-sequencing analyses of conditional Dnmt3a or Tet2 mutant mice indicate that individual KO of these enzymes in HSCs results in opposing shifts in erythroid or myelomonocytic differentiation, highlighting transcriptional priming at the early stages of HSC multipotency (Izzo et al., 2020). Specifically, Tet2 KO HSCs show increased cellular quiescence, long-term HSC expansion and bias toward myeloid progenitors. Dnmt3a KO HSCs demonstrate erythroid lineage skewing, which is also observed in a human patient with a DNMT3A mutation (F755S) presenting with clonal hematopoiesis. Notably, TET2 KO cells exhibit 5mC enrichment at TF binding sites associated with erythroid differentiation (Tal1, Gata1, Klf1), which may prevent TF binding (Yang et al., 2020) and account for mature erythrocyte depletion.
Although DNMT3A and TET2 are of opposing molecular functions, their mutations often result in paradoxically similar clinical outcomes, possibly reflecting different roles of these enzymes during development (antagonistic) and tissue maintenance (synergistic). Indeed, studies of Dnmt3a and Tet2 DKO HSCs show that DNMT3A and TET2 cooperate to repress lineage-specific transcription factors in HSCs (Zhang et al., 2016). The upregulation of lineage-specific TFs (e.g. erythroid TFs: Klf1, Epor) in DKO HSCs coincides with promoter 5hmC attenuation, inversely correlating 5hmC enrichment with the degree of gene expression. In addition, downregulated HSC genes in DKO cells lose 5hmC at promoter and genic regions, transcriptionally priming HSCs toward other lineages. DKO HSCs show monocytic progenitor skewing, augmented self-renewal and expansion, and reductions in mature erythrocytes, which are clinical phenotypes of AML patients harboring dual DNMT3A and TET2 mutations (Zhang et al., 2016).
New technologies for elucidating the regulatory roles of dynamic DNA methylation
Single-cell multi-omics analyses of DNA methylomes, transcriptomes and chromatin states
Single-cell DNA methylome sequencing has recently emerged as a powerful technology to identify cellular heterogeneity within tissue and cell populations, revealing temporal methylation dynamics in specific cell types. Since the first reports of single-cell reduced representation (scRRBS, enriched for CG-rich regions) or whole-genome single-cell bisulfite sequencing (scBS-seq) analysis (Guo et al., 2013; Smallwood et al., 2014), methodological advances in single-cell methylome techniques have increased experimental throughput and genome coverage. By leveraging split-pooling based combinatorial indexing (e.g. sci-MET), more efficient post-bisulfite adaptor tagging (PBAT) reactions (snmC-seq) and expanding reduced representation regions to enrich for more distal regulatory regions (scXRBS) (Luo et al., 2017; Mulqueen et al., 2018; Shareef et al., 2021) (Fig. 4A), these scalable single-cell methods have been used to profile DNA methylomes in thousands of nuclei, revealing epigenetic diversity across highly heterogeneous cell types (Liu et al., 2021). Importantly, single-cell DNA methylation landscapes provide cell-type specific information on distal cis-regulatory elements, linking genetic risks associated with human diseases to specific non-coding regions. However, compared with single-cell transcriptomics (i.e. RNA-based) and other epigenomics (i.e. chromatin accessibility- and histone modification-based) sequencing methods, current single-cell DNA methylome technologies, which are mostly plate-based, are still substantially more costly to implement and are of lower throughput, impeding their wider adoption in the field.
Building on single-cell DNA methylome profiling methods, multi-omic approaches have been developed to jointly measure DNA methylation along with the genome [scTrio-seq (Hou et al., 2016)], transcriptome [scM&T-seq (Angermueller et al., 2016)], chromatin accessibility [scNOMe-seq (Pott, 2017); scCOOL-seq (Guo et al., 2017); scNMT-seq (Clark et al., 2018); snmCAT-seq (Luo et al., 2022)] and three-dimensional chromatin conformation [sn-m3C-seq (Lee et al., 2019); scMethyl-HiC (Li et al., 2019)] in the same single cell (Fig. 4A). Further development of single-cell multi-omics methylome mapping may enable joint profiling of both DNA methylation and major histone modifications in the same cell to better understand the crosstalk between these two epigenetic regulatory mechanisms (Fig. 4A). Collectively, integrating various regulatory layers of the cell (genome, epigenome, transcriptome and proteome) into multi-omic measurements allow for studying cellular heterogeneity at different time scales and for revealing novel links between genomic/epigenomic regulation and its functional output.
A major limitation for these sequencing methods is their reliance on the bisulfite conversion reaction, which cannot distinguish 5mC from 5hmC, the two most abundant yet functionally opposing DNA modifications in mammalian genomes (Huang et al., 2010). In addition, two relatively rare highly oxidized 5mC bases, 5fC and 5caC, cannot be distinguished from unmodified cytosines in bisulfite sequencing (Wu et al., 2014). To resolve epigenetic base ambiguity between 5mC/C and oxidized 5mC bases in heterogeneous cell populations, modification-specific endonucleases [AbaSI for glucosylated 5hmC (5ghmC) and MspJI for 5mC] have been leveraged to map 5hmC or 5mC in single cells (Mooijman et al., 2016; Sen et al., 2021) (Fig. 4A,B). Interestingly, strand-specific 5hmC mapping in single cells can facilitate lineage reconstruction in early embryos in which 5hmCGs are diluted in a replication-dependent manner (Mooijman et al., 2016). However, these restriction enzyme-based methods rely on selective enrichment of 5hmC- or 5mC-containing genomic fragments, thus obscuring quantification of absolute modification levels. Advances in the enzymatic or chemical conversion of DNA modification provide the quantitative means to systematically measure genome-wide profiles of 5hmC [snhmC-seq (Fabyanic et al., 2021 preprint)] or highly oxidized cytosine bases [CLEVER-seq for 5fC (Zhu et al., 2017); scMAB-seq for 5fC/5caC (Wu et al., 2017)] in single cells (Fig. 4A,B). To fully reveal the regulatory functions of DNA modification dynamics across heterogeneous cell populations or in primary tissues, novel methods are still needed to jointly profile true 5mC and oxidized bases (e.g. 5hmC) in the same single cells to reveal the distinct roles of 5mC and oxidized bases as well as the enzymatic kinetics of the DNMT/TET/TDG axis in the native chromatin environment.
Non-destructive and long-read sequencing
A key limitation to most bulk or single-cell DNA methylome sequencing methods is their reliance on destructive chemical conversion reactions, such as bisulfite treatment, which randomly fragment genomic DNA, and short-read sequencing technologies (generally less than a few hundreds of base pairs). These limitations preclude comprehensive analysis of allele-specific methylation states or repetitive regions in mammalian genomes. However, recent advances in third generation sequencing platforms, such as PacBio single-molecule real-time (SMRT) (Wenger et al., 2019) and Oxford Nanopore (Deamer et al., 2016) sequencing, have enabled long-read sequencing analysis of DNA modification states (Fig. 5A). Notably, Nanopore and SMRT sequencing can directly map native DNA modifications in eukaryotic (e.g. 5mC) or prokaryotic (e.g. 6mA) genomic DNA (Kong et al., 2022; Rand et al., 2017; Simpson et al., 2017). These methods are particularly useful in advancing our understanding of the phased epigenome by robustly assigning reads (>10 kb) to haplotypes. When paired with GpC methyltransferases (e.g. M. CviPI) to exogenously mark open chromatin regions, nanopore sequencing can evaluate both endogenous CpG methylation and chromatin accessibility simultaneously on long strands of single DNA molecules (Battaglia et al., 2022; Lee et al., 2020). Recent completion of the first entire human reference genome by the Telomere-to-Telomere (T2T) Consortium using long-read sequencing technologies has resolved the remaining 8% of the genome that includes many complex homologous regions and diverse repetitive sequences (Nurk et al., 2022). The draft T2T-CHM13 reference thus provides unprecedented opportunities to investigate the epigenetic dynamics and regulatory roles of DNA modifications across these complex and elusive regions of the human genome at high resolution during normal development and in human diseases (Gershman et al., 2022; Hoyt et al., 2022).
In addition to Nanopore-based direct long-read sequencing analysis of 5mCG, several bisulfite-free methylome sequencing methods enable base-resolution, long-read sequencing of 5mC and oxidized bases (Fig. 5B). These methods improve read alignment and genomic coverage by replacing bisulfite treatment with mild, non-destructive enzymatic or chemical reactions. APOBEC-coupled epigenetic sequencing (ACE-seq) employs human APOBEC3A (A3A) enzyme to deaminate unmodified C and 5mC, while leaving 5hmC protected as unreacted (Schutsky et al., 2018) (Fig. 5B). Similarly, enzymatic methyl-seq (EM-seq) (Vaisvila et al., 2021), DNA deaminase-based sequencing is leveraged to map 5mC and 5hmC by making both resistant to A3A deamination, yielding a non-destructive, all-enzymatic readout equivalent to standard BS-seq. Protection of 5mC from A3A deamination involves TET-mediated 5mC oxidation to the A3A-resistant 5caC state or, by coupling to β-glucosyltransferase (β-GT), the subsequent generation of 5-glucosyl-hmC (5ghmC). These conversions render 5mC and 5hmC resistant to A3A, leaving only unmodified C susceptible to A3A-mediated deamination (Fig. 5B). In an alternative strategy, TET-assisted pyridine borane sequencing (TAPS), 5mC and 5hmC are first oxidized by TET proteins to 5caC and then reduced to dihydrouracil (DHU) by pyridine borane (Liu et al., 2019). DHU is then amplified and sequenced as T during sequencing, thus further improving mapping rate. As these non-destructive methylome sequencing methods are compatible with both Nanopore and SMRT sequencing platforms (Liu et al., 2020; Sun et al., 2021), we anticipate that long-read methylome studies can provide information on 5mC patterning and functions that cannot be accessed with short-read sequencing approaches. Furthermore, future work combining non-destructive long-read sequencing with single-cell analyses will significantly advance our understanding of DNA methylation dynamics across complex human genomic regions within heterogeneous cell populations.
Advances in DNA modification epigenome editing technologies
As highlighted above, decades of genetic studies have established global gene regulatory roles for 5mC modifiers (DNMT and TET enzymes) and readers (e.g. MeCP2) in various biological contexts. However, ascribing direct and causal functional roles to 5mC at regulatory elements (e.g. promoters, enhancers) via global gain- or loss-of-function studies remains a challenge. Recently, CRISPR-based epigenome editing technologies have emerged as versatile and powerful tools that provide unprecedented insights into how 5mC and oxidized bases directly regulate gene expression and chromatin organization at specific loci and in an allele-specific manner (reviewed by Lei et al., 2018; Liu and Jaenisch, 2019). By tethering the catalytic domain of various DNA methylation writers and erasers [e.g. DNMT3A (Vojta et al., 2016), Mollicutes spiroplasma CpG methyltransferase SssI (MQ1) (Lei et al., 2017), A. thaliana DNA glycosylase ROS1 (Devesa-Guerra et al., 2020) and TET1 (Choudhury et al., 2016)] to a catalytically deactivated Cas9 protein (dCas9), and targeting this to genomic regions using a well-designed single-guide RNA (sgRNA) (Fig. 6A), several studies have explored DNA methylation and demethylation dynamics in cultured cells, such as neurons, cancer cells and mESCs.
The first CRISPR/dCas9-based study demonstrating locus-specific 5mC editing used dCas9-DNMT3A or -TET1, delivered with lentiviral transduction to both cultured cells and mouse brain in vivo (Liu et al., 2016). dCas9-TET1 was later applied to reactivate FMR1, a repressed and hypermethylated gene with CGG expansion repeats in the 5′ untranslated promoter region of Fragile X syndrome (FXS) patients (Liu et al., 2018). Specifically, reactivation and demethylation of FMR1 promoter was demonstrated in an iPSC-based FXS model, with cells subsequently being differentiated into post-mitotic neurons and engrafted into early postnatal mouse brains. However, direct manipulation of post-mitotic neurons in vivo results in less effective FMR1 rescue, DNA demethylation and electrophysiological changes (Liu et al., 2018). Furthermore, the CGG expansion in FXS models provides a unique genomic environment in which repetitive sequences enable locus saturation with multiple copies of dCas9-TET1 through expression of a singular sgRNA. The attenuated rescue efficiency observed in vivo (compared with that seen in cultured cells) underscores the need for developing more efficacious technologies and methods for epigenome editing in vivo and in post-mitotic cells.
Although this pioneering work demonstrates the translational potential and feasibility of studying brain disorders in vivo using epigenome editing, several technical limitations are worth considering. For example, many methylation analyses are measured exclusively with bisulfite sequencing, a quantitative base resolution sequencing method unable to disambiguate 5mC from 5hmC (Huang et al., 2010), and C from 5fC/5caC. Ascribing functional roles for individual 5mC and oxidized modifications thus remains a challenge. As individual ox-mCs may potentially act as epigenetic modifications with distinct functions and regulation (Wu and Zhang, 2015), it is imperative to comprehensively resolve base ambiguity.
Efforts to improve CRISPR-based methylome epigenome editing focus on signal amplification and combinatorial effector recruitment. By adding repetitive GCN4 peptides to dCas9 (Tanenbaum et al., 2014), multiple copies of DNMT3A (Huang et al., 2017; Su et al., 2018) or TET1 (Hanzawa et al., 2020; Morita et al., 2016) tethered to anti-GCN4 single chain variable fragments can be recruited to targeted loci, affording more efficient 5mC deposition and removal (Fig. 6B). This ‘SunTag’ system is compatible with modular expression, allowing optimization for qualifying off-target effects and improving on-target effects (Pflueger et al., 2018). Another RNA-based multi-valent system incorporates MS2 RNA stem loops into the sgRNA backbone to recruit its cognate binding partner, MS2 coat protein (MCP), tethered to TET1 with dCas9-TET1 (Xu et al., 2016) (Fig. 6C). Alternatively, Pumilio/FBF (PUF) domains linked to TET1 in conjunction with DNA glycosylases GADD45A/NEIL2 can be recruited to PUF-binding sites incorporated into the 3′ end of the sgRNA to more efficiently demethylate loci (Taghbalout et al., 2019) (Fig. 6D). When combined with repressive histone modifying enzyme activity (i.e. H3K9me3 methyltransferase co-factor recruiter, KRAB), an optimized dCas9-DNMT3A/3L epigenome editing approach (termed ‘CRISPRoff’) can initiate long-term heritable gene silencing in both proliferative and post-mitotic cells and can be further leveraged for genome-wide screens (Nunez et al., 2021). The future in vivo application of these programmable epigenome-editing platforms can hopefully be harnessed to mechanistically dissect the causal relationship between 5mC and gene regulation.
Conclusions
In contrast to historic views, the DNA methylome is highly dynamic during mammalian development, tissue maturation and aging. Extensive crosstalk between DNA methylation and histone modifications shapes the dynamic methylation landscape and contributes to the gene regulation of diverse molecular processes, including genomic imprinting, somatic suppression of germline-specific genes, plasticity of CGI bivalent promoters and multiple forms of human disease. However, multiple mechanistic and biological questions remain unanswered.
For example, as mammals lack the bifunctional 5mC DNA glycosylases found in plants, it is currently conceived that ox-mCs need to act as intermediates for TET-catalyzed replication-dependent dilution of ox-mCs (e.g. 5hmC) or TDG-coupled BER to restore unmodified states. However, DNA demethylation via replication-dependent dilution of ox-mCs has only been observed in cellular contexts where DNA maintenance machinery is functionally impaired (e.g. pre-implantation embryos). Thus, it is possible that ox-mC modified dyads might not be sufficient to induce replication-dependent loss of 5mC/ox-mC (Caldwell et al., 2021), but instead can serve as substrates for DNMTs during somatic cell division. As TET proteins preferentially stall at 5hmC at many genomic regions, it may also function as bona fide stable epigenetic mark in proliferative or post-mitotic cells (Spruijt et al., 2013). In addition, the crosstalk between TETs/ox-mCs and histone modifications is poorly understood, and whether active DNA methylation/demethylation turnover at distal enhancers is consequential of expression changes or plays more instructive roles in transcriptional control is unclear. Finally, many long-standing puzzles, such as how DNA methylation mechanistically relates to epigenetic clocks and functionally contributes to aging processes, and how piRNAs direct de novo methylation to repetitive elements in germ cells, warrant future investigations. Emerging single-cell multi-omics, long-read methylome sequencing methods and novel epigenome-editing technologies will shed light on these unanswered questions.
Acknowledgements
We are grateful to all members of the Wu lab for helpful discussions.
Footnotes
Funding
H.W. is supported by a National Heart, Lung, and Blood Institute grant (DP2-HL142044), National Human Genome Research Institute grants (U01-HG012047, R01-HG010646), a National Cancer Institute grant (U2C-CA233285) and a Penn Epigenetics Institute pilot grant. A.W. is supported by a pre-doctoral Individual National Research Service Award from the National Institutes of Health (F31-HG011429). Deposited in PMC for release after 12 months.
References
Competing interests
The authors declare no competing or financial interests.