Trimethylation of histone H3 lysine 4 (H3K4me3) at the promoters of actively transcribed genes is a universal epigenetic mark and a key product of Trithorax group action. Here, we show that Mll2, one of the six Set1/Trithorax-type H3K4 methyltransferases in mammals, is required for trimethylation of bivalent promoters in mouse embryonic stem cells. Mll2 is bound to bivalent promoters but also to most active promoters, which do not require Mll2 for H3K4me3 or mRNA expression. By contrast, the Set1 complex (Set1C) subunit Cxxc1 is primarily bound to active but not bivalent promoters. This indicates that bivalent promoters rely on Mll2 for H3K4me3 whereas active promoters have more than one bound H3K4 methyltransferase, including Set1C. Removal of Mll1, sister to Mll2, had almost no effect on any promoter unless Mll2 was also removed, indicating functional backup between these enzymes. Except for a subset, loss of H3K4me3 on bivalent promoters did not prevent responsiveness to retinoic acid, thereby arguing against a priming model for bivalency. In contrast, we propose that Mll2 is the pioneer trimethyltransferase for promoter definition in the naïve epigenome and that Polycomb group action on bivalent promoters blocks the premature establishment of active, Set1C-bound, promoters.
INTRODUCTION
In eukaryotes, transcription is regulated not only by transcription factors that bind specific DNA sequences near the regulated gene, but also by post-translational modifications of the nucleosomes that surround and encompass these DNA sequences. The modifications include methylation, acetylation and mono-ubiquitylation of histone tails that project out from the core nucleosome and serve as binding sites for chromatin proteins and complexes (Bannister and Kouzarides, 2011; Suganuma and Workman, 2011). In vertebrates, nucleosome modifications, together with cytosine methylation, influence transcriptional regulation during development, adult life and disease (Albert and Helin, 2010; Butler et al., 2012; Reik, 2007). This epigenetic level of transcriptional regulation is crucial to the multiple ways in which a genome is interpreted in multicellular organisms (Goldberg et al., 2007).
Metazoan development is regulated by programmed transcriptional hierarchies acting in synergy with epigenetic mechanisms (Fisher and Fisher, 2011; Jaenisch and Bird, 2003; Magnúsdóttir et al., 2012). The first clues about how epigenetic mechanisms regulate gene expression were discovered in Drosophila through genetic screens that juxtaposed the repressive Polycomb group (PcG) and the activating Trithorax group (TrxG) (Brock and Fisher, 2005; Ringrose and Paro, 2004; Simon and Tamkun, 2002). These genetic interactions were given biochemical relevance when it was discovered that PcG and TrxG proteins confer mutually exclusive lysine methylations on the histone H3 tail. TrxG action was first linked to histone H3 lysine 4 methylation (H3K4me) when the yeast homologue of the Drosophila TrxG protein, Ash2, was found to be a subunit of the first H3K4 methyltransferase complex, the Set1 complex (Set1C) (Roguev et al., 2001). The SET domain of Set1 methylates H3K4 and is virtually identical to the SET domain in Trithorax itself, indicating that Trithorax is also an H3K4 methyltransferase, as subsequently shown for the mammalian orthologue Mll1 (Milne et al., 2002). PcG action was linked to H3K27 methylation when the SET domain protein Enhancer of Zeste homologue 2 (EZH2) was shown to methylate H3K27 and to be a subunit of the widely conserved Polycomb repressor complex 2 (PRC2) (Cao et al., 2002; Kuzmichev et al., 2002; Müller et al., 2002).
Mammals have six Set1/Trithorax-type H3K4 methyltransferases (Glaser et al., 2006), which are all found in similar protein complexes based on the Set1C/Ash2 scaffold (Ruthenburg et al., 2007; Yokoyama et al., 2004). Hence, the core of the H3K4 methyltransferase complex is one of the most highly conserved components of the epigenetic machinery. This extreme evolutionary conservation reflects the fact that H3K4 trimethylation (H3K4me3) is the only universally conserved epigenetic modification.
The genetic opposition between PcG and TrxG in metazoan development relates to the differing consequences of H3K4 and H3K27 trimethylation for the nucleosomes surrounding promoters. H3K4me3 is bound by factors involved in transcriptional activity, such as ING proteins (Champagne and Kutateladze, 2009), TFIID/Taf3 (Vermeulen et al., 2007), JMJD2A (Huang et al., 2006), NURF/BPTF (Wysocka et al., 2006), CHD1 and U2snRNP components (Sims et al., 2007), PHF8 and Sgf29/SAGA (Vermeulen et al., 2010). This list also includes the Set1/Trithorax H3K4 methyltransferase complexes themselves (Eberl et al., 2013; Milne et al., 2010; Shi et al., 2007), suggesting that H3K4me3 ensures epigenetic maintenance through a feed-forward maintenance cycle. Similarly, H3K27me3 recruits PRC2 itself (Hansen et al., 2008; Margueron et al., 2009) and also PRC1, the companion PcG complex in gene repression (Bernstein et al., 2006b; Fischle et al., 2003).
As well as recruiting proteins involved in gene expression, TrxG action also prevents PcG-mediated silencing (Klymenko and Müller, 2004), potentially through binding occlusion by H3K4me3, as found for exclusion of NuRD (Zegerman et al., 2002) or a DNA methylation complex (Ooi et al., 2007). Therefore, the finding that certain promoters, termed bivalent, feature both H3K4me3 and H3K27me3 (Azuara et al., 2006; Bernstein et al., 2006a) was unexpected and controversial (Akkers et al., 2009; Herz et al., 2009; Vastenhouw and Schier, 2012; Vastenhouw et al., 2010; Voigt et al., 2013). Recent progress with bivalency includes the findings that PRC2 cannot methylate H3K27 in vitro when H3K4 is trimethylated on the same tail (Schmitges et al., 2011; Voigt et al., 2012), that bivalent promoters have the least H3K4 trimethylation of any H3K4me3 promoters (Li et al., 2010; Xu et al., 2010) and that H3K4me3 is restricted to the +1 nucleosome whereas H3K27me3 is at least partially excluded from that nucleosome and found on flanking nucleosomes (Marks et al., 2012).
Bivalent promoters were discovered in mouse embryonic stem cells (mESCs), where they were mostly found in genes involved in lineage specification. In mESCs these genes are not, or are very poorly, expressed despite having an H3K4me3 peak near the transcription start site (TSS). Hence, a role for bivalency in keeping crucial gene expression off, but poised until the appropriate developmental stage, was proposed (Azuara et al., 2006; Bernstein et al., 2006a; Ku et al., 2008; Vastenhouw and Schier, 2012).
Bivalency is a component of the epigenetic blueprint that contributes to the orchestration of gene expression hierarchies in development. Epigenetic remodelling during germ cell maturation and early embryonic development resets the epigenome back to the totipotent ground state of the zygote and pluripotency of the epiblast (Magnúsdóttir et al., 2012; Marks et al., 2012; Wray et al., 2010). Towards the end of spermatogenesis and oogenesis, transcription from the germ cell genome is shut down in preparation for meiosis. Before transcriptional activation of the embryonic genome shortly after the first cell division, the paternal epigenome must be reassembled after its substantial stripping during spermatogenesis. How the epigenome is reprogrammed during early development is still unclear. Previously, we reported that the Trithorax orthologue Mll2 is a key player in resetting the epigenome. It is required during oogenesis and early cleavage stages after fertilisation, when it is the major H3K4 trimethyltransferase (Andreu-Vieyra et al., 2010). Later in development, it is again required after gastrulation, when it is no longer the major H3K4 trimethyltransferase. Mll2-/- mouse embryos die at ∼E10.5, displaying retarded growth and development as well as widespread apoptosis (Glaser et al., 2006; Glaser et al., 2009). Here we show that Mll2 is required for trimethylation of bivalent promoters in mESCs.
RESULTS
Mll2 regulates H3K4me3 at bivalent promoters
It has been shown that Mll2 is dispensable for the self-renewal of mESCs. Overall, fewer than 100 genes were downregulated more than 2-fold when Mll2 was deleted, including just one gene, Magohb, that was entirely dependent on Mll2 for expression (Glaser et al., 2009; Ladopoulos et al., 2013). This modest impact on gene expression was obtained with two different loss-of-function alleles for Mll2 (Mll2-/- and Mll2FC/FC; supplementary material Fig. S1) in different genetic backgrounds (129Ola [E14Tg2a=E14] and C57Bl/6, respectively).
Consequently, we were surprised to discover that ∼3000 promoters showed more than 2.5-fold loss of H3K4me3 in the E14 Mll2-/- ESCs (Fig. 1A). To verify this observation, we employed tamoxifen-induced conditional mutagenesis in the C57Bl/6 ESCs to convert the Mll2 allele from F/F to FC/FC (supplementary material Fig. S1), which completely eliminates Mll2 protein by 48 hours (Glaser et al., 2009). After culturing for a further 48 hours, over 900 promoters showed greater than 2.5-fold loss of H3K4me3. Again, this is far more than the 60 genes that were downregulated by more than 2-fold at the same 96 hour time point (Glaser et al., 2009), and most (75%) of these 900 affected promoters were also affected in constitutively ablated Mll2-/- ESCs (Fig. 1A,B). This suggests that the promoters affected at 96 hours are predominantly the most sensitive direct targets of Mll2, whereas the additional changes found in Mll2-/- ESCs include less sensitive promoters and secondary, adaptive changes.
By contrast, no H3K4me3 changes were found in E14 Mll1-/- ESCs (Fig. 1C,D), despite the fact that Mll1 protein is expressed in wild-type ESCs (Testa et al., 2003) and is highly related to Mll2, having arisen by gene duplication (FitzGerald and Diaz, 1999; Glaser et al., 2006; Huntsman et al., 1999). Consistent with the unchanged H3K4me3 peaks, loss of Mll1 had very little effect on the mRNA expression profile (Fig. 1D). As with Mll2, fewer than 100 genes were downregulated more than 2-fold when Mll1 was ablated in ESCs.
To understand the discrepancy between the large effect on promoter H3K4me3 peaks and the small effect on mRNA levels in the absence of Mll2, we examined the relationship between promoter H3K4me3 levels and mRNA expression levels (Fig. 1D). This revealed that almost all of the affected promoters expressed very little, or no, mRNA in wild-type ESCs. These promoters also showed the least H3K4 trimethylation. Because small H3K4me3 peaks and low to absent mRNA expression are characteristics of bivalent promoters, we looked at the other signature of bivalency, H3K27me3.
Fig. 2A displays all 15,610 H3K4me3 promoters that we identified in ESCs (>30 reads above background in the 2 kb surrounding TSSs) sorted according to the ratio of H3K4me3 change comparing wild-type E14 and Mll2-/- cells from most increased (Fig. 2A, lane 1, at the bottom) to most decreased (at the top). Mll2FC/FC results show a reduced, but similar, profile (Fig. 2A, lane 2). Remarkably, this criterion for sorting genes according to the impact on H3K4me3 of mutating Mll2 also approximately arranged: (1) promoters from the most to least H3K4me3 reads (Fig. 2A, lanes 3-7); (2) promoters from the least to most H3K27me3 reads (lanes 8-11); (3) genes from the most to least expressed (lanes 12, 13); (4) promoters from most to least RNA polymerase II (Pol II) occupancy (lane 14). Consequently, it also clustered the bivalent promoters at the top (Fig. 2A, lane 15). These data show that the bivalent promoters lose H3K4me3 upon loss of Mll2 and are the least H3K4 trimethylated (Fig. 2B). These points are also illustrated by typical examples and conventional ChIP of genes unaffected and affected by loss of Mll2 (Fig. 2C,D). Hence, the discrepancy between a small impact on expression levels and a much larger impact on H3K4me3 levels at thousands of promoters in Mll2-/- ESCs is explained by the realisation that Mll2 is the primary H3K4 trimethyltransferase for bivalent promoters.
Characterisation of tagged BAC transgenes by rescue of Mll2 function
To identify Mll2 genomic binding sites in ESCs, we applied BAC transgenic protein tagging methods for ChIP-seq (Hofemeister et al., 2011). For H3K4 methyltransferases, the stop codon is a highly conserved and integral part of the post-SET domain. Previously, we found that C-terminal tagging of yeast Set1 inactivated the methyltransferase activity (Roguev et al., 2001), whereas N-terminal tagging of Mll2 unexpectedly generated a hypomorphic allele that caused sterility in homozygous females (Andreu-Vieyra et al., 2010). Hence, we explored a third strategy based on internal tagging.
Mll1 and Mll2, like their fly orthologue Drosophila Trithorax, are cleaved by the asparaginyl protease Taspase at a conserved site in the middle of the protein. However, the N- and C-terminal sections remain associated due to an intramolecular interaction between the FYR-N and FYR-C domains (Hsieh et al., 2003a; Hsieh et al., 2003b). To tag Mll2 internally, GFP was inserted next to the Taspase cleavage site, either on the N- or C-terminal side to generate Mll2-FL N-GFP or Mll2-FL C-GFP (FL, full-length; supplementary material Fig. S1) by recombineering a BAC transgene (Hofemeister et al., 2011). The tagged BAC transgenes were introduced into the conditional Mll2F/F ESCs and tested for cleavage by Taspase and rescue of function after tamoxifen-induced elimination of endogenous Mll2 (supplementary material Fig. S2). In ESCs, endogenous Mll2 is almost completely cleaved (supplementary material Fig. S2A, lanes 1, 7). We observed physiological expression levels and Taspase cleavage of both full-length tagged Mll2s in the presence and absence of endogenous Mll2. Both the N-terminal-only (Mll2 N-GFP) and C-terminal-only (Mll2 C-GFP) tagged half-proteins were also expressed at physiological levels from the BAC transgenes. Loss or additional expression of Mll2 did not change the expression of Ash2l, which is the core subunit for the H3K4 methyltransferases, or the pluripotency factor Nanog (supplementary material Fig. S2A).
To evaluate the function of the tagged forms of Mll2, we examined their ability to rescue three defects in the Mll2-/- ESC phenotype. Mll2-/- ESCs show increased doubling time, loss of Magohb expression and delayed differentiation upon removal of LIF (Glaser et al., 2009; Lubitz et al., 2007). Both of the tagged full-length BAC transgenes (Mll2-FL N-GFP and Mll2-FL C-GFP) rescued all three phenotypic faults, whereas the N-terminal-only (Mll2 N-GFPonly) and C-terminal-only (Mll2 C-GFPonly) constructs did not (supplementary material Fig. S2B-D), except for partial rescue by the N-terminal-only construct in the differentiation assay (supplementary material Fig. S2D).
As well as demonstrating that the tagged BAC transgenes rescued function, we also demonstrated by generic native affinity purification using the tag followed by mass spectrometry (AP-MS) (Hofemeister et al., 2011; Hubner et al., 2010) that the tagged proteins retrieved all of the expected interaction partners (data not shown).
Characterisation of Mll2 binding sites in the ESC genome
Tagged Mll2 was used for ChIP-seq. Peak identification aligned almost all of the 11,000 peaks to TSSs (Fig. 3A), which were H3K4 trimethylated as expected (Fig. 3B). Fifteen percent also showed H3K27me3. A preference for CpG islands was observed (84%, versus 70% if randomly distributed) (Mikkelsen et al., 2007) (Fig. 3B). We also tagged the common H3K4 methyltransferase subunit Ash2l and the Set1C-specific subunit Cxxc1 (supplementary material Fig. S3). In contrast to a recent report on Ash2l in ESCs (Wan et al., 2013), ∼10,000 TSS promoters enriched for Ash2l were observed. These strongly correlated with Mll2 sites (88%) and CpG islands (84%), indicating that Ash2l was also localised at most Mll2-occupied TSSs. Interestingly, this correlation was less evident for the bivalent promoters (907/1652=55%). Fewer TSS binding sites were observed for Cxxc1 than for Mll2 (8271 versus 10936); however, most Cxxc1 sites (90%) were co-occupied by Mll2. Conversely, only 68% of Mll2 sites were co-occupied by Cxxc1 and only 8% of Cxxc1-bound TSSs also showed H3K27me3 (Fig. 3B). Whereas the tagged C-terminal half of Mll2 did not yield peaks or interpretable data in a ChIP-seq experiment (not shown), the N-terminal half frequently colocalised to the same sites as full-length Mll2, indicating that Mll2 predominantly binds via its N-terminus (Fig. 3B).
To evaluate these data in more detail, we included peak quantifications in an intensity plot. Mll2 was bound to the promoters that lost H3K4me3 when Mll2 was mutated, reinforcing the conclusion that Mll2 is the main H3K4 trimethyltransferase for bivalent promoters. However, Mll2 was also bound to most H3K4me3 promoters (Fig. 3C,D). To challenge the veracity of Mll2 genomic binding sites, we employed a different method for ChIP using a double-crosslinking protocol (Fig. 3C) (Nowak et al., 2005; van den Berg et al., 2008). This protocol achieved a very similar binding profile as found for conventionally crosslinked full-length and N-terminal Mll2 (Fig. 3C,D), albeit with a greater difference between the specific signals and the mock ChIP (Fig. 3C).
Ash2l was also bound to most H3K4me3-marked promoters; however, it was less frequently bound to bivalent promoters and this trend was even more pronounced for Cxxc1 (Fig. 3D), suggesting that Set1C was primarily bound to active but not bivalent promoters. These conclusions are also illustrated by representative examples (Fig. 3E). Sntb1 and Foxp2 are bivalent genes that rely on and are bound by Mll2 and not Cxxc1. Pex12 is a non-bivalent gene that does not rely on Mll2 and is bound by both Mll2 and Cxxc1. 1700016H13Rik is an example of a rare, non-bivalent gene that relies on Mll2 in ESCs; notably, it is bound by Mll2 but not by Cxxc1.
ATRA differentiation in Mll2-/- ESCs
Self-renewal of mESCs does not require Mll2 (Lubitz et al., 2007) or, as now revealed, H3K4me3 enrichment on bivalent promoters. Because bivalency is important for lineage commitment (Fisher and Fisher, 2011), we treated E14 and Mll2-/- ESCs with all-trans retinoic acid (ATRA) for 4 days to examine early events in differentiation.
About half of the 2880 TSSs (1536) that had reduced H3K4me3 in undifferentiated Mll2-/- ESCs compared with wild type (Fig. 1A) were still reduced in Mll2-/- cells after 4 days of ATRA treatment (Fig. 4A). However, the other half (1344) recovered H3K4me3 levels to within 2.5-fold of wild-type levels, despite the lack of Mll2, indicating that another H3K4 methyltransferase was acting on these bivalent promoters to restore H3K4me3 levels.
In wild-type E14 cells, we identified 136 bivalent TSSs at which H3K4me3 levels increased more than 2.5-fold after 4 days of ATRA treatment compared with undifferentiated ESCs (Fig. 4B). Of these, 120 were also elevated in Mll2-/- ESCs, suggesting that only a few bivalent promoters (16) were impaired for ATRA responsiveness by the absence of Mll2. In addition, a further 442 TSSs that are bivalent in wild-type E14 ESCs showed increased H3K4me3 in Mll2-/- ESCs (Fig. 4B,C), reflecting a ‘catch-up’ during ATRA differentiation to near wild-type H3K4me3 levels. Of the 16 bivalent promoters that were impaired, we selected six for closer examination (Fig. 4D). In several cases, H3K4me3 levels were induced by ATRA but remained less than 2.5-fold of the H3K4me3 levels at the equivalent wild-type promoter, whereas in other cases an ATRA response was absent. Because Mll2-/- ESCs showed delayed differentiation (Lubitz et al., 2007), we reasoned that the reduced ATRA response at these promoters could be due to a general delay. Consequently, we induced for 7 days with ATRA, and found a very similar profile including a similar minority of retarded promoters after 7 days (supplementary material Fig. S5), indicating that the problem was not simply due to timing and delay. In conclusion, the absence of Mll2 or H3K4me3 peaks on bivalent TSSs did not prevent ATRA stimulation in the majority of cases. In a minority of cases, the ATRA response was apparently diminished beyond a point of recovery.
ATRA induction in the absence of Mll1
As observed above, in Mll2-/- ESCs (Fig. 4A) another H3K4 methyltransferase(s) must trimethylate the responsive bivalent TSSs during ATRA differentiation. To explore this issue, we knocked out both alleles of the Mll2 sister gene Mll1 by consecutive rounds of targeting in E14 ESCs (supplementary material Fig. S3). We also generated double-knockout ESCs lacking both Mll1 and Mll2 by crossing mouse lines carrying conditional alleles for both Mll1 and Mll2 with Rosa26CreERT2 (Seibler et al., 2003) followed by de novo establishment of a male Mll1F/F; Mll2F/F; R26CreERT2/+ ESC line. This line was induced with tamoxifen to simultaneously knockout both Mll1 and Mll2 to generate Mll1FC/FC; Mll2FC/FC cells, which were harvested 144 hours after tamoxifen was administered.
Concordant with the absence of changes in undifferentiated Mll1-/- ESCs (Fig. 1C,D), very few H3K4me3 TSS peaks (84) were changed more than 2.5-fold in ATRA-induced Mll1-/- ESCs compared with ATRA-induced wild type (Fig. 5A,B), indicating again that Mll1 is not required for ESC self-renewal. Notably, most of the affected TSSs in ATRA-induced Mll1-/- or Mll2-/- ESCs were also affected in ATRA-induced double-knockout Mll1FC/FC; Mll2FC/FC ESCs (Fig. 5C; 75/84=89% and 2060/2342=88%, respectively) thereby indicating high reproducibility of these data despite the differences in experimental origin and ESC genetic background. In addition to these >2000 TSSs, a further 1717 H3K4me3-reduced TSSs were observed in the doubly mutated cells, indicating that either Mll1 or Mll2 catalyzes H3K4 trimethylation on these promoters (Fig. 5C).
As noted above (Fig. 4A), nearly half (1344) of the H3K4me3-reduced TSSs in Mll2-/- undifferentiated ESCs were restored to near wild-type H3K4me3 levels after ATRA differentiation. Of these, more than a third (522/1344=39%; Fig. 5D) were not restored in the double-mutant cells. Hence, Mll1 is the trimethyltransferase that acted on these TSSs in the absence of Mll2. The remaining ∼60% (822) indicate that other methyltransferases were involved. As expected, most of the 522 TSSs (401/522=77%) were bivalent in undifferentiated E14 ESCs. By contrast, only one-third of the extra 1259 TSSs additionally decreased in the double-mutant ESCs were bivalent (429/1259=34%). However, a GO term analysis of these 1259 TSSs revealed similar properties to bivalent promoters (i.e. mainly differentiation and lineage commitment terms, although the regulation of apoptosis might be a notable exception; data not shown).
The overlapping relationships between Mll1, Mll2 and other H3K4 methyltransferases are well illustrated by selected ChIP-seq profiles (Fig. 6). The selection includes active genes in ESCs (Pou5f1, Nanog); genes induced during ATRA differentiation (Gata6, Rec8); bivalent genes that rely on Mll2 (Dkk2, Mef2c); a bivalent gene that relies on either Mll1 or Mll2 (Pcdh18) and a non-bivalent gene that relies on Mll2 (Col11a1). Notably, this unusual non-bivalent case, like 1700016H13Rik (Fig. 3E), also has bound Mll2 and a very small signal for Cxxc1.
H3K4me3 profiles on the Hox complexes exemplify the functional overlaps between the H3K4 methyltransferases (Fig. 6C) and include one of the few direct targets that we could identify for Mll1. Before ATRA induction, few H3K4me3 peaks are visible on the Hox complexes, as expected in undifferentiated ESCs. After ATRA induction, the acquisition of some H3K4me3 peaks required Mll2 (e.g. Hoxd1), some peaks required Mll1 (e.g. Hoxb5) and some required both Mll1 and Mll2 (Hoxa3, b2, c4). However, many H3K4me3 peaks did not require either Mll2 or Mll1, again indicating the contribution of another methyltransferase.
DISCUSSION
Here, we report an unexpected specialisation among the six mammalian Set1/Trithorax-type methyltransferases. Mll2 is mainly responsible for H3K4 trimethylation at ESC bivalent promoters. This was seen both shortly after removal of the protein, when ∼1000 bivalent promoters had reduced H3K4me3, and many passages after the loss of Mll2, when almost all ∼3000 bivalent promoters showed little H3K4me3. Besides at bivalent promoters, few other changes in H3K4me3 were found. Furthermore, removal of its sister protein, Mll1, had little effect on any H3K4me3 TSS peaks, unless Mll2 itself was also removed. Concordantly, removal of either Mll1 or Mll2 or both together had no observable impact on bulk H3K4 methylation levels (supplementary material Fig. S4).
In agreement with the loss of H3K4me3 on bivalent promoters, we found that Mll2 was bound to bivalent promoters in wild-type ESCs. However, it was also bound to most active promoters as well, without apparent effect on H3K4me3 levels or mRNA production. By contrast, the Set1C-specific subunit, Cxxc1, was bound to virtually every active promoter but much less frequently to bivalent promoters. This matches recent H3K4me3 ChIP-seq data from Cxxc1-/- ESCs, which revealed that H3K4me3 levels are greatly diminished near active TSSs but not at bivalent sites (Clouaire et al., 2012). Similar connections between Cxxc1 and active promoters have been made using RNAi knockdown in fibroblasts or with a Cxxc1 (Cfp1) mutant fly line (Ardehali et al., 2011; Thomson et al., 2010). Hence, it appears likely that the Set1 complexes account for most H3K4 trimethylation at active TSSs in many cell types, whereas the Trithorax orthologues (Mll1, Mll2) play more specialised roles.
Of the six mammalian H3K4 methyltransferases, only the localisation of Mll1 and Mll4 has been investigated by ChIP-seq. Two studies in somatic cells found that Mll1 is bound to nearly all active promoters (Guenther et al., 2005; Scacheri et al., 2006), which matches our results for Mll2 and Cxxc1. However, few TSSs showed reduced H3K4me3 in Mll1-/- mESCs (Figs 1, 6) or mouse embryonic fibroblasts (Wang et al., 2009). Together with data from Mll4 (Guo et al., 2012), the emerging picture indicates that most active promoters are bound by more than one H3K4 methyltransferase. However, binding of Mll1 or Mll2 does not indicate function. Despite localisation to most TSSs, Mll2 contributes to the expression of only a few genes. In mESCs, expression of Magohb, but not its sister gene Magoha, requires Mll2. Upon Mll2 removal, the Magohb promoter, which is a CpG island, loses H3K4me3, acquires H3K27me3 and increased CpG methylation (Glaser et al., 2009; Ladopoulos et al., 2013). In macrophages, Mll2 is required for the expression of only a few mRNAs, including Pigp, an essential component in glycosylphosphatidylinositol (GPI) anchor synthesis (Austenaa et al., 2012). Here, we identify two additional non-bivalent genes that are reliant on Mll2 in ESCs: Col11a1 and 1700016H13Rik. Notably, these genes lack Cxxc1 bound at their promoters, suggesting that they are dependent on Mll2 because Set1C is not available.
Because deletion of Mll2 ablated H3K4me3 at bivalent promoters but left active promoters unaffected, we were presented with the unique opportunity to test the proposition that bivalent promoters are primed for responsiveness. Unexpectedly, we found only limited evidence to support this proposition because the ATRA response was largely unaffected, even from H3K4me3-denuded bivalent promoters. However, a minority of bivalent promoters appeared to be impaired for the ATRA response. Potentially, the impaired response of even a few genes involved in a complex lineage commitment programme could derail development and thereby explain the developmental and growth retardations observed in Mll2-/- embryos, which die at ∼E10.5 (Glaser et al., 2006). Because ATRA induction of ESCs is not physiological or a recapitulation of embryonic events, further work is required to determine if our in vitro observations can be extrapolated to the embryo. However, we previously reported that Mll2-/- ESCs are impaired for neural differentiation (Lubitz et al., 2007) and further work has revealed that the expression of only a few genes is significantly perturbed (K. Neumann, K.A. and A.F.S., unpublished).
Because many bivalent promoters showed normal ATRA responsiveness in the absence of H3K4me3, rather than being primed for lineage-specific activation, we suggest a different explanation for bivalency. The idea that CpG islands can attract epigenetic regulators without assistance from sequence-specific DNA-binding proteins (i.e. transcription factors) has recently been proposed (Blackledge and Klose, 2011; Blackledge et al., 2010; Glaser et al., 2009; Ku et al., 2008; Mendenhall et al., 2010; Thomson et al., 2010). This dogma-breaking concept has been most concisely embodied in the suggestion that Cxxc1, via its CxxC domain, which binds to unmethylated CpG dinucleotides (Lee et al., 2001), positions Set1C on CpG islands (Clouaire et al., 2012; Thomson et al., 2010). Our evidence suggests a modified version of this idea. Rather than Cxxc1/Set1C, which we find to be concentrated on active but not bivalent promoters, Mll2 with its CxxC domain appears to be a pioneer in the epigenetic definition of CpG islands as TSSs. With respect to bivalent promoters, we suggest that H3K27me3 and PcG repression serves to prevent the maturation of CpG islands into Set1C-bound active promoters (Fig. 7).
The proposition that Mll2 plays an early role in defining the epigenetic status of ESCs is supported by its requirement during oogenesis and the early cleavage stages after fertilisation. During oogenesis, Mll2 is the major H3K4 tri-, but not mono-, methyltransferase and expression from the paternal allele lessens the deleterious impact of a mutant maternal allele before blastocyst formation (Andreu-Vieyra et al., 2010). As a CxxC domain protein and the major H3K4 trimethyltransferase during late oogenesis and early development, Mll2 is the prime candidate to define CpG islands as potential promoters in the naïve epigenome. At some point in early development, presumably before the blastocyst, Mll2 stops being the major H3K4 trimethyltransferase and is not again required until after gastrulation, despite continuous expression (Glaser et al., 2006).
Although Mll1 is not required in development until definitive haematopoiesis (Ernst et al., 2004; Yagi et al., 1998), our analyses indicate that it contributes to the H3K4 methylation system in mESCs. By examining mESCs doubly mutated for Mll1 and Mll2, a further depth to the complexities of H3K4 methylation and interplay between the six Set1/Trithorax-type methyltransferases was revealed. What is the purpose of this complexity? Potentially, it establishes a multiply redundant system. If so, the redundancy appears to be imperfect and the causes of the imperfections are not obvious. Alternatively, a multiply backed-up system could be mutually reinforcing and achieve epigenetic momentum to ensure that, once a promoter becomes activated, it will remain active without reliance on the continuous reiteration of regulatory signalling. We previously proposed this mechanism for housekeeping gene promoters as well as for the maintenance of activation after lineage-specific decisions (Glaser et al., 2009). In this scenario, housekeeping gene promoters, which are invariably CpG islands, remain active by epigenetic maintenance, as do certain lineage-specific promoters that become recruited to the same status.
MATERIALS AND METHODS
General methods
Recombineering, gene targeting, BAC transgenesis, chromatin immunoprecipitation (ChIP) and generic affinity purification using the GFP tag were performed as previously described (Fu et al., 2010; Hofemeister et al., 2011).
ESC methods
The Mll2-/- (Lubitz et al., 2007) and Mll1-/- ESCs were derived from E14TG2a by consecutive rounds of gene targeting using the same targeting construct for the second allele after exchanging the neomycin resistance gene for hygromycin by recombineering. The Mll2F/F;CreERT2/+, Mll1F/F;CreERT2/+ and Mll1F/F;Mll2F/F;CreERT2/+ ESCs were established de novo from blastocysts after germline transmission of targeted R1 ESCs and crossing to C57Bl/6 at least five times, including a cross to a C57Bl/6 Rosa26CreERT2 mouse line (Seibler et al., 2003). All ESCs were cultured with fetal calf serum (FCS)-based medium [DMEM + GlutaMAX (Invitrogen), 15% FCS (PAA), 2 mM L-glutamine (Invitrogen), 1× non-essential amino acids (Invitrogen), 1 mM sodium pyruvate (Invitrogen), 0.1 mM β-mercaptoethanol] in the presence of LIF. For ATRA inductions, the LIF was replaced with 0.1 μM all-trans retinoic acid (Sigma).
RNA-Seq and ChIP-Seq
Total RNA was isolated from 107 cells using Trizol reagent, poly(A) fractions were purified by double selection on oligo(dT) beads and cDNA prepared as described previously (Marks et al., 2012). Sequencing libraries for cDNA and ChIP samples were prepared using Illumina kits and processed on Illumina GA2 and High-Seq instruments. Single 36 nt reads were aligned to mm9 (Mus musculus 2007) genome assembly and analysed by SAMtools (samtools.sourceforge.net) and BAMtools (sourceforge.net/projects/bamtools). Sequencing data have been deposited in NCBI GEO SuperSeries with accession GSE52071.
Peaks on the ChIP-seq data were defined using the FindPeaks tool from the Vancouver Short Read Analysis Package (vancouvershortr.sourceforge.net). Levels of H3K4me3 and H3K27me3 at promoters were calculated as number of reads per 3 kb promoter regions centred at the TSS. Expression and histone modifications were normalised to total read counts. Bivalency was assigned to promoters covered by >30 and >45 reads of H3K4me3 and H3K27me3, respectively. Low levels (<30 reads per promoter) of H3K4me3 were excluded from the analysis of Mll2-/- affected genes. Gene expression was counted as the number of reads per 1 kb mRNA. Gene ontology (GO) and pathway analyses were performed by DAVID (david.abcc.ncifcrf.gov).
To compensate for differences in sequencing depth and mapping efficiency, the total number of unique reads of each sample was uniformly equalised, allowing quantitative comparisons, as for the heatmaps (Fig. 3C). The number of tags per 80 bp bin was calculated, aligned on the maximum signal in the H3K4me3 profile of E14 ESCs, and plotted essentially as described (Marks et al., 2012). For quantification of the heatmaps using line plots (Fig. 3D), we performed a moving average (50 sites per bin) on the profiles shown in the heatmap (Fig. 3C). For the GFP-tagged subunits, the negative control based on GFP antibody ChIP using untagged E14 ESCs was subtracted. Data were normalised to compensate for differences in ChIP efficiencies for the various profiles and the maximum value was set to 100%.
Antibodies
Antibodies used for western blot analysis were: goat anti-GFP (obtained from MPI-CBG, Dresden); anti-Ash2l (A300-112A-1, Bethyl Laboratories); anti-Nanog (sc1000, Calbiochem); anti-β-actin (A5441, Sigma); anti-Cfp1/CXXC1 (H-120, Santa Cruz); anti-Mll1 (A300-086A, Bethyl Laboratories); anti-H3K4me3 (ab8580, Abcam); anti-H3K4me2 (ab32356, Abcam); anti-H3K4me1 (ab8895, Abcam); anti-H3pan (07-690, Millipore); and anti-Mll2 [as described previously (Glaser et al., 2006)]. For westerns, primary antibodies were detected using secondary HRP-coupled antibodies (Pierce or ThermoFisher).
Note on nomenclature
All published references to Mll2 before 2006 refer to the gene that is called Mll2 in this article, which in the mouse is also called Wbp7. This gene is on human chromosome 19 and mouse chromosome 7. Since 2006, Mll2 has also been used in publications for the gene first named ALR and sometimes called Mll4. This gene is on human chromosome 15 and mouse chromosome 12. An attempt to establish a new nomenclature (Allis et al., 2007) has not resolved the problem because both genes have been called either Kmt2b or Kmt2d.
Acknowledgements
We thank Michelle Meredyth, Davi Torres Coe, Ashish Gupta and Katrin Neumann for discussions.
Author contributions
S.D. and H.M. conceived and performed the experiments and analysed the data; H.H. generated the cell lines, conceived and performed the experiments; A.K., G.C. and K.A. generated the cell lines; S.S. analysed the data; H.G.S. conceived the experiments and analysed the data; A.F.S. conceived the experiments, analysed the data and wrote the paper.
Funding
This work was supported by funding from the EU 6th Framework Program HEROIC to H.G.S. and A.F.S.; the EU 7th Framework Program SyBoSS to A.F.S.; the Deutsche Forschungsgemeinschaft (DFG) SPP1356 Program Pluripotency and Cellular Reprogramming to K.A. and A.F.S.; and a DFG grant [KR2154/3-1] to A.K. and A.F.S. Deposited in PMC for immediate release.
References
Competing interests
The authors declare no competing financial interests.