We explored functional roles of two H3K9-specific histone methyltransferases of Drosophila melanogaster, SetDB1 (also known as Eggless) and Su(var)3-9. Using the DamID approach, we generated the binding profile for SetDB1 in Drosophila salivary gland chromosomes, and matched it to the profile of Su(var)3-9. Unlike Su(var)3-9, SetDB1 turned out to be an euchromatic protein that is absent from repeated DNA compartments, and is largely restricted to transcription start sites (TSSs) and 5′ untranslated regions (5′UTRs) of ubiquitously expressed genes. Significant SetDB1 association is also observed at binding sites for the insulator protein CP190. SetDB1 and H3K9 di- and tri-methylated (me2 and me3)-enriched sites tend to display poor overlap. At the same time, SetDB1 has a clear connection with the distribution of H3K27me3 mark. SetDB1 binds outside the domains possessing this modification, and about half of the borders of H3K27me3 domains are decorated by SetDB1 together with actively transcribed genes. On the basis of poor correlation between the distribution of SetDB1 and H3K9 methylation marks, we speculate that, in somatic cells, SetDB1 may contribute to the methylation of a broader set of chromosomal proteins than just H3K9. In addition, SetDB1 can be expected to play a role in the establishment of chromatin functional domains.
Post-translational modifications of histones are known to play essential role in eukaryotic genome organization. Multiple enzymes add or remove methyl, acetyl, phosphate and other groups at various amino acid residues in histones, thereby producing specific modification patterns that lie in the heart of local or chromosome-wide phenomena, such as heterochromatin formation, transcription activation, dosage compensation, etc. (Kouzarides, 2007; Zhang et al., 2015a).
Interplay between the amino acid residue and modifying enzyme can be both simple and quite complex. For instance, in fruit flies, E(z) is the sole enzyme responsible for H3K27 methylation (Ebert et al., 2004). In mice two enzymes – Ezh1 and Ezh2 – have been demonstrated to play this role (Shen et al., 2008). In budding yeast S. cerevisiae, H3K4 is exclusively methylated by Set1 (Briggs et al., 2001), whereas in fruit flies and mammals, up to three and six enzymes fulfill this function, respectively (Eissenberg and Shilatifard, 2010). Similarly, the number of proteins targeting H3K9 residues differs among the species. In fission yeast S. pombe, Clr4 protein is the only known H3K9-specific histone methyltransferase (HMTase) (Nakayama et al., 2001). In Drosophila, Su(var)3-9, SetDB1 [also known as Eggless (Egg)] and G9a have been described to have H3K9-centered activity (Schotta et al., 2002; Stabell et al., 2006a,b), whereas in mammals at least eight enzymes – SUV39H1, SUV39H2, SetDB1/ESET, G9a, GLP, CLLD8/SetDB2, Prdm3 and Prdm16 – are known (Falandry et al., 2010; O'Carroll et al., 2000; Pinheiro et al., 2012; Rea et al., 2000; Schultz et al., 2002; Tachibana et al., 2002, 2005).
Exactly why multiple enzymes are needed to add a methyl group to the same residue is not well understood, but clearly implicates the existence of a specific ‘niche’ for each of the HMTases involved. One important distinction between the enzymes is linked with the number of methyl groups being added. For instance, in Drosophila and mammals, PR-Set7/SET8 add a single methyl group at H4K20, whereas activity of Suv4-20/Suv4-20h1/Suv4-20h2 results in the tri-methylation of the same amino acid residue (Fang et al., 2002; Nishioka et al., 2002; Schotta et al., 2004). Adding further layer of complexity, distinct HMTases may operate in distinct chromosome compartments. In mouse cells, G9a is largely responsible for H3K9me2 and partially for H3K9me1 marks in euchromatin, whereas Suv39h1 and Suv39h2 serve to produce H3K9me3 in heterochromatin (Rice et al., 2003). In fruit flies, di- and tri-methylation (denoted me2 and me3, respectively) of H3K9 in the pericentric regions of large chromosome arms is mediated by Su(var)3-9. In contrast, on the dot chromosome 4, this methylation is controlled by SetDB1 (Schotta et al., 2002; Seum et al., 2007a; Tzeng et al., 2007). Besides, various HMTases may have various expression profiles in tissues and throughout development. Both Suv39h1 and Suv39h2 display overlapping transcription patterns during mouse embryogenesis. However, Suv39h1 continues to be broadly active in somatic cells, whereas Suv39h2 transcripts remain specifically expressed in adult testes (O'Carrol et al., 2000).
Functional specialization of histone-modifying proteins is probably best exemplified by the phenotypes of appropriate loss-of-function mutations. Mice deficient for either Suv39h1 or Suv39h2 display normal viability and fertility, however Suv39h double-null mice are poorly viable and have abnormal development. Yet, certain double-null animals can survive to adulthood (Peters et al., 2001). This is unlike the situation with embryonic lethality of G9a- or SetDB1/ESET-mutant animals (Dodge et al., 2004; Tachibana et al., 2002). In Drosophila, mutations in orthologous genes behave differently. Whereas Su(var)3-9 mutants are viable and fertile, albeit to a lesser extent than wild-type flies (Mis et al., 2006; Schotta et al., 2002), ablating G9a function has little if any effect on flies (Seum et al., 2007b). Mutation in the egg gene encoding SetDB1 results in significantly reduced viability, with a very low percentage of homo- or hemi-zygous mutant escapers surviving to imago and presenting morphological abnormalities as well as female sterility (Clough et al., 2007, 2014; Wang et al., 2011; Yoon et al., 2008).
The existence of several enzymes targeting the same histone residue prompts the analysis of how their activities are coordinated. In mammals, a subset of HMTases, namely Suv39h1, G9a, GLP and SETDB1, are physically associated in a common protein complex (Fritsch et al., 2010). In fruit flies, Su(var)3-9 and SetDB1 are also functionally related. Specifically, in female germline cells, one enzyme progressively substitutes for another during development; SetDB1 is mostly confined to germarium cells, whereas Su(var)3-9 replaces SetDB1 in the maturing egg chambers (Yoon et al., 2008). SetDB1 has a key role in inactivation of transposable elements, which is necessary to initiate a repressive heterochromatin state, while Su(var)3-9 assists in local bidirectional spreading of the methyl mark (Sienski et al., 2015). Similarly, during the first embryonic cell cycles, SetDB1 initiates H3K9 methylation and heterochromatin formation, whereas Su(var)3-9 is responsible for the maintenance of this histone tail mark throughout development (Seller et al., 2019). Intriguingly, the situation when SetDB1 activity precedes that of SUV39H1/2 has also been described for human cells, where a protein complex containing SetDB1 mono-methylates K9 on non-nucleosomal histone H3 and provides H3K9me1 for subsequent tri-methylation by SUV39H1/2 in pericentric regions (Loyola et al., 2009). Our earlier data are in line with this scenario as well. Previously, using a Drosophila model, we identified a list of genes targeted by Su(var)3-9 in the somatic cell chromosomes, as well as in the male and female germline cells (Maksimov et al., 2018). We established that Su(var)3-9 binding with most of the single copy euchromatic genes is dependent on SetDB1, unlike the situation observed for repeated DNA sequences in heterochromatin (Maksimov and Koryakov, 2019).
First described in 2006 (Stabell et al., 2006a), the Drosophila egg gene encoding SetDB1 still presents multiple puzzles. Most of reports on SetDB1 in fruit flies explore its contribution to female germline cell differentiation and silencing of transposable elements, as well as to the intricate biology of the dot chromosome 4. Whereas in mammals detailed analysis of SetDB1 binding has been performed (Bilodeau et al., 2009; Fei et al., 2015a; Frietze et al., 2010), this is not the case for fruit flies. To fill this gap, we constructed the genome-wide profile of SetDB1 in the chromosomes of somatic cells using the DamID technique (van Steensel and Henikoff, 2000). In order to uncover the specialization of two H3K9-specific HMTases Su(var)3-9 and SetDB1, we compared the distribution of these proteins. Specifically, we analyzed which sequences are predominantly bound by SetDB1, whether SetDB1 and Su(var)3-9 targets overlap, and how SetDB1 may contribute to transcription and formation or maintenance of H3K9me2/3-enriched chromosomal domains.
SetDB1 and Su(var)3-9 associate with different types of genomic sequences
Having established the SetDB1–Dam (hereafter SetDB1) binding profile in the salivary gland chromosomes, we proceeded to compare it to the Su(var)3-9–Dam [hereafter Su(var)3-9] profile reported previously (Maksimov et al., 2018). Overall distributions of the proteins along the chromosomes are different. Whereas Su(var)3-9 is predominantly restricted to heterochromatin, most SetDB1 peaks are found in euchromatin, and very few are in pericentric heterochromatin. One notable exception is chromosome 4, where both proteins display prominent binding (Fig. 1A).
The Drosophila genome can be operationally subdivided into two basic types of sequences, unique and repeated. Unambiguous anchoring of SetDB1 to the repeated DNA compartment presents a challenge, as our bioinformatic pipeline results in the clear alignment to the unique portion of the genome and misses the sequences with multiple alignments. Consequently, most of the repeated DNA fragments are masked from the downstream analyses. Nonetheless, one can infer SetDB1 binding to such regions of the genome. For instance, Piwi-interacting RNA (piRNA) clusters are represented by large repeated blocks, and our pipeline clearly detects strong Su(var)3-9 binding at least at the edges of such clusters (as exemplified by the region 42AB of the 2R chromosome arm) (Maksimov et al., 2018). Yet, SetDB1 is distinctly absent from this and other piRNA clusters (Fig. 1B).
Alternatively, we can calculate the fraction of repeated DNA sequences ‘pulled down’ by SetDB1–Dam versus Dam-only in the raw sequencing data prior to alignment to the genome. It is important to keep in mind that chimeric SetDB1–Dam protein can target the chromatin specifically, via its SetDB1 part, and non-specifically, via its Dam part, whereas a separate Dam protein binds the chromatin non-specifically (see Materials and Methods). Based on the reasonable assumption that Dam binds the chromatin equally well regardless of whether it is part of the fusion protein or not, we processed the raw data and established the percentage of select DNA sequences in the total pool of reads from Dam and SetDB1–Dam datasets. The values obtained were compared to estimate the specificity of SetDB1 binding to repeated DNA sequences of interest. Four types of sequences were queried: 359 bp satellite, telomeric TART and TAHRE repeats, and transposable element 1360 (hoppel). The results obtained for SetDB1 and Su(var)3-9 are very different. The numbers of repeated DNA reads in SetDB1-Dam datasets are on par with those found for Dam (Fig. 1C). This contrasts the situation with Su(var)3-9, where we previously observed significant over-representation of repeated DNA sequences in Su(var)3-9-Dam versus Dam samples (Maksimov et al., 2018). We can therefore conclude that the binding of SetDB1–Dam that we detect in the repeated DNA compartment is largely non-specific and is due to the Dam module binding. Hence, the SetDB1 portion of the chimeric protein is unable to specifically bind the chromatin encompassing the above repeats.
Next, we asked whether there is significant overlap between SetDB1- and Su(var)3-9-binding regions. K-means clustering analysis (Ramírez et al., 2016) allowed us to subdivide SetDB1 binding sites into three groups (clusters 1–3, Fig. 2A). As it follows from our data, vast majority of SetDB1 sites lack detectable Su(var)3-9 co-binding nearby (clusters 2 and 3), which is restricted to the cluster 1. Importantly, Su(var)3-9 colocalization with SetDB1 in cluster 1 is far from being perfect, as it tends to be spread around rather than matching exactly in terms of binding. We proceeded to analyze whether there are any chromosomal biases in cluster localization (Fig. 2B), and indeed we observed that most of the cluster 1 sequences map to the dot chromosome 4, with the rest of the sequences scattered along the large chromosome arms. Chromosome 4 signals are virtually absent from the clusters 2 and 3, and most of the sequences belong to the autosomes. One can therefore conclude that SetDB1 and Su(var)3-9 have a predilection for binding distinct types of sequences; Su(var)3-9 is found predominantly in heterochromatic repeats, whereas SetDB1 prefers single copy euchromatic sequences. It is only in chromosome 4 that the two proteins tend to colocalize.
SetDB1 and Su(var)3-9 associate with distinct parts of genes, with their gene targets having distinct expression patterns
Our analysis of SetDB1 distribution within and between gene units in euchromatin indicates that this protein is depleted from intergenic regions. Inside genes, SetDB1 is significantly enriched around transcription start sites (TSSs), in 5′UTRs and less so in introns (due to the presence of alternative TSSs, see below) (Fig. 3A). Previously, we reported that, in euchromatin, Su(var)3-9 also tends to be attracted more to genes than to intergenic regions; however, it is enriched in exons and coding sequences (CDSs) (Maksimov et al., 2018).
To demonstrate distinct behavior of SetDB1 and Su(var)3-9 at promoter regions, we calculated the number of binding peaks per 100 bp around TSSs and averaged this value for 1000 genes. To avoid possible overlap with transcription end sites in short genes, only those spanning above 3 kb were taken into account. Our analysis indicates that the probability of SetDB1 binding begins to increase 2 kb upstream of the TSS and is the highest at the TSS of a metagene. In contrast, Su(var)3-9 binding progressively decreases starting at −2 kb relative to the TSS, and reaches the minimum at the TSS (Fig. 3B).
Previously, we found that Su(var)3-9 binding is significantly overrepresented at genes having a multiple CDS composition (Maksimov et al., 2018). This is, however, not the case for SetDB1, and the observed to expected ratios for genes having multiple CDSs or just a single one are 1.16 and 0.96, respectively (P=0.022, binomial test). Yet, SetDB1 notably prefers the genes with alternative TSSs where it binds 1.4 times more frequently and ‘avoids’ the genes with a single TSS, where it binds 1.3 times less frequently compared to the expected rate (P=3×10−66; binomial test). This peculiarity strongly influences the choice of SetDB1 targets. To calculate this number, we considered that alternative TSSs can be found within the introns along the entire length of a gene. Hence, we defined SetDB1 targets as the genes that display significant SetDB1 binding within the region spanning −300 bp upstream of TSS until the very end of 3′UTR. In total, 4048 such SetDB1-positive genes were found in the salivary gland chromosomes (Table S1).
Using RNA-seq, we assayed expression of SetDB1 targets in the salivary gland chromosomes of wild-type larvae (Table S2) and compared it to the entire gene set and a subset of Su(var)3-9-bound genes. SetDB1 targets turned out to be more actively expressed compared to both the genome average and Su(var)3-9 targets (Fig. 4A). In order to better understand which types of genes contribute to this activity, the genes were binned into five groups, ranging from genes inactive in salivary glands [fragments per kilobase of transcript per million mapped reads (FPKM)<1] to strongly expressed ones (FPKM>1000). Overall, half of the genes are inactive in salivary glands, and a quarter of transcriptome is expressed at low levels. About one fifth of the genes are moderately expressed, and only a handful of genes are highly active. The same trend is observed for Su(var)3-9 targets, with an even higher proportion of silent or low-expression genes. The distribution of SetDB1 targets is very different though. The percentage of expressed genes is significantly higher, and silent genes constitute only ∼15% of the dataset (Fig. 4B).
Using modENCODE tissue expression data (Brown et al., 2014) and Gene Cluster 3.0 software (de Hoon et al., 2004), expression profiles for SetDB1- and Su(var)3-9-bound genes were compared. Fig. 4C shows that Su(var)3-9 targets encompass both ubiquitously expressed and tissue-specific groups of genes, largely restricted to testes, ovaries, nervous and digestive systems. In contrast, SetDB1 targets are essentially expressed everywhere, and can be referred to as housekeeping genes.
Our previous work has uncovered an interesting fact – Su(var)3-9 binding to genes is SetDB1-dependent, and about two thirds of Su(var)3-9 targets in salivary gland chromosomes are lost in egg mutants (Maksimov and Koryakov, 2019). Yet, the SetDB1 profile obtained here indicates that the interplay with Su(var)3-9 is not as straightforward as this might suggest. All Su(var)3-9 targets can be subdivided into four broad categories depending on whether they share SetDB1 binding or are dependent on SetDB1 (Fig. 5A). A total of 125 genes bind Su(var)3-9 regardless of whether SetDB1 is present. There are 152 genes that become Su(var)3-9 targets only in the presence of SetDB1. One example of such genes is shown in the Fig. 5B, where the binding peaks for the two proteins overlap. A total of 261 gene are never bound by SetDB1, yet they share a feature of lost Su(var)3-9 binding in egg mutants (as is illustrated by Fig. 5C). A total of 91 gene are bound both proteins, but the peak positions do not match, and Su(var)3-9 binding does not depend on the availability of SetDB1. Fig. 5D illustrates this situation; SetDB1 is strongly enriched at alternative TSSs, whereas Su(var)3-9 prefers exons. Thus, the very dependence of Su(var)3-9 binding on SetDB1 does not necessarily imply direct SetDB1 binding to the same target; likewise the mere presence of SetDB1 does not indicate that Su(var)3-9 binding at the same gene is SetDB1-dependent.
Our RNA-seq data obtained for salivary gland cells are consistent with overall low expression of Su(var)3-9 gene targets, although FPKM values may vary several-fold for different genes. However, if one restricts the analysis to each of the four separate gene sets described above, they tend to belong to different areas of the distribution range. For instance, two groups of genes shared by Su(var)3-9 with SetDB1 are most active, while two groups of SetDB1-negative Su(var)3-9-positive genes – including those that are SetDB1-dependent – are less transcriptionally active (Fig. 5E).
As noted above, Su(var)3-9 targets in salivary gland chromosomes encompass both ubiquitously expressed and tissue-specific genes (Fig. 4C). Our analysis of Su(var)3-9 and SetDB1 co-shared genes separately from the SetDB1-negative subset indicates that the pattern shown in Fig. 4C falls into two contrasting categories. Namely, almost all ubiquitously active Su(var)3-9 targets are also bound by SetDB1, whereas tissue-specific genes are largely composed of the Su(var)3-9-only subset of targets (Fig. 5F). One can generalize that a typical SetDB1-binding site in the chromosomes of somatic cells is represented by an alternative TSS and 5′UTR of a moderately active, ubiquitously expressed gene. At the same time, euchromatic Su(var)3-9 binding site is probably best described as exonic or an alternative CDS of a silent or low-expressed tissue-specific gene. When the gene hosts both proteins, it likely fits the criteria of SetDB1-bound genes rather than a Su(var)3-9 target.
SetDB1 distribution is poorly correlated with H3K9 methylation
Comparison of SetDB1, Su(var)3-9 and H3K9me2 enrichment profiles uncovered several patterns based on their presence or absence, and functional dependence (Fig. S1). For instance, in some places the presence of Su(var)3-9 and H3K9me2 depends on SetDB1 (region I); other sites are described as lacking SetDB1, yet having H3K9me2 that is dependent on Su(var)3-9 (region III). The opposite situation when Su(var)3-9 is absent, and H3K9me2 depends on SetDB1 is referred to as region IV. Finally, sites positive for SetDB1 and negative for Su(var)3-9 and H3K9me2 are defined as region II.
In order to analyze the interplay between SetDB1 and H3K9me2/3 genome-wide, several approaches were pursued. We explored whether there is (1) overlap between SetDB1 peaks and methylation marks, (2) dependence of H3K9me2/3 enrichment values on significance of SetDB1 binding, and (3) the relationship between SetDB1 binding and large domains enriched for H3K9me2/3 modifications. The heat map of relative distribution of H3K9me2/3 and SetDB1 displays three major clusters (Fig. 2A). Cluster 1 has the strongest H3K9me2/3 enrichment, cluster 2 is weakly enriched, and cluster 3 completely lacks these modifications. SetDB1 binding sites that overlap with high H3K9me2/3 enrichment peaks are also bound by Su(var)3-9 and generally map to the chromosome 4 (Fig. 2B). Colocalization between the SetDB1 and H3K9me2/3 is notably imperfect, as the latter generally follows the distribution of Su(var)3-9 and is uniformly spread around SetDB1 peaks.
Next, we proceeded to explore whether the H3K9me2 and me3 enrichment level (Figueiredo et al., 2012) is in any way dependent on significance of SetDB1 binding in each GATC fragment of the genome (see Materials and Methods) and compared the results obtained with our previous data for Su(var)3-9 (Maksimov et al., 2018). Plots shown in Figs S2 and S3 illustrate that the regions lacking Su(var)3-9 are also devoid of H3K9me2/3. This trend holds true both genome wide and domain wide (euchromatin, heterochromatin and chromosome 4). In the case of SetDB1, the situation is drastically different. There is no correlation between H3K9me2 enrichment and SetDB1 binding throughout the genome and euchromatin – regardless of the SetDB1 presence or absence, the methylation level is overall quite low. In contrast, in heterochromatin and chromosome 4, H3K9me2 enrichment is very strong, yet it, again, poorly depends on the presence of SetDB1 (Fig. S2). The same is equally applicable to H3K9me3 mark (Fig. S3). The only region of the genome where positive trend between SetDB1 binding and H3K9me3 enrichment is observed is chromosome 4 (Fig. S3).
To assess how egg and Su(var)3-9 mutations may affect the H3K9 methylation pattern across large chromatin domains, hidden Markov model (HMM) analysis was performed. Using this approach, H3K9me3-enriched domains were identified in the salivary gland chromosomes of third instar larvae (Table S3). Downstream K-means clustering uncovered the relative distribution of H3K9me2 across these domains as well as changes in H3K9me2/3 profiles in egg and Su(var)3-9 mutant backgrounds (Fig. 6A). It must be emphasized that despite its large size, pericentric heterochromatin in each chromosome arm is a single domain, therefore it is represented by a single line in the figure. Consequently, the figure generally provides a bird's eye view of the situation in euchromatin. Genomic regions enriched for H3K9me2 in wild-type chromosomes display extensive overlap with H3K9me3 HMM domains, and comprise three main clusters of sequences. Su(var)3-9 mutation has little, if any, influence on H3K9me3, but profoundly reduces H3K9me2 levels in clusters A and B. These same clusters are also visible in egg mutants – H3K9me2/3 levels increase in cluster A and decrease in cluster B. Importantly, cluster A not only has higher methylation levels, but also covers a broader territory enriched in H3K9me2/3.
Next, we established the distributions of SetDB1 and Su(var)3-9 within these clusters. Specifically, we calculated the enrichment value of each GATC fragment within each HMM domain for SetDB1 or Su(var)3-9 (Fig. 6B). Cluster A has most of the Su(var)3-9, yet very little SetDB1. Upon Su(var)3-9 depletion, H3K9me2 levels drops, whereas SetDB1 depletion has an opposing effect. Cluster B is rich in both SetDB1 and Su(var)3-9, and depletion of either protein results in reduced H3K9me2. Cluster C displays a distribution of SetDB1 and Su(var)3-9 that is overall similar to that of the cluster A, the difference being that H3K9 methylation remains unchanged in the egg and Su(var)3-9 mutant backgrounds. Thus, SetDB1 and H3K9me2/3 display complex interactions. There are genomic regions where H3K9 methylation is defined by SetDB1 activity; however, there are also regions where methylation overlaps with SetDB1 yet is SetDB1 independent. Finally, multiple regions exist where SetDB1 binding to chromatin is not accompanied by H3K9 methylation.
SetDB1 and Su(var)3-9 are found in distinct chromatin partitions
Despite the equivocal correlations between SetDB1 and H3K9me2/3, a clear-cut connection between SetDB1 and H3K27me3 profiles was observed. The fragment of the chromosome arm 3L shown in Fig. 7A demonstrates that SetDB1 peaks are virtually invariably found between H3K27me3-rich domains (highlighted yellow). Whenever SetDB1 and H3K27me3 peaks appear overlapping (red arrow on Fig. 7A), a zoom-in view indicates that SetDB1 binding does not match H3K27me3 (Fig. 7B).
To identify the H3K27me3-rich regions, HMM analysis and previously published data for salivary gland chromosomes were used (Sher et al., 2012) (Table S3). The HMM domains identified match well with the H3K27me3 profile, yet may include small local gaps in H3K27 methylation that encompass standalone SetDB1 peaks (see the example in Fig. 8A, red arrows). Nonetheless, compared with the expected value, the observed frequency of SetDB1 binding within H3K27me3 domains is ∼9 times lower, and it is 1.6 times higher for the regions outside H3K27me3 domains (P<0.001, χ-squared test) (Fig. 7C,D). An opposite situation is observed for the Pc protein, used here as a control (Posukh et al., 2017), as it is found within H3K27me3 domains significantly more frequently than outside H3K27me3 domains (Fig. 7C,D). Su(var)3-9 is concentrated in pericentric heterochromatin and is underrepresented in euchromatin, where most of the H3K27me3-rich regions reside. Hence, Su(var)3-9 is generally absent from both H3K27me3-rich regions and the regions in between (Fig. 7C,D). The H3K27me3 enrichment level displays a negative correlation with the significance of SetDB1 binding (Fig. S4). Most of the H3K27me3 is found where SetDB1 is absent, and whenever SetDB1 appears, the H3K27me3 enrichment level immediately becomes negative.
SetDB1 localization outside H3K27me3 domains as well as close to TSSs is reminiscent of the binding pattern reported for the insulator protein CP190 (Bartkuhn et al., 2009; El-Sharnouby et al., 2017; Nègre et al., 2010; Stadler et al., 2017). We therefore asked whether SetDB1 and Su(var)3-9 binding patterns may overlap with CP190 in the salivary gland chromosomes. Indeed, CP190 and SetDB1 profiles appear to mirror each other (Fig. 7A,B) and are found in between H3K27me3 domains (Fig. 7C,D). Our analysis indicates that SetDB1 colocalizes with CP190 significantly more often than would be expected by chance, whereas Su(var)3-9 signals near CP190 peaks are essentially random (Fig. 7E). Further, we assessed whether SetDB1 binding sites may correspond with the two classes of insulators found in the Drosophila genome (Nègre et al., 2010). SetDB1 binds the chromatin of class I insulators (BEAF-32, CP190 and CTCF) 2.2 times more often than expected (P=1.67×10−269, binomial test), and it is found near the class II insulators [Su(Hw)] 2.1 times less frequently compared to the expectation (P=1.48×10−31, binomial test) (Table S4).
Insulators and insulator proteins are known to contribute to the three-dimensional organization of chromosomes, which is typically assessed by a number of chromosome conformation capture approaches. This analysis has been successfully applied to multiple organisms and was instrumental to the identification of genomic regions where DNA sequences physically interact with each other more frequently than with sequences outside them. In Drosophila, these regions have been described under different names. For instance, they have been referred to as active and silenced physical domains (Hou et al., 2012; Sexton et al., 2012), or inactive topologically associated domains (TADs), separated by short boundaries or longer transcriptionally active inter-TADs (Eagen et al., 2015; Stadler et al., 2017; Ulianov et al., 2016). Each of these types of chromosomal domains are associated with specific sets of proteins and histone modifications (such as H3K27me3 found in TADs), and domain boundaries may be decorated by insulator proteins, such as CP190 (Hou et al., 2012; Sexton et al., 2012; Stadler et al., 2017; Ulianov et al., 2016).
Taken together, the above data may argue for the possible contribution of SetDB1 in partitioning the chromatin fiber into TADs or physical domains. We analyzed the mutual positioning of SetDB1- and Su(var)3-9-bound regions relatively to TADs and inter-TADs in salivary gland chromosomes, based on the published dataset by Eagen et al., (2015). The pattern observed was very similar to the relative distribution of SetDB1 and Su(var)3-9 versus HMM-based domains of H3K27me3 enrichment, and there are significantly more SetDB1 peaks in inter-TADs compared to in TADs. It must be specified again, that TADs cover euchromatin and distal portions of pericentric heterochromatin, whereas most of the heterochromatic and chromosome 4 sequences are missing from the analysis. It is largely for this reason that Su(var)3-9 is generally absent from both TADs and inter-TADs (Fig. 7F).
We then proceeded to the comparison of SetDB1, Su(var)3-9 and CP190 distribution in the chromosomes of salivary glands versus the four types of physical domains in the chromosomes of fly embryos (Sexton et al., 2012). These included ‘null’ domains, which were not enriched for any available mark, transcriptionally ‘active’ domains, domains bound by PcG proteins and associated with H3K27me3 (‘PcG’), and domains bound by HP1 and Su(var)3-9 and associated with H3K9me2 (‘HP1’). We observed that active domains are significantly enriched and PcG domains are significantly depleted for SetDB1 and CP190 (P<0.001, χ-squared test). In turn, HP1 domains are enriched with Su(var)3-9, with the rest of domains being depleted for Su(var)3-9 (Fig. 7G).
Mutual distributions of SetDB1, Su(var)3-9, and CP190 relatively to TADs/inter-TADs or physical domains are indicative of the fact that these proteins fall into or contribute to the formation of distinct types of functional partitions that persist throughout cell division and differentiation. For instance, Su(var)3-9 participates in the formation of pericentric heterochromatin, which remains stable in both embryonic and larval chromosomes. SetDB1 and CP190 associate with ubiquitously expressed genes that in turn form inter-TADs in salivary gland chromosomes and active physical domains in the chromosomes from cells of fly embryos.
H3K27me3-rich domains are flanked by SetDB1-binding sites
Functional domains of chromatin are known to be flanked by border elements whose nature still appears controversial. Despite the widely held belief that insulator proteins and transcriptional activity are required for the formation of borders around H3K27me3-rich domains, neither of these two factors are universally present across all H3K27me3 borders nor do they seem to be absolutely indispensable for their establishment (El-Sharnouby et al., 2017; Li and Zhou, 2013; Schwartz et al., 2012; van Bortle et al., 2012). Visual analysis of SetDB1-binding profile prompted us to test the idea whether SetDB1 may serve as one of the factors that may contribute to the formation of border elements. SetDB1 is found exactly at the borders of H3K27me3 domain around the Antp gene (Fig. 8A), as well as in the local methylation ‘trough’ within the domain (Fig. 8A, red arrows). Notably, CP190 is only detectable at the proximal border of the domain (left part of the figure) and is absent from the distal border. CP190 binding sites are present within H3K27me3-rich domain (Fig. 8A, blue arrows), and most fall into positions of the known insulators (Nègre et al., 2010). Chromosome arm 3L encompasses two juxtaposed H3K27me3 domains around the dar1 and Awh loci (Fig. 8B). No SetDB1 or CP190 is present at the domain borders around dar1; in contrast both proteins are present around Awh, although their binding sites do not match exactly. By overlaying the transcription profile, one can see that SetDB1-binding peaks are found on the domain borders only when active transcription is also present (Fig. 8A,B).
In order to analyze the mutual distribution of SetDB1 and H3K27me3 domain borders genome wide, the coordinates of each of the H3K27me3 HMM domain borders were recorded and the nearest significant SetDB1 peak positions were identified. Next, the positions of domain borders, transcription levels and CP190 peaks were aligned to the above subset of SetDB1 peaks (Fig. 8C). As it follows from our analysis, half of the borders of HMM-based H3K27me3 domains (forming the cluster 1) harbor SetDB1-binding site less than 5 kb away, and a gene or genes that are actively expressed. The other half of domain borders (cluster 2) also encompasses active genes, but those are unrelated to SetDB1 peaks. Thus, SetDB1 is not obligatory for H3K27me3 border formation, yet when combined with active transcription, is present in about half of the borders. Importantly, it is on the border of the domain that transcription and SetDB1 binding are the strongest, and both factors become far less pronounced as one moves away from the H3K27me3 domain. This contrasts the situation observed for CP190, which lacks a clear distribution pattern around the domain border.
Much as in other multicellular organisms, fruit flies have several enzymes capable of methylating the K9 residue in the histone H3 tail. Each of these enzymes is uniquely specialized, and their depletion has distinct effects on the organism. Nonetheless, better understanding of the roles played by each of the H3K9-specific HMTases is needed. Previously, we established the distribution profiles for Su(var)3-9 HTMase in the chromosomes from various Drosophila tissues and performed in-depth analysis of the data obtained (Maksimov et al., 2018). Construction of the SetDB1 profile provided us with the opportunity to compare the distributions of the two enzymes and infer the functional niches they may occupy in the chromosomes of somatic cells.
The genome-wide distribution of SetDB1 in Drosophila was previously known based on the low-resolution immunostaining of larval polytene chromosomes. Despite minor discrepancies between the results from different groups, the consensus was that SetDB1 localizes to the chromosome 4 and multiple euchromatic sites, and that very little SetDB1 binding is detectable in the pericentric heterochromatin (Lundberg et al., 2013; Seum et al., 2007a; Stabell et al., 2006a). Immunostaining of ovarian somatic cell nuclei indicated that SetDB1 is not colocalized with the DAPI-positive heterochromatic granules (Osumi et al., 2019). In human cells, SetDB1 is found predominantly in euchromatic regions of interphase nuclei (Schultz et al., 2002). Thus, SetDB1 can be classified as an euchromatic protein, which is in line with our data. In contrast, Su(var)3-9 is largely concentrated in the pericentric heterochromatin (Maksimov et al., 2018; Schotta et al., 2002). Hence, given that the two enzymes recognize the same H3 residue, they have distinct activities as modifiers of position effect variegation and influence the transcription of heterochromatin-embedded genes. Su(var)3-9 mutations are strong suppressors of variegation (Tschiersch et al., 1994), whereas egg mutants (unless the reporter gene is integrated into chromosome 4) display weak, if any, suppressor activity (Seum et al., 2007a). An egg mutation has no effect on heterochromatic gene expression, whereas it is significantly reduced in Su(var)3-9 mutants. It is only in the context of chromosome 4, where both enzymes display strong enrichment and somewhat colocalize, that both Su(var)3-9 and egg mutations result in upregulated transcription (Lundberg et al., 2013).
Numerous studies have addressed the key role of SetDB1 in inactivation of transposable elements via piRNA-based mechanisms, which occur due to H3K9 methylation in piRNA clusters or in the very transposons, and is consistent with SetDB1 binding to repeated DNA sequences (Osumi et al., 2019; Rangan et al., 2011; Sienski et al., 2015). However, we show that SetDB1 does not bind to the repeats but, nonetheless, these data can be perfectly reconciled. Previously published studies used ovarian cells as a model, where piRNA system is operational. In germline cells H3K9 methylation, Su(var)3-9 binding and transcription of piRNA clusters are SetDB1 dependent (Maksimov and Koryakov, 2019; Rangan et al., 2011). Salivary glands, in contrast, are represented by the terminally differentiated somatic cells, where piRNAs are silent, and SetDB1 ‘ignores’ piRNA clusters, where H3K9 methylation is exclusively Su(var)3-9 dependent (Maksimov and Koryakov, 2019). This observation is further supported by the transcription analysis of transposable elements in second-instar larvae that are mutant for Su(var)3-9 or egg – whereas Su(var)3-9 significantly increased transposon expression levels, egg had no effect (Lundberg et al., 2013).
Both SetDB1 and Su(var)3-9 display binding to unique euchromatic genes. Previous microarray analysis of transcription in egg and Su(var)3-9 mutants indicated that most of the genes responding to these mutations are the same, which led the authors to conclude that SetDB1 and Su(var)3-9 display extensive overlapping functions (Lundberg et al., 2013). It must be noted that, a very soft criterion for differential expression was used in that study, and once increased the number of affected genes, as well as the number of common genes between the two datasets are strongly reduced. We report here that most of the Su(var)3-9 targets do not overlap with SetDB1-binding sites. Even if a gene is co-bound by both proteins, the binding positions are rarely exactly overlapping. Indeed, Su(var)3-9 tends to associate with exons and CDSs (Maksimov et al., 2018), whereas SetDB1 predominantly associates with 5′UTRs and TSSs. Interestingly, analogous data were reported for SetDB1 in mice, as significant SetDB1 enrichment was detected in the promoter regions (Bilodeau et al., 2009; Fei et al., 2015a).
Whereas Su(var)3-9 is a well-known repressor of euchromatic genes, the exact role of SetDB1 – activating or repressing – is presently unclear. Microarray analysis of egg mutants indicates that multiple genes are upregulated, however many genes are downregulated as well (Lundberg et al., 2013) (i.e. the expression changes are bi-directional). Using mammalian cell lines, artificial recruitment of SetDB1-containing protein complex was demonstrated to result in transgene inactivation (Ayyanathan et al., 2003; Li et al., 2006; Schultz et al., 2002). However, genome-wide analysis suggested that nearly half of the genes displaying SetDB1 binding in promoter regions are actively transcribed (Bilodeau et al., 2009). This result is in line with our data showing that ∼85% of SetDB1 targets in the salivary gland chromosomes are expressed.
Despite SetDB1 being originally described as an H3K9-specific HMTase, its link with H3K9 methylation is not straightforward. Mouse studies have established that 45.7% of SetDB1-binding sites did not have nearby H3K9me3 signal peaks (Fei et al., 2015a). As it follows from the fig. 1B published by Frietze et al. (2010), in human cells only 68% of SetDB1 binding sites overlap with H3K9me3 peaks. Similarly, our data indicate that SetDB1 binding is not invariably correlated with H3K9me2/3. This can be attributable to the observed localization of SetDB1 around TSSs that in turn display a very peculiar positioning of nucleosomes. A nucleosome-free region is typically found upstream of TSSs, and the ‘depth’ of reduced nucleosome coverage in this region is expression dependent; the more the gene is active, the lower is the chance of finding a nucleosome upstream of TSS (Martin et al., 2017; Mavrich et al., 2008). Clearly then, taking into account that most of SetDB1 targets are transcriptionally active, the peak of SetDB1 binding will coincide with the lowest probability of finding there a nucleosome, which is a substrate for methylation. This is indirectly supported by colocalization of SetDB1 and CP190, which is predominantly associated with TSSs of active genes, where nucleosome density is known to be profoundly reduced (Bartkuhn et al., 2009; El-Sharnouby et al., 2017; Nègre et al., 2010). The above factors agree well with the idea that SetDB1 may actually methylate not only histone H3, but also additional protein(s). In fact, HMTases have been described to methylate non-histone substrates (Lanouette et al., 2014; Moore and Gozani, 2014; Zhang et al., 2012). Specifically, SetDB1 has been reported to methylate HIV Tat protein (van Duyne et al., 2008), as well as human p53 (Fei et al., 2015b) and AKT (Guo et al., 2019; Wang et al., 2019).
In our work, SetDB1 has been found at about half of H3K27me3 domain borders, which is indicative of its participation in partitioning of chromatin into functional domains. The assumed role of SetDB1 in the formation of boundary elements may explain several of our findings. For instance, some of the H3K9me2/3-enriched regions tend to be SetDB1 negative, and in egg mutants, H3K9 methylation in such regions does not decrease, but increases, with domains becoming somewhat wider. Next, ∼40% of Su(var)3-9 targets are never shared with SetDB1, yet egg mutation results in the disappearance of Su(var)3-9 from such regions. Also, we previously described a small subset of genes that are not associated with Su(var)3-9 in wild-type animals, but become Su(var)3-9 targets in egg mutants (Maksimov and Koryakov, 2019). These facts can be explained by the indirect, rather than direct, effects of the egg mutation on histone methylation or protein binding, for instance, via repositioning of domain borders that is accompanied with domain merging or splitting. As a result, a fraction of genomic regions may experience a novel epigenetic environment. This interpretation is in agreement with the studies of TAD and SetDB1 interplay in the mammalian neurons. A subset of very large TADs in the nuclei of mouse neurons critically required the SetDB1-containing complex, and underwent structural disintegration after SetDB1 ablation (Jiang et al., 2017). Finally, deletion or overexpression of Setdb1 in mouse and human neurons was reported to alter higher-order chromatin organization (Bharadwaj et al., 2014).
To conclude, Su(var)3-9- and SetDB1-binding sites partially overlap in the chromosomes of somatic cells, yet these proteins appear to have distinct functions. Su(var)3-9 is a repressor that locates predominantly to the pericentric heterochromatin and is indispensable for the methylation of H3K9 in the nucleosomes associated with repeated DNA sequences. In the context of euchromatic genes, Su(var)3-9 may be required for fine-tuning transcription and alternative splicing, as discussed previously (Maksimov et al., 2018). SetDB1 is mostly present in euchromatin, where it appears to contribute to transcription initiation and formation of chromatin domain borders. We speculate that SetDB1 may have more functions than just H3K9 methylation, and be active against some non-histone substrates. Whereas Su(var)3-9 has the same function across different cell types and developmental stages, SetDB1 may have distinct role(s) in somatic cells compared to embryonic or germline cells. Indeed, SetDB1 was demonstrated to be required for the establishment of H3K9 methylation during the first rounds of embryo cell divisions (Seller et al., 2019), yet it has very little, and a largely indirect, effect on H3K9 methylation in differentiated somatic cells.
MATERIALS AND METHODS
DNA constructs and fly genetics
First, we cloned a genetic construct encompassing a cDNA of D. melanogaster egg fused in frame with the Dam DNA methyltransferase gene from E. coli and placed under the control of a minimal hsp70 promoter from the pUAST vector (Brand and Perrimon, 1993). A LoxP-flanked stop-cassette was strategically placed between the promoter and the SetDB1–Dam module to prevent leaky expression of the construct during Drosophila transformation and establishment of transgenic stocks. A detailed description of the stop-cassette was published previously in Maksimov et al. (2014). Using site-specific attP/attB phiC31-mediated recombination, this construct was integrated into a single attP18 landing site on the X chromosome. The fly stock lacking the stop-cassette, was generated by standard genetic crosses with the nanos-CRE flies (Laktionov et al., 2014) and selecting for appropriate progeny. As a background control, a fly stock bearing Dam module (rather than SetDB1–Dam) integrated at the same landing site was used (Maksimov et al., 2018).
A CP190 profile was obtained for salivary gland chromosomes isolated from third-instar male larvae as a part of the separate project that will be published elsewhere. All experimental procedures and constructs were similar to those described above for SetDB1, except for the landing site used for the integration of CP190–Dam and Dam sequences, which was attP40 on chromosome 2.
We followed the tissue-specific DamID protocol described in detail in Maksimov et al. (2016). Below we outline only the concept of the procedure. Transgenic animals used expressed either Dam, or the SetDB1–Dam or CP190–Dam (SetDB1/CP190–Dam) fusion protein. The Dam protein is of bacterial origin and lacks specific binding sites in the Drosophila genome. The SetDB1/CP190–Dam protein associates with the chromatin either specifically via the SetDB1/CP190 part, or non-specifically, via Dam. In both Dam and SetDB1/CP190–Dam, the Dam portion functions to methylate adenine nucleotides in the GATC sites nearest to the position where binding took place. In Drosophila, naturally occurring adenine methylation is present mainly in embryos, but it exists out of the GATC context (Zhang et al., 2015b). Hence, whenever GATC site is methylated at an adenine, this can have only resulted from nearby binding of either Dam or SetDB1/CP190–Dam. Thus, DNA methylation at GATC sites is used as a readout of protein distribution. Importantly, expression of Dam-only constructs serves to provide a control for non-specific Dam binding. Expression of SetDB1/CP190–Dam fusion protein produces a composite signal of specific SetDB1 or CP190 and non-specific Dam binding. Using a custom bioinformatic pipeline, the non-specific Dam signal can be ‘subtracted’ from the composite signal derived from SetDB1/CP190–Dam expression. Thus, whenever SetDB1 or CP190 binding is referred in the text, this is applied to the specific binding of the chimeric protein.
For SetDB1 DamID, salivary glands were isolated from female third-instar larvae; the CP190 DamID dataset was produced from male third-instar larvae. Flies and larvae were kept at all times at 18°C on standard fly food. Organs were excised and collected in PBS. For each sample, 25 salivary glands were collected; three biological replicate experiments were undertaken for the Dam and SetDB1–Dam samples, while two biological replicate experiments were undertaken for the Dam and CP190–Dam samples. Collected material was processed for genomic DNA isolation using phenol/chloroform extraction, followed by a special protocol for selective amplification of the fragments that are flanked by methylated GATC sites on both ends (GATC fragments). The DamID libraries obtained were sequenced using Illumina MiSeq platform according to the manufacturer's instructions. The reads were fed into the custom bioinformatic pipeline that filters out the DNA reads unrelated to the DamID procedure, and identifies the regions of specific SetDB1 or CP190 binding. As a result, a genome-wide distribution profile is constructed, which shows the significance values of SetDB1 or CP190 enrichment (above the x-axis) or Dam enrichment (below the x-axis) at each genomic GATC fragment [as a −log10(P-value) scale]. Black horizontal lines above the x-axis denote threshold values with a false discovery rate (FDR)<5%. Only the peaks exceeding the threshold value were used in all of our downstream analyses.
Total RNA from 15 salivary glands was isolated using TRIzol reagent (Invitrogen) according to the manufacturer's protocol; 500 ng of RNA was used as a starting material for poly(A)-RNA enrichment using a TruSeq RNA Sample Preparation Kit v2 (Illumina) and to construct RNA-seq libraries using the NEBNext Ultra II Directional RNA Library Prep Kit for Illumina (New England Biolabs). The Illumina MiSeq (75 bp paired-end reads) platform was used. Three technical replicates were run. Around 2.5 million paired-end reads were obtained for each sample. RNA-seq raw data were trimmed for quality using Trimmomatic SE and aligned to BDGP6 (Ensembl 95) using HISAT2 aligner (Kim et al., 2015). RNA coverage in 1 kb bins was obtained using bamCoverage tool from deepTools package (Ramírez et al., 2016). Reads aligned to genomic locations were counted using FeatureCounts tool from the Subread package (Liao et al., 2014). Signal normalization was performed in DESeq2 (Love et al., 2014).
Identification of histone methyl-mark enriched chromosomal domains
To identify the regions of prominent H3K9me3 and H3K27me3 enrichment, an approach based on hidden Markov model (HMM) was implemented. ChIP-chip data were log-converted [log2(IP/input)] and smoothed using the rolling mean algorithm (window size 200). The profiles obtained this way for each of the chromosomes were then processed as discrete time series for HMM analysis using R package depmixS4 (Visser and Speekenbrink, 2010). The genome was binned into three hidden states, which was visually the most optimal solution. Points of state transition have partitioned the genome into easily interpretable fragments. If the fragment had average value of log2(IP/input) above 0 for H3K27me3 and above 0.2 for H3K9me3, it was considered as enriched for that particular modification. Whenever the domains were found next to each other, they were merged together. Coordinates of H3K27me3-enriched domains were corrected. To do so, using computeMatrix and plotHeatmap functions from the deepTools2 suite (Ramírez et al., 2016), a heat map of H3K27me3 enrichment was plotted for the original domain borders. Next, each border was moved so that the transition points between the H3K27me3-enriched and depleted regions were found along the same line of the heat map.
The authors gratefully acknowledge the resources provided by the ‘Molecular and Cellular Biology’ core facility of the IMCB SB RAS.
Conceptualization: D.E.K.; Formal analysis: D.A.M., S.E.R., P.P.L., D.E.K.; Investigation: D.A.K., D.A.M., S.E.R., P.P.L., D.E.K.; Writing - original draft: D.E.K.; Writing - review & editing: D.E.K.; Supervision: D.E.K.; Project administration: D.E.K.; Funding acquisition: S.E.R., P.P.L., D.E.K. Authors D.A.K. and D.A.M. contributed equally to this work as first authors and are listed in alphabetical order.
This research was supported by the grants from the Russian Foundation for Basic Research #19-04-00352 (D.A.K., D.A.M., and D.E.K.), #19-34-90108 (S.E.R.), #20-34-70141 (P.P.L.), Russian Program for Basic Research #0310-2019-0003 (D.E.K.) and 0310-2019-0005 (D.A.K., D.A.M., S.E.R. and P.P.L.), a grant from the Ministry of Education and Science of Russian Federation #14.Y26.31.0024 (D.A.K., D.A.M., S.E.R., and P.P.L.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
The authors declare no competing or financial interests.