CTCF is a ubiquitous transcription factor that is involved in numerous, seemingly unrelated functions. These functions include, but are not limited to, positive or negative regulation of transcription, enhancer-blocking activities at developmentally regulated gene clusters and at imprinted loci, and X-chromosome inactivation. Here, we review recent data acquired with state-of-the-art technologies that illuminate possible mechanisms behind the diversity of CTCF functions. CTCF interacts with numerous protein partners, including cohesin, nucleophosmin, PARP1, Yy1 and RNA polymerase II. We propose that CTCF interacts with one or two different partners according to the biological context, applying the Roman principle of governance, `divide and rule' (divide et impera).
Introduction
CCCTC-binding factor (CTCF) is a ubiquitously expressed 11-zinc-finger vertebrate protein that binds to thousands of sites in the genome in a sequence-specific manner and performs myriad functions. Initially, CTCF was described as a transcriptional repressor of the Myc gene; later studies, however, recognized its involvement in very diverse functions, including enhancer blocking, X-chromosome inactivation, gene imprinting and promoter activation or repression (Fig. 1A) (for reviews, see Ohlsson et al., 2001; Gazner and Felsenfeld, 2006; Wallace and Felsenfeld, 2007; Filippova, 2008).
How can one ubiquitous protein perform so many functions, which are often seemingly unrelated? The answer might lie in the context-dependent interactions of CTCF with diverse protein partners (Fig. 1B,C), but what determines which partner is chosen for each occasion? At this point we do not have clear answers to these questions, but several possibilities can be considered. First, CTCF uses its 11 zinc fingers in a combinatorial way (Ohlsson et al., 2001) to recognize and bind to a variety of DNA sequences (see below). The discriminate usage of a subset of zinc fingers for DNA binding might create, out of the remaining fingers, specific platforms for interaction with other proteins. A second possible mechanism for the control of partner choice and affinity of the CTCF-partner interaction is the different post-translational modifications of the partner and/or of CTCF itself, which might be used under different cellular circumstances. At least one example has been reported in which post-translational modifications of CTCF affect its interaction with a partner protein - in this case, RNA polymerase II (Pol II) (Chernukhin et al., 2007) (see below for further details).
In this Commentary, we first discuss the key features of CTCF, including its DNA-binding specificity and its role in linking intra- and interchromosomal sites. We next focus our attention on several protein partners of CTCF that are known to have important cellular functions, or that have been very recently identified [CTCF partners that have been identified in proteomic analysis only, such as lamin A/C, importins, topoisomerase II (Topo II) and others (Yusufzai et al., 2004), will not be covered]. In doing so, we will attempt to disentangle the complex knot of CTCF interactions with other proteins, and to understand how these interactions determine the functions of this fascinating protein.
Key characteristics of CTCF
CTCF is a single polypeptide chain of 727 amino acid residues, the secondary structure of which can be subdivided into three distinct domains - an N-terminal region, a central domain containing 11 zinc fingers, and a C-terminal region (reviewed by Ohlsson et al., 2001). The protein sequence is highly conserved among birds and mammals, being 100% identical in the zinc-finger domain. The three domains contain sites for distinct post-translational modifications: the N-terminus is poly(ADP-ribosyl)ated (Yu et al., 2004), whereas the C-terminal domain contains several sites for phosphorylation by casein kinase 2 (Klenova et al., 2001; El-Kady and Klenova, 2005). A recent study reported that CTCF is also modified by SUMOylation (covalent addition of the small ubiquitin-like protein SUMO) at two sites in the polypeptide chain. This modification might contribute to the repressive function of CTCF on the Myc P2 promoter (MacPherson et al., 2009). The three distinct domains of CTCF also provide interaction platforms for various proteins (Fig. 1C), including CTCF itself (e.g. Pant et al., 2004; Yusufzai et al., 2004; Ling et al., 2006). The ability of CTCF to dimerize and/or multimerize might underpin its ability to link sites within and between chromosomes (looping and bridging, respectively) (Williams and Flavell, 2008; Zlatanova and Caiafa, 2009) (see also below).
The CTCF gene is cell-cycle-regulated, with its expression peaking at S-G2 phase (Klenova et al., 1998). CTCF is characterized by a relatively uniform nuclear distribution in interphase, with prominent binding sites at the periphery of the nucleolus. CTCF also binds to the nuclear matrix, a proteinaceous meshwork in the nucleus that stabilizes nuclear architecture and mechanically supports nuclear processes. This interaction indicates a possible functional connection between CTCF-dependent insulator elements and the nuclear matrix (Dunn et al., 2003). [Insulators are short, specific nucleotide sequences that collaborate with proteins to define boundaries between neighboring, but functionally distinct, genomic domains (Gaszner and Felsenfeld, 2006; Wallace and Felsenfeld, 2007).] The interactions of CTCF with the matrix, as well as with the nucleolus, might occur through the nuclear phosphoprotein nucleophosmin (Yusufzai and Felsenfeld, 2004; Yusufzai et al., 2004) (and see below). CTCF also associates with the centrosomes and the midbody at the end of mitosis, suggesting that it has non-nuclear functions, such as cell-cycle control (Zhang et al., 2004).
DNA-binding specificity and genome-wide distribution of CTCF
CTCF was originally described as a transcriptional repressor of the chicken, mouse and human Myc genes (Lobanenkov et al., 1990; Klenova et al., 1993; Filippova et al., 1996). Since then, CTCF-binding sites have been found in numerous genes, and binding of CTCF to these sites has been implicated in complex transcriptional regulation pathways. Early attempts to define a consensus CTCF-binding DNA sequence were unsuccessful and the diversity of identified binding sequences indicated that CTCF had an exceptional degree of flexibility in terms of binding-site recognition. This flexibility was attributed to combinatorial usage of the 11 zinc fingers in the central part of the molecule, and led to the description of CTCF as a `multivalent' transcription factor (Ohlsson et al., 2001). Recent genome-wide chromatin immunoprecipitation (ChIP) experiments utilized microarrays (ChIP-on-chip) (Kim et al., 2007) or Solexa sequencing technology (ChIP-seq) (Barski et al., 2007) to identify the DNA sequences immunoprecipitated by anti-CTCF antibodies. The studies identified ∼14,000 CTCF-binding sites in the human genome, which enabled the derivation of a ∼20 bp consensus CTCF-binding sequence (Kim et al., 2007); notably, however, 18% of the sites identified by ChIP experiments did not conform to the consensus sequence, in agreement with earlier observations of CTCF-binding sites on individual genes (Ohlsson et al., 2001). A very similar consensus sequence was simultaneously derived by purely computational approaches in a search for regulatory motifs in conserved non-coding elements in the human genome (Xie et al., 2007). The total number of identified CTCF-binding sites was close to 15,000.
Are there any characteristic features of CTCF distribution that can be gleaned from these genome-wide studies? The CTCF-binding sites correlate with genes but are not close to promoters (Kim et al., 2007; Xie et al., 2007). They often flank groups of genes that are transcriptionally co-regulated, suggesting that the majority of CTCF-binding sites function as insulators. Another recent study identified domains in the human genome that are associated with the nuclear-lamina structure and, more specifically, with lamin B (Guelen et al., 2008). These so-called lamina-associated domains (LADs), which have an average size of ∼550 kb, cover 40% of the genome and contain gene-poor regions in a repressive chromatin environment. Computational analysis indicated that 22% of LADs have CTCF-binding sites on one side, and 2% are flanked by two binding sites. The CTCF-binding sites center at 5-10 kb outside the LAD borders; however, these sites of CTCF accumulation do not coincide with the sites of promoter enrichment in these regions, in agreement with Kim et al. (Kim et al., 2007) and Xie et al. (Xie et al., 2007).
Linking intra- and interchromosomal sites
As has been discussed above, CTCF appears to be able to link discrete domains on the same or different chromosomes. An important series of studies has used various modifications of the chromosome conformation capture (3C) technique (Dekker et al., 2002) to identify chromatin regions that contact each other physically in the nucleus. Ling and co-workers (Ling et al., 2006) studied the transcriptional control of imprinted genes (only one of the two alleles of such genes is expressed in a given cell, and the expression is determined by the parental origin of the allele, i.e. the mother and the father alleles are differentially expressed). They found that the so-called imprinting control region (ICR) that borders and governs the expression of the maternal allele of the H19 gene (which encodes an RNA molecule of unknown function) specifically interacts with the paternal allele of an intergenic region between two other imprinted genes, Wsb1 (WD repeat and SOCS-box-containing protein 1) and Nf1 (neurofibromin). Notably, the interacting regions are located on two different mouse chromosomes: the H19 ICR on chromosome 7, and Wsb1 and Nf1 on chromosome 11. Importantly, the interaction is dependent on the presence of CTCF and intact CTCF-binding sites. A further study identified 114 unique sequences from all chromosomes that interact with the same H19 ICR region, with some preference for interchromosomal interactions (Zhao et al., 2006). Imprinted loci are highly represented among the interacting DNA regions, and the pattern of interactions changes during differentiation ex vivo (in embryonic stem cells) and in vivo (when comparing the embryoid body with the neonatal liver). Notably, the physical proximity of sites depends on intact CTCF target sites, implicating CTCF in mediating these interactions [for further examples and discussion, see Zlatanova and Caiafa (Zlatanova and Caiafa, 2009)].
Simonis and colleagues (Simonis et al., 2006) studied the β-globin gene locus in its transcriptionally active (fetal liver) and inactive (fetal brain) state. When active, the locus preferentially interacts with other transcribed loci, whereas the inactive locus prefers to partner with transcriptionally silent regions. This study did not directly address CTCF involvement in bringing genomic loci together; however, such an involvement is to be expected because CTCF is known to have a role in enhancer blocking in the β-globin gene clusters through a mechanism that involves loop formation, i.e. bringing distant DNA regions together (Bell et al., 1999; Farrell et al., 2002; Splinter et al., 2006).
CTCF-interacting proteins - a different partner for each occasion?
The number of proteins recognized to interact with CTCF under specific circumstances is growing steadily and will, undoubtedly, continue to grow. In general, CTCF partners can be divided into several functional groups (Fig. 1B). The group of DNA-binding proteins [transcription factors (activators and/or repressors depending on the context) and cofactors] includes, but is probably not limited to, Y-box-binding protein 1 (YB1) (Chernukhin et al., 2000), Yin and yang 1 (Yy1) (Fig. 2), Kaiso (Defossez et al., 2005), and regulatory factor X (RFX) and MHC class II transactivator (CIITA) (Majumder et al., 2006; Majumder et al., 2008). The second category of partners includes chromatin proteins (both structural proteins and enzymes). Table 1 provides a summary of the most important characteristics of each specific partner, and the main findings concerning the functional significance of the partnership, for the first two groups of interactors. A third group includes important multifunctional proteins, such as poly[ADP-ribose] polymerase 1 (PARP1), nucleophosmin and Topo II. Finally, there are other identified partners that do not belong to any of these groups and will be separately considered as `miscellaneous'. In the following subsections, we describe the interactions of CTCF with several partner proteins, and show how these give rise to distinct functions of CTCF. Please note, however, that these cases represent only a few examples of the CTCF-protein interactions that occur at specific genomic loci; in addition, the issue of whether CTCF actually recruits the partner protein in question to the site has not been addressed in most of the examples described.
Protein partner . | Function . | Main observation . | Reference . |
---|---|---|---|
DNA-binding proteins | |||
YB1 | Multifunctional DNA- and RNA-binding factor implicated in regulation of DNA replication, DNA repair, transcription and RNA processing; interacts with Yy1 | Co-immunoprecipitates with CTCF in vivo; interacts with CTCF zinc-finger domain; cooperates with CTCF in transcriptional repression of Myc | Chernukhin et al., 2000 |
CTCF interferes with the binding of YB1 to transcription control elements (variable-number tandem-repeat domains) in intron 2 of the gene encoding the serotonin transporter 5-HTT, which has been implicated in CNS-related disorders | Klenova et al., 2004 | ||
Yy1 | Zinc-finger transcription factor | Paired CTCF-Yy1 binding sites are highly clustered at the Tsix domain of the X-chromosome inactivation center (see text and Fig. 2 for details) | Donohoe et al., 2007 |
In transient co-transfection experiments, Yy1 specifically interacts with CTCF (mainly through the CTCF N-terminus) to transactivate Tsix (to a greater extent than either protein alone) | |||
Kaiso | Member of the pox-virus and zinc-finger (POZ) family of zinc-finger transcription factors, which are implicated in development and cancer; possesses dual specificity of DNA binding (binds to methylated CpGs or to the non-methylated sequence TGGCAGGA) | Binds to CTCF bait in yeast two-hybrid screen; interaction is through the CTCF C-domain; binds to the unmethylated consensus sequence close to the CTCF-binding site in the human 5′ β-globin insulator and reduces CTCF enhancer-blocking activity | Defossez et al., 2005; |
Replaces CTCF at the promoter of RB1, the gene encoding human retinoblastoma-associated protein (Rb), when the CTCF-binding site becomes methylated; binding of Kaiso results in transcriptional repression of RB1 | De La Rosa-Velázquez et al., 2007 | ||
RFX and CIITA | RFX is a transcription factor that binds to proximal promoters of all MHCII genes (and is required, but not sufficient, for expression); CIITA is a transcriptional co-activator that controls expression by recruiting chromatin remodelers and transcription factors | CTCF directly interacts with both RFX and CIITA, probably forming a trimeric complex; the complex is involved in loop formation between the promoters of the HLA-DRB1 and HLA-DQA1 genes and the intergenic element XL9 (which contains a CTCF-binding site) to allow expression of the genes | Majumder et al., 2006; |
Majumder et al., 2008 | |||
Chromatin proteins | |||
H2A and H2A.Z | Structural components of nucleosomes; H2A.Z is a non-allelic histone H2A variant that replaces H2A in nucleosomes at specific genome locations (Zlatanova and Thakar, 2008) | Identified as CTCF cofactors by CTCF-affinity chromatography followed by mass-spectrometry analysis | Yusufzai et al., 2004 |
Co-immunoprecipitate with CTCF in vivo | Guastafierro et al., 2008 | ||
Co-localize with CTCF genome-wide | Barski et al., 2008 | ||
CTCF positions 20 nucleosomes around H2A-binding sites (genome-wide); these nucleosomes are highly enriched for H2A.Z and 11 post-translational histone modifications | Fu et al., 2008 | ||
Suz12 | Essential component of polycomb repressor complex 2 (PRC2), which methylates histone H3 at lysine 27 | Binds specifically to the maternal allele of promoters P2 and P3 of the repressed Igf2 allele at the imprinted Igf2/H19 locus (H3K27 becomes methylated at the maternal allele); Suz12 directly interacts with CTCF both in vivo and in vitro | Li et al., 2008; Han et al., 2008 |
SIN3A | Transcriptional co-repressor | Binds to CTCF via the zinc-finger domain; recruits histone deacetylase activity | Lutz et al., 2000 |
CHD8 | Member of the chromodomain helicase family, which is implicated in chromatin assembly and control of gene expression | Binds to the CTCF zinc-finger domain used as bait in a yeast two-hybrid screen; associates with known CTCF-binding sites (H19 ICR, 5′ HS5 of the LCR of β-globin gene cluster, and the promoters of BRCA1 and Myc; knockdown of either CTCF or CHD8 results in loss of ICR insulator activity at luciferase reporter plasmids; CHD8 acts through CTCF at reporter plasmids and the endogenous ICR site; loss of CHD8 induces CpG hypermethylation and histone hypo-acetylation in the vicinity of CTCF-binding sites at BRCA1 and Myc promoters | Ishihara et al., 2006 |
Taf1/Set | Molecular chaperone; component of the INHAT complex that inhibits histone acetyltransferases | Identified as a CTCF cofactor by CTCF-affinity chromatography followed by mass-spectrometry analysis | Yusufzai et al., 2004 |
CP190 | Centrosome-binding protein that also binds to Drosophila polytene chromosomes; essential for viability but not required for cell division | CP 190-binding sites significantly overlap with those of CTCF in Drosophila; CP190 is required for proper CTCF binding to chromatin; CTCF localizes at the borders of interbands and bands on polytene chromosomes; CP190 directly interacts with CTCF in vivo | Mohan et al., 2007; |
Gerasimova et al., 2007 | |||
Cohesin | Four-subunit complex (Smc1, Smc3, Scc1 and Scc3) that forms a ring-like structure in sister-chromatid cohesion; implicated in proper chromosome segregation and homologous-recombination-dependent DNA-damage repair | Cohesin colocalizes with CTCF at: the control region of the major latency-associated transcript (LAT) gene of Kaposi sarcoma-associated herpesvirus (and dissociates upon lytic-cycle induction); ICR of imprinted mouse Igf2/H19 locus; and the Myc promoter | Stedman et al., 2008 |
Cohesin colocalizes with CTCF in the human genome; CTCF recruits cohesin to specific sites; cohesin is required for insulator function at H19 ICR and human β-globin locus at reporter plasmids; cohesin and CTCF are bound to the same (maternal) DNA molecules; controls transcription at the Igf2/H19 imprinted locus in both G1 and G2 cells (although cohesion does not occur in G1 cells) | Wendt et al., 2008 | ||
Cohesin colocalizes with CTCF in mammalian cells (conventional ChIP and ChiP-on-chip) (70% of all identified cohesin and CTCF sites are co-occupied by both proteins); CTCF recruits cohesin to specific sites; insulator function of cohesin on transfected insulator plasmid is lost by siRNA-mediated depletion of either CTCF or Rad21 | Parelho et al., 2008 | ||
Interacts with CTCF at the Myc insulator; recruitment of cohesin to chromosomal sites (Igf2/H19 and DM locus) depends on the presence of CTCF; colocalizes (within 1 kb) with CTCF in the human genome (ChIP-on-chip); some chromosomal sites interact exclusively with CTCF or cohesin | Rubio et al., 2008 |
Protein partner . | Function . | Main observation . | Reference . |
---|---|---|---|
DNA-binding proteins | |||
YB1 | Multifunctional DNA- and RNA-binding factor implicated in regulation of DNA replication, DNA repair, transcription and RNA processing; interacts with Yy1 | Co-immunoprecipitates with CTCF in vivo; interacts with CTCF zinc-finger domain; cooperates with CTCF in transcriptional repression of Myc | Chernukhin et al., 2000 |
CTCF interferes with the binding of YB1 to transcription control elements (variable-number tandem-repeat domains) in intron 2 of the gene encoding the serotonin transporter 5-HTT, which has been implicated in CNS-related disorders | Klenova et al., 2004 | ||
Yy1 | Zinc-finger transcription factor | Paired CTCF-Yy1 binding sites are highly clustered at the Tsix domain of the X-chromosome inactivation center (see text and Fig. 2 for details) | Donohoe et al., 2007 |
In transient co-transfection experiments, Yy1 specifically interacts with CTCF (mainly through the CTCF N-terminus) to transactivate Tsix (to a greater extent than either protein alone) | |||
Kaiso | Member of the pox-virus and zinc-finger (POZ) family of zinc-finger transcription factors, which are implicated in development and cancer; possesses dual specificity of DNA binding (binds to methylated CpGs or to the non-methylated sequence TGGCAGGA) | Binds to CTCF bait in yeast two-hybrid screen; interaction is through the CTCF C-domain; binds to the unmethylated consensus sequence close to the CTCF-binding site in the human 5′ β-globin insulator and reduces CTCF enhancer-blocking activity | Defossez et al., 2005; |
Replaces CTCF at the promoter of RB1, the gene encoding human retinoblastoma-associated protein (Rb), when the CTCF-binding site becomes methylated; binding of Kaiso results in transcriptional repression of RB1 | De La Rosa-Velázquez et al., 2007 | ||
RFX and CIITA | RFX is a transcription factor that binds to proximal promoters of all MHCII genes (and is required, but not sufficient, for expression); CIITA is a transcriptional co-activator that controls expression by recruiting chromatin remodelers and transcription factors | CTCF directly interacts with both RFX and CIITA, probably forming a trimeric complex; the complex is involved in loop formation between the promoters of the HLA-DRB1 and HLA-DQA1 genes and the intergenic element XL9 (which contains a CTCF-binding site) to allow expression of the genes | Majumder et al., 2006; |
Majumder et al., 2008 | |||
Chromatin proteins | |||
H2A and H2A.Z | Structural components of nucleosomes; H2A.Z is a non-allelic histone H2A variant that replaces H2A in nucleosomes at specific genome locations (Zlatanova and Thakar, 2008) | Identified as CTCF cofactors by CTCF-affinity chromatography followed by mass-spectrometry analysis | Yusufzai et al., 2004 |
Co-immunoprecipitate with CTCF in vivo | Guastafierro et al., 2008 | ||
Co-localize with CTCF genome-wide | Barski et al., 2008 | ||
CTCF positions 20 nucleosomes around H2A-binding sites (genome-wide); these nucleosomes are highly enriched for H2A.Z and 11 post-translational histone modifications | Fu et al., 2008 | ||
Suz12 | Essential component of polycomb repressor complex 2 (PRC2), which methylates histone H3 at lysine 27 | Binds specifically to the maternal allele of promoters P2 and P3 of the repressed Igf2 allele at the imprinted Igf2/H19 locus (H3K27 becomes methylated at the maternal allele); Suz12 directly interacts with CTCF both in vivo and in vitro | Li et al., 2008; Han et al., 2008 |
SIN3A | Transcriptional co-repressor | Binds to CTCF via the zinc-finger domain; recruits histone deacetylase activity | Lutz et al., 2000 |
CHD8 | Member of the chromodomain helicase family, which is implicated in chromatin assembly and control of gene expression | Binds to the CTCF zinc-finger domain used as bait in a yeast two-hybrid screen; associates with known CTCF-binding sites (H19 ICR, 5′ HS5 of the LCR of β-globin gene cluster, and the promoters of BRCA1 and Myc; knockdown of either CTCF or CHD8 results in loss of ICR insulator activity at luciferase reporter plasmids; CHD8 acts through CTCF at reporter plasmids and the endogenous ICR site; loss of CHD8 induces CpG hypermethylation and histone hypo-acetylation in the vicinity of CTCF-binding sites at BRCA1 and Myc promoters | Ishihara et al., 2006 |
Taf1/Set | Molecular chaperone; component of the INHAT complex that inhibits histone acetyltransferases | Identified as a CTCF cofactor by CTCF-affinity chromatography followed by mass-spectrometry analysis | Yusufzai et al., 2004 |
CP190 | Centrosome-binding protein that also binds to Drosophila polytene chromosomes; essential for viability but not required for cell division | CP 190-binding sites significantly overlap with those of CTCF in Drosophila; CP190 is required for proper CTCF binding to chromatin; CTCF localizes at the borders of interbands and bands on polytene chromosomes; CP190 directly interacts with CTCF in vivo | Mohan et al., 2007; |
Gerasimova et al., 2007 | |||
Cohesin | Four-subunit complex (Smc1, Smc3, Scc1 and Scc3) that forms a ring-like structure in sister-chromatid cohesion; implicated in proper chromosome segregation and homologous-recombination-dependent DNA-damage repair | Cohesin colocalizes with CTCF at: the control region of the major latency-associated transcript (LAT) gene of Kaposi sarcoma-associated herpesvirus (and dissociates upon lytic-cycle induction); ICR of imprinted mouse Igf2/H19 locus; and the Myc promoter | Stedman et al., 2008 |
Cohesin colocalizes with CTCF in the human genome; CTCF recruits cohesin to specific sites; cohesin is required for insulator function at H19 ICR and human β-globin locus at reporter plasmids; cohesin and CTCF are bound to the same (maternal) DNA molecules; controls transcription at the Igf2/H19 imprinted locus in both G1 and G2 cells (although cohesion does not occur in G1 cells) | Wendt et al., 2008 | ||
Cohesin colocalizes with CTCF in mammalian cells (conventional ChIP and ChiP-on-chip) (70% of all identified cohesin and CTCF sites are co-occupied by both proteins); CTCF recruits cohesin to specific sites; insulator function of cohesin on transfected insulator plasmid is lost by siRNA-mediated depletion of either CTCF or Rad21 | Parelho et al., 2008 | ||
Interacts with CTCF at the Myc insulator; recruitment of cohesin to chromosomal sites (Igf2/H19 and DM locus) depends on the presence of CTCF; colocalizes (within 1 kb) with CTCF in the human genome (ChIP-on-chip); some chromosomal sites interact exclusively with CTCF or cohesin | Rubio et al., 2008 |
Yy1 is a CTCF partner with a role in X-chromosome inactivation
Yy1 is a ubiquitous four-zinc-finger transcription factor that has been implicated in biological processes such as embryogenesis, differentiation, cell proliferation and tumorigenesis (Gordon et al., 2006). Homozygous Yy1 mouse mutants die early in development, whereas heterozygous animals are characterized by severe growth retardation and neurological defects (Gordon et al., 2006). It has been hypothesized that overexpression and/or activation of Yy1 are linked to loss of control of cell proliferation, although the molecular mechanisms remain elusive. Among the numerous potential mechanisms are effects on p53 expression and/or activity (Gordon et al., 2006) and stimulation of PARP1 activity (Griesenbeck et al., 1999). PARP1 stimulation might be of special interest, because PARP1 has been identified as a CTCF interaction partner (Yusufzai et al., 2004) and poly(ADP-ribosyl)ated forms of CTCF have been implicated in the control of transcription of imprinted genes and ribosomal DNA (Yu et al., 2004; Torrano et al., 2006; Caiafa and Zlatanova, 2009) (see below). Vertebrate Yy1 has also been implicated in polycomb group (PcG)-mediated functions because it can repress transcription in Drosophila and functionally compensates for loss of its Drosophila homologue, PHO (Atchison et al., 2003; Wilkinson et al., 2006). Yy1 recruits the PcG complex to DNA, resulting in methylation of histone H3K27 (Wilkinson et al., 2006); the introduction of methyl groups onto Lys27 in the tail of histone H3 is thought be a mechanism through which PcG proteins repress expression of genes involved in embryonic development.
Yy1 has been recently identified as a CTCF cofactor that has a role in X-chromosome inactivation. Although the mechanism still remains unclear, it is worth noting that another CTCF partner, histone variant H2A.Z, has been also implicated in the inactivation process (Donohoe et al., 2007) (Fig. 2). In mammals, gene-dosage compensation between females (XX) and males (XY) occurs through a random inactivation of one of the two female X chromosomes. The inactivation process is complex and occurs through at least three genetically separable stages: (1) `counting' of the X-chromosome-to-autosome ratio to ensure the inactivation of only one of the two X chromosomes; (2) `choice' of the chromosome to be inactivated; and (3) the actual inactivation process, which is initiated by coating the designated inactive chromosome with the non-coding Xist RNA (Avner and Heard, 2001; Clerc and Avner, 2006; Erwin and Lee, 2008). CTCF has been implicated in the initial pairing of the two X chromosomes through their X-inactivation centers (Avner and Heard, 2001; Clerc and Avner, 2006; Erwin and Lee, 2008), in the `choice' decision (e.g. Xu et al., 2007), and in the inactivation process itself (Pugacheva et al., 2005). CTCF is also involved in the function of boundary (insulator) elements that separate inactivated genes from rare `escapee' genes that remain transcriptionally active in the context of the inactive X chromosome (Filippova et al., 2005). The interactions of Yy1 and CTCF are described in more detail in Table 1.
Next, we describe the role of CTCF in X-chromosome inactivation in more detail. The physical map of the region that specifies the sequences of the three non-coding RNAs involved in the inactivation process is presented in Fig. 2. On the future active X chromosome, Xite (X-inactivation intergenic transcription element) prolongs the antisense transcription of Tsix [X (inactive)-specific transcript, antisense], which in turn blocks transcription of Xist [X (inactive)-specific transcript] (Fig. 2); both CTCF and Yy1 transactivate Tsix. On the future inactive X chromosome, repression of Xite downregulates Tsix transcription, which in turn induces Xist transcription to initiate the inactivation process. In mouse cells, the Xist-Tsix region is characterized by the presence of ∼40 potential CTCF-binding sites, which are frequently paired with binding sites for Yy1 (Donohoe et al., 2007) (Fig. 2). CTCF directly interacts with Yy1, as shown in co-immunoprecipitation experiments; the high-affinity interaction between the two proteins involves mainly the N-terminus of CTCF (Donohoe et al., 2007). Finally, transient cotransfection experiments indicate that CTCF and Yy1 together confer higher transactivation on Tsix than either protein alone (Donohoe et al., 2007). The physical and functional interaction of CTCF with Yy1 during X-chromosome inactivation provides a clear example of how a specific function of CTCF is mediated by a specific protein partner.
Cohesin partners CTCF in gene regulation
The cohesin complex has a central role in holding the two sister chromatids in close contact from the time of DNA replication in S phase to the time of their separation at the onset of mitotic anaphase (reviewed by Hirano and Hirano, 2006; Hirano, 2006). Cohesin function is essential for genome stability and repair; several human developmental disorders, such as Cornelia de Lange syndrome and Robert's syndrome, are associated with mutations in cohesin components or the machinery that loads cohesin on chromatids.
The cohesin complex comprises four subunits; Smc1 and Smc3 are members of the structural maintenance of chromosomes (SMC) protein family, whereas Scc1 and Scc3 (subunit of the cohesin complex 1 and 3) are thought to participate in the formation of a ring structure around the two chromatids (Fig. 3; and see below). Two other non-SMC proteins, Scc2 and Scc4, are required in mammals to load the cohesin complex onto DNA. SMC proteins are large polypeptides of very unusual three-dimensional organization, in which two long α-helices fold back on themselves in an antiparallel orientation to form a rigid coiled-coil domain that has a hinge domain at one end and an ATP-binding `head' domain at the other (Fig. 3A). Two SMC monomers dimerize at their hinge region to produce long V-shaped molecules. These dimers can form several alternative structures - rings, filaments and rosettes - through intra- and intermolecular interactions. The cohesins are proposed to form ring structures around the two sister chromatids (Haering et al., 2002).
Recently, a cohesion-independent function of cohesins has been recognized in yeast, Drosophila and mammals: they have been detected in post-mitotic cells that lack chromatid cohesion and have been implicated in gene regulation (for reviews, see Göndör and Ohlsson, 2008; Peric-Hupkes and van Steensel, 2008; Uhlmann, 2008; Gause et al., 2008). Four recent papers have reported a strong functional connection between cohesins and CTCF (Table 1). First, cohesin proteins and CTCF colocalize both at specific loci (Stedman et al., 2008; Rubio et al., 2008), including the Myc insulator element (MINE) (Gombert et al., 2003) (Fig. 3B) and genome-wide (Parelho et al., 2008; Rubio et al., 2008; Wendt et al., 2008). Second, CTCF recruits cohesin to specific sites, including the DM1 locus, which has a CTG repeat that is expanded in individuals with myotonic dystrophy (Fig. 3C) (Rubio et al., 2008; Cho et al., 2005). Third, in transient transfection experiments, the activity of insulator elements depends on the presence of cohesin proteins (Parelho et al., 2008; Wendt et al., 2008) (such effects have yet to be demonstrated on endogenous sites).
These studies, exciting as they are, raise a plethora of important questions. For example, what are the molecular interactions that are responsible for the colocalization of CTCF and cohesin? Despite the fact that ∼70% of all sites identified as CTCF- and cohesin-binding sites bind to both proteins (Parelho et al., 2008), it is clear that there are sites occupied exclusively by CTCF or cohesin (Rubio et al., 2008). Moreover, downregulation of CTCF does not interfere with mitosis (Parelho et al., 2008; Wendt et al., 2008), suggesting that the cohesion function of cohesin is independent of CTCF. A second question is whether the structure of cohesin is different at CTCF-dependent and CTCF-independent binding sites. Fluorescence recovery after photobleaching (FRAP) experiments suggest that this might be the case; they indicate the existence of two pools of cohesin at interphase (an immobile fraction that is irreversibly bound to chromatin and a dynamic fraction) (Gerlich et al., 2006). The existence of the two distinct cohesin pools is consistent with available biochemical data (Hirano and Hirano, 2006), which suggest the existence of two forms of chromatin-bound cohesin: the ring form that embraces two DNA helices tightly and steadily without interacting directly with DNA, and a less tightly bound form that interacts with DNA in a more conventional manner. The second structure might require other DNA-binding proteins, such as CTCF. We propose that the ring structure is involved in cohesion, whereas the conventional structure participates in gene regulation. Whether long-range chromosomal interactions (loops) are involved in gene regulation through CTCF and cohesin also remains to be directly addressed.
Thus, the interactions between CTCF and cohesin provide another important example of how different CTCF partners may underlie distinct CTCF functions. The cohesin complex should clearly be considered as an interaction partner that mediates the involvement of CTCF in gene regulation.
PARP1 partners CTCF in DNA methylation
Poly(ADP-ribose) polymerases (PARPs) are enzymes that catalyze the formation of poly(ADP-ribose) chains (PARs) on chromatin proteins, including themselves (D'Amours et al., 1999; Schreiber et al., 2006; Kraus, 2008). PARPs use the coenzyme NAD+ as a source of ADP-ribose moieties to synthesize protein-bound polymers of variable size (ranging from 2 to more than 200 units) and structural complexity (linear or branched); these polymers introduce negative charges onto the acceptor proteins, thus affecting their interactions with DNA and/or other proteins. The intracellular levels of PARs are under tight control; this involves dynamic formation of polymers by members of the PARP family (Ame et al., 2004) and their removal by poly(ADP-ribose) glycohydrolase (PARG) (Bonicalzi et al., 2005; Caiafa et al., 2008).
Heteromodification and automodification are the two processes through which PARPs introduce covalently bound ADP-ribose polymers onto other proteins or onto themselves, respectively. Automodification of PARPs is generally activated by nicks on DNA. PAR polymers on PARP1, which are attached at up to 28 sites in the automodification domain, are usually very long (up to 200 ADP-ribose units) and heavily branched (Juarez-Salinas et al., 1982). In addition, PARs (both protein-free and covalently linked to proteins) are capable of strong non-covalent binding (Malanga and Althaus, 2005) to specific proteins, the activity of which is then modulated by the bound polymers.
A PARP has been identified among the partners of CTCF in a proteomic search carried out on purified CTCF complexes (Yusufzai et al., 2004). Yu and colleagues (Yu et al., 2004) demonstrated that CTCF undergoes covalent poly(ADP-ribosyl)ation in the N-terminal domain. These authors found that the control of gene imprinting by CTCF is lost upon inhibition of PARP activity, and therefore suggested that PARylated CTCF is directly involved in the control of imprinting. PARylated CTCF has also been implicated in the control of ribosomal gene expression (Torrano et al., 2006; Caiafa and Zlatanova, 2009). Importantly, it has been recently shown that transient ectopic overexpression of CTCF induces PAR accumulation, PARP1 expression and PARylation of CTCF (Guastafierro et al., 2008). In vitro data from this paper have shown that CTCF can activate automodification of PARP1, even in the absence of nicked DNA; this finding is of great interest, because so far a burst of PARylation of PARP1 has generally been found only following introduction of DNA strand breaks. The persistence of high PAR levels over time affects the DNA methylation machinery: DNA-methyltransferase activity is inhibited, with the consequence that the genome becomes diffusely hypomethylated (Caiafa et al., 2008). Thus, the data of Guastafierro and co-workers (Guastafierro et al., 2008) provide, for the first time, evidence that CTCF is involved in the crosstalk between PARylation and DNA methylation, through its activation of PARP1 (which, in turn, leads to inhibition of DNA methylation) (Reale et al., 2005).
Nucleophosmin is a CTCF partner at insulator sites
Nucleophosmin is an abundant nuclear-matrix phosphoprotein, a large fraction of which is localized to the peripheral region of the nucleolus. It has been implicated in embryonic development and maintenance of genomic stability, mainly through its role in centrosome duplication (Grisendi et al., 2005). At the molecular level, nucleophosmin mediates diverse functions, including rDNA transcription, pre-ribosomal RNA processing, mRNA polyadenylation, and the stress response. It also participates in transport functions, chaperoning ribosomal subunits and/or histones from the cytoplasm to the nucleus and nucleoli. A recent study of the role of nucleophosmin in transcriptional regulation of rDNA has indicated that nucleophosmin is associated with the gene locus, maintaining an open chromatin conformation over the active copies of the rRNA genes by removing histones from the promoter (Murano et al., 2008).
Nucleophosmin was identified as a CTCF partner in a proteomic search (Yusufzai et al., 2004), and was the only protein in the soluble CTCF complex that was present in stoichiometric amounts. ChIP analysis of the two known insulator sites that flank the chicken β-globin gene locus confirmed the presence of CTCF at these sites. Remarkably, nucleophosmin was also present at both sites (Fig. 4) (Yusufzai et al., 2004). In human cell lines carrying multiple integrated copies of the chicken HS4 insulator (one of the insulators upstream of the β-globin gene locus), the insulator sites were preferentially localized to the nuclear periphery. As in the case of the endogenous insulator sites at the β-globin gene locus (see above), CTCF colocalized with nucleophosmin at these integrated insulator sites; importantly, the peripheral nucleolar localization of insulator sites was dependent on the integrity of CTCF-binding sites. Thus, it was suggested that insulators are recruited to the periphery of the nucleolus through the strong interaction of CTCF with nucleophosmin (Yusufzai et al., 2004). It should be noted that these data concern only the relatively small portion of CTCF that is located in the nucleolus; a large fraction of CTCF is not bound to the nucleolus, and might not be associated with nucleophosmin (Yusufzai et al., 2004).
Finally, a recent study focused on chromosome translocations involving the immunoglobulin heavy chain (IgH) gene locus in certain cancer cells (Liu et al., 2008). Interestingly, CTCF and nucleophosmin colocalized at the 3′ regulatory elements of the IgH gene locus only in cells carrying the chromosome translocation; moreover, the cells could be growth arrested by nucleophosmin short hairpin RNA. The exact molecular mechanism behind these observations awaits further research.
The studies described here provide evidence that the insulator function of CTCF is mediated through its specific tethering to subnuclear sites through its interactions with nucleophosmin. Thus, the insulator function of CTCF - similar to its functions in X-chromosome inactivation, gene regulation and DNA methylation - might require its interaction with a partner protein specific to that function.
Is RNA polymerase II a CTCF partner in transcriptional regulation?
The function of CTCF in transcriptional regulation is not well understood. However, a recent report has identified direct interactions between CTCF and the large subunit of Pol II (Chernukhin et al., 2007); we will discuss this paper in detail, as it contains data of potential relevance to the role of CTCF in transcriptional regulation.
In vitro, CTCF interacts equally well with the hypophosphorylated and the hyperphosphorylated forms of Pol II, which are known to be involved in transcription initiation and elongation, respectively (Chernukhin et al., 2007). In vivo, however, CTCF exhibits a significant preference for interaction with the hypophosphorylated Pol II form. This interaction is mediated by the C-terminal domain of CTCF (Fig. 1C), which contains the sites for phosphorylation of CTCF (Klenova et al., 2001; El-Kady and Klenova, 2005). Preliminary data (Chernukhin et al., 2007) indicate that in-vitro-phosphorylated CTCF has a lower affinity for Pol II, suggesting that the CTCF-Pol-II interaction might be subject to regulation by CTCF phosphorylation.
In an attempt to gain insight into the functional significance of the reported CTCF-Pol-II interaction, serial ChIP analysis (using anti-CTCF antibodies, followed by anti-Pol II antibodies, as bait) was used to interrogate the in vivo presence of the CTCF-Pol-II complex on the β-globin insulator (see above) (Chernukhin et al., 2007). Interestingly, CTCF colocalizes with Pol II at the insulator only in proliferating chicken erythroblasts that do not express the globin genes. In differentiated cells that transcribe two of the four globin genes in the cluster, the association of both proteins with the insulator is lost. The mechanisms behind these events remain to be determined. Further experiments in human choriocarcinoma cells transfected with wild-type or mutated H19 ICR (see above) demonstrated that the binding of Pol II to the ICR requires functional CTCF target sites. Finally, a single CTCF-binding site fused to a promoterless luciferase reporter gene conferred transcriptional activity on the gene in stably integrated constructs. This observation suggested that CTCF is a functional equivalent of TATA-box-binding protein (TBP), and thereby allows accurate transcription initiation at some promoters. This is certainly an interesting notion that deserves to be directly addressed in further experiments.
ChIP-on-chip experiments using a previously constructed library of CTCF-binding sites from mouse fetal liver (Mukhopadhyay et al., 2004) were used to identify sites that are co-occupied by CTCF and Pol II in proliferating and resting NIH 3T3 cells (Chernukhin et al., 2007). Only about 10% of the CTCF sites represented on the microarray interacted with Pol II. Of note, 15 out of the 26 sequences that bound to both CTCF and Pol II were not present in the mouse genome database, which contains almost exclusively euchromatic sequences. Thus, CTCF-Pol-II binding probably also occurs at heterochromatic sequences. Finally, the protein complex was also identified in intergenic regions that are 1.5-15 kb from the nearest gene. Chernukhin and colleagues (Chernukhin et al., 2007) suggest that the CTCF-Pol-II complexes at these sites remain intact until a signal for the release of Pol II is received; the released Pol II then initiates transcription of the neighboring genes from cryptic promoters.
An earlier study that is relevant to Pol-II- and CTCF-mediated insulator function showed that the presence of the chicken insulator HS4 on chromatinized episomes (ectopic, unintegrated DNA constructs that acquire characteristics of chromatin organization in the host cell) in human cells leads to accumulation of Pol II at the enhancer in the β-globin gene locus control region (Zhao and Dean, 2004). This suggested that, as part of its insulator function, CTCF blocks the transfer of Pol II from the enhancer to the promoter. Whether and how these observations relate to the more recent data (Chernukhin et al., 2007) remains to be seen.
More recently, a possible link between CTCF binding and Pol II occupancy was revealed in a genome-wide study (Barski et al., 2007), in which a tantalizing high-resolution profiling of histone methylation patterns in the human genome was undertaken. In addition to mapping 20 histone lysine and arginine methylations, the authors addressed the genome-wide localization patterns of Pol II, histone H2A.Z (see Table 1) and CTCF. Out of the ∼20,000 CTCF-binding sites, more than 6000 were in transcribed regions. Unfortunately, the CTCF sites that lie close to Pol II sites were excluded from further analysis to avoid complications in the interpretation of the methylation data, which was the main objective of that study.
The picture that emerges from the study by Chernukhin and colleagues (Chernukhin et al., 2007) is extremely complex; the authors suggest several possible functions of the CTCF-Pol-II complex that are context dependent. It is clear that numerous new questions (concerning the mechanism of a possible TBP-like function for CTCF, the presence and distribution of CTCF-Pol-II complexes at different genomic regions, etc.) arise from this study, and that significant experimental effort will be required to address them.
The CTCF-partner network
Above, we have presented and discussed evidence that connects CTCF with individual protein partners, particularly Yy1, cohesin, PARP1, nucleophosmin and Pol II. We have pointed out that the interactions of CTCF with each protein partner occur in a specific biological context. However, it has not escaped our attention that some of the partners are known to interact with each other, thus creating a rather complex network (Fig. 5). For example, nucleophosmin is a recognized partner of PARP1 (Meder et al., 2005), and PARP1 interacts with Yy1 (Oei and Shi, 2001a; Oei and Shi, 2001b). In addition, Yy1 directly interacts with another recognized CTCF partner, YB1 (Chernukhin et al., 2000; Li et al., 1997). It is clear that more research is needed to identify the possible protein interactions in the CTCF network, and to understand the biological contexts in which they work.
Concluding remarks
The data discussed in this Commentary show that CTCF possesses extreme flexibility, not only in terms of the diversity of its binding sites but also with respect to its numerous binding partners. It seems that CTCF performs its numerous functions by using different binding partners in different biological contexts. Two points deserve special mention. First, even with one and the same partner, CTCF is obviously performing a multiplicity of (sometimes seemingly antagonistic) functions. The CTCF-Pol-II interaction might provide a good example of such functional diversity, because such complexes might perform different functions depending on whether they are located in euchromatin or heterochromatic regions. Second, the various partners seem to interact with each other directly or indirectly, which is likely to contribute to the fine-tuning of CTCF function (Fig. 5). There is no doubt that new CTCF protein partners will be identified in the future; they will probably endow CTCF with distinct functions in distinct biological contexts, as the ones that are already recognized appear to do. Will we ever be able to understand this complexity? Is `divide and rule' the key to success in nature, as well as in society?
J.Z. is supported in part by NSF grant 0504239; P.C. is partially financed by Ministero della Salute, Italy.