Methylation of DNA in mammalian cells serves to demarcate functionally specialized regions of the genome and is strongly associated with transcriptional repression. A highly conserved family of DNA-binding proteins characterized by a common sequence motif is widely believed to convert the information represented by methylation patterns into the appropriate functional state. This family, the MBD family, has been characterized at both the biochemical and genetic levels. A key issue, given their highly similar DNA-binding surfaces, is whether the individual MBD proteins bind differentially to distinct regions within the genome and, if so, by what mechanism. Somewhat surprisingly, some MBD family members, such as MeCP2, have considerable selectivity for specific sequences. Other family members, such as MBD2, appear to bind with somewhat relaxed specificity to methylated DNA. Recent genetic and molecular experiments have shed considerable light on these and other issues relevant to the chromosomal biology of this interesting protein family.
The MBD family
Eukaryotic chromosomes are the repositories of the genetic information necessary to direct the synthesis of cellular components. Information is embedded in them at multiple levels. In addition to the genetic information contained in the sequence of nucleotide bases, chromosomes contain an `epigenetic code' that provides information crucial to regulation of the DNA itself. One component of this results from a system that covalently modifies cytosine residues by methylation at the 5 position of the pyrimidine ring (Bestor, 1990; Bird and Wolffe, 1999; Jaenisch and Bird, 2003). In almost all cases, mammalian DNA methylation occurs solely within the context of a simple palindromic sequence, CG, in which both cytosine residues are methylated. The methyl groups protrude into the major groove of DNA, providing novel functional moieties available for molecular interactions within this key surface of the double helix. The methylated fraction of the genome includes such interesting loci as imprinted genes, the inactive X chromosome, and transposable elements and their relics. These regions are strongly repressed and DNA methylation is believed to play an integral role in establishment and/or maintenance of this repression.
The DNA methylation pattern is believed to be `read' by a conserved family of proteins, the MBD family (Jaenisch and Bird, 2003; Wade, 2001a). These proteins share a common motif, the methyl CpG binding domain (MBD) (Hendrich and Bird, 1998). This was initially identified in MeCP2 more than a decade ago (Nan et al., 1993). The structures of MBD motifs from three different MBD proteins have been solved and their overall similarity indicates that all MBD-containing proteins are likely to adopt a similar fold (Heitmann et al., 2003; Ohki et al., 1999; Wakefield et al., 1999). The MBD forms a wedge-shaped structure (Fig. 1) composed of a β-sheet superimposed over an α-helix and loop. Amino acid side chains in two of the β-strands along with residues immediately N-terminal to the α-helix interact with the cytosine methyl groups within the major groove, providing the structural basis for selective recognition of methylated CpG dinucleotides (Ohki et al., 2001; Wade and Wolffe, 2001).
The MBD family has five known members in mammals (Hendrich and Tweedie, 2003). MeCP2 was identified in the early 1990s biochemically (Lewis et al., 1992). Subsequently, information gleaned from EST and genomic sequencing projects led to the identification of four additional proteins (Cross et al., 1997; Hendrich and Bird, 1998). The primary structures of these proteins bear little resemblance to each other outside the MBD motif (Fig. 2). An exception to this general rule is the case of MBD2 and MBD3, which have substantial sequence similarity. Interestingly, mammalian MBD3, unlike its amphibian counterpart, fails to selectively recognize methylated DNA owing to substitution of a critical tyrosine residue within the MBD motif with phenylalanine (Fraga et al., 2003). Four members of the MBD family are believed to function, at least in part, in transcriptional repression (Bird and Wolffe, 1999; Hendrich and Tweedie, 2003; Wade, 2001b). The fifth MBD protein, MBD4, has DNA N-glycosylase enzymatic activity and probably functions in DNA repair (Hendrich et al., 1999). In most cases, the MBD proteins are expressed ubiquitously (Hendrich and Bird, 1998; Meehan et al., 1992).
The MBD family represents an important class of chromosomal protein. They associate with protein partners that play active roles in transcriptional repression and/or heterochromatin formation. Indeed, the maintenance of transcriptional silence in the methylated fraction of the genome appears to be an issue of fundamental importance to mammals. While the general properties of MBD proteins firmly tie the family to transcriptional repression, important question regarding their biological action remain incompletely described.
Genomic distribution of binding sites - random or specific?
In the absence of detailed biochemical characterization of all the determinants influencing the association of MBD proteins with genomic chromatin, one can picture two possible scenarios for nuclear distribution of members of the MBD family. If their binding to chromatin is influenced solely by the interaction of the MBD motif with methylated CpG sequences - the `random interaction' model - then one might expect a given methylated locus to be randomly associated with different MBD proteins within a population of cells (see Fig. 3A). Obviously, the randomness of association would be strongly influenced both by the relative abundance of individual MBD proteins within a given cell and by the affinity of a given MBD motif for methylated CpG. By contrast, if the association of individual MBD proteins with a given methylated locus is influenced by other factors - the `specific interaction' model - one might expect non-random association (see Fig. 3B).
Any factor other than MBD-methyl-CpG interaction that influences binding energy should influence the observed randomness. Indeed, several lines of evidence support the conclusion that the interaction of some MBD proteins with chromosomal DNA is influenced by other determinants. First, consider the documented interactions of members of this family with naked DNA. Although discovered as a factor that has high affinity for methyl-CpG-containing DNA (Lewis et al., 1992), MeCP2 was also subsequently identified as a high-affinity binder to a matrix attachment region (MAR) from the chicken lysozyme gene (Weitzel et al., 1997), which indicates that its interaction with DNA might be influenced by the DNA sequence as well as DNA methylation. Subsequent experiments established that MeCP2 has more than one DNA-binding interface, a feature it shares with MBD1 (Fujita et al., 2000; Meehan et al., 1992). The recent finding that consensus binding sites for MeCP2 include additional sequence information outside the methylated CpG dinucleotide (Klose et al., 2005) reinforces the concept that the determinants for interaction of the MBD family with naked DNA may not be as simple as previously believed. The additional contacts could permit members of the MBD family to discriminate specific methylated regions of the genome.
MDB family members might also have locus-specific functions imparted through association with other nuclear factors. In fact, several members of the MBD family are components of protein complexes. MeCP2 has been reported to interact with the transcriptional corepressor Sin3a (Jones et al., 1998; Nan et al., 1998) and with the SWI/SNF component Brahma (Harikrishnan et al., 2005), which indicates association with chromatin-modifying enzymes that actively regulate transcription. However, recent biochemical characterization of native MeCP2 from rat brain has failed to substantiate these findings and indicates that MeCP2 may exist primarily as a monomer (Klose and Bird, 2004). Nevertheless, a subset of MBD proteins stably associates with other factors in nuclei. MBD3 is a bona fide component of the chromatin-remodeling enzyme Mi-2/NuRD (Wade et al., 1999; Zhang et al., 1999). MBD2 is also reported to associate with NuRD subunits (Feng and Zhang, 2001; Le Guezennec et al., 2006). The biochemistry of MBD1 is somewhat less certain. The protein was initially reported to be a component of the MeCP1 complex (Cross et al., 1997), although this finding has subsequently been questioned (Ng et al., 1999). The biochemical details of the interaction of MBD1 with other proteins have yet to be established, but clearly the protein does stably interact with other nuclear factors, including proteins involved in histone methylation (Ichimura et al., 2005; Sarraf and Stancheva, 2004). While the biochemical description remains incomplete for some family members, it nonetheless seems highly plausible that association of these proteins with other nuclear factors affects their nuclear distribution and function.
Finally, one must also consider the interaction of MBD proteins with their native substrate, chromatin. Surprisingly, the interaction of MBD proteins with nucleosomal DNA has not been extensively addressed in the literature. Wolffe and co-workers found that the MBD motif of MeCP2 can stably associate with nucleosomal DNA if the methylated cytosine groups are accessible and not located on the surface of DNA directly facing the histone octamer (Chandler et al., 1999). Hansen and colleagues have recently provided a glimpse into the interactions between MeCP2 and a chromatin fiber. They found that MeCP2 has a very high affinity for chromatin, and this is independent of the methylation status of the underlying DNA (Georgel et al., 2003). Further, the presence of MeCP2 in a near 1:1 ratio with nucleosomes results in the formation of a novel, highly compact chromatin structure, which suggests that at least part of the impact of MeCP2 results from its effects on long-range chromatin architecture (Georgel et al., 2003). This study underscores the importance of considering the interactions of MBD family proteins with the protein components of the chromatin fiber. To what extent such interactions influence the distribution of other MBD family members remains a matter of conjecture.
Genetic analysis of MBD family members - one gene, one MBD?
Ultimately, genetic analysis provides the potential for understanding the biological roles of the MBD protein family in vivo. Animals lacking MeCP2 (Chen et al., 2001; Guy et al., 2001), MBD1 (Zhao et al., 2003), MBD2 (Hendrich et al., 2001), MBD3 (Hendrich et al., 2001) and MBD4 (Millar et al., 2002) have now been established. In contrast to animals lacking DNA methyltransferases, which fail to develop or die shortly after birth (Jaenisch and Bird, 2003), most MBD-deficient animals do not have dramatic phenotypes. MBD3-null mutants fail to survive embryogenesis (Hendrich et al., 2001). In all other instances, the animals survive to adulthood, although with varying abnormalities (Chen et al., 2001; Guy et al., 2001; Hendrich et al., 2001; Millar et al., 2002; Zhao et al., 2003). At least two possible explanations for this lack of connection between the phenotype of DNA-methyltransferase-mutant animals and those of MBD-mutant animals are conceivable. Loss of a single MBD family member might be compensated for by the action of other methyl-CpG-binding proteins. For instance, the random interaction model would predict that loss of any one MBD family member should not have dramatic consequences. Alternatively, the MBD family may not constitute the only proteins able to `read' DNA methylation. Indeed, there is excellent evidence that other proteins are capable of binding specifically to methylated DNA (Filion et al., 2006; Prokhortchouk et al., 2001).
Detailed analyses of animal models have identified specific molecular defects associated with loss of individual MBD proteins. For instance, MBD2 deficiency is associated with subtle but important changes in the abundance of transcripts for certain cytokines crucial to the process of T-lymphocyte differentiation (Hutchins et al., 2002). Lack of MBD2 is also associated with a decreased incidence of tumors of the colon promoted by mutation of the adenomatous polyposis coli gene (APC) (Sansom et al., 2003). In addition, lack of MeCP2 is associated with specific neurological defects in the mouse that mimic the symptoms observed in the human neurological disorder Rett Syndrome, which is caused by mutation in the human MECP2 gene (Amir et al., 1999). These could result from aberrant expression of genes now known to be regulated by MeCP2, including some known to be important in neuronal development or differentiation (Ballas et al., 2005), signaling (Chen et al., 2003; Martinowich et al., 2003) and stress responses (Nuber et al., 2005). Defective genomic imprinting of the DLX5 locus in mice lacking MeCP2 may also contribute to their phenotype and be relevant to Rett Syndrome (Horike et al., 2005). Finally, loss of MBD1 function is associated with neuronal defects, potentially related to the subtle upregulation of a specific class of endogenous retroelements (Zhao et al., 2003). These examples of regulation of specific classes of transcripts support the specific interaction model for MBD protein function.
In many instances, loss of a specific MBD protein does not result in rampant, unrestrained expression of the corresponding target genes. Rather, in most cases, the resulting loss of repression is rather subtle. This could result from partial loss of repression in all cells in a given population or from more dramatic changes in expression in a small number of cells. In at least one instance, that of cytokine expression during T-lymphocyte differentiation (Hutchins et al., 2002), the loss of MBD2 appears to affect only some cells within the population. Loss of any given MBD protein might therefore not lead to rapid gene activation but simply increase the probability that a silenced gene is reactivated. This process must overcome not only barriers imposed by DNA modification, but also barriers inherent to local chromatin architecture. MBD family proteins could thus act as a `locking' mechanism, ensuring that genes repressed through the action of other components of chromatin-modification systems remain silenced.
An obvious approach to addressing the issue of MBD redundancy is to make double- and triple-mutant animals. MBD3-/- animals fail to develop. When MBD3+/- animals are crossed with MBD2-null animals, MBD3 heterozygotes appear at a lower frequency than expected, indicating that loss of MBD2 function exacerbates the loss of one allele of MBD3 (Hendrich et al., 2001). However, as indicated above, mammalian MBD3 does not interact specifically with methylated DNA. A more relevant double mutant animal is the MECP2-/-, MBD2-/- double mutant. These animals are indistinguishable from MECP2-null animals, indicating a lack of detectable genetic interaction (Guy et al., 2001). There are different possible interpretations of these data. The fact that MBD2 and MECP2 mutant animals have different phenotypes supports the specific interaction model, implying that each factor regulates a set of genes not regulated by the other. Alternatively, the lack of genetic interaction at the level of gross phenotype (the double-mutant animal is no worse than the MECP2-null animal) can be construed as evidence for the random interaction model (Fig. 3). More data will be required to resolve this issue. The creation of animals lacking MeCP2, MBD1 and MBD2 would directly address the possibility that MBD1 compensates for the loss of MBD2, loss of MeCP2, or both. To date, such an animal has not been reported in the literature.
Molecular analysis of MBD proteins
The emerging genetic studies have identified several loci as candidates for directed action of one, and only one, MBD family member. This conclusion is supported in part by molecular studies. Chromatin immunoprecipitation experiments used in combination with morpholino-based protein depletion have provided an interesting picture of the distribution of selected MBD proteins on methylated DNA in primary human cells (Klose et al., 2005). Following a biochemical selection for DNA sequences associated with MeCP2, Klose et al. asked whether this was the only MBD protein associated with the enriched clones. MeCP2, and not MBD1 or MBD2, was the only MBD protein found at most sites examined (11 of 12). Thus, they judged the in vivo overlap of these proteins to be minimal. Following depletion of MeCP2 using a morpholino approach, they found that MBD2 occupied ∼50% of the sites previously bound only by MeCP2. Converse experiments revealed a number of methylated chromosomal sites bound by MBD2 and not MeCP2. Following depletion of MBD2 by morpholino, MeCP2 was present at only a very small percentage (3 of 25) of sites previously occupied by MBD2. These results provide a glimpse into the behavior of MBD proteins on methylated DNA in primary human cells. Although the number of sequences sampled is not large, the data are entirely consistent with MeCP2 following the specific interaction model. Surprisingly, the data are also entirely consistent with MBD2 following the random distribution model.
Several independent reports, largely performed on transformed cell lines, have suggested that most MBD proteins follow a random distribution. For instance, both MeCP2 and MBD2 increase in abundance during muscle cell differentiation concurrently with global changes in pericentric heterochromatin. Importantly, exogenous expression of either MeCP2 or MBD2 can mimic their effect, arguing for functional redundancy of these two proteins in this particular differentiation event (Brero et al., 2005). Studies of the inactive, methylated estrogen receptor alpha locus in human breast cancer cell lines revealed the presence of MeCP2, MBD1 and MBD2. Treatment with inhibitors of DNA methyltransferases and histone deacetylases led to release from the inhibition imposed by these MBD family members and gene reactivation (Sharma et al., 2005). A genome-wide study of binding sites for various MBD family members by chromatin immunoprecipitation in both transformed and primary human cells revealed genes that appeared to be associated with more than one MBD protein as well as genes that appeared to be associated with a single MBD family member (Ballestar et al., 2003). Neither the random nor the specific interaction models (Fig. 3) can thus explain all the data, leaving the probability that both models are correct in some, but not all, cases.
A long-held hypothesis regarding the MBD family of proteins is that their function is dedicated to reading DNA methylation. Evaluation of the ability of these proteins to interact with other nucleic acids has revealed that MBD1, MBD2 and MeCP2 have the capacity to bind RNA with nanomolar affinity (Jeffery and Nakielny, 2004). In addition, MeCP2 has been recently reported to interact with RNA-binding proteins and to regulate mRNA splicing (Young et al., 2005). These results remind us that the full range of biological functions of the MBD family may not yet be adequately described.
Summary and perspectives
How epigenetic information is duplicated during the cell cycle and how this information is read and translated into a functional state in a chromosome remains unanswered. The discovery of a family of proteins with the capacity to `read' the DNA methylation mark has provided important insights into how this information is deciphered. Although we still cannot decisively pinpoint mechanisms by which these proteins `translate' the information content into chromosome function, much progress has been made in this area. The availability of animal models lacking MBD family members has proven to be an important tool for deciphering the complex interplay of MBD proteins with specific genes. Future experiments must address more fully the biochemistry of individual MBD family members and settle the question of how their binding sites are distributed within the genome. Current data seem to indicate that, at some loci, the action of a unique MBD protein is essential for proper regulation. At other genes, functional redundancy seems likely. Sorting out the answers to these questions will require further experimentation, using existing tools as well as novel reagents and models.
The authors acknowledge financial support from the Intramural Research Program of the National Institute of Environmental Health Sciences, NIH. We express our gratitude to Drs Lisa Chadwick, Brian Chadwick, Chris Geyer and Karen Adelman for comments on the manuscript. We apologize to our many colleagues whose work could not be cited here owing to space considerations.