Double-stranded RNAs (dsRNAs) and their `diced' small RNA products can guide key developmental and defense mechanisms in eukaryotes. Some RNA-directed mechanisms act at a post-transcriptional level to degrade target messenger RNAs. However, dsRNA-derived species can also direct changes in the chromatin structure of DNA regions with which they share sequence identity. For example, plants use such RNA species to lay down cytosine methylation imprints on identical DNA sequences, providing a fundamental mark for the formation of transcriptionally silent heterochromatin. Thus, RNA can feed backwards to modulate the accessibility of information stored in the DNA of cognate genes. RNA triggers for DNA methylation can come from different sources, including invasive viral, transgene or transposon sequences, and in some cases are derived from single-stranded RNA precursors by RNA-dependent RNA polymerases. The mechanism by which RNA signals are translated into DNA methylation imprints is currently unknown, but two plant-specific types of cytosine methyltransferase have been implicated in this process. RNA can also direct heterochromatin formation in fission yeast and Drosophila, but in these organisms the process occurs in the absence of DNA methylation.
Chromatin, a complex of genomic DNA and histone proteins, allows chromosomes to be packaged into the relatively small volume of the nucleus. Along each chromosome, some regions are more loosely packaged into transcriptionally active euchromatin, whereas other regions are more tightly packaged into transcriptionally silent heterochromatin (Elgin and Grewal, 2003). Heterochromatin typically includes centromere-associated repeats, repeated gene arrays encoding ribosomal RNAs (rDNA) and transposable elements (Lippman et al., 2003; Vongs et al., 1993; Yoder et al., 1997). In the case of centromere and rDNA repeats, heterochromatin probably stabilizes these structures against rearrangements (Grewal and Klar, 1997; Maloisel and Rossignol, 1998; Xu et al., 1999). In the case of transposable elements, which are parasitic invaders of the host genome, heterochromatin suppresses their transcription and movement (Miura et al., 2001; Singer et al., 2001; Walsh et al., 1998).
In mammals and plants, heterochromatin is associated with cytosine methylation. Cytosine methyltransferases that add this covalent DNA modification can use hemi-methylated DNA as a substrate, providing a mechanism to maintain DNA methylation and heterochromatin after each round of DNA replication (Bestor, 2000). Heterochromatin is also associated with certain patterns of post-translational modifications on histone protein N-terminal `tails', including a lack of acetylation on histone H3 and H4 tails, and methylation of histone H3 at the lysine 9 position (H3 mK9) (Gendrel et al., 2002; Peters et al., 2003; Soppe et al., 2002; Tariq et al., 2003). These histone modification patterns are thought to constitute a chemical code for heterochromatin assembly (Jenuwein and Allis, 2001; Lachner et al., 2003). The H3 mK9 mark guides cytosine methylation patterns in mice, Arabidopsis thaliana and the fungus Neurospora crassa (Jackson et al., 2002; Lehnertz et al., 2003; Malagnac et al., 2002; Tamaru and Selker, 2001; Xin et al., 2003). However, in Arabidopsis this relationship is complex because some mutations that reduce cytosine methylation also reduce H3 mK9 (Lippman et al., 2003; Soppe et al., 2002; Tariq et al., 2003). Furthermore, in Arabidopsis loss of histone deacetylase function can cause a decrease in cytosine methylation and H3 mK9 (Aufsatz et al., 2002b; Lippman et al., 2003; Probst et al., 2004).
A key question is: what directs heterochromatin formation only to certain regions of the eukaryotic genome. In some cases, features inherent in the DNA sequence, its secondary structure or its organization in the nucleus might guide packaging into heterochromatin. In the case of the inactive X chromosome in female mouse cells, a cis-acting Xist RNA molecule coats the affected chromosome in a non-sequence-specific manner and triggers cytosine methylation and heterochromatin assembly (Avner and Heard, 2001; Lee, 2003). In plants, RNA signals can also target cytosine methylation and heterochromatin formation very precisely to identical DNA sequences. Below, we discuss the process of RNA-directed DNA methylation (RdDM) in plants and compare it with potentially related processes in other eukaryotes.
DNA methylation triggered by RNA viruses
The first demonstration that RNA can trigger the cytosine methylation of identical genomic DNA sequences came from experiments using an RNA viroid, a short circular infectious RNA species with a high degree of secondary structure, in tobacco (Wassenegger et al., 1994). Wassenegger et al. engineered tobacco plants to carry viroid-identical DNA sequences in their genomes on integrated transgenes. These target transgene sequences become efficiently cytosine methylated in strains in which the viroid is actively replicating but are not methylated in replication-deficient controls. More recently, several other plant RNA viruses have been shown to trigger methylation of identical DNA sequences during the course of infection (Jones et al., 1998; Jones et al., 1999; Wang et al., 2001). Because RNA viroids and viruses produce only RNA species during their replication cycles, these experiments provide clear evidence that RNA can communicate directly with matching DNA sequences.
Clues to the nature of the RNAs that attract the DNA methylation machinery came from the observation that RdDM often occurs together with RNA interference (RNAi) (Fig. 1). RNAi, which occurs in plants, animals and some fungi, is triggered by the presence of double-stranded RNAs (dsRNAs) (reviewed in Finnegan and Matzke, 2003). These dsRNAs are cleaved by the dicer class of ribonuclease into small 21-26 nucleotide RNAs. These small interfering RNAs (siRNAs) are taken up by an RNA-induced silencing complex (RISC), which includes an argonaute protein (Hammond et al., 2001) that acts as an RNA-binding factor (Song et al., 2003). The siRNA-RISC complex then directs the degradation of transcripts with sequences complementary to the siRNAs, with argonaute probably mediating transcript cleavage (Liu et al., 2004; Song et al., 2004).
The coincidence of RNAi and RdDM has been demonstrated in plant RNA virus systems. For example, when an RNA virus carrying green fluorescent protein (GFP) sequences infects tobacco that expresses GFP from an integrated transgene, GFP siRNAs are produced, GFP transcripts become degraded by RNAi and the GFP DNA sequences become methylated (Jones et al., 1999; Vaistij et al., 2002). The simplest view is that GFP siRNAs and/or their dsRNA precursors trigger GFP methylation as well as RNAi.
Support for the idea that dsRNA-derived species guide RNAi and DNA methylation comes from experiments using plant transgenes designed to produce high levels of dsRNA: typically a strong promoter that drives transcription through a perfect inverted repeat (IR) of the target sequence. For example, if such a transgene carrying an IR segment of the β-glucuronidase gene (GUS) is introduced into an Arabidopsis strain that already carries an expressed GUS reporter transgene, GUS transcripts are degraded by RNAi and GUS sequences become methylated (Béclin et al., 2002).
DNA methylation factors that read RNA signals
In mammalian genomes, methylation occurs almost exclusively on cytosines in the symmetric dinucleotide context 5′-CG-3′. In plant genomes, CG is also the predominant methylation context. For example, centromere and rDNA repeat arrays carry mainly CG methylation (Vongs et al., 1993). However, in plant genomes methylation can also occur on cytosines in other contexts, including the symmetric context 5′-CNG-3′ and asymmetric contexts. This difference in methylation patterning reflects the different types of cytosine methyltransferase present in mammals versus plants. In mammals, Dnmt1 is the major cytosine methyltransferase responsible for maintaining CG methylation (Bestor, 2000). Plants have a Dnmt1 orthologue, MET1, which also maintains the majority of CG methylation (Finnegan et al., 1996; Kankel et al., 2003; Ronemus et al., 1996; Saze et al., 2003). But, in addition, they have two other structurally distinct cytosine methyltransferases, the CMT and DRM classes, which are not found in mammalian genomes (Cao et al., 2000; Finnegan and Kovac, 2000; Henikoff and Comai, 1998). The CMT class is primarily responsible for maintenance of CNG methylation, but also contributes to methylation in other contexts at some loci (Bartee et al., 2001; Cao and Jacobsen, 2002a; Lindroth et al., 2001; Papa et al., 2001). The DRM class is primarily responsible for establishing new methylation imprints (Cao et al., 2003; Cao and Jacobsen, 2002b) and has non-CG specificity in vitro (Wada et al., 2003).
A notable feature of plant RdDM is that it affects cytosines in all possible sequence contexts (Aufsatz et al., 2002b; Jones et al., 1999; Melquist and Bender, 2003; Pélissier et al., 1999; Wang et al., 2001). For example, the RNA-viroid-infected tobacco system displays methylation on almost every available cytosine in the target transgene sequence (Pélissier et al., 1999). This dense methylation patterning suggests that trigger RNAs efficiently recruit DRM and/or CMT enzymes to their target DNA sequences to establish and maintain a high proportion of non-CG methylation, on top of the CG methylation patterns maintained by MET1 (Fig. 1). Interestingly, in Arabidopsis a subset of non-CG methylation is dependent on the function of the H3 K9 methyltransferase KYP/SUVH4 (Jackson et al., 2002; Lippman et al., 2003; Malagnac et al., 2002). This finding raises the possibility that H3 mK9 might be the first epigenetic modification made in response to dsRNA-derived species, with DNA methylation accumulating later in the process. Alternatively, the initial DNA methylation imprint established in response to an RNA signal might trigger transcriptional silencing, causing enrichment for H3 mK9, which then aids in recruitment of cytosine methyltransferases during maintenance of the silent state.
In systems where an initial RdDM imprint is established, and then the RNA trigger is removed, the target sequence either retains a reduced level of mostly CG methylation (Jones et al., 2001) or loses its methylation (Aufsatz et al., 2002a), probably depending on the length and cytosine content of the target sequence. In cases where some CG methylation is retained, this residual methylation is not stable, which presumably reflects the incomplete efficiency of MET1 in remethylating hemimethylated CG sites after DNA replication (Jones et al., 2001).
Pathways that generate RNA signals
In addition to RNA viruses and IR transgenes, many transgenes that do not carry IR sequences can nonetheless trigger both DNA methylation and RNAi by providing template RNAs that are converted into dsRNA (Fig. 2). Genetic screens in Arabidopsis using such transgenes as reporters have implicated putative RNA-processing factors including a predicted RNA-dependent RNA polymerase (RdRP), SGS2/SDE1 (recently re-named RDR6 (Xie et al., 2004)), and the AGO1 argonaute protein (Fagard et al., 2000; Morel et al., 2002; Mourrain et al., 2000) in this pathway. Thus, the pathway probably involves the RdRP-catalyzed synthesis of antisense RNA on template sense strands (Schiebel et al., 1998; Tang et al., 2003). The requirement for the AGO1 argonaute protein in the dsRNA synthesis pathway suggests that argonautes might mediate siRNA interactions not only in RISC but at other steps during RNAi.
An intriguing idea is that siRNAs hybridized to a target transcript might enhance the activity of an RdRP on that transcript, either directly by acting as primers or indirectly by making the template more generally accessible to the polymerase, thus amplifying levels of dsRNA and siRNAs (Martienssen, 2003; Vaistij et al., 2002) (Fig. 2). Studies by Vaistij et al. support the second model, showing that when siRNAs corresponding to only the 5′ end of a target transcript are introduced into a plant they can sometimes promote the spread of siRNAs and DNA methylation into sequences matching the entire extent of the target transcript (Vaistij et al., 2002). This spread of siRNA production depends on the RDR6 RdRP. The ability of siRNAs to promote new dsRNA synthesis has a number of interesting implications, including the possibility that even an `off-target' transcript with only a short extent of sequence identity to a pool of siRNAs can be drawn into the dsRNA synthesis pathway (Garcia-Perez et al., 2004).
The Arabidopsis genome encodes six predicted RdRPs (Mourrain et al., 2000), ten predicted argonautes (Morel et al., 2002) and four predicted dicer-like (DCL) proteins (Finnegan et al., 2003; Schauer et al., 2002). Genetic studies of these RNA-processing factors suggest that they act in parallel pathways, sometimes with overlapping functions, probably depending on the localization and structure of the RNA substrate (Table 1). For example, whereas the RDR6 RdRP and the AGO1 argonaute are key factors for RNAi induced by non-IR transgenes (Dalmay et al., 2000; Fagard et al., 2000; Morel et al., 2002; Mourrain et al., 2000), the RDR2 RdRP and the AGO4 argonaute are required for the accumulation of siRNAs from particular endogenous sequences (Xie et al., 2004; Zilberman et al., 2003). Correspondingly, each of these factors is required for the maintenance of non-CG methylation on DNA sequences that have identity to the substrate RNA, illustrating the link between dsRNA and RdDM. In addition, AGO4 is required for the maintenance of DNA methylation triggered by IR transgenes, although in this case the production of siRNAs is not affected (Zilberman et al., 2004).
Similarly, the DCLs control distinct systems (Table 1). DCL1 is needed for processing a specialized class of small RNAs called micro-RNAs (miRNAs) that are derived from endogenous short dsRNA-encoding genes and that control plant development (Finnegan et al., 2003; Kasschau et al., 2003; Papp et al., 2003; Park et al., 2002; Reinhart et al., 2002). AGO1 has also been implicated in miRNA function (Kidner and Martienssen, 2004; Vaucheret et al., 2004). DCL2 is required for accumulation of siRNAs derived from infecting turnip crinkle virus (Xie et al., 2004). DCL3 is required for accumulation of siRNAs from the same endogenous transcripts that are targets for RDR2 (Xie et al., 2004). In addition, like RDR2, DCL3 is required for establishing and maintaining DNA methylation on the genes that correspond to substrate transcripts, although it has only partial effects on the maintenance of non-CG methylation (Chan et al., 2004; Xie et al., 2004). However, mutations in DCL genes have not been recovered in forward genetic screens for mutations that block transgene-induced RNAi, which suggests that the plant DCLs might function redundantly in processing transgene-derived siRNAs. Moreover, mutations in DCL genes have not been recovered in screens for mutations that reduce silencing from promoters targeted by RdDM. Therefore, it is currently unclear whether the DCL genes function redundantly in generating signals for RdDM, or whether in some cases RdDM is triggered by dsRNA precursors rather than diced siRNAs.
In plant transgene systems designed to trigger RNAi, DNA methylation typically accumulates only on the target genecoding-sequences and does not extend into upstream promoter sequences (Dalmay et al., 2000; Mourrain et al., 2000). Nuclear run-on analysis shows that the unmethylated promoters continue to drive transcription through the downstream methylated sequences. This observation suggests that RdDM of coding sequences might be merely a neutral side-effect of RNAi. In keeping with this view, mutations in the DNA methylation machinery have not been recovered from RNAi mutant screens with transgene reporters. However, when mutations in the methylation machinery are crossed into a strain carrying a non-IR transgene reporter for RNAi, the progeny exhibit reduced efficiency of RNAi (Morel et al., 2000). This suggests that methylation of the transgene reporter reinforces RNAi by diverting some reporter transcripts into a dsRNA processing pathway.
Although it remains unclear whether RNA-directed coding-sequence-methylation has functional consequences, it is evident from both transgene and RNA-virus-experiments that dsRNA can also trigger promoter methylation, which results in transcriptional silencing. For example, transgenes carrying highly transcribed IRs of target promoter sequences can direct DNA methylation and transcriptional silencing to both transgene and endogenous promoter targets in tobacco, Arabidopsis and petunia (Melquist and Bender, 2003; Mette et al., 2000; Sijen et al., 2001). Similarly, an RNA virus carrying 35S promoter sequences can methylate and transcriptionally silence a 35S-GFP transgene target in tobacco (Jones et al., 1999). Thus, targeted promoter methylation potentially provides an effective means of reverse genetics in plants. However, notice that not all genomic sequences might be equally susceptible to RdDM. In some cases, in which an RNA virus carrying particular endogenous tobacco gene coding sequences infects tobacco, it induces RNAi but not DNA methylation of the target gene (Jones et al., 1999; Thomas et al., 2001). This finding could reflect unique features of some endogenous genes that protect them from RdDM or delay their efficient methylation during the time-frame of a viral infection.
RNA-directed DNA methylation as a genome defense mechanism
High levels of dsRNA produced by viral infections or highly transcribed transgenes can provoke DNA methylation. But RdDM can also affect endogenous genes expressed at lower levels. For example, the PAI tryptophan-biosynthesis genes in Arabidopsis are methylated at CG and non-CG cytosines in strains that carry a transcribed IR of PAI sequences, and this methylation depends on RNA produced from the IR (Luff et al., 1999; Melquist and Bender, 2003; Melquist et al., 1999). The majority of PAI transcripts are polyadenylated in the center of the IR; only a few extend into palindromic sequences to make dsRNA (Melquist and Bender, 2003). Furthermore, PAI siRNAs are not detectable by the standard gel blot methods used to detect such species derived from viruses or transgenes, and full-length PAI transcripts are not efficiently degraded by RNAi, which suggests that PAI siRNA accumulation is very low (Melquist and Bender, 2003; Melquist and Bender, 2004). These results indicate that RdDM can be triggered by levels of trigger RNAs lower than those needed for RNAi, as long as the source of RNA is continuously expressed over several plant generations. As discussed above, complete deletion of the PAI IR dsRNA source results in unstable, mostly CG, methylation on the remaining PAI sequences in the genome (Bender and Fink, 1995; Jeddeloh et al., 1998). However, in strains whose levels of PAI dsRNA are strongly reduced but not abolished, the residual methylation is stabilized (Melquist and Bender, 2003; Melquist and Bender, 2004). This suggests that, once an RdDM imprint is established, even very low levels of the RNA-trigger can greatly improve the maintenance of the imprint. By extension, other sequences in plant genomes that carry mostly CG methylation and that have only low expression levels might nonetheless have originally been targeted for methylation by RNA signals.
Some cases of endogenous gene methylation, such as that displayed by the PAI genes, are accidental consequences of unusual rearrangements that produce trigger RNAs. But transposons represent a class of resident genomic sequence that is probably the intended target of RdDM as a means of defending the host genome against deleterious effects. Three lines of evidence argue that transposon sequences in plant genomes are methylated by RNA-based mechanisms. First, methylated transposons usually display both CG and non-CG methylation, which is indicative of RdDM (Kato et al., 2003; Lindroth et al., 2001; Lippman et al., 2003). Second, small RNAs corresponding to transposon sequences have been detected in Arabidopsis and tobacco (Hamilton et al., 2002; Lippman et al., 2003; Llave et al., 2002; Xie et al., 2004), which indicates that the transposon sequences can generate dsRNA. Third, mutations in specific RNA-processing factors block production of small RNAs and cytosine methylation of the retrotransposon element AtSN1 in Arabidopsis (Hamilton et al., 2002; Xie et al., 2004; Zilberman et al., 2003).
What features of transposon RNAs might make them uniquely susceptible to being processed as DNA methylation signals? As transposons invade host sequences, they land in sites where they can be fortuitously transcribed from nearby endogenous promoters (Fig. 2). Perhaps these aberrant read-through transcripts contain cues such as long untranslated segments or unusual secondary structures that make them favored templates for RdRP-catalyzed synthesis of dsRNA, as has been suggested from studies in worms (Sijen and Plasterk, 2003). Moreover, tandemly duplicated transposon sequences have the potential to amplify small RNA-primed dsRNA synthesis (Martienssen, 2003). Alternatively, fortuitous read-through transcripts of both sense and antisense strands of transposon sequences could pair with each other to make dsRNA. Support for the idea that the trigger RNAs are derived from species that originate outside the transposon sequences comes from the observation that methylation covers transposon ends as well as internal, transposon-coding-sequence regions (Kato et al., 2003). Such end-to-end methylation has the potential to suppress elements that move by a cut-and-paste mechanism not only by transcriptional silencing of internal transposon promoters but also by blocking transposase action at transposon ends.
In fact, RNA-directed defense against transposons provides a raison d'etre for both CG and non-CG methylation systems in plants. For example, the cut-and-paste CACTA class of transposon in Arabidopsis is methylated at CG and non-CG cytosines and does not move to new sites at a detectable frequency in wild-type plants (Kato et al., 2003; Miura et al., 2001). Removing just CG methylation by mutation of MET1, or removing just non-CG methylation by mutation of the major CMT gene CMT3, can partially reactivate CACTA transcription but is insufficient to activate CACTA movement to new sites (Kato et al., 2003). By contrast, if both CG and non-CG methylation patterns are removed by a met1 cmt3 double mutation, or by mutation of a chromatin-remodeling helicase, DDM1, that helps maintain global methylation patterns, then the CACTA elements become fully transcriptionally active and mobile (Kato et al., 2003; Miura et al., 2001). In some cases, the new transposon insertions cause pleiotropic mutations (Miura et al., 2001). This system illustrates that CG and non-CG methylation act redundantly to reinforce silencing at RNA-directed targets.
RNA-directed heterochromatin formation in fission yeast
Although RdDM involves plant-specific factors such as the DRM and CMT cytosine methyltranferases, a potentially related process of RNA-directed heterochromatin formation has recently been elucidated in fission yeast (Volpe et al., 2002). This organism lacks cytosine methylation, and instead uses histone H3 mK9 as a mark of heterochromatin. The heterochromatin protein SWI6 binds to histones containing the H3 mK9 mark to effect transcriptionally repressed states at centromeres and at the silenced mating-type locus. Strikingly, mutations in factors involved in generating small RNAs, including an argonaute, an RdRP and a dicer, block H3 mK9 and heterochromatin formation at centromeres and impair centromere function (Volpe et al., 2003; Volpe et al., 2002). Moreover, small RNAs corresponding to particular centromere sequences can be detected (Reinhart and Bartel, 2002). These small centromeric RNAs localize to their cognate DNA sequences and recruit the RNA-induced initiation of transcriptional gene silencing complex (RITS), which contains the argonaute protein, the heterochromatin-associated protein Chp1 and the novel protein Tas3 (Verdel et al., 2004). This complex provides a link between small RNAs and histone methylation.
Intriguingly, a similar link between RNAi, H3 mK9 and heterochromatin proteins has recently been demonstrated for silenced genes in Drosophila (Pal-Bhadra et al., 2004), suggesting that this fundamental mechanism is conserved in animal cells. In the Drosophila system, mutations in the RNAi genes aubergine, homeless and piwi all cause a loss of H3 mK9 and of binding of heterochromatin proteins to a transcriptionally silenced transgene reporter. Whether a similar link between RNAi and heterochromatin exists in mammals is being actively investigated (Fukagawa et al., 2004; Morris et al., 2004). But a specialized RNA-based mechanism guides heterochromatin formation and silencing to one of the two X chromosomes in female mouse-embryonic-cells to adjust the dosage of X-encoded gene expression to be equivalent to that in male cells (Avner and Heard, 2001; Lee, 2003). This X-inactivation mechanism involves a specialized 17 kb non-coding RNA molecule, Xist, which is X encoded. Xist is preferentially expressed from only one of the two female X chromosomes and builds up in cis along the chromosome from which it was transcribed. The Xist-coated chromosome is packaged into heterochromatin, which suggests that Xist recruits histone-modifying enzymes and cytosine methyltransferases. Thus, unlike the plant and yeast RNA-based mechanisms, the Xist mechanism involves a very long RNA trigger and a whole chromosome target.
A notable difference between plant RdDM and fission yeast RNA-directed heterochromatin formation is that the yeast heterochromatin can spread significantly outwards from the initiating region that produces the small RNAs - for example, from a coding region into upstream promoter sequences (Schramke and Allshire, 2003). By contrast, sequencing of methylation patterns at the boundaries of plant RdDM-target regions shows that methylation is very precisely directed only to the regions of RNA-DNA sequence identity (Luff et al., 1999; Pélissier et al., 1999; Wang et al., 2001). The precision of this patterning suggests that an RNA-DNA paired species could initiate the DNA methylation imprint. However, because each nucleosome is wrapped only by approximately 150 base pairs of DNA (Elgin and Grewal, 2003), it is also possible that, as in yeast and flies, RNA first guides the H3 mK9 mark, which subsequently guides DNA methylation.
Conclusion and Perspectives
Plants use dsRNA, produced by viruses, transgenes and transposons, as the starting point for a variety of defense mechanisms, including RNA degradation by RNAi and heterochromatin formation through cytosine methylation of identical DNA sequences. How RNAs are aligned with matching DNA sequences to recruit cytosine methyltransferases remains unknown. But the identification of DRM and CMT cytosine methyltransferases as mediators of the non-CG methylation associated with RNA signals now provides biochemical tools to isolate associated protein components. RNA has also been demonstrated to direct heterochromatin formation in fission yeast and to guide specific mammalian silencing processes such as X chromosome inactivation. Thus, plant RdDM represents part of a continuum of RNA-based mechanisms for targeted gene silencing in eukaryotes.