In eukaryotes, motifs such as silencers, enhancers and locus control regions act over thousands of base pairs to regulate adjacent genes; insulators limit such effects, and barriers confine repressive heterochromatin to particular chromosomal segments. Recent results show that many of these motifs are nongenic transcription units, and two of them directly contact their targets lying further down the chromosome to loop the intervening DNA: the barriers (scs and scs') flanking the 87A7 heat-shock locus in the fly contact each other, and a locus control region touches the β-globin gene in the mouse. I hypothesize that the act of transcription underlies the function of these regulators; active polymerizing complexes tend to cluster into `factories' and this facilitates molecular contact between the transcribed regulator and its distant (and transcribed) target.
In bacteria, most regions regulating transcription lie within a few tens of base pairs of the transcriptional start site. They generally bind repressors and activators, and the bound regulators contact the transcription machinery to vary initiation rates up to 1000-fold. In higher eukaryotes, an additional but ill-defined nuclear `context' acts over thousands of base pairs to regulate transcription by another 10,000-fold or more (Ivarie et al., 1983). Although a histone `code' forms part of this context (Jenuwein and Allis, 2001; Kurdistani and Grunstein, 2003), there must be additional layers of control; yeast mutants lacking the N-terminal tails that carry the code are viable (Grunstein, 1990), and depleting histone H4 has little effect on the expression of most genes (Wyrick et al., 1999). Here, I discuss recent evidence showing that many regulatory motifs acting at a distance are transcription units, and that clusters of such units organize the chromatin fibre into loops. I suggest that the act of transcription of the regulatory motifs underlies the way they work and constitutes an under-recognized part of this context. I restrict discussion to a few of the best-characterized regulatory motifs, although many other examples could be cited.
An important distinction is made between genes (which usually encode proteins) and transcription units (which might be copied into noncoding transcripts). It is now estimated there are tenfold more transcription units than genes in the human genome (Kapranov et al., 2002), and very little is known about the former. Most of these extra transcription units do not encode recognizable structural RNAs like rRNA or tRNA, and most are not found in current transcriptomes [as they are copied into unstable transcripts that lack the poly(A) tails usually used to select RNA molecules for analysis]. A small fraction of these extra units will be pseudogenes, as sequence analysis reveals that many of the ∼20,000 human pseudogenes have functional promoters. [Note that transcription of one pseudogene (Makorin1-1p) regulates the expression of its coding counterpart (Hirotsune et al., 2003).] A larger fraction is probably made up of repeated sequences, because many of the retrotransposons in those repeats also contain functional promoters. However, most are probably just `junk' and destroyed rapidly by the nonsense-mediated decay pathway (Iborra et al., 2001). I will argue that nature utilizes neighbouring transcription units (whether they be genic or nongenic) in gene regulation.
Heterochromatin: a repressive context
Heterochromatin was originally defined as the part of a mitotic chromosome like a centromere that remains condensed and dark-staining during interphase. It is characterized by the inaccessibility of its DNA to nucleases, methylation of cytosines in CpG doublets, methylation of lysine 9 in histone H3, a regular nucleosomal spacing, hypoacetylation of nucleosomes, and the presence of specific proteins, such as HP1 (Richards and Elgin, 2002). It contains few genes, and so was traditionally thought to be transcriptionally inert; however, we now know it contains many active transcription units. For example, wheat centromeres provide extreme examples of the deepest constitutive heterochromatin, and yet the density of transcription sites in this heterochromatin equals that of euchromatin (Abranches et al., 1998). [This was shown by permeabilizing cells in which centromeres and telomeres were found at opposite ends of nuclei, extending nascent transcripts in Br-UTP, and then immunolabelling the resulting Br-RNA.] In addition, many genes in the facultative heterochromatin of the so-called `inactive' X chromosome of female mammals turn out to be active (Sudbrak et al., 2001), and the heterochromatin on the human Y chromosome contains many transcription units (Skaletsky et al., 2003). Recent work even shows that transcription from within the heterochromatin of fission yeast is required to maintain the heterochromatic state (Volpe et al., 2002).
Heterochromatinization inactivates most genes (Gerasimova and Corces, 2001; Donze and Kamakaka, 2002; Schedl and Broach, 2003). During his famous experiments on the effects of X-rays on Drosophila, Muller noticed some mutants with mottled red and white eyes (Sturtevant and Beadle, 1962). This was subsequently traced to the translocation of the white gene from euchromatin to heterochromatin (Baker, 1968). Translocation – and not mutation of white – causes this variegation, because the phenotype reverts on relocation of the gene back to euchromatin. white encodes a membrane protein that transports pigment precursors into eye cells, and so deficient eyes appear shades of orange to white. white has a weak promoter without a canonical TATA box that is sensitive to adjacent sequences. Inactivation of white in some cells in the primordial eye disc early in development coupled to the subsequent inheritance of that inactivity leads to patches with and without pigment. Inactivation can be accompanied by the spread of heterochromatin hundreds of kilobases into juxtaposed euchromatin, with a consequential alteration in the banding pattern in the giant chromosomes of salivary glands. This spread propagates down one fibre and laterally to others through various modifications (e.g. histone hypoacetylation) until some `barrier' is encountered (Donze and Kamakaka, 2002; Schedl and Broach, 2003). Genetic screens in Drosophila originally revealed that almost any euchromatic gene can be inactivated by such position effects (Lindsley et al., 1960; Baker, 1968), and this is now being amply confirmed in a wide range of higher eukaryotes by experiments using reporters encoding the green fluorescent protein (GFP) (Spector, 2003).
Current models for the way regulatory motifs act at a distance are of three non-exclusive types (Gerasimova and Corces, 2001; Schedl and Broach, 2003). They involve altering the balance between euchromatin and heterochromatin, and the motifs are seen as binding sites for proteins that induce (1) chemical modifications and/or structural alterations that propagate down the fibre, (2) looping to bring regulatory elements into contact with distant promoters, or (3) relocation of elements to permissive (or restrictive) nuclear compartments.
Transcription units as barriers, boundaries, insulators, enhancers, LCRs
I now describe examples of the best-characterized regulatory motifs; significantly, all prove to contain promoters and/or binding sites for transcription factors.
The first barrier segregating euchromatin from heterochromatin was defined in the Drosophila 87A7 heat-shock locus (Fig. 1A) (Kellum and Schedl, 1991). This locus possesses flanking `specialized chromatin sequences' (scs and scs') that each contain nuclease-hypersensitive sites and a nuclease-resistant core to which specific transcription factors bind (SBP/Zw5 binds to scs; BEAF32/DREF binds to scs'). When flies are transformed with wild-type white, some flies have wild-type eyes, and others have mutant ones. When white integrates into euchromatin it is expressed, and when integrated into heterochromatin it is repressed. But when it is flanked by scs and scs', most flies have wild-type eyes. Both scs and scs' contain promoters (Avramova and Tikhonov, 1999). Many similar barriers have now been identified (Gerasimova and Corces, 2001; West et al., 2002); hypersensitive site 4 (HS4) of chicken β-globin (see below) protects white from fly heterochromatin, and active transcription units insulate GFP-encoding reporters from human heterochromatin (Sutter et al., 2003).
The Drosophila bithorax complex (BX-C) contains many well-studied enhancers, silencers and insulators that ensure that three homeotic genes – Ubx, abd-A and Abd-B – are expressed appropriately along the anterior-posterior axis of the developing fly (Fig. 1B) (Martin et al., 1995). A significant number of these regulators extend over many kilobases, and there has been no satisfactory explanation of why they are so long. Moreover, it has only recently been recognized that it is the act of transcription that underlies their activity. For example, first, deleting an unrelated gene (Glut3) encoding a glucose transporter that happens to be embedded in the locus alters abd-A expression and transforms the first abdominal segment into the second segment (Martin et al., 1995). Second, P element insertions downstream of abd-A also transform the first abdominal segment into the second and third segments; the P element promoter drives transcription across intergenic regions regulating segmental development, and blocking it reverts the mutant phenotype (Bender and Fitzgerald, 2002). Third, disrupting the transcription of each of the iab-4 to iab-8 enhancers transforms the segments they control into more posterior ones (Drewell et al., 2002a; Hogga and Karch, 2002). Fourth, activating transcription of the cellular memory modules (CMMs) associated with the bxd, Mcp and Fab7 regulators correlates with the alteration in epigenetic state (Rank et al., 2002). Fifth, transcription of the Mcp and Fab8 insulators correlates with their activity (Drewell et al., 2002a).
Saccharomyces cerevisiae lacks cytologically detectable heterochromatin (it is too small to be visible), but insertions in a few sites are subject to position effects. Their DNA is resistant to nucleases and their nucleosomes are hypoacetylated, but CpGs plus lysine 9 of histone H3 are unmethylated, and SIR (silent information regulator) proteins replace HP1. The most-studied region is HMR, which contains silent copies of the mating-type specific gene, MATa (Fig. 1C). A reporter gene such as URA3 is silenced when inserted into HMR (Richards and Elgin, 2002). Silencing depends on context, because excised DNA rings soon reactivate (Cheng and Gartenberg, 2000). Barriers marking the limit of the silent domain contain promoters, and mutating the tRNAThr promoter in the right-hand one reduces barrier activity (Donze and Kamakaka, 2001). [tRNA genes also limit the spreading of heterochromatin in fission yeast (Partridge et al., 2000).]
HS4, which lies 5′ of the chicken β-globin locus, is the best-characterized vertebrate boundary (Burgess-Beusse et al., 2002). It marks the border between nuclease-sensitive and nuclease-resistant chromatin, blocks enhancer action and screens reporters against position effects. Although it has not been shown to be a promoter in vivo, it is nevertheless a CpG island that bears the histone code (i.e. H3 hyperacetylation and methylation of lys4) characteristic of one (Litt et al., 2001).
Enhancers promote the activity of linked promoters (Hertel et al., 1997). They invariably contain binding sites for transcription factors and some are transcribed, including those regulating BX-C (above) and a subset known as locus control regions (LCRs). LCRs enable inserted transgenes to be expressed independently of position effects at physiological levels in a manner that is tissue specific and dependent on copy number (Li et al., 2002). The first LCR was found in the human β-globin locus (Fig. 1D). In transgenic mice that contain only the minimal β-globin gene, inserted transgenes were subject to position effects; however, adding distant hypersensitive sites (HS1-HS5) led to high-level expression in erythroid cells regardless of the insertion site. Both human and mouse LCRs contain multiple transcription units (Ashe et al., 1997; Routledge et al., 2002), and active RNA polymerase II can be immunoprecipitated bound to HS2 (Johnson et al., 2001). Other transcribed LCRs include those controlling the expression of α-globin, keratin 18, adenosine deaminase, growth hormone and major histocompatibility complex (MHC) class II genes (Li et al., 2002). Significantly, a point mutation in the promoter of the keratin 18 LCR – an alu repeat – destroys some of its insulating activity (Willoughby et al., 2000).
Transcription of all these motifs regulates the activity of linked genes. Other well-known mammalian examples include H19 (a noncoding unit that combines with transcribed enhancers to control IGF2 imprinting) (Drewell et al., 2002b), and XIST and Tsix (noncoding units maintaining inactivity of one of the X chromosomes in females) (Plath et al., 2002).
Repression through anisense trasncription and transcriptional interference
One factor confounds the analysis of these motifs – the assays used alter flanking sequences, but the function examined depends on them. Thus, scs and scs' cannot insulate white from some heterochromatin on the X chromosome (Kellum and Schedl, 1991), HMR barriers protect one reporter (i.e. URA3) but not other genes with powerful upstream activation sequences (e.g. TEF1 and TEF2) (Bi and Broach, 1999), CHA1 – which flanks the HML mating-type locus – becomes a robust barrier when induced by serine (Donze and Kamakaka, 2001), and inverting the β-globin LCR destroys much of its activity (Tanimoto et al., 1999). I now argue that moving a motif from one place to another usually changes the flanking transcription units, and this generates much of the complexity seen.
Transcripts copied from noncoding strands can pair with their coding counterparts to regulate gene activity post-transcriptionally – for example, through the operation of the RNA interference, alternative splicing, and editing pathways (Gottesman, 2002; Storz, 2002). Such downstream effects are not discussed here. However, the act of antisense transcription inevitably and directly reduces sense transcription, because both strands cannot be copied simultaneously. Such antisense inhibition is probably significant as at least 300 human genes contain embedded antisense units that are copied into transcripts stable enough to yield expressed sequence tags (ESTs) (Shendure and Church, 2002).
Transcription of a gene also interferes with the expression of a nonoverlapping but neighbouring one. This kind of transcriptional interference was first seen in mammals in clones harbouring a single (integrated) copy of a retroviral vector encoding resistance to both neomycin and azaguanine (Emerman and Temin, 1986). Expression of the 3′ resistance gene was suppressed when selection required expression of the 5′ gene, and vice versa. Moreover, few cells grew in both neomycin and azaguanine, showing that only one gene is usually active. Interference can spread at least 10 kbp, but the upper limit remains unknown. It is generally missed in cells with many integrants (and in diploid cells) because one copy might express one gene at any moment, and a second copy another. However, it is likely to be common in human cells, because ∼20% of genes on chromosomes 21 and 22 lie within 1 kbp of another (in a head-to-head orientation) (Adachi and Lieber, 2002), and even more common when (unrecognized) transcription units are considered. Such interference also occurs between different kinds of unit; thus, tRNAThr (a polymerase III unit) silences URA3 (a polymerase II unit; above), and other yeast tRNA genes reduce the transcription of polymerase II units lying within ∼400 bp between three- and sixty-fold (Bolton and Boeke, 2003). Although we might imagine that two mammalian genes lying within a few kbp on a chromosome could each be loaded with many polymerases, this evidence suggests that only one of the two is transcribed at any moment.
An enduring idea in biology sees the chromatin fibre as looped, with attachment points as the barriers that segregate contexts. But despite wide acceptance, there remains little agreement as to which proteins and DNA motifs constitute the molecular ties. There is good reason for this. Chromatin is poised in a metastable state, and the buffers used for biochemical purification can irreversibly change the structure. Consequently, results obtained with isolates like `matrices', `scaffolds' and `ucleoids' are not widely accepted (e.g. Belmont, 2002). Nevertheless, we can envisage two kinds of tie (Fig. 2) (Cook, 2001). `Structural' ties would persist from one interphase to the next, and probably involve conserved DNA repeats. However, 30 years of research (including sequencing whole genomes) has failed to uncover any such motifs. `Functional' ties would depend on which part of the genome was being transcribed, and even two daughter cells would possess different arrays of everchanging attachments. I believe we must now look to this elusive class.
Images of lampbrush chromosomes are often thought to provide the best evidence for looping. These chromosomes can be isolated from oocytes of many animals (but not mammals). During the first meiotic division, duplicated homologs pair, and long loops can be seen extending microns away from axial chromomeres. Unusually, these chromosomes are transcribed, and transcripts are attached to both loops and chromomeres (Snow and Callan, 1969). But these loops only become visible on dispersing chromatin in unphysiological buffers, and none are seen in sections of whole oocytes where chromatin appears as a granular aggregate; therefore, the long loops may be created during isolation as active units are stripped off chromomeres (Cook, 2001).
Supercoiling in linear eukaryotic DNA provides additional evidence for looping. Supercoils cannot be maintained in linear DNA without looping. However, lysing cells in >1 M NaCl releases superhelical loops that are visible in the electron microscope, and nascent transcripts are associated with attachment points (but not loops) (Jackson et al., 1984). Here, artifactual aggregation could induce looping, and so this is also indecisive. Many experiments involving nuclease digestion are also consistent with looping. Cutting an unlooped fibre should release long fragments that are then shortened, but the expected long fragments are not seen; rather, kinetics fit the release of short fragments from loops. Although initial experiments used unphysiological buffers that generated/destroyed attachments (Jackson et al., 1990), more recent ones with `physiological' buffers confirm looping and show that transcription units mediate attachment (Fig. 2B) (Jackson et al., 1990; Jackson and Cook, 1993; Jackson et al., 1996).
Two powerful new methods now provide excellent evidence for looping (Fig. 3) and for a model described below in which the loops are anchored by transcription units. `Chromosome conformation capture' (3C) (Dekker et al., 2002) shows that a distant (transcribed) LCR contacts the (transcribed) β-globin gene in mouse erythroid cells but not in brain cells (where the gene is inactive) (Tolhuis et al., 2002). Moreover, (transcribed) scs contacts (transcribed) scs', and looping was confirmed by chromatin immunoprecipitation – antibodies against Zw5 (bound to scs) precipitate scs' (Blanton et al., 2003). The β-globin loop is also seen by RNA tagging and recovery of associated proteins (RNA TRAP) (Carter et al., 2002).
Although there is no decisive evidence for looping in living cells, the above data can be interpreted simply if two distant and active transcription units come together to tie the intervening fibre into a loop.
Combining models for loop structure and action at a distance
Models for action at a distance often involve looping (Merika and Thanos, 2001; Labrador and Corces, 2002; West et al., 2002; Schedl and Broach, 2003). Thus, a polymerizing complex might be attracted to an `enhanceosome', before initiating to track out into a loop. But then transcription units should be detached in the experiment illustrated in Fig. 2 (but they are not), and they should not usually be seen together in the experiments illustrated in Fig. 3 (but they are). Moreover, a tracking polymerase generates a transcript that is entwined about the template once for every ten base pairs transcribed, but no satisfactory method for untwining the transcript has yet been suggested (Cook, 1999). However, all the above results fit comfortably with one current model for chromatin structure (Cook, 1999; Cook, 2002) in which transcription complexes strung along the chromosome cluster to form a `factory', and this would loop intervening DNA (Fig. 4). Here, each active polymerase in the factory reels in its template and extrudes its transcript (Fig. 5). Because the template rotates as it moves through an immobile polymerase, the transcript does not become entwined about the template (Iborra et al., 1996; Cook, 1999). Support for this model comes from several approaches. First, active polymerases (plus their templates and transcripts) resist detachment in the experiment illustrated in Fig. 2B, which places them at (or very close to) attachment points (Jackson and Cook, 1993; Jackson et al., 1996; Cook, 1999). Second, because there are more active molecules of RNA polymerase II in a HeLa cell than transcription sites, and because only one polymerase is typically engaged on a transcription unit, each site (diameter ∼50 nm) must contain several different units (Cook, 1999). The new results using 3C and RNA TRAP (Fig. 3) provide further support for the core element of this model – that two (or more) active units cluster to loop the intervening DNA; thus, the (transcribed) LCR contacts (transcribed) β-globin (Carter et al., 2002; Tolhuis et al., 2002), and (transcribed) scs contacts (transcribed) scs' (Blanton et al., 2003).
In this model, a distant regulator (a polymerizing complex) can easily contact the polymerase it regulates. Regulators would have to associate with the appropriate factories before they could exert their effects, because different factories specialize in transcribing different genes. Thus, some contain only RNA polymerase I, others contain only RNA polymerase II, and still others contain only RNA polymerase III (Pombo et al., 1999). Factories containing one kind of polymerase specialize even further. For example, a subset of polymerase II factories are rich in PSE-binding transcription factor (PTF) and Oct1, and they associate with particular chromosomes – presumably those carrying transcription units regulated by these factors (Pombo et al., 1998). In addition, genes encoding the histones – and U1, U2 and U3 snRNA – often lie near Cajal bodies (Spector, 2003), again presumably associated with factories dedicated to their transcription. It may even be that an individual factory could become dedicated to the transcription of one gene plus its flanking nongenic transcription units (for example, the β-globin gene, its LCR, plus other nongenic units). Only three types of `functional' tie currently need to be invoked, although more may have to be added. These would involve (1) nontranscribed motifs (promoters, and a subset of what are currently called enhancers and silencers) bound to their target proteins in a factory, (2) transcribing DNA (including genic introns and exons, silencers, insulators, barriers, and a subset of what are currently called enhancers/LCRs) bound to active polymerases in a factory, and (3) appropriately modified nucleosomes out in the body (or bight) of a loop to other complementary nucleosomes in heterochromatin and/or the lamina (Polioudaki et al., 2001). The first would be transient, the second would persist for as long as the unit was transcribed (which is in the order of minutes) (Kimura et al., 2002) and the third could be almost permanent (because histones H3 and H4 in deep heterochromatin only exchange over a period of many hours) (Kimura et al., 2001).
This organization has important consequences (Cook, 2002). In the nucleus of a HeLa cell the concentration of soluble RNA polymerase II is ∼1 μM, but the local concentration in a factory is 1000-fold higher. Because a promoter can diffuse ∼100 nm in 15 seconds, one lying near a factory is likely to initiate; moreover, when released at termination it will still lie near a factory, and the movement and modifications (e.g. acetylation) accompanying elongation will leave it in an `open' conformation. Another promoter out in a long loop (i.e. in the bight) is less likely to initiate because the promoter concentration falls off with the cube of the distance from the factory. Moreover, a long tether will buffer it from transcription-induced movement, making it prone to deacetylation, deposition of HP1 and incorporation into heterochromatin (Fig. 4). And because heterochromatin has an affinity for both the lamina (Polioudaki et al., 2001) and other heterochromatic regions (e.g. flanking ribosomal cistrons), inactive genes will inevitably be drawn to the periphery or to nucleoli (Cockell and Gasser, 1999; Galy et al., 2000; Spector, 2003). The context around a promoter will then be self-sustaining: productive collisions of an active promoter with the factory will attract factors increasing the frequency of initiation, and the longer an inactive promoter remains inactive the more it becomes embedded in heterochromatin.
The probability that promoters collide productively with a factory is increased by increasing promoter mobility (by `opening' chromatin), increasing promoter-factory affinity (through binding of appropriate factors; Fig. 6A,B) and reducing promoter-factory distance (by shortening the tether; Fig. 6C,D) (Iborra et al., 1996). However, transcriptional interference will occur if the tether becomes too short (Fig. 6E). Some active transcription units will also be barriers separating euchromatin from heterochromatin (Fig. 6F). A promoter embedded in a long heterochromatic loop then becomes active by progressively activating units deeper and deeper into the loop; this `opens' chromatin, subdivides the loop into ever smaller ones and brings the promoter closer to the factory. Only then can it compete effectively with others in the vicinity for polymerases in the factory. Here, binary switches do not activate genes flanked by discrete barriers; rather, a surrounding pattern of transcription encodes a fuzzy logic that specifies the probability of initiating.
Many models invoke looping to explain how the chromatin fibre is organized during interphase and how regulatory motifs act at a distance. However, the two sets of models usually involve different molecular ties, largely because cell biologists working on different nuclear isolates have identified different ties – and then molecular biologists distrust the conflicting models they suggest. The model described here integrates aspects of both sets. It is based on the remarkable facts that clusters of active polymerases seem both to mediate action at a distance and to constitute the ties. It has several advantages. First, it is general, applicable to the range of motifs discussed and from yeast to man (although distance from the factory and nongenic transcription play larger roles as genome size increases). Second, it is consistent with current views on how everchanging interactions underlie self-assembling, but persistent, architectures (Misteli, 2001). Third, it suggests new possibilities. For example, it is difficult to explain why some eukaryotic genes and their regulatory motifs are so long (e.g. in BX-C). But, if this model applies, the activation of a long unit will tether the promoter close to a factory for longer, and this gives more time for that promoter to reinitiate; then, a longer elongation time is offset by a shorter time between initiations. Similarly, RNA polymerase II transcribes well beyond the poly(A) signal, but this apparently irrelevant transcription will nevertheless continue to tether the promoter close to the factory and so increase the rate of re-initiation (within any constraints imposed by transcriptional interference); polymerases paused anywhere in a transcription unit will have similar effects. Fourth, it highlights the shortcomings of current assays being used to analyze regulatory motifs, and helps to explain why the results obtained have been so difficult to interpret. These assays usually involve moving a test sequence to a different genomic region where a different (and usually unknown) set of flanking transcription units will inevitably exert their effects. In future, the first step in this kind of analysis should be the characterization of all transcription units in both the original and final locations. By the same token, analysis of the regulation of a gene should begin with the systematic characterization of all the flanking transcription units. Fifth, the model is testable. Thus, (1) most regulatory motifs should be transcription units, (2) point mutations knocking out their promoters should abolish their activity, and (3) adding/deleting promoters should affect expression of neighbouring genes.
The challenge now is to predict whether a gene is likely to be active from the DNA sequence. Unfortunately, we are still a long way from being able to do so in higher eukaryotes. At the very least, we need to know the location of all flanking transcription units, their relative activities, the concentration of the required transcription factors, how the length of the tether affects initiation rates and how heterochromatinization reduces DNA mobility. Only then can we predict which factories might be in the vicinity, and how the complex balance of conflicting forces that we know as context might be resolved.
I thank my colleagues for helpful discussions and Cancer Research UK and The Wellcome Trust for support.