ABSTRACT
We have used a chromatin immunopurification approach to identify target genes regulated by the homeotic gene Ultrabithorax. A monoclonal antibody against the Ultrabithorax gene product is used to immunopurify in vivo Ultrabithorax protein binding sites in embryonic chromatin. The procedure gives an enrichment of sequences with matches to a consensus homeodomain binding site. In one case we have shown that an immunopurifled sequence lies within a 4 kb fragment that acts in vivo as a homeotic response element. We anticipate that this approach will enable us to identify further targets, allowing the analysis of their regulation and function. The chromatin immunopurification strategy may be of general application for the identification of direct in vivo targets of DNA-binding proteins.
INTRODUCTION
A decade or so of intense research on the generation of patterns of gene expression in the early Drosophila embryo has provided the elements of a hierarchical system of transcriptional regulation (reviewed by Akam, 1987; Ingham, 1988). In the anteroposterior axis, the gradient of the bicoid transcription factor is interpreted by the gap genes to produce overlapping bands of expression of gap gene products and this pattern is, in turn, interpreted by the pair-rule genes to produce stripes. The overlapping stripe patterns of expression of the family of pair-rule genes are then used to establish the more refined stripes of expression of segment polarity genes such as engrailed. Transcriptional regulation by the products of gap, pair-rule and segment-polarity genes generates the expression patterns of the next class of genes in the hierarchy, the homeotic genes. The homeotic genes are expressed in overlapping domains in the anteroposterior axis in rather complex patterns such that different segments in the developing fly express specific combinations of homeotic gene products (Fig. 1; Lewis, 1978; reviewed by Akam, 1987). The products of the homeotic genes are transcription factors and thus extend the cascade of transcriptional regulation; however at this point our knowledge of the hierarchy of control breaks down and we are largely ignorant of the nature of the genes at the next level. The analysis of mutant phenotypes that played such an important role in the identification and ordering of genes in the hierarchy thus far, unfortunately provides few clues as to the identity of the genes directly subordinate to the homeotics.
Loss of function of an individual homeotic gene causes a host of morphogenetic consequences for the development of particular segments. For example, mutations in the Ultrabithorax (Ubx) gene of the bithorax complex cause the repeat units, parasegments 5 and 6, to develop morphologies similar to the normal parasegment 4 (Lewis, 1978; Sanchez-Herrero et al., 1985). Such mutants die in the larval stages and show transformations not only ip the epidermis, as visualised by alterations in the differentiation and patterning of cuticular elements, but also in the peripheral and central nervous systems and in the somatic and visceral musculature (Hartenstein, 1987; Teugels and Ghysen, 1985; Hooper, 1986; Bienz and Tremml, 1988). Analysis of the developmental consequences of homeotic mutations indicates a wide range of cellular processes under homeotic gene control including mitotic rate, cell growth, movement and invagination, cell-cell interactions and cellular differentiation. From this it was proposed, as long ago as 1975, that the homeotic genes functioned to regulate large batteries of “downstream” or target genes (Garcia-Bellido, 1975).
We are interested in the identification of these target genes for several reasons. Firstly, in order to understand how homeotic genes control morphogenesis we need to identify the molecules through which they act. If we can follow the regulatory command chain downstream from the homeotic genes this should lead us to the molecules regulating the cellular processes of morphogenesis. Secondly, the specific combinations of homeotic gene products in different segments presumably produce a unique set of target gene activities in each segment. How they do this raises interesting questions of transcriptional regulation: do the different homeotic genes regulate the same or different sets of target genes? Do they compete for binding sites in target gene regulatory sequences? How important are interactions between homeotic gene products or between homeotic gene products and other transcription factors for binding site specificity? How do homeotic genes control the activities of characteristic sets of target genes in different tissues? The solution to all these questions depends on the identification of regulatory sequences of target genes that respond directly to homeotic gene control. The third reason for interest in the identification of direct target genes is that this is not a problem restricted to homeotic genes; in many systems, for example for transcription factors involved in oncogenesis, it is important to identify in vivo targets and we can hope, using Drosophila homeotic genes as a model system, to develop strategies for the identification of direct targets of DNA-binding proteins which will be of general application.
Looking for targets
As the homeotic genes control segmental identity, the genes that they control might be expected to show patterns of expression that vary from segment to segment. A number of cloned genes exhibit such patterns. For some, particularly cell type differentiation genes, the basis for this may be trivial; for example the myosin heavy chain gene exhibits a pattern of expression that varies from segment to segment, reflecting the segment-specific arrangments of muscles. Clearly myosin expression may simply be a secondary consequence of earlier processes under homeotic control that specify the patterns of muscle cell development within each segment. Other genes may be more interesting; for example the decapentaplegic (dpp) gene is expressed in a number of different tissue types at different stages in development, and is expressed in the visceral mesoderm in a parasegmentally modulated pattern which includes a specific band of expression in parasegment 7. The homeotic gene Ubx is normally expressed in parasegment 7 in the visceral mesoderm and Ubx function is required both for the formation of a specific morphological feature, the second midgut constriction, and for the expression of dpp (Immergluck et al., 1990; Reuter et al., 1990). The dpp gene shows homology to the the TGF-p family of vertebrate growth factors and thus provides a clue to at least one class of downstream functions. However, it is not known whether dpp is directly or indirectly regulated by Ubx.
The technique of P-element mediated enhancer detection makes it possible to screen for genes of interest on the basis of their patterns of expression (O’Kane and Gehring, 1987). Screening for genes whose pattern of expression shows seg-ment-to-segment modulation should produce candidate downstream genes. Wagner-Bernholtz and colleagues (1991) have carried out such a screen, looking for genes whose expression differs between antennal and leg imaginai discs, and have identified the spalt major gene as a direct or indirect target of the homeotic gene, Antennapedia (Antp). We have carried out a similar screen on embryos looking for segment-to-segment modulations in expression. Screening several hundred p-galactosidase-expressing lines produced a small number of interesting lines. Two of these are illustrated in Fig. 2. In both cases transcription units lying close to the P-element insertion site exhibited patterns of RNA expression similar to the P-galactosidase expression from the enhancer detector line. The B5 line identified a transcript with a complex expression pattern. In the central nervous system, it is expressed most abundantly in the thoracic neuromeres Tl, T2 and T3, and in the epidermis it also shows different patterns from segment to segment including a prominent patch of expression specific to Tl. The transcript identified by the line T6 is more restricted to the nervous system where, like B5, it is most prominently expressed in the thoracic neuromeres. The T6 insertion has a wings held-out phenotype when homozygous and fails to complement eagle mutations, suggesting that T6 may be allelic to eagle (Lindsley and Zimm, 1985). The expression patterns of both B5 and T6 are altered in mutants defective in the genes of the bithorax-complex and thus are directly or indirectly controlled by homeotic genes. Such downstream genes provide useful molecular markers for segment-specific morphogenesis and may provide insights into the gene functions regulated by the homeotics. This analysis could also potentially provide an answer to a major question concerning downstream genes: how many are there? In the largest published P-element mediated enhancer detection screen 7 lines out of about 4,000 showed segment-segment differences not ascribable simply to tissue-type differentiation (Bier et al., 1989). This suggests a rather low number of downstream genes by these criteria but these figures should be treated with extreme caution as it is clear that the P-element mediated enhancer detection technique gives a biased sampling of the genome.
The enhancer detection can usefully provide downstream genes, but it does not immediately lead to direct targets of homeotic control. A strategy for the identification of such direct targets could be based on the location of in vitro binding sites for homeotic gene products in genomic DNA. However, homeodomain proteins bind rather promiscuously to DNA in vitro, and analysis of the DNA sequence requirements for binding a Ubx product in vitro revealed little sequence preference outside a core TA AT motif (Ekker et al., 1991). We have chosen to pursue an in vivo approach designed to isolate DNA sequences occupied by Ubx protein in embryonic chromatin (Gould et al., 1990).
The immunopurification strategy
The approach is outlined in Fig. 3. Briefly, embryonic nuclei are prepared according to the low salt method of Wu (1984); the nuclei are digested with Haelll and then the soluble chromatin fragments are released by lysis with EDTA. This soluble chromatin is then subjected to an immuno-affinity purification using anti-Ubx antibody in order to enrich for chromatin fragments containing bound Ubx protein. DNA is extracted from the control and experimental immunomatrix beads and cloned into Ml3 phage. Several experiments gave a small (approximately two-fold) increase in number of recombinant phage in the experimental (+Ab) versus the control (—Ab). To ask whether specific sequences were more highly enriched than bulk DNA we sequenced clones from the +Ab and —Ab pools. We then screened the sequences for matches to the home-odomain consensus binding sites derived from footprint analysis with engrailed protein (Desplan et al., 1988). The results of this screen are shown in Fig. 4. There is a clear enrichment for matches to the homeodomain binding consensus in the +Ab population. In addition, the enrichment is highly sequence-specific; if the preference for A over T in position four of the consensus is reversed then no enrichment is seen (Fig. 4B).
This sequence analysis suggests that the immunopurification successfully enriches for Ubx binding sites and the next question is whether these binding sites lie adjacent to transcriptional units under Ubx control. To address this, we took the six clones containing the top scoring matches to the consensus sequence and used these as probes to isolate flanking phage-lambda genomic clones. The lambda clones were then used as probes for in situ hybridization on embryos to detect any transcript expression. In four cases we identified embryonic transcripts within the range of the lambda clones. The distributions of RNA expression from two of these transcription units are shown in Figs 5 and 6. Transcript #48 has a rather complex and dynamic pattern.
It is initially expressed in a ventral stripe roughly corresponding to the mesodermal anlage at the cellular blastoderm stage. After invagination of the ventral furrow expression shuts off in the mesoderm but expression is activated in the lateral and dorsal ectoderm. In the extended germ band there is a second wave of expression in the mesoderm and it is not until germ band retraction that the first clearly segmentally modulated pattern appears. This expression (Fig. 5) is in epidermal stripes and in the ventrolateral epidermis there is a weak stripe in the first thoracic segment (T1), a very weak stripe in T2, a weak stripe in T3 and then stronger stripes in the abdominal segments. At later stages the #48 transcript is expressed in a segmentally modulated pattern in the visceral mesoderm with predominant expression in parasegment 7, the repeat unit under Ubx control in this tissue. Transcript #35 appears initially in segmentally repeated subsets in the visceral mesoderm, then towards the end of the extended germ band stage it is expressed in the somatic mesoderm predominantly in the thoracic segments (Fig. 6). Following the expression in the somatic mesoderm reveals that the labelled cells are precursor cells for specific sets of ventral and pleural muscles. In the central nervous system expression also appears around the time of germ band retraction but it initially shows little segment-to-segment modulation in intensity. However, by stage 16 the most prominent labelling is in the thoracic neuromeres.
Thus, both transcription units #35 and #48 show segmentally modulated patterns of expression consistent with regulation by the homeotic gene Ubx. In the case of #35, expression is low in the Ubx expression domain suggesting a repressive control, and with #48 expression is high in the Ubx domain indicating a positive control. The alterations in the expression patterns of #35 and #48 in Ubx mutant embryos are also consistent with a direct regulatory role of the Ubx gene on their expression.
Regulation of the #35 transcription unit
The immunopurification procedure appears to have successfully led us to transcription units plausibly under the direct regulation of the homeotic gene Ubx. However the case would be considerably more compelling if we could demonstrate that the immunopurified DNA fragments, which are putative in vivo binding sites of the Ubx protein, are indeed components of regulatory elements under Ubx control. We have approached this question for the #35 transcription unit (Gould and White, 1992).
We first asked whether the 110 bp #35 immunopurified fragment had led us to a regulatory region capable of mediating homeotic control. The 110 bp fragment lies some 7 kb to the 3’ side of the #35 transcription unit and thus is not in a region that one would immediately expect to harbour a control element. However, we took a 4 kb restriction fragment, which included the 110 bp immunopurified sequence, and tested its ability to drive expression of 0-galactosidase from a minimal heat-shock promoter in transformed fly lines. The expression pattern produced by this construct is illustrated in Fig. 7. The 4 kb fragment drives 0-galactosidase expression in a subset of the #35 expression pattern and, importantly, shows a highly segment-to-seg-ment modulated pattern of expression. The β-galactosidase activity mimics two parts of the endogenous #35 expression pattern with expression in gnathal sense organs and in specific muscle precursor cells in the thoracic segments. No 0-galactosidase activity is seen in the central nervous system as presumably the control sequences responsible for expression in this tissue lie outside the 4 kb fragment used here.
The pattern of expression driven by the 4 kb fragment shows that the immunopurification strategy has led us to a sequence that acts as a homeotic response element. Indeed we have shown, by crossing the 4 kb construct into homeotic mutant backgounds, that the segment-to-segment modulation in the expression of the 4 kb construct is dependent on homeotic gene control. Several homeotic genes appear to be involved, with Antennapedia (Antp) and Sex combs reduced (Scr) acting to activate expression and Ubx and abdominal-A (abd-A) acting as repressors.
We also wanted to determine whether the 110 bp clone 35 sequence, immunopurified as an in vivo Ubx binding site, was an important component of the homeotic response element within the 4 kb fragment. We constructed a precise deletion of the 110 bp sequence from the 4 kb fragment and transformed this deletion construct (called 4kb-35) into flies. The effect of the deletion was to reduce the intensity of expression predominantly in the somatic mesodermal cells of the thoracic segments; the strong, consistent expression in these cells seen with the 4 kb construct was replaced by weak and variable expression with the 4kb-35 lines. Thus the 110 bp fragment appears to be an important component of the regulatory sequences contained within the 4 kb fragment. It does not, however, define a Ubx specific regulatory module. It might have been anticipated, for example, that the removal of the 110 bp sequence, containing a putative in vivo Ubx binding site, would have been equivalent to removal of Ubx function. In that case the expression from the deletion construct would have been similar to the expression of the entire 4 kb construct in a Ubx mutant backgound. However, in the former, the thoracic expression is reduced and in the latter the expression is derepressed and extends into anterior abdominal segments.
Our interpretation is that the 110 bp is involved in the regulation by several homeotic genes and its deletion affects, not only Ufcc-dependent repression, but also positive controls by Antp and Scr. Also, as deletion of the 110 bp sequence weakens rather than abolishes the thoracic expression, it is likely that there are other sites of homeotic control within the 4 kb fragment.
DISCUSSION
The chromatin immunopurification strategy appears to have successfully guided us to a homeotic response element and provides us with a method for the identification of further target genes allowing us to study both their regulation and function. Whilst we have concentrated so far on the validation of the approach as a means for the identification of direct targets, we can ask at this juncture what have we learnt of the nature of target genes and of their regulation.
The sequence of the #35 gene reveals that it is a member of the leucine-rich repeat family and it contains 10 leucine-rich repeats (Gould and White, 1992). It contains a signal sequence but no transmembrane domain. The COOH-ter-minal has a hydrophobic sequence suggesting that the molecule may be linked to the membrane via a glycolipid anchor (Ferguson and Williams, 1988). We have expressed the #35 product in tissue culture cells and shown that it is cleavable from the membrane by glycosylinositol specific phospholipase C. The presence of leucine-rich repeats does not immediately provide a specific function for the #35 protein: leucine-rich repeats appear to be involved in proteinprotein interactions but occur in proteins of a wide variety of functional types (Takahashi et al., 1985; Kataoka et al., 1985; Titani et al., 1987). However, in Drosophila, two cell-surface, leucine-rich repeat molecules, chaoptin and Toll, have been shown to be capable of mediating cell-cell adhesion (Krantz and Zipursky, 1990; Keith and Gay, 1990). Chaoptin is required for the organization of the Drosophila eye and like the #35 gene product it is linked to the cellsurface via a glycosyl-phosphatidylinositol anchor. We have shown that the #35 gene product, when expressed in the non-adherent Drosophila S2 cell line, is capable of mediating homophilic cell-cell adhesion. This becomes more interesting when taken together with results from Nose et al. (1992), who independently isolated the #35 gene. Using an antibody to the gene product, they have demonstrated that the #35 protein is expressed on a subset of motoneurons and also on their specific target muscles both prior to, and immediately following, innervation. This strongly suggests that the #35 gene product has a role in the formation of specific neuromuscular contacts and the #35 gene has been named connectin. However, at present we have no mutations in the connectin gene to test its requirement for neuromuscular development.
The #48 gene sequence reveals a transmembrane protein with limited sequence similarity to the EGF-receptor family. Its developmental role is unknown but several lethal complementation groups have been mapped in the vicinity of the #48 transcription unit at 97D.
The preliminary analysis of these two target genes allows us to make a couple of points. Firstly, the targets we have identified are not transcription factors, and thus moving downstream of homeotic genes we are moving away from the cascade of transcriptional regulation responsible for the early patterning processes in Drosophila development. Secondly, as one of the characterized gene products is a cell surface molecule capable of mediating cell-cell adhesion and the other is a transmembrane protein, it appears that the identification of target genes will provide a useful entry point into processes of cell-cell recognition and communication that must be important in regulating morphogenesis.
What do these genes tell us about how homeotic genes control the activities of target genes? It seems that neither of the genes that we have identified are uniquely controlled by the Ubx homeotic gene. These genes were identified on the basis of their proximity to putative Ubx in vivo binding sites; however, their patterns of expression, in wild-type and homeotic mutant backgrounds, indicate that several homeotic genes participate in their regulation. Some homeotic genes activate and others repress; in the case of expression from the 4 kb connectin regulatory element, Scr and Antp activate transcription whereas Ubx and abd-A repress. Also, with respect to different target genes, a particular homeotic gene can act as either an activator or a repressor; thus, Ubx activates #48 gene expression but represses expression from the (#35) connectin gene. With the identification of DNA sequences that respond to homeotic control we should be able to analyse the molecular basis of these interactions. We want to define the functional homeoprotein binding sites and to ask whether different homeodomain proteins act on the same or different sites. There are currently two complicating factors; firstly the homeotic genes interact and so a mutation in a particular homeotic gene may alter the expression from a homeotic response element indirectly through its affect on another homeotic gene; secondly, the homeotic response element that we have defined is 4 kb long and will need to be dissected into smaller functional units to allow the identification of binding sites and their analysis by mutagenesis.
Will the chromatin immunoaffinity purification approach be generally useful as a method for the isolation of target genes? The method depends on the availability of specific antibody and on a stable interaction between the DNA-binding protein and its binding site in vivo in chromatin and under conditions allowing specific antibody binding. In addition we relied on a second selection based on sequence matches to an in vitro-defined binding consensus. Measurements of the affinity of binding have been made for the Antp homeodomain. It binds with relatively high affinity (KD in the range IO−9 to 10’10 M) to a consensus homeodomain binding site and the interaction is also relatively stable, with the half-life of the homeodomain-DNA complex estimated to be approximately 90 min (Affolter et al., 1990). However, in comparing Antp with other DNA-binding proteins it should be recognised that it is difficult to extrapolate from the in vitro to the in vivo situation and the binding of many transcription factors in chromatin may be stabilized by protein-protein interactions. The immunopurification approach may also be useful for determining whether a DNA-binding protein interacts directly in vivo with a particular sequence by assaying the enrichment of the sequence on immunopurification. Enrichment could be assayed by representation in a library derived from the immunopurified DNA or by directly assaying the amount of a given sequence in the immunopurification by, for example, PCR.
Whilst the approach may be useful for the identification of target genes of homeotic control in other organisms, and it would be particularly interesting to identify target genes in vertebrates, it may also be possible to use the target genes isolated in Drosophila to search for homologues in other organisms. As there is growing evidence that the Hox gene clusters in vertebrates serve similar functions to their Drosophila homeotic gene counterparts (reviewed by McGinnis and Krumlauf, 1992), it seems likely that the functions of target genes may also be evolutionarily conserved.
ACKNOWLEDGEMENTS
This work was supported by the Medical Research Council and the Wellcome Trust. The T6 transformant line was kindly provided by Maria Leptin and Michael Wilcox.