ABSTRACT
Central nervous system midline cells constitute a discrete group of Drosophila embryonic cells with numerous functional and developmental roles. Corresponding to their separate identity, the midline cells display patterns of gene expression distinct from the lateral central nervous system. A conserved 5 base pair sequence (ACGTG) was identified in central nervous system midline transcriptional enhancers of three genes. Germ-line transformation experiments indicate that this motif forms the core of an element required for central nervous system midline transcription.
The central nervous system midline element is related to the mammalian xenobiotic response element, which regulates transcription of genes that metabolize aromatic hydrocarbons. These data suggest a model whereby related basic-helix-loop-helix-PAS proteins interact with asymmetric E-box-like target sequences to control these disparate processes.
INTRODUCTION
Development of the nervous system in a bilaterally symmetric organism critically depends on cells that lie on its midline. Midline cells in various metazoans have been shown to be important as functional neurons and glia, as sources of signals for migrating commissural growth cones and for patterning adjacent tissues during embryogenesis (reviewed in Nambu et al., 1993). Consequently, these cells often display patterns of gene expression distinct from lateral neuroectoderm. Single gene mutants and lineage analysis in the fruit fly Drosophila melanogaster have begun to decipher the genesis and function of CNS midline cells (Nambu et al., 1993).
CNS midline gene expression can be temporally divided into three embryonic components: (1) initial mesectodermal (MEC) expression in the blastoderm and gastrula (stages 5-7), (2) midline precursor (MLP) expression (stages 8-11) and (3) later midline expression (stages 12-17), which is restricted to subsets of differentiated cells such as the midline glia (MLG) and neurons. MEC expression is controlled by dorsal/ventral patterning genes (Nambu et al., 1990; Kosman et al., 1991; Leptin, 1991; Rao et al., 1991), whereas MLP expression requires activity of the single-minded (sim) gene product, which encodes a basic-helix-loop-helix-PAS (bHLH-PAS) transcription factor (Nambu et al., 1990, 1991). Later midline expression is more complicated, genetically dependent on sim and other transcription factors (Nambu et al., 1993).
Previous experiments identified CNS midline enhancers from the sim, Krüppel, slit (sli), Toll (Tl) and rhomboid (also known as veinlet) genes (Nambu et al., 1990; Hoch et al., 1992; Ip et al., 1992b; Wharton and Crews, 1993). In this paper, we focus on three of these genes: sim, sli and Tl. Genetic analysis has shown that sim positively autoregulates its own expression in MLP cells beginning around stage 8 (Nambu et al., 1991). Germ-line transformation experiments demonstrated that a 3.7 kilobase (kb) fragment of the sim gene, fused to the E. coli lacZ gene, drives β-galactosidase expression in CNS midline cells in a pattern indistinguishable from the endogenous sim gene (Nambu et al., 1991). The Tl gene contains a 900 base pair (bp) fragment that drives CNS midline expression of β-galactosi-dase beginning at stage 9 when coupled to a lacZ enhancer-tester vector (Wharton and Crews, 1993). The sli gene has a 380 bp fragment that drives β-galactosidase expression later in development in MLG (Wharton and Crews, 1993).
Genetic evidence suggests that these enhancers are targets of the sim gene product (Nambu et al., 1990, 1991). The DNA-binding properties of SIM have not been demonstrated, but an educated guess can be entertained by comparison to related bHLH transcription factors. In many examples studied, the bHLH domain mediates dimerization and DNA binding to the consensus E-box (5′ CANNTG 3′) (Murre et al., 1989). The adjacent SIM PAS domain is thought to mediate protein-protein interactions (Huang et al., 1993). The SIM bHLH and PAS domains closely resemble those of two subunits of the mammalian Aromatic Hydrocarbon Receptor Complex (AHRC) (Hoffman et al., 1991; Nambu et al., 1991; Burbach et al., 1992; Ema et al., 1992). The AHRC is a ligand-dependent transcription factor that induces expression of enzymes involved in drug and carcinogen metabolism (Whitlock, 1990). The AHRC interacts with the consensus xenobiotic response element (XRE) (5′ TNGCGTG 3′) present in multiple copies in responsive gene enhancers (Denison et al., 1988). The nuclear form of the AHRC is composed of two bHLH-PAS proteins, Aromatic Hydrocarbon Receptor (AHR) and AH Receptor Nuclear Translocator (ARNT), that bind to the XRE as a heterodimer (Reyes et al., 1992; Matsushita et al., 1993; Dolwick et al., 1993; Probst et al., 1993).
Sequence analysis of the sim, sli and Tl regulatory regions revealed a conserved sequence ACGTG that resembles the XRE. Mutation of these sequence elements followed by germ-line transformation revealed that this element is required for CNS midline transcription for all three genes. Multimerization of a 20 bp DNA fragment containing the ACGTG sequence is sufficient for MLP transcription. These experiments further reinforce the functional similarity between control of CNS midline transcription and aromatic hydrocarbon metabolism. They also suggest a model for how related bHLH-PAS domain proteins control transcription of these two processes.
MATERIALS AND METHODS
sim gene deletion constructs
Deletion constructs of the 3.7 kb sim early promoter region were generated using an Erase-a-base kit (Promega). The 5′ deletion fragments of 2.8, 2.1, 1.5 and 0.9 kb were cloned into Casper-AUG-β-gal (Thummel et al., 1988), and the 5′-ends sequenced to localize the deletion site.
Isolation of Drosophila virilis sim gene
The Drosophila melanogaster 2.8 kb fragment containing the sim early promoter is sufficient for MEC and MLP expression (see Results), and the corresponding DNA was desired from Drosophila virilis. The Drosophila virilis sim homolog was cloned from a genomic library (kindly provided by John Tamkun) using a sim bHLH-PAS cDNA probe. Two overlapping genomic clones were isolated and restriction mapped. The Drosophila virilis region corre-sponding to the Drosophila melanogaster 2.8 kb early promoter region was sequenced and aligned to the Drosophila melanogaster sequence using the University of Wisconsin GCG program COMPARE (Devereux et al., 1984). This analysis identified 13 conserved sequence blocks, 20 to 60 bp long, of greater than 90% identity interspersed by nonconserved sequence. The order of the conserved sequence blocks was preserved between the two species, but spacing of the blocks varied.
DNA sequence analysis
The sim 2.8 kb, sli 1.0 HV and Tl 1.4 RR (Wharton and Crews, 1993) restriction fragments that include the 380 bp sli and 900 bp Tl sequences were bidirectionally deleted using an Erase-a-base kit and sequenced by the dideoxynucleotide chain termination method. These sequences were examined for the presence of elements related to the XRE (TNGCGTG). Because of SIM’s arginine residue at basic region position 13, which specifies a CG dinucleotide pair in E-box response elements (Dang et al., 1992; Ferre-D’Amare et al., 1993), sequences containing CG dinucleotide pairs in the sim, sli and Tl enhancers were aligned using University of Wisconsin GCG program FIND (Devereux et al., 1984). These sequences were then examined for the presence of asymmetric E-box-like motifs.
Site-directed mutagenesis
Oligonucleotide site-directed mutagenesis (Amersham) was performed on the 900 bp Tl 950 Rd fragment (Tl950 refers to the name of the construct published in Wharton and Crews, 1993, whereas the size of the fragment from sequence analysis is 900 bp). Sites were either mutated individually or all four sites were altered together. Each NACGTG motif was mutated to the BamHI restric-tion site GGATCC. The mutant clones residing in Bluescript II KS were cut with NotI and KpnI, sequenced to verify the mutated sequence and cloned into the lacZ enhancer-tester vector C4PLZ (Wharton and Crews, 1993). Three independent transformants of a construct that harbored a specific mutation in Tl site 2 failed to produce any tissue-specific staining. Since a deletion of Tl sites 2 and 3 shows normal salivary gland placode staining and low levels of midline staining (Wharton, 1992), the Tl site 2 mutation may represent an artifact associated with the mutagenesis, cloning or transformation. Site-directed mutagenesis of sli fragment 380 dNc (Wharton and Crews, 1993) replaced its single AAACGTG motif with the HindIII restriction site TAAGCTT. The mutant fragment was sequenced and cloned into NotI-KpnI cut C4PLZ. Site-directed mutations in the 2.8 kb sim fragment replaced each NACGTG motif with a BamHI site. The mutant construct was sequenced and cloned into Casper-AUG-βgal (Thummel et al., 1988).
Multimerization of CME
Two 24-mers that duplicate opposite strands of the 20 bp encompass-ing Tl site 4 were synthesized. The coding strand oligonucleotide was 5′ ctagAAATTTGTACGTGCCACAGA 3′ and the complementary strand was 5′ ctagTCTGTGGCACGTACAAATTT 3′ (non Tl sequence that introduces an XbaI half-site is in small case; ACGTG motif is underlined). These oligos were annealed and multimerized with T4
DNA ligase. Multimers of 3 and 4 were visualized with UV light, isolated on a 15% polyacrylamide gel and purified by standard techniques. Multimer fragments were cloned into XbaI-cut Bluescript II KS‐ and sequenced. Clones containing 3 and 4 copies of Tl site 4 were cloned into NotI-KpnI cut C4PLZ followed by germline transformation. The P[4X950Tl4] element was crossed into a sim null mutant (simH9) background and stained for β-galactosidase immunoreactivity (Nambu et al., 1990). The P[4X950Tl4] element was also crossed into a strain containing multiple copies of a sim cDNA fused to an hsp70 heat-shock promoter (Nambu et al., 1991). Heat induction was similar to that described previously (Nambu et al., 1991; Franks and Crews, 1994): 2-to 4-hour-old embryos were treated for 1 hour at 37°C, allowed to recover for an additional 2-4 hours at 25°C and then stained for β-galactosidase immunoreactivity.
Germ-line transformation and detection of lacZ expression
P-element constructs were introduced into the Drosophila germ line by microinjection as described by Rubin and Spradling (1982) with Pp25.7wc used as a source of transposase (Karess and Rubin, 1984). w1118 flies were used as the transformation strain, since all P elements used in this study contained the white mini-gene. Embryos were stained for β-galactosidase protein with a monoclonal antibody to β-galactosidase (Promega) (Nambu et al., 1991) or for lacZ tran-scripts with a lacZ antisense RNA probe (Tautz and Pfeifle, 1989; Kasai et al., 1992). At least three identically staining independent transformants for each construct were analyzed to control for position effects.
RESULTS
Conserved sequence motif in CNS midline enhancers
Given the functional relationship between SIM, the AHRC proteins and other HLH proteins, the midline enhancers from sim, sli and Tl were sequenced and searched for sequence motifs related to the XRE (TNGCGTG) and E-boxes (CANNTG). DNA corresponding to 2.8 kb of the sim gene was also sequenced in a related species Drosophila virilis, since conserved flanking sequences between homologous genes often identify regulatory elements (Bray and Hirsh, 1987). A candidate sequence motif, ACGTG, was found five times in sim within the 2.8 kb fragment, four times in the 900 bp Tl fragment (referred to as 950Tl; Wharton and Crews, 1993) and once in the 380 bp sli fragment (Fig. 1). Four of five sites in sim were conserved between Drosophila melanogaster and Drosophila virilis.
Sequence alignment of ACGTG motifs. The sequence ACGTG was found 10 times in the three CNS midline enhancers. The sim gene contains five motifs within a 3.7 kb fragment, and sim motifs 2-5 are identical between Drosophila melanogaster and Drosophila virilis. Four ACGTG motifs were found within the 900 bp Tl fragment. The sli 380 bp fragment contains a single motif. At each nucleotide position, nine out of ten sequences yield the extended consensus (G/A)(T/A)ACGTG.
Sequence alignment of ACGTG motifs. The sequence ACGTG was found 10 times in the three CNS midline enhancers. The sim gene contains five motifs within a 3.7 kb fragment, and sim motifs 2-5 are identical between Drosophila melanogaster and Drosophila virilis. Four ACGTG motifs were found within the 900 bp Tl fragment. The sli 380 bp fragment contains a single motif. At each nucleotide position, nine out of ten sequences yield the extended consensus (G/A)(T/A)ACGTG.
Germ-line transformation shows requirement of conserved sequence element for CNS midline transcription
Germ-line transformation was utilized to test whether these motifs were required for CNS midline transcription. Specific mutations were introduced into the ACGTG sequences in DNA fragments of sim, sli and Tl that drive CNS midline transcription. Mutagenized fragments were fused to P element promoter-fusion (Thummel et al., 1988) or enhancer-tester vectors (Wharton and Crews, 1993), and introduced into germ-line DNA by microinjection. Embryos collected from homozy gous transformed strains were stained using anti-β-galactosidase antibody or by in situ hybridization with a lacZ probe.
Tl
The P[950Tl] construct drives β-galactosidase expression in the CNS midline cells during embryonic stages 9-13 and is also expressed in the salivary gland placode, epidermis and gut (Wharton and Crews, 1993) (Fig. 2A,E). Specific mutagenesis of a single site, either 1, 3 or 4, shows a significant decrease in CNS midline expression (Fig. 2B,E). When all four sites are mutated, CNS midline expression is absent (Fig. 2C,E). Expression in other tissues, such as the salivary gland placode, is unaffected in all constructs providing an internal control. This experiment demonstrates that ACGTG motifs con-tribute to Tl expression within MLP cells.
Tl CMEs are necessary and sufficient for CNS midline expression. (A-D) Embryos containing P[950Tl/lacZ] constructs are stained with an antibody against β-galactosidase followed by HRP immunohistochemistry. Ventral surface is shown with anterior to the left. Scale bar, 50 µm. (A) Staining of P[950Tl] stage 10 embryo shows MLP (arrowhead) and salivary gland placode expression (arrow). (B) Staining of P[950Tl.mut1] stage 10 embryo, which has one of the four CMEs mutated, shows weak MLP expression (arrowhead) although salivary gland placode expression (arrow) is unaffected. Constructs P[950Tl.mut3] and P[950Tl.mut4] stain similarly. (C) Staining of P[950Tl.mut1-4] stage 10 embryo, which has four CMEs mutated, shows an absence of MLP expression (arrowhead), although salivary gland placode expression (arrow) is unaffected. (D) Staining of P[4X950Tl4] shows strong and specific MLP expression in a stage 10 embryo. This strain harbors four multimerized 20 bp fragments containing Tl site 4 driving lacZ. (E) Genomic map of the Tl gene (Wharton and Crews, 1993) showing location of the 950Tl fragment and staining summary of each construct. Raised blocks represent exons with coding sequences filled and non-coding sequences unfilled. The location of the translational start and termination sites are marked with a ‘M’ and ‘X’, respectively. The 950Tl fragment that confers CNS midline expression lies approximately 1.5 kb upstream from the start of transcription (arrow). Shown below are the different constructs analyzed in this paper. The first five constructs utilized the 950Tl fragment fused to C4PLZ, which contains a weak P-element promoter fused to lacZ. This fragment has four CMEs, labeled 1-4. Specific mutations were generated at the sites marked with an ‘X’. The bottom two constructs utilized multiple copies of a 20 bp fragment incorporating Tl site 4. P[4X950Tl4] had four copies tandemly linked to C4PLZ and P[3X950Tl4] had three copies. Each box with an enclosed ‘4’ refers to a single copy of Tl site 4. Expression was monitored in CNS midline precursor cells (MLP) and the salivary gland placode (SG). High levels of expression are indicated by ‘+’, weak levels by ‘+/‐’ and absence of expression by ‘‐’.
Tl CMEs are necessary and sufficient for CNS midline expression. (A-D) Embryos containing P[950Tl/lacZ] constructs are stained with an antibody against β-galactosidase followed by HRP immunohistochemistry. Ventral surface is shown with anterior to the left. Scale bar, 50 µm. (A) Staining of P[950Tl] stage 10 embryo shows MLP (arrowhead) and salivary gland placode expression (arrow). (B) Staining of P[950Tl.mut1] stage 10 embryo, which has one of the four CMEs mutated, shows weak MLP expression (arrowhead) although salivary gland placode expression (arrow) is unaffected. Constructs P[950Tl.mut3] and P[950Tl.mut4] stain similarly. (C) Staining of P[950Tl.mut1-4] stage 10 embryo, which has four CMEs mutated, shows an absence of MLP expression (arrowhead), although salivary gland placode expression (arrow) is unaffected. (D) Staining of P[4X950Tl4] shows strong and specific MLP expression in a stage 10 embryo. This strain harbors four multimerized 20 bp fragments containing Tl site 4 driving lacZ. (E) Genomic map of the Tl gene (Wharton and Crews, 1993) showing location of the 950Tl fragment and staining summary of each construct. Raised blocks represent exons with coding sequences filled and non-coding sequences unfilled. The location of the translational start and termination sites are marked with a ‘M’ and ‘X’, respectively. The 950Tl fragment that confers CNS midline expression lies approximately 1.5 kb upstream from the start of transcription (arrow). Shown below are the different constructs analyzed in this paper. The first five constructs utilized the 950Tl fragment fused to C4PLZ, which contains a weak P-element promoter fused to lacZ. This fragment has four CMEs, labeled 1-4. Specific mutations were generated at the sites marked with an ‘X’. The bottom two constructs utilized multiple copies of a 20 bp fragment incorporating Tl site 4. P[4X950Tl4] had four copies tandemly linked to C4PLZ and P[3X950Tl4] had three copies. Each box with an enclosed ‘4’ refers to a single copy of Tl site 4. Expression was monitored in CNS midline precursor cells (MLP) and the salivary gland placode (SG). High levels of expression are indicated by ‘+’, weak levels by ‘+/‐’ and absence of expression by ‘‐’.
sli
The 380 bp sli fragment drives β-galactosidase expression in MLG from stages 11-17 (Wharton and Crews, 1993) (Fig. 3A,C). This construct is also expressed in cells along the midline of the frontal sac. Mutation of the single ACGTG motif in sli eliminates MLG expression although expression in the frontal sac is unaltered (Fig. 3B,C).
Mutation of the single sli CME results in loss of MLG expression. (A,B) Embryos containing P[sli/lacZ] constructs are stained as in Fig. 2. Sagittal view is shown with anterior to the left. Scale bar, 50 µm. (A) Staining of P[380sli] stage 13 embryo shows strong late midline staining, including MLG (arrowhead) and midline frontal sac expression (arrow). This unmutated P element has an intact CME (labeled ‘1’ in the box below). (B) Staining of P[380sli.mut1] stage 13 embryo, which has its CME mutated (marked with an ‘X’), shows a complete absence of CNS midline expression (arrowhead) although frontal sac expression (arrow) remains strong. (C) Genomic map of the sli gene (Wharton and Crews, 1993) showing location of the 380 bp fragment downstream of the start site of transcription (arrow) and staining summary of each construct. Each fragment was fused to C4PLZ, transformed and assayed for midline glia (MLG) and frontal sac (FS) expression.
Mutation of the single sli CME results in loss of MLG expression. (A,B) Embryos containing P[sli/lacZ] constructs are stained as in Fig. 2. Sagittal view is shown with anterior to the left. Scale bar, 50 µm. (A) Staining of P[380sli] stage 13 embryo shows strong late midline staining, including MLG (arrowhead) and midline frontal sac expression (arrow). This unmutated P element has an intact CME (labeled ‘1’ in the box below). (B) Staining of P[380sli.mut1] stage 13 embryo, which has its CME mutated (marked with an ‘X’), shows a complete absence of CNS midline expression (arrowhead) although frontal sac expression (arrow) remains strong. (C) Genomic map of the sli gene (Wharton and Crews, 1993) showing location of the 380 bp fragment downstream of the start site of transcription (arrow) and staining summary of each construct. Each fragment was fused to C4PLZ, transformed and assayed for midline glia (MLG) and frontal sac (FS) expression.
sim
Previous experiments demonstrated that a 3.7 kb fragment of sim containing the early promoter (PE) drives both blastoderm MEC and sim-dependent MLP transcription (Nambu et al., 1991; Kasai et al., 1992). Progressive 5′ deletions of this fragment reveal that high levels of MLP expression can be driven by fragments 2.8, 2.1 and 1.5 kb upstream of exon 2, while MLP expression is strongly reduced in an 0.9 kb fragment (Fig. 4E). MEC expression is at wild-type levels only in the 3.7 and 2.8 kb fragments (Fig. 4E). The unmutated 2.8 kb sim-lacZ construct maintained high levels of CNS midline transcripts at stage 11 and later (Fig. 4A). In contrast, specific mutations in ACGTG motifs were introduced together into sites 1, 2, 3 and 4 of the 2.8 kb fragment, and MLP expression was completely abolished by stage 11 (Fig. 4B). As an internal control, MEC and early MLP expression (stages 5-9) were identical to the unmutated fragment (Fig. 4C,D). This resembles sim gene transcription in sim mutant embryos. These results suggest that sim autoregulation, but not initial sim tran scription, requires the ACGTG motifs.
Mutation of the sim CMEs results in loss of sim autoregulation. (A-D) Embryos containing P[sim/lacZ] constructs are hybridized with a labeled lacZ RNA probe followed by alkaline phosphatase histochemistry. Ventral surface is shown with anterior to the left. Scale bar, 50 µm. (A) Staining of P[2.8sim] stage 11 embryo shows MLP expression (arrowhead). This P element has all 5 CMEs intact. (B) Staining of P[2.8sim.mut1-4] stage 11 embryo, which has four of the CMEs mutated, shows absence of MLP expression (arrowhead). (C,D) Staining of P[2.8sim.mut1-4] stage 5 embryo (C) and stage 9 embryo (D) shows high levels of lacZ transcripts, indicating that the initial sim-independent transcription of the sim gene is unaffected. (E) Genomic map of the sim gene (Nambu et al., 1990) showing location of the different DNA fragments and expression summary of each construct. Location of the late promoter (PL) and early promoter (PE) are indicated with arrows. The 3.7 kb fragment that confers CNS midline expression lies upstream of exon 2 and contains PE. Regions highly conserved between Drosophila melanogaster and Drosophila virilis within the 2.8 kb fragment are shown as open boxes. The location of the ACGTG motifs are indicated below and numbered 1-5; all but motif 1 are identical between Drosophila melanogaster and Drosophila virilis. Shown are the different constructs analyzed. The first five constructs are 5′ deletion fragments that contain PE and are fused to Casper-AUG-β-gal. The P[3.7sim], P[2.8sim], P[2.1sim] and P[1.5sim] genes contained all 5 CMEs (numbered 1-5), whereas P[0.9sim] lacks the first three sites. The P[2.8sim.mut1-4] gene contains specific mutations in sites 1-4 (labeled with ‘X’s). Stage 11 embryos from each transformed line were analyzed by in situ hybridization with a lacZ probe for sim-dependent midline precursor expression ‘MLP’ and stage 5-7 embryos were analyzed for mesectoderm ‘MEC’ staining.
Mutation of the sim CMEs results in loss of sim autoregulation. (A-D) Embryos containing P[sim/lacZ] constructs are hybridized with a labeled lacZ RNA probe followed by alkaline phosphatase histochemistry. Ventral surface is shown with anterior to the left. Scale bar, 50 µm. (A) Staining of P[2.8sim] stage 11 embryo shows MLP expression (arrowhead). This P element has all 5 CMEs intact. (B) Staining of P[2.8sim.mut1-4] stage 11 embryo, which has four of the CMEs mutated, shows absence of MLP expression (arrowhead). (C,D) Staining of P[2.8sim.mut1-4] stage 5 embryo (C) and stage 9 embryo (D) shows high levels of lacZ transcripts, indicating that the initial sim-independent transcription of the sim gene is unaffected. (E) Genomic map of the sim gene (Nambu et al., 1990) showing location of the different DNA fragments and expression summary of each construct. Location of the late promoter (PL) and early promoter (PE) are indicated with arrows. The 3.7 kb fragment that confers CNS midline expression lies upstream of exon 2 and contains PE. Regions highly conserved between Drosophila melanogaster and Drosophila virilis within the 2.8 kb fragment are shown as open boxes. The location of the ACGTG motifs are indicated below and numbered 1-5; all but motif 1 are identical between Drosophila melanogaster and Drosophila virilis. Shown are the different constructs analyzed. The first five constructs are 5′ deletion fragments that contain PE and are fused to Casper-AUG-β-gal. The P[3.7sim], P[2.8sim], P[2.1sim] and P[1.5sim] genes contained all 5 CMEs (numbered 1-5), whereas P[0.9sim] lacks the first three sites. The P[2.8sim.mut1-4] gene contains specific mutations in sites 1-4 (labeled with ‘X’s). Stage 11 embryos from each transformed line were analyzed by in situ hybridization with a lacZ probe for sim-dependent midline precursor expression ‘MLP’ and stage 5-7 embryos were analyzed for mesectoderm ‘MEC’ staining.
In summary, these experiments indicate that the ACGTG motif is required for CNS midline expression in both the MLP cells early in neurogenesis and later in the MLG.
Minimal CNS midline element is sufficient for CNS midline transcription
To test whether the ACGTG motif by itself was capable of directing midline expression, multimers of three or four 20 bp fragments incorporating Tl site 4 were cloned into an enhancer-tester lacZ vector. Upon germ-line transformation and anti-β-galactosidase staining, embryos harboring these constructs expressed β-galactosidase in CNS midline cells beginning at stage 9 (Fig. 2D,E). Indeed, midline expression from these con-structs appeared strikingly similar to sim expression: initially in all MLP cells, and later becoming restricted to MLG and a subset of ventral neurons (data not shown) (Crews et al., 1988). Four copies of this element drove strong MLP expression, while three copies had weaker expression (Fig. 2E), suggesting a positive relationship between element dosage and expression level. This effect is similar to the dosage dependence of ACGTG copy number in the P[950Tl] mutagenesis experiments. Midline expression of P[4X950Tl4] was absent in a sim‐ background and expanded throughout the lateral neu-roectoderm upon heat induction of a P[hsp70/sim] fly strain, demonstrating the dependence of its expression on sim gene function (data not shown). Similar results were previously obtained with midline enhancers of sim, sli, Tl (Nambu et al., 1990, 1991) and other CNS midline-expressed genes (Nambu et al., 1991; Muralidhar et al., 1993).
DISCUSSION
Transcriptional control of CNS midline development
These results define an ACGTG-containing CNS Midline Element (CME), which is required for sim-dependent CNS midline gene expression. For a gene to be activated in MLP, its regulatory region simply requires multiple CMEs and a promoter element. This is exemplified by the MLP expression of the sim and Tl genes. Each gene has 4-5 CMEs scattered over less than 1.5 kb of DNA and both genes show strong, comparable MLP expression. Mutations of individual CMEs within the Tl gene indicates that each CME contributes quantitatively to the overall level of expression. Comparison of the 1.5 kb sim fragment (5 CMEs) and the 0.9 kb sim fragment (2 CMEs) also suggests a correlation with CME copy number and midline expression. This relationship is further reinforced by the experiments in which either three or four CMEs were multimerized and fused to a heterologous promoter. Constructs with four CMEs reproducibly showed higher levels of expression than the constructs with three elements.
The sli gene is expressed in MLP and later in MLG, similar to the expression of sim. Previously, a 380 bp fragment was shown to be sufficient for MLG, but not MLP, expression. In this paper, we show that the 380 bp sli fragment contains a single CME that is required for MLG expression. Thus, a related control element is utilized for both MLP and MLG transcription. However, it is unknown whether the same transcrip tion factors interact with the CME in MLP and MLG.
Although, we have a identified one component of sli MLG expression, the complete array of elements required for sli MLP and MLG transcription remain unknown. The CME multimerization experiments revealed that multiple copies of the Tl4 CME are able to drive transcription in MLPs. Since the 380 bp sli fragment has a only a single CME, additional CMEs residing elsewhere within the sli gene may be required for sli MLP transcription. Additional elements may also be necessary for sli MLG transcription, although it is possible that the single sli CME is sufficient for MLG expression. For instance, there may be sites for additional transcriptional repressors and activators that direct MLG transcription. Further mutational dis-section of the sli 380 bp fragment and analysis of the sli CME will further resolve these issues.
bHLH-PAS transcription factor interactions
As shown in Fig. 1, sequence alignment of the ten ACGTG motifs indicate that additional nucleotides are favored, yielding a 7 bp consensus [5′ (G/A)(T/A)ACGTG 3′] whose 3′ proximal residues resemble both the XRE (TNGCGTG) and a half-site of the symmetric E-box motif (CANNTG). bHLH proteins usually recognize the E-box as heterodimers, each partner forming sequence-specific contacts within the major groove of one half-site (Blackwell and Weintraub, 1990; Ferre-D’Amare et al., 1993). Examination of the amino acid sequences of the AHRC bHLH-PAS subunits suggests a model for how an AHR/ARNT bHLH heterodimer recognizes the XRE that may be relevant in understanding the function of SIM and the CME. ARNT possesses basic region amino acids identical to both MYC and MAX in positions important for sequence specific DNA-protein contacts: a histidine at basic region position 5, glutamate at 9 and arginine at 13 (Fig. 5A) (Ferre-D’Amare et al., 1993). AHR, by contrast, possesses a lysine and serine at positions 5 and 9, respectively, suggesting that AHR would not interact with an E-box half site. Thus, ARNT would recognize the E-box half site, GTG, and AHR would recognize the non-E-box half of the XRE, TNGC (Fig. 5B). Note that AHR and ARNT both possess arginine 13 that by mutational and crystallographic analysis (Dang et al., 1992; Halazonetis and Kandil, 1992; Ferre-D’Amare et al., 1993) specifies a CG dinucleotide pair adjacent to the axis of symmetry in the cognate binding site for bHLH dimers, and those sequences are also present at the center of the XRE core sequence.
Hypothetical model of bHLH-DNA interactions occurring at xenobiotic and CNS midline response elements. (A) Basic region amino acid sequences of human MAX (Blackwood and Eisenman, 1991), MYC, ARNT (Hoffman et al., 1991), murine AHR (Ema et al., 1992; Burbach et al., 1992) and Drosophila SIM (Nambu et al., 1990). Conserved amino acid residues that make sequence-specific DNA contacts in the MAX crystal structure are boxed (Ferre-D’Amare et al., 1993). (B) Nucleotide sequences of the E-box, MYC/MAX-binding site, XRE and CME with proposed binding proteins shown adjacent to each half site. One DNA strand is shown from 5′ to 3′. Nucleotide positions from the axis of symmetry are numbered above consensus. Axis of half-site is denoted by vertical dashed line. According to the MAX crystal structure, histidine 5 of MAX basic region makes contact with the G opposite C at position 3, glutamate 9 interacts with both the C and A at positions 3 and 2, and arginine 13 interacts with the G opposite C at position 1. AHR and SIM possess distinct amino acids at MAX positions 5 and 9 compared with MAX, MYC and ARNT. The existence of ARNTrel, a putative partner of SIM, is hypothesized from the sequence of the CME and SIM’s relatedness to the AHRC proteins.
Hypothetical model of bHLH-DNA interactions occurring at xenobiotic and CNS midline response elements. (A) Basic region amino acid sequences of human MAX (Blackwood and Eisenman, 1991), MYC, ARNT (Hoffman et al., 1991), murine AHR (Ema et al., 1992; Burbach et al., 1992) and Drosophila SIM (Nambu et al., 1990). Conserved amino acid residues that make sequence-specific DNA contacts in the MAX crystal structure are boxed (Ferre-D’Amare et al., 1993). (B) Nucleotide sequences of the E-box, MYC/MAX-binding site, XRE and CME with proposed binding proteins shown adjacent to each half site. One DNA strand is shown from 5′ to 3′. Nucleotide positions from the axis of symmetry are numbered above consensus. Axis of half-site is denoted by vertical dashed line. According to the MAX crystal structure, histidine 5 of MAX basic region makes contact with the G opposite C at position 3, glutamate 9 interacts with both the C and A at positions 3 and 2, and arginine 13 interacts with the G opposite C at position 1. AHR and SIM possess distinct amino acids at MAX positions 5 and 9 compared with MAX, MYC and ARNT. The existence of ARNTrel, a putative partner of SIM, is hypothesized from the sequence of the CME and SIM’s relatedness to the AHRC proteins.
This model also provides a hypothesis for recognition of the CME by SIM. In this scheme, the CME is composed of a GTG E-box half site and a diverged (G/A)(T/A)AC sequence. sim also encodes a bHLH-PAS domain protein that is more related in primary sequence and genomic structure to AHR than ARNT is to AHR, suggesting SIM and AHR diverged from a common ancestral gene (Burbach et al., 1992; Ema et al., 1992; Schmidt et al., 1993). SIM, like AHR, possesses amino acid residues at positions 5 and 9, different from MYC, MAX and ARNT, suggesting it is unlikely to bind a GTG half site. SIM also possesses an arginine at position 13, suggesting a CG pair adjacent to the axis of symmetry of its cognate binding site and that sequence is also present at the center of the CME. By analogy to the AHRC, we propose that SIM binds the (G/A)(T/A)AC half site of the CME and that a Drosophila ARNT-related protein forms a heterodimer with SIM to interact with the GTG half site of the CME (Fig. 5). However, it is worth noting that asymmetric E-box-like sequences can be recognized in vitro (and probably in vivo) by homodimers of the bHLH proteins E47 (Blackwell and Weintraub, 1990), TWIST (Ip et al., 1992a, b; Kasai and Crews, unpubl.), ENHANCER OF SPLIT (Tietze et al., 1992), HES-5 (Akazawa et al., 1992) and HAIRY (Van Doren and Posakony, personal communication). Thus, the occurrence of an asymmetric E-box-like sequence is not definitive evidence that the corresponding bHLH transcription factors act as heterodimers.
In summary, this paper provides evidence that a family of related bHLH transcription factors and cognate DNA sequence elements control CNS midline development and carcinogen metabolism. The CME is required for CNS midline transcrip tion throughout Drosophila embryonic development. MLP expression requires multiple CMEs and a promoter element. The control of later CNS midline expression in differentiated glial cells can also involve the CME, but may include additional regulatory elements. The sim gene is genetically required for all CNS midline transcription and for the reasons described in this paper, it is likely that SIM acts directly through the CME to exert its control over CNS midline transcription and development. Future experiments will seek to identify ARNT-related or other bHLH-PAS dimerization partners for the SIM protein, bio-chemically analyze their predicted interactions with the CME and further understand the multiplicity of control elements that restrict midline expression in differentiated neurons and glia.
Acknowledgements
S. T. C. would like to acknowledge the intellectual contributions of Eric Davidson in stimulating his interest in the control of gene expression during development. We would like to thank Al Courey, Oliver Hankinson and Song Hu for a critical reading of the manuscript, Jim Posakony for advice and the Crews laboratory for support throughout this project. K. A. W was supported by a NIH Medical Scientist Training Program (GM-08042). R. G. F. is a predoctoral trainee in the UCLA Interdepartmental Program for Neuroscience and is supported in part by US Public Health Service National Research Service Award GM 07104 and by a Dr Ursula Mandel Scholarship. Y. K. is a postdoctoral trainee supported by DHHS PHS National Institutional Research Service Award T 32 CA09056 and an NIH postdoctoral grant. This work was supported by grants from the Lucille P. Markey Charitable Trust, Jonsson Cancer Center Founda-tion (J911209) and the NIH (HD25251) to S. T. C.