ABSTRACT
The Sex comb on midleg (Scm) gene is a member of the Polycomb group (PcG) of genes in Drosophila melanogaster. The PcG genes encode transcriptional repressors required for proper spatial expression of homeotic genes. We report the isolation of new Scm mutations and the molecular char- acterization of the Scm gene. Scm mRNA is expressed maternally, at peak levels in early embryos and then at lower levels throughout the remainder of development. Scm encodes a putative zinc finger protein of 877 amino acids. Scm protein is similar to polyhomeotic, another member of the PcG, both in the zinc finger region and in a separate C-terminal domain of 60 amino acids, which we term the SPM domain. Sequence analysis of an Scm mutant allele suggests a functional requirement for the SPM domain. Scm protein also bears homology in multiple domains to a mouse protein, Rae-28 (Nomura, M., Takihara, Y. and Shimada, K. (1994) Differentiation 57, 39-50) and to a fly tumor suppressor protein, the product of the lethal(3)malignant brain tumor gene (Wismar, J. et al., (1995) Mech. Dev. 53, 141-154). Possible functional rela- tionships among these proteins and potential biochemical roles for Scm protein in PcG repression are discussed.
INTRODUCTION
The homeotic genes of the Antennapedia and bithorax complexes control development along the anterior-posterior (A-P) axis in Drosophila (Lewis, 1978; Kaufman et al., 1980). Each homeotic protein is expressed in a precise domain along this axis (White and Wilcox, 1985; Celniker et al., 1989; Karch et al., 1990), which corresponds to its realm of function. The homeotic genes are first activated at about 2 hours of embryo- genesis and their restricted expression patterns are maintained throughout subsequent development.
The initial A-P domains of homeotic expression are set by gap gene products, such as hunchback and Krüppel, which directly repress homeotic gene transcription (Zhang and Bienz, 1992; Qian et al., 1993; Shimell et al., 1994). This mode of repression is transient, however, since the early expression patterns of the gap proteins decay by about 4 hours of embryo- genesis. The Polycomb group (PcG) genes encode a second set of regulators that maintain repression when the gap proteins disappear. Most of the PcG products are expressed throughout embryonic, larval and pupal development, and are required continuously to maintain restricted homeotic expression (Duncan and Lewis, 1982; Franke et al., 1992; Martin and Adler, 1993; Jones and Gelbart, 1993; Lonie et al., 1994).
Mutations in PcG genes cause ectopic expression of multiple homeotic products along the A-P axis (McKeon and Brock, 1991; Simon et al., 1992). Thirteen genes have been identified as members of the PcG (Simon, 1995 for review) by the segmental transformation phenotypes that result from this mis- expression. The large number of PcG products may reflect their common function in multiprotein complexes. In support of this idea, the Polycomb, polyhomeotic and Polycomblike proteins are chromatin components that co-localize to the same sites in polytene chromosomes (Franke et al., 1992; Lonie et al., 1994). Furthermore, co-immune precipitation experiments detect association of the Polycomb and polyhomeotic proteins in embryonic nuclear extracts (Franke et al., 1992).
Little is known about the biochemical roles of individual PcG proteins or how they accomplish repression. Molecular analyses of six PcG genes, Polycomb (Pc), polyhomeotic (ph), Posterior sex combs (Psc), Polycomblike (Pcl), Enhancer of zeste [E(z)] and extra sex combs (esc), have identified domains likely to be used for protein-protein interactions (Messmer et al., 1992; Jones and Gelbart, 1993; Sathe and Harte, 1995; Gutjahr et al., 1995; Simon et al., 1995). However, catalytic domains or sequence-specific DNA-binding domains have not been found. In addition, no sequence homologies have been detected among these PcG proteins.
The Sex comb on midleg (Scm) gene (Jürgens, 1985; Breen and Duncan, 1986) encodes one of the PcG repressors (McKeon and Brock, 1991; Simon et al., 1992). Breen and Duncan (1986) showed that embryos lacking both maternal and zygotic Scm product die with most segments transformed into copies of the eighth abdominal segment. This null phenotype, which is among the strongest seen in single PcG mutants, shows that the Scm product is a central component in PcG repression.
To begin molecular analysis of the role of Scm product, we have cloned the Scm gene. We describe the expression of Scm mRNA during development and we present the Scm cDNA sequence. The predicted Scm protein shares domains with polyhomeotic protein (DeCamillis et al., 1992), which suggests a structural similarity between two key PcG repressors.
MATERIALS AND METHODS
Drosophila strains
Df(3R)GB104 is a deficiency that removes 85D11-85E10 (Lindsley and Zimm, 1992), including the Scm locus. ScmXF24 is a null or strong hypomorphic allele isolated by X-ray mutagenesis and ScmET50 is an EMS-induced allele (Jürgens, 1985). Although ScmET50 is a lethal allele, it appears to be hypomorphic since ET50 mutant embryos show normal expression of abdA protein (D. B. and J. S., unpublished results). Germline transformation was performed using a y Df(1)w67c2 recipient strain. The balancer stock w; T(2:3) ApXa/CyO; TM2 was used to assess the chromosomal linkage of transgenic inserts.
Screens for mutations by P element excision
P excision screens were performed using either of two stocks, P1246 or P1092, that contain single P elements inserted at 85E1-5. Both elements contain a rosy+(ry+) marker gene and are viable over lethal Scm mutations and DfGB104. The F1 screen for Scm mutations as dominant suppressors of zeste is described in Fig. 1. The F2 screen for lethal mutations in 85E was performed as follows. P[ry+]/Sb Δ2-3 ry+ dysgenic males were generated as shown in Fig. 1 and were crossed to ry502Fab7 females. P[ry−]*/ry502Fab7 males that had experienced excision or inactivation of the ry+ marker were selected and crossed singly to DfGB104/TM3 females to test for lethal mutations. Mutations that were lethal over DfGB104 were recovered in the Stubble, non-Fab siblings (genotype P[ry−]*/TM3). Four independent lethal mutations were recovered among the progeny of 120 ry− males.
A screen for Scm mutations as dominant suppressors of zeste. Imprecise P excision events were selected as mutants that darkened eye color in z1 animals. Since both loss-of- function Scm mutations and a deficiency for the Scm locus, Df(3R)GB104, cause moderate darkening of z1 eye color (Wu et al., 1989; D. B. and J. S., unpublished results), the screen should identify small deficiencies that remove or inactivate Scm. The use of the z1wis chromosome as a sensitive tester for modifiers of zeste has been described (Jones and Gelbart, 1990). The screen was performed using either of two P elements inserted at 85E, P1246 or P1092. Δ2-3 is a P transposase source on the third chromosome (Robertson et al., 1988). * indicates DNA alterations at the P element insertion site. Three independent Suppressor of zeste [Su(z)] mutations were isolated, two from the P1246 screen and one from the P1092 screen. Each Su(z) mutation darkened the eye color in z1wis males from light orange to red.
A screen for Scm mutations as dominant suppressors of zeste. Imprecise P excision events were selected as mutants that darkened eye color in z1 animals. Since both loss-of- function Scm mutations and a deficiency for the Scm locus, Df(3R)GB104, cause moderate darkening of z1 eye color (Wu et al., 1989; D. B. and J. S., unpublished results), the screen should identify small deficiencies that remove or inactivate Scm. The use of the z1wis chromosome as a sensitive tester for modifiers of zeste has been described (Jones and Gelbart, 1990). The screen was performed using either of two P elements inserted at 85E, P1246 or P1092. Δ2-3 is a P transposase source on the third chromosome (Robertson et al., 1988). * indicates DNA alterations at the P element insertion site. Three independent Suppressor of zeste [Su(z)] mutations were isolated, two from the P1246 screen and one from the P1092 screen. Each Su(z) mutation darkened the eye color in z1wis males from light orange to red.
Germline transformation and tests for rescue of Scm lethality
The germline transformation construct Cas-Scm7 contains a 7.5 kb segment of genomic DNA that includes the Scm transcription unit plus approximately 2 kb of 5′ upstream DNA and 1.5 kb downstream of the poly(A) addition site. Cas-Scm7 was constructed by inserting a 3 kb NheI-XbaI genomic fragment and the contiguous 4.5 kb XbaI fragment into the XbaI site of the pCasper4 transformation vector (Thummel and Pirrotta, 1991).
Germline transformants were generated essentially as described (Rubin and Spradling, 1982). Two primary transformant lines with Cas-Scm7 inserts on the second chromosome were isolated. Five additional second chromosome insert lines were generated by mobi- lization of the original inserts using the Δ2-3 transposase source (Robertson et al., 1988). All seven lines were tested for rescue of lethality in ScmXF24/ScmET50trans-heterozygotes as follows. w67c2; P[Cas-Scm7]/+; ScmXF24e/+ males were generated and mated to w67c2; +/+; ScmET50e/TM3 Sb e females. Rescue was assessed by scoring for surviving progeny with the genotype w67c2; P[Cas- Scm7]/+; ScmXF24e/ScmET50e. The presence of the Scm transgene was assessed using the linked w+ marker. Thus, rescued individuals were identified phenotypically as red-eyed, ebony and non-Stubble. The sibling class lacking the rescue construct (w67c2; +/+; ScmXF24e/ScmET50e) was absent in all cases.
Isolation of genomic and cDNA clones and Southern blots
Plasmid rescue of genomic DNA flanking the P1092 element was performed as described (Wilson et al., 1989) using the P-lArB HindIII site. A 0.5 kb genomic fragment was gel-purified from the rescue plasmid and used as probe to screen a λEMBL3A Drosophila genomic library (Tamkun et al., 1992). A 6.6 kb genomic EcoRI fragment was used to screen a 0 to 4-hour embryonic cDNA library (Brown and Kafatos, 1988). For Southern blots, 3-5 μg of genomic fly DNA was restriction digested, fractionated on 0.7% agarose gels, denatured by base treatment and blotted to Nytran nylon membranes. Hybridization conditions for Southern blots and for screening the genomic and cDNA libraries were essentially as described (Li et al., 1994).
Analysis of mRNA expression
In situ hybridizations to whole-mount embryos were performed essentially as described (Jiang et al., 1991) using a 3 kb antisense riboprobe or a 4 kb sense riboprobe prepared by in vitro transcription with digoxygenin uridine triphosphate. The RNA blot was prepared, hybridized and washed as described (Li et al., 1994). The probe was 32P-labelled 6.6 kb EcoRI genomic fragment.
Antibody staining of embryos
Embryos were fixed and stained with a rabbit polyclonal antibody against abdA protein as described (Karch et al., 1990). The secondary antibody was goat anti-rabbit IgG coupled to horseradish peroxidase (BioRad).
DNA sequencing
The nucleotide sequence of a 4.1 kb Scm cDNA, Sc9, was determined using dideoxy chain termination with Sequenase 2.0 polymerase (United States Biochemical). The complete cDNA sequence was determined on both strands by a combination of direct sequencing with internal Scm primers and sequencing of Sc9 subclones and ExoIII deletion derivatives in the plasmid pBluescript II KS+. Sequence determination of the 3′ end of a 3.8 kb Scm cDNA, Sc1, used a primer that anneals to 3′-flanking pNB40 vector sequence (Brown and Kafatos, 1988).
Homology searches and secondary structure predictions
Homology searches of the GenBank database were performed using the BLASTP program (Altschul et al., 1990). Initial searches with full-length Scm protein yielded numerous proteins related only by the presence of serine-rich regions. Subsequent searches were performed using successive segments of Scm protein from non-serine-rich regions. Chou-Fasman secondary structure predictions of the SPM domain were obtained using the PEPTIDESTRUCTURE program from the Wisconsin GCG package. The PHDsec program (Rost and Sander, 1993) was also used to predict SPM domain structure. The input for PHDsec was a multiple sequence alignment of the SPM domains from Scm, ph, Rae-28 and l(3)mbt generated by the PILEUP program from the GCG package.
PCR and sequence analysis of the ScmXF24 mutant lesion
Template DNA was prepared from ScmXF24/TM3 or cp in ri pp adults. cp in ri pp is the background chromosome upon which ScmXF24 was isolated (Jürgens, 1985 and personal communication). Internal Scm primers, Scm10 (5′-GCTGGATGGAAGTGACT-3′) and Scm12 (5′- GAATCACGAGCAGTTGG-3′), were used to amplify approxi- mately 2 kb fragments that span the region containing the XF24 deletion. The PCR products were digested with SalI and NruI and inserted into SalI-EcoRV cut pBluescript II KS+. PCR clones derived from the XF24 or TM3 chromosomes were distinguished by the 0.2 kb difference in insert sizes. The relevant regions of three indepen- dent XF24 and cp in ri pp PCR clones were sequenced and compared.
RESULTS
Isolation of Scm mutations by imprecise P element excision
The Scm gene was localized within cytological region 85E1- 10 of the third chromosome by deficiency mapping (Jürgens, 1985; Breen and Duncan, 1986; Lindsley and Zimm, 1992). We further delimited its location by creating small deficiencies within 85E. We performed P element excision screens (Fig. 1 and Materials and Methods) using two lines that contain P inserts in 85E1-5, as judged by in situ hybridization (data not shown).
The first screen (Fig. 1) exploited the fact that loss-of- function Scm mutations are dominant suppressors of zeste1 eye color (Wu et al., 1989). Three independent Su(z) mutations were isolated from approximately 1500 progeny scored. Each mutation was both homozygous lethal and lethal in trans to the recessive lethal Scm alleles, ScmXF24 and ScmET50 (Jürgens, 1985). Further evidence for Scm allelism was derived from in situ hybridization (not shown) and Southern analyses (see below), which revealed DNA alterations at 85E in all three mutants.
We compared homeotic mis-expression patterns in these Su(z) mutants to that in pre-existing Scm mutants. Embryos homozygous for the embryonic lethal ScmXF24 mutation (Jürgens, 1985), mis-express abdominal-A (abdA) protein in many cells anterior to the normal expression domain (compare Fig. 2B and A). The pattern of abdA mis-expression, which is most prominent in the central nervous system, is similar to that seen in an Scm apparent null mutant, ScmD1 (Simon et al., 1992). Fig. 2C shows abdA expression in an embryo homozy- gous for one of the new 85E Su(z) mutations. The pattern of ectopic abdA resembles the ScmXF24 pattern (Fig. 2B), consis- tent with a null or strong hypomorphic mutation in Scm. All three Su(z) mutations showed similar abdA mis-expression patterns. Given these mis-expression patterns, the failure to complement Scm mutations and the detection of DNA alter- ations at 85E, we conclude that the three Su(z) mutations impair or inactivate the Scm gene. These mutations are desig- nated ScmP11, ScmP12 and ScmP22.
abdA expression in Scm mutant embryos. Embryos were stained with abdA antibody and are approximately 12 hours old. (A) Wild-type embryo; (B) ScmXF24 homozygote; (C) ScmP12 homozygote. ScmP12 was isolated in the screen described in Fig. 1. The arrow in A indicates the anterior edge of parasegment 7. Embryos here and in subsequent Figs are oriented with anterior to the top.
abdA expression in Scm mutant embryos. Embryos were stained with abdA antibody and are approximately 12 hours old. (A) Wild-type embryo; (B) ScmXF24 homozygote; (C) ScmP12 homozygote. ScmP12 was isolated in the screen described in Fig. 1. The arrow in A indicates the anterior edge of parasegment 7. Embryos here and in subsequent Figs are oriented with anterior to the top.
In addition to the F1 Su(z) screen, an F2 noncomplementation screen for P-derived lethals in 85E was performed (see Materials and Methods). Four independent lethal mutations in the interval defined by DfGB104 were isolated. Complemen- tation tests established that three are Scm+ and one, designated ScmP23, is Scm−.
Isolation of Scm locus DNA
The frequency of Scm mutations recovered in these screens suggested that the P elements used are located near the Scm gene. One of the 85E P insert lines, P1092, contains the P-lArB construct, which is equipped for isolation of flanking genomic DNA by plasmid rescue (Wilson et al., 1989). Plasmid rescue from line P1092 yielded the expected circularized vector plus 0.5 kb of captured genomic DNA, which was used as probe to isolate clones from a Drosophila λEMBL3A genomic library. A composite restriction map (Fig. 3) spanning 23 kb of genomic DNA was derived from analysis of the overlapping phage inserts. In situ hybridization to polytene chromosomes using either the original 0.5 kb genomic fragment or the phage clones as probes revealed unique signals at 85E (data not shown).
Molecular map of the Scm locus. The bold arrow below the DNA map represents the Scm transcription unit. Triangles indicate the positions of the P1246 and P1092 elements. The approximate extents of deletions in P-derived Scm mutants are shown above the DNA map. These were determined by genomic Southern analysis (not shown). Dotted lines represent uncertainty in the deletion endpoints. The genomic fragment used for germline transformation rescue is shown above the DNA map. An expanded map of the 6.6 kb EcoRI fragment is shown at the bottom. The position of the ScmXF24 deletion and the difference between the two Scm cDNA forms are indicated. Sequence analysis shows that the 5′ ends of all five Scm cDNAs lie within 60 bp of each other at a position about 400 bp from the upstream EcoRI site. Restriction sites: R, EcoRI; B, BamHI; S, SalI; N, NruI.
Molecular map of the Scm locus. The bold arrow below the DNA map represents the Scm transcription unit. Triangles indicate the positions of the P1246 and P1092 elements. The approximate extents of deletions in P-derived Scm mutants are shown above the DNA map. These were determined by genomic Southern analysis (not shown). Dotted lines represent uncertainty in the deletion endpoints. The genomic fragment used for germline transformation rescue is shown above the DNA map. An expanded map of the 6.6 kb EcoRI fragment is shown at the bottom. The position of the ScmXF24 deletion and the difference between the two Scm cDNA forms are indicated. Sequence analysis shows that the 5′ ends of all five Scm cDNAs lie within 60 bp of each other at a position about 400 bp from the upstream EcoRI site. Restriction sites: R, EcoRI; B, BamHI; S, SalI; N, NruI.
Identification of the Scm gene
Southern blot analysis with probes derived from the genomic phage clones established the locations of the two P elements used in our screens (Fig. 3 and data not shown). Additional Southern analysis of the P-derived mutants showed that mutations that failed to complement Scm involved deletion of genomic DNA to the left of the P element insert sites (summarized in Fig. 3). The extents of these deletions suggested that at least part of the Scm gene is located within the adjacent 6.6 kb EcoRI fragment (Fig. 3). This fragment was therefore used as a probe to search for genomic DNA alterations associated with X-ray induced Scm alleles (Jürgens, 1985; Breen and Duncan, 1986). Fig. 4 shows that the 6.6 kb fragment detects an approximately 200 bp deletion in DNA from ScmXF24 mutants (lanes 2 and 5). The precise location of this deletion, as determined by sequence analysis (see below), is shown on the expanded restriction map (Fig. 3, bottom). The background chromosome for the XF24 allele (G. Jürgens, personal communication), shows the wild-type restriction pattern (lanes 3 and 6).
Mapping of the ScmXF24 mutant lesion. The genomic Southern blot was probed with the 6.6 kb genomic EcoRI fragment (Fig. 3). Lanes 1 and 4 contain DNA from DfGB104/TM3 adults, lanes 2 and 5 contain DNA from ScmXF24/TM3 adults and lanes 3 and 6 contain DNA from cp in ri pp adults (the XF24 background chromosome). DNA samples were doubly digested with the restriction enzymes indicated. Arrows indicate DNA fragments shifted in the XF24 mutant. Sizes of fragments in kb are indicated at the right.
Mapping of the ScmXF24 mutant lesion. The genomic Southern blot was probed with the 6.6 kb genomic EcoRI fragment (Fig. 3). Lanes 1 and 4 contain DNA from DfGB104/TM3 adults, lanes 2 and 5 contain DNA from ScmXF24/TM3 adults and lanes 3 and 6 contain DNA from cp in ri pp adults (the XF24 background chromosome). DNA samples were doubly digested with the restriction enzymes indicated. Arrows indicate DNA fragments shifted in the XF24 mutant. Sizes of fragments in kb are indicated at the right.
We next searched for cDNAs that traverse the regions deleted in the XF24 and P-derived mutants. Five independent cDNAs that span the ScmXF24 deletion were isolated from a 0 to 4-hour embryonic library (Brown and Kafatos, 1988). Three of the cDNA clones have 4.1 kb inserts and two contain 3.8 kb inserts. Restriction mapping and cross-hybridization studies showed that the 4.1 kb and 3.8 kb cDNAs are produced from the same transcription unit (Fig. 3, bottom). Since the XF24 deletion is completely contained within this transcription unit, and since it removes protein-coding sequence (see below), this transcription unit corresponds to the Scm gene. The similarity between the genomic and cDNA restriction maps indicates that, if there are introns in Scm, they must total less than about 0.3 kb.
A 7.5 kb segment of genomic DNA that encompasses the Scm transcription unit (Fig. 3) was inserted into the pCaSpeR4 transformation vector (Thummel and Pirrotta, 1991) and the resulting construct, Cas-Scm7, was tested for transformation rescue of Scm mutants. Seven independent transformants with inserts on the second chromosome were recovered and tested for rescue of lethality in ScmXF24/ScmET50 animals. Six of the seven inserts produced viable adults of genotype P[Cas-Scm7]/+; ScmXF24/ScmET50 (see Materials and Methods for rescue crosses). These rescue results provide further evidence that the transcription unit encoding the 4.1 and 3.8 kb cDNAs (Fig. 3) is the Scm gene. The 7.5 kb genomic fragment may not provide complete Scm function, however; the rescued ScmXF24/ScmET50trans-het- erozygotes were not healthy enough to maintain as stocks and, in at least one line, the rescued adults showed homeotic phenotypes such as extra sex combs and wing-to-haltere transformations. It is possible that additional regulatory DNA not present on the 7.5 kb fragment is needed for full Scm function.
Expression of Scm mRNA during development
Fig. 5A shows the temporal profile of Scm mRNA accumulation during development. In agreement with genetic studies that establish a maternal contribution to Scm function (Breen and Duncan, 1986), Scm mRNA is expressed in ovaries. The highest levels are seen in 0- to 2-hour embryos, presumably reflecting the maternal product. Scm mRNA is present at lower levels during later embryonic and larval stages, and a modest increase is seen during the pupal stage (Fig. 5A, Lane P, compare to rp49 signal). This temporal profile resembles that seen for most other PcG products, which are expressed maternally and then continuously during subsequent development (Paro and Zink, 1992; Martin and Adler, 1993; Jones and Gelbart, 1993).
Expression of Scm mRNA during development. (A) The northern blot was hybridized with either an Scm genomic probe (top) or an rp49 probe, derived from a ribosomal protein gene (bottom), as a control for amounts of RNA loaded. Each lane contains approximately 25 μg of total RNA isolated from ovaries (ov), embryos at the indicated hours of development, larvae (L) or pupae (P). (B-D) Embryos hybridized with an antisense Scm riboprobe. (B) 9-hour embryo (germ-band retracted). (C) 12-hour embryo (at dorsal closure). (D) Approximately 16-hour embryo. The medial structure with stronger staining in C and D is the central nervous system.
Expression of Scm mRNA during development. (A) The northern blot was hybridized with either an Scm genomic probe (top) or an rp49 probe, derived from a ribosomal protein gene (bottom), as a control for amounts of RNA loaded. Each lane contains approximately 25 μg of total RNA isolated from ovaries (ov), embryos at the indicated hours of development, larvae (L) or pupae (P). (B-D) Embryos hybridized with an antisense Scm riboprobe. (B) 9-hour embryo (germ-band retracted). (C) 12-hour embryo (at dorsal closure). (D) Approximately 16-hour embryo. The medial structure with stronger staining in C and D is the central nervous system.
Since the size of the mRNA detected is approximately 4 kb, the 4.1 kb and 3.8 kb cDNAs represent full-length or nearly full-length products. Although this northern analysis fails to resolve multiple mRNA species in the 4 kb size range, the relative mobilities of the hybridizing species are consistent with maternal expression of both the 4.1 and 3.8 kb mRNA forms and zygotic expression of primarily the 4.1 kb form.
The spatial distribution of Scm mRNA in embryos was determined by whole-mount in situ hybridization. Fig. 5B shows an approximately 9-hour-old embryo hybridized with an antisense riboprobe. At this stage, Scm mRNA is distributed throughout the embryo. The uniform pattern persists until about 12 hours of development, after which expression is more concentrated in the central nervous system (CNS) (Fig. 5C,D). This parallels the time when homeotic gene expression also becomes enriched in the CNS compared to other tissues (White and Wilcox, 1985; Celniker et al., 1989; Karch et al., 1990). The even distribution of Scm mRNA along the A-P axis and its concentration in the embryonic CNS resemble the expression patterns of other PcG products (Franke et al., 1992; Martin and Adler, 1993).
The Scm cDNA sequence
Fig. 6 shows the nucleotide sequence of one of the 4.1 kb Scm cDNAs, clone Sc9. A single long open reading frame starts at nucleotide position 435 and ends at position 3065. The sequence that flanks the first in-frame ATG (AACC) is similar to the Drosophila consensus sequence for translation start sites (C/A AA A/C) (Cavener, 1987). The predicted Scm protein is 877 amino acids long with a relative molecular mass of 94×103 and a pI of 9.4. A potential nuclear localization signal, starting at amino acid 52 (RQRGRPAKR), is indicated by asterisks.
The Scm cDNA sequence. The complete nucleotide sequence of the Sc9 cDNA is shown. Lower case letters indicate 5′ and 3′ untranslated regions and upper case indicates the open reading frame. Asterisks denote a potential nuclear localization signal. Single bold underlines indicate three potential zinc fingers. Cysteine and histidine residues in these fingers are in bold type. The two mbt repeats are italicized and enclosed in boxes. The SPM domain at the C terminus is also enclosed in a box. Arrows mark the extent of the XF24 deletion. An alanine-rich region is indicated by the single dashed underline. The alternative poly(A) addition sites are indicated by bold nucleotides in the 3′-UTR and matches to the poly(A) addition consensus upstream of these sites are indicated by double lines. GenBank accession number for the Scm cDNA sequence shown is U49793.
The Scm cDNA sequence. The complete nucleotide sequence of the Sc9 cDNA is shown. Lower case letters indicate 5′ and 3′ untranslated regions and upper case indicates the open reading frame. Asterisks denote a potential nuclear localization signal. Single bold underlines indicate three potential zinc fingers. Cysteine and histidine residues in these fingers are in bold type. The two mbt repeats are italicized and enclosed in boxes. The SPM domain at the C terminus is also enclosed in a box. Arrows mark the extent of the XF24 deletion. An alanine-rich region is indicated by the single dashed underline. The alternative poly(A) addition sites are indicated by bold nucleotides in the 3′-UTR and matches to the poly(A) addition consensus upstream of these sites are indicated by double lines. GenBank accession number for the Scm cDNA sequence shown is U49793.
Restriction mapping and sequence analysis showed that the difference between the 4.1 and 3.8 kb cDNAs results from use of alternative polyadenylation sites. The poly(A) tail of the longer cDNA begins at position 4046 whereas the poly(A) tail of the shorter cDNA begins at position 3744 (Fig. 6). There is a consensus poly(A) addition signal (double solid lines, Fig. 6) located 25 bp upstream of the 3.8 kb cDNA poly(A) tail. There are only imperfect matches to the poly(A) addition signal (double dashed lines) in the region immediately upstream of the 4.1 kb cDNA poly(A) tail.
Shared domains in Scm protein
Database searches revealed that Scm is one of a set of four proteins that are related by multiple domains of similarity (Fig. 7A). The first of these related proteins, polyhomeotic (DeCamillis et al., 1992), is another Drosophila PcG protein. Rae-28 is a mouse protein that was identified in a screen for retinoic-acid inducible cDNAs (Nomura et al., 1994). l(3)mbt [lethal(3)malignant brain tumor] is the product of a Drosophila tumor suppressor gene (Wismar et al., 1995).
Domains in Scm protein and related proteins. (A) Diagrams depict full-length Scm and Rae-28 proteins and the portions of ph and l(3)mbt that contain homology domains. Zn indicates potential zinc fingers, mbt indicates mbt repeats and SPM indicates the domain shared at the C termini of the four proteins. S/T indicates serine/threonine-rich regions, Q indicates glutamine- rich regions and ALA indicates an alanine-rich region. H-I is an additional homology domain shared by ph and Rae-28 (Nomura et al., 1994). The SPM domain corresponds to the H-II homology region described by these workers. (B) Sequence alignment of the shared zinc fingers. The cysteine residues are in bold type. (C) Alignment of the SPM domains. The ‘5′-HLH’ domain in the tel oncoprotein (Golub et al., 1994) is aligned below the SPM consensus sequence. Residues in tel that match the consensus are in bold type. Two subregions with the highest similarity to the consensus are underlined. (D) Alignment of the mbt repeats in Scm and l(3)mbt. (B-D) Alignments and consensus sequences were determined using the PILEUP program from the Wisconsin GCG package and manual comparisons. Shading indicates amino acid residues that match the consensus sequences. Numbers to the right indicate the amino acid positions in each protein.
Domains in Scm protein and related proteins. (A) Diagrams depict full-length Scm and Rae-28 proteins and the portions of ph and l(3)mbt that contain homology domains. Zn indicates potential zinc fingers, mbt indicates mbt repeats and SPM indicates the domain shared at the C termini of the four proteins. S/T indicates serine/threonine-rich regions, Q indicates glutamine- rich regions and ALA indicates an alanine-rich region. H-I is an additional homology domain shared by ph and Rae-28 (Nomura et al., 1994). The SPM domain corresponds to the H-II homology region described by these workers. (B) Sequence alignment of the shared zinc fingers. The cysteine residues are in bold type. (C) Alignment of the SPM domains. The ‘5′-HLH’ domain in the tel oncoprotein (Golub et al., 1994) is aligned below the SPM consensus sequence. Residues in tel that match the consensus are in bold type. Two subregions with the highest similarity to the consensus are underlined. (D) Alignment of the mbt repeats in Scm and l(3)mbt. (B-D) Alignments and consensus sequences were determined using the PILEUP program from the Wisconsin GCG package and manual comparisons. Shading indicates amino acid residues that match the consensus sequences. Numbers to the right indicate the amino acid positions in each protein.
These four proteins all contain putative Cys2-Cys2 zinc fingers that define a distinct zinc finger subclass, which is marked by identical spacing between the cysteine pairs and conservation of residues that flank the cysteines (Fig. 7B). There are two such zinc fingers located near the N terminus of Scm protein and a single finger in the ph, Rae-28 and l(3)mbt proteins. The spacing between cysteines is distinct from the Cys2-Cys2 fingers of known DNA-binding proteins such as the nuclear hormone receptors (Coleman, 1992). Scm also contains a third potential zinc-binding region (Zn3 in Fig. 7A), which differs from the N-terminal fingers and is not shared in ph, Rae-28 or l(3)mbt. This third region can be arranged as a Cys2-Cys2 finger, but the presence of additional cysteine and histidine residues between the outer cysteine pairs (Fig. 6) may reflect alternative forms of a zinc-binding domain.
All four proteins contain a second homology domain located at each C terminus (Fig. 7A,C). This domain is approximately 60 amino acids long and is shared with 34-64% identity among the four proteins. We refer to this homology domain as the SPM domain, after the three fly proteins. Chou-Fasman secondary structure analysis of the four individual proteins, together with analysis of the multiple sequence alignment (Rost and Sander, 1993), predict that the SPM domain is largely α-helical. There is also weaker homology of the SPM domain to a domain found in members of the ets family of transcription factors (Wasylyk et al., 1993). The corresponding region from a representative ets family member, the tel oncoprotein (Golub et al., 1994), is aligned with the consensus sequence in Fig. 7C. The overall identity to Scm in this region is only 30%. However, the homology within two distinct sub- regions (underlined in Fig. 7C), the conserved spacing between them and the 58% overall similarity, suggest a structural relationship to the SPM domain.
A third homology region, consisting of repeat units each about 100 amino acids in length, is found in the Scm and l(3)mbt proteins (Fig. 7A,D). There are two tandem copies of these ‘mbt’ repeats in Scm and three copies in l(3)mbt (Wismar et al., 1995). To date, our database searches have failed to identify other proteins with mbt repeats.
Scm protein also contains a region with a high density of alanine residues (Fig. 7A). Starting at amino acid position 748, there is a stretch of 29 residues comprised of 52% alanine (dashed underline in Fig. 6). Alanine-rich regions have been associated with transcriptional repression domains in the Drosophila engrailed, even- skipped and Krüppel proteins (Han and Manley, 1993).
An Scm mutant lesion alters the SPM domain
The ScmXF24 mutation is a deletion internal to the Scm tran- scription unit (Figs 3, 4). To determine its precise location, DNA fragments spanning the region were amplified by PCR from ScmXF24/TM3 adult DNA and from the XF24 parental chromosome (see Materials and Methods). Sequence analysis showed that the DNA between the arrows in Fig. 6 is deleted and is replaced by 4 nucleotides (CTGA) of unknown origin. As a consequence, the C-terminal 49 amino acids of Scm protein, including most of the SPM domain, are removed and replaced by 26 novel amino acids encoded by extension of the open reading frame into the 3′-UTR. Since ScmXF24 is a null or strong hypomorphic mutation, its location suggests that the SPM domain is important either for function or stability of Scm protein in vivo. Alternatively, the loss of function could result from the substitution of an abnormal peptide tail.
DISCUSSION
PcG proteins with shared domains
The PcG proteins are classified together based upon their common role in homeotic gene repression (Simon, 1995 for review). Previous molecular analyses of individual PcG proteins have shown that they are diverse in sequence and structure. Here we report that the Scm and ph proteins bear homology in their C- terminal SPM domains (Fig. 7), and they share the same subclass of zinc fingers. This provides an example of two fly PcG proteins, encoded by unlinked loci, that are functionally and structurally related. Another possible example of related PcG proteins involves the Psc and Su(z)2 proteins (Brunk et al., 1991), which are the products of neighboring genes. However, the role of Su(z)2 in the PcG is unclear; although Su(z)2 mutations enhance mutations in other PcG genes (Adler et al., 1989; Wu and Howe, 1995), homeotic gene expression is normal in embryos that completely lack Su(z)2 product alone (Soto et al., 1995).
Possible biochemical role for a domain shared in Scm and ph
The most extensive homology between the Scm and ph proteins is in the SPM domains (Fig. 7A, C), which share 41% identity and 61% similarity. Clues about the possible function of this domain derive from its predicted α-helical content and the presence of related domains in members of the ets family of transcription factors (Wasylyk et al., 1993). In the ets proteins, such as tel (Fig. 7C) and ets-1, this region has been likened to the helix-loop-helix (HLH) domains found in MyoD, myc and the products of the Drosophila achaete-scute complex (Seth and Papas, 1990; Golub et al., 1994). Indeed, similarity between part of the SPM domain in ph and the second helix of the HLH consensus sequence was noted previously (DeCamillis et al., 1992). The HLH domains of MyoD and closely related classes form protein-protein contact surfaces used for homo- and heterodimerization (Murre et al., 1994 for review). Although the roles of the ets HLH domains are less well-characterized, there is evidence that they may be sufficient for functional interactions. Chromosome translocations that fuse just the tel HLH domain to other unrelated proteins cause various forms of leukemia (Golub et al., 1994, 1995). It has been suggested that the tel HLH domain causes abnormal or unregulated dimerization of the fusion proteins expressed in the disease state.
Although the SPM domain appears to be a distant relative of these various HLH domains, we suggest that it may also play a role in protein-protein contact. The SPM domain could help in assembly or stabilization of multiprotein complexes of PcG proteins (Franke et al., 1992; Rastelli et al., 1993). In particular, it could be used for homomeric Scm association or for heteromeric association of Scm and ph. We note that Scm is among those PcG genes that exhibit extragenic noncomple- mentation with ph mutations (Cheng et al., 1994).
Zinc fingers and target site recognition by PcG proteins
The localization of PcG proteins at specific chromosomal sites (Franke et al., 1992; Rastelli et al., 1993; Lonie et al., 1994) implies a mechanism to target the proteins to specific genes. Target site recognition involves DNA sequence elements, at some level, since regulatory DNA from homeotic loci attracts PcG proteins when inserted at ectopic chromosomal sites (DeCamillis et al., 1992; Chiang et al., 1995). However, none of the PcG proteins yet tested exhibit sequence-specific DNA- binding activity alone in vitro (Franke et al., 1992; Rastelli et al., 1993).
The presence of zinc fingers in Scm protein (Fig. 7) raises the possibility that Scm plays a role in direct DNA-binding. However, the zinc finger motifs do not closely resemble those in known DNA-binding proteins such as the steroid hormone receptors or yeast GAL4 (Coleman, 1992). Two of the Scm fingers, Zn1 and Zn2 (Fig. 7), are similar to the single zinc finger in ph protein. Since ph protein shows nonspecific DNA- binding activity (H. Brock, personal communication), a single finger of this type (Fig. 7B) appears insufficient for sequence recognition. Further studies will test if two tandem fingers of this type, or an additional putative zinc-binding motif (Zn3, Figs 6, 7), provide Scm protein with sequence-specific binding activity.
A variety of zinc finger motifs are present in PcG proteins besides Scm and ph (Brunk et al., 1991; Jones and Gelbart, 1993; Lonie et al., 1994; Simon, 1995 for review), The functions of these domains is unknown. Perhaps these zinc fingers are used for protein-protein contact rather than DNA- binding, as has been shown for the zinc-binding LIM domain (Schmeichel and Beckerle, 1994). Another possibility is that they are used for DNA contact, but that sequence-specific DNA binding requires the association of two or more PcG proteins in a complex. Finally, PcG proteins could be targeted through association with other classes of proteins such as the DNA-binding gap gene products (Zhang and Bienz, 1992).
Vertebrate genes related to fly PcG genes
The mouse gene Rae-28 encodes a protein related to fly Scm and ph (Fig. 7; Nomura et al., 1994). This is intriguing in light of accumulating evidence that fly PcG proteins have functional homologs in vertebrates. The mouse gene bmi-1 bears homology to the fly Psc gene (Brunk et al., 1991) and mouse embryos that either lack or over-express bmi-1 display homeotic phenotypes (van der Lugt et al., 1994; Alkema et al., 1995). Another mouse gene, M33, bears homology to fly Pc (Pearce et al., 1992). The striking demonstration that M33 expressed in flies can rescue defects in Pc mutants shows that M33 is a functional Pc homolog (Müller et al., 1995). Furthermore, Xenopus bmi-1 and M33 homologs have been isolated and their protein products interact in vitro (Reijnen et al., 1995). Taken together, these studies suggest that the regulators and the mechanism responsible for restricted expression of homeotic/Hox genes may be conserved in evolution. Rae- 28 was identified in a screen for mouse genes induced by retinoic acid (Nomura et al., 1994) and its function in mouse embryos is not known. Since the homologies among Rae-28, Scm and ph are limited to several small domains (Fig. 7), functional tests are needed to determine if Rae-28 plays a PcG role in mice. We note that Pc and M33 are related only in two small domains at the N terminus and C terminus (Pearce et al., 1992), yet they are clearly functional homologs (Müller et al., 1995).
Connections between PcG proteins and tumor suppressors
The protein in the databases with the most extensive homology to Scm is the product of the fly tumor suppressor gene l(3)mbt (Fig. 7). Recessive mutations in l(3)mbt cause overprolifera- tion of imaginal precursors in the larval brain and imaginal discs (Gateff et al., 1993). The presence of multiple shared domains raises the possibility that the Scm and l(3)mbt proteins may be functionally related. The report that the fly PcG gene multi sex combs is allelic to another fly tumor suppressor gene, l(1)malignant blood neoplasm (Santamaria and Randsholt, 1995) also supports a relationship between fly PcG proteins and tumor suppressors.
It is perhaps not surprising to find connections between tumor suppressors and transcriptional repressors like PcG proteins. The recessive nature of tumor suppressors implies a normal role in negative control of growth and proliferation. Although this can occur at many levels, some tumor sup- pressors are thought to act as transcriptional repressors. For example, the retinoblastoma protein has been implicated in transcriptional repression of growth-promoting genes (Weintraub et al., 1992).
Roles of Scm protein during development
Scm protein represses multiple homeotic genes during embryonic stages (Breen and Duncan, 1986; McKeon and Brock, 1991; Simon et al., 1992). Analysis of pupal lethal Scm alleles (Wu et al., 1989; T. Wu, personal communication) shows that Scm is also required during postembryonic times. These genetic data and the continuous developmental expression of Scm mRNA (Fig. 5A) imply a long-term role for Scm product, like most other PcG products, in homeotic repression.
Although Scm is most well-characterized for homeotic gene control, it is also likely involved in other processes, as are many of the PcG proteins (Simon, 1995 for review). Scm is a regulator of the segmentation gene engrailed (Moazed and O’Farrell, 1992) and genetic studies suggest a role in dorsal- ventral development (Adler et al., 1991). The suppression of zeste1 eye color by Scm mutations may reflect an Scm role in white gene expression. This suppression does not require unusual Scm alleles, since it occurs with deficiency for the Scm locus and apparent Scm null mutations (Fig. 3). Although the mechanism of zeste1 suppression is unclear, it is intriguing that a subset of PcG products, including Scm, E(z) and Psc, share the zeste interaction (Wu et al., 1989; Jones and Gelbart, 1990; Wu and Howe, 1995). Investigation of the physical interactions between Scm protein and its PcG cohorts should help define how transcription is modulated at homeotic loci and at other loci under PcG control.
ACKNOWLEDGEMENTS
We thank Ian Duncan, Gerd Jürgens and Ting Wu for sharing Scm mutant stocks and information about them. We are especially grateful to Ting and to Rick Jones for advice about Suppressor of zeste screens. We thank Hugh Brock for many helpful discussions and Elisabeth Gateff and Jasmine Wismar for communicating results prior to pub- lication. We thank Kathy Matthews and co-workers at the Drosophila stock center in Bloomington, Indiana for numerous fly stocks used in this work. We thank Madeline Serr and Tom Hays for the loan of a developmental northern blot, Karen Lunde for generating the germline transformation construct and Leah Rein for assistance with embryo stainings. The cDNA and genomic fly libraries were gener- ously provided by Nick Brown and John Tamkun. We thank Bob Herman and Tom Hays for critical comments on the manuscript. This work was supported by NSF grant IBN-9304936 and a University of Minnesota McKnight Land-Grant Professorship to J. S.; D. B. was supported in part by NIH training grant GM07323.