ABSTRACT
A MyoD family gene was identified in the ascidian Ciona intestinalis and designated CiMDF (Ciona intestinalisMuscle Determination Factor). Expression of CiMDF was restricted to the muscle cells of the developing embryo and the body-wall muscle of adults. Northern blots showed that two differentially regulated CiMDF transcripts were expressed during development. A 1.8 kb transcript (CiMDFa) appeared first and was gradually replaced by a 2.7 kb transcript (CiMDFb). These transcripts encoded essentially identical MyoD family proteins with the exception of a 68 amino acid C-terminal sequence present in CiMDFb that was absent from CiMDFa. Although both CiMDFa and CiMDFb contained the cysteine-rich/basichelix loop helix domain (Cys-rich/bHLH) present in all MyoD family proteins, only CiMDFb contained the region near the C terminus (Domain III) characteristic of this gene family. Genomic Southern blots showed that C. intestinalis has only one MyoD family gene, suggesting that CiMDFa and CiMDFb result from differential processing of primary transcripts. The existence of two MyoD family proteins that are differentially expressed during ascidian embryogenesis has novel parallels to vertebrate muscle development and may reflect conserved myogenic regulatory mechanisms among chordates.
INTRODUCTION
The discovery of myogenic regulatory genes of the MyoD family has led to a significantly better understanding of the molecular mechanisms regulating muscle development. These genes encode structurally related, sequence-specific transcriptional activators that bind to E-box consensus promoter elements (CANNTG) and are essential for the expression of many muscle-specific genes (Olson, 1990; Weintraub et al., 1991). The MyoD gene family consists of four members in vertebrates: MyoD (Davis et al., 1987), myogenin (Edmondson and Olson, 1989; Wright et al., 1989), MRF-4/herculin/Myf-6 (Rhodes and Konieczny, 1989; Miner and Wold, 1990) and Myf-5 (Braun et al., 1989). These genes are active only in skeletal muscle or its precursors during vertebrate embryogenesis and have a distinct spatiotemporal pattern of expression (Olson, 1990). In vitro cell culture assays (Olson, 1990; Weintraub et al., 1991) and transgenic/gene knockout mice (Braun et al., 1992; Rudnicki et al., 1992; Hasty et al., 1993; Nabeshima et al., 1993; Zhang et al., 1995) suggest that, while a degree of redundancy exists among the vertebrate members of this family, MyoD and Myf-5 act upstream of myogenin and MRF-4 in a genetic pathway that is indispensable for skeletal muscle development (Weintraub, 1993; Rudnicki and Jaenisch, 1995; Olson et al., 1996).
Members of the MyoD gene family have also been found in a variety of invertebrates, including nematodes (Krause et al., 1990), insects (Michelson et al., 1990), sea urchins (Venuti et al., 1991), ascidians (Araki et al., 1994) and cephalochordates (Araki et al., 1996). The wide phylogenetic distribution of the MyoD gene family, together with its demonstrated importance to vertebrate skeletal myogenesis, suggested that studies of this gene family in simple model systems would provide important insights into questions of muscle development and evolution. Ascidians (Subphylum Urochordata, Class Ascidiacea), represent such an evolutionarily advantageous model system. They are accessible to experimental manipulations and molecular procedures and have only three prominent muscle types (tail muscle of the larva and the adult body-wall and cardiac muscles).
Larval tail muscle of the ascidian Ciona intestinalis consists of 36 large, mononucleate cells arranged in two bilaterally symmetrical bands running the length of the tail on either side of the notochord (Satoh and Jeffery, 1995; Meedel, 1997). Tail muscle cells originate from either the primary or secondary lineages (Meedel et al., 1987). The primary lineage consists of 28 muscle cells that are distributed throughout the anterior and middle of the tail (Nishida and Satoh, 1983, 1985; Nishida, 1987). These cells develop autonomously under the influence of maternal cytoplasmic determinants that are localized to specific blastomeres during cleavage (Whittaker, 1982; Nishida, 1992). The secondary lineage contains 8 muscle cells located at the posterior tip of the tail, which appear to be specified by a conditional mechanism that requires intercellular interactions (Meedel et al., 1987; Nishida, 1990).
Body-wall muscle cells of the adult are multinucleate and lack striations (Nevitt and Gilly, 1986). In C. intestinalis, this muscle is divided into an outer longitudinal system, which is organized into a small number of well-defined bands, and a relatively diffuse inner system consisting of a meshwork of transverse and oblique strands that encircle the body (Millar, 1953). Muscle associated with the heart (i.e., cardiac muscle) is composed of mononucleated cells that possess myofibrils with a typical sarcomeric arrangement (Schulze, 1964). Biochemical and molecular level studies of these two ascidian muscles suggest specific evolutionary relationships with particular vertebrate muscle types (Meedel and Hastings, 1993; Meedel, 1997).
This study describes the identification and cDNA cloning of a MyoD family gene (CiMDF) of C. intestinalis. CiMDF is a single-copy gene and is not a member of a larger divergent gene family. Northern blot and RT-PCR demonstrated that CiMDF transcripts were absent from eggs and early cleavage stage embryos, suggesting it is unlikely that this gene encodes a maternal determinant specifying muscle. In adults, CiMDF was expressed only in the body-wall muscle, implying an underlying relationship between this muscle type and vertebrate skeletal muscle. In addition, two differentially regulated CiMDF transcripts appeared during embryogenesis and both transcripts were also present in adult body-wall muscle. These transcripts encode two putative MyoD family proteins demonstrating that ascidians, like vertebrates, use more than one MyoD family protein during myogenesis. The use of similar molecular strategies in different Chordata subphyla may reflect evolutionary selection pressures uniquely associated with chordate myogenesis.
MATERIALS AND METHODS
Animal material
Adult Ciona intestinalis were collected from the Sandwich Marina in the Cape Cod Canal and held under constant illumination in a recirculating marine aquarium at 14°C. Eggs were obtained from the oviducts and collected into 0.5 M KCl, 50 mM MgCl2, 10 mM EGTA, pH 7.6 to prevent fertilization. Eggs were repeatedly washed in Millipore-filtered sea water and fertilized with sperm from several individuals. Embryos were raised at 18°C and synchrony of these cultures exceeded 90%.
Generation of a Ciona intestinalis MyoD family probe by PCR with degenerate primers
Genomic DNA of C. intestinalis was isolated from sperm of individual animals (Meedel and Hastings, 1993) and used in PCR reactions with degenerate primers (gift of Michael Krause) corresponding to the cysteine-rich (WACKACK; sense primer) and second helix region (KVEILRN; antisense primer) of the Cys-rich/bHLH domain of mouse MyoD (Davis et al., 1987)). Sequences of the primers were: sense primer, [5′] TGGGCNTGYAARGCNTGYAA [3′] (128-fold degenerate); antisense primer [5′] RTTNCKNARDATYTCNACYTT [3′] (6144-fold degenerate); N = G, A, T, C; R = A, G; Y = T, C; K = G, T; D = A, G, T. PCR reaction conditions were: 1 μg genomic DNA, 3 μM sense primer, 30 μM antisense primer, 0.2 mM of each dNTP, 10 mM Tris-HCl pH 8.3 (25°C), 50 mM KCl, 1.5 mM MgCl2,0.01% (w/v) gelatin and 2.5 units of Taq polymerase (Perkin Elmer Cetus; Norwalk, CT). The reactions were carried out in a volume of 50 μl using a Model 100 Thermocycler (MJ Research; Watertown, MA). The reaction program consisted of an initial denaturation of 94°C for 4 minutes followed by 30 cycles of 94°C (30 seconds), 60°C (2 minutes) and 72°C (2 minutes).
Embryo cDNA libraries
RNA was isolated using a water-saturated phenol extraction protocol described previously (Meedel and Whittaker, 1978). Poly(A) + RNA was purified from 6 hour postfertilization embryos with two selection cycles using oligo(dT)-cellulose (Collaborative Research; Bedford, MA) and from 9 hour postfertilization embryos with two selection cycles using oligo(dT)25-conjugated magnetic beads (Dynal, Inc.; Lake Success, NY). Oligo(dT)-primed cDNA libraries were constructed from 6 hour embryo RNA using the Librarian cDNA kit and the lambda vector λgt10 (Invitrogen; San Diego, CA), and from 9 hour embryo RNA using the Super Script System (Gibco, BRL; Grand Island, NY) and the lambda vector λZAPII (Stratagene; La Jolla, CA).
Molecular cloning and DNA sequence analysis
Library screening, preparation of λ phage DNA and generation of 32P-labeled probes were done using standard procedures (Ausubel, 1996). Phage cDNA inserts were subcloned in pBluescript KS(+) (Stratagene; La Jolla, CA). Sequences not included on cDNAs were cloned by rapid amplification (RACE) using the Amplifinder RACE kit (Clonetech; Palo Alto, CA) and primers designed from existing cDNA sequence. Double-stranded templates for automated DNA sequence analysis were prepared using Qiagen plasmid DNA isolation kits (Qiagen Inc.; Chatsworth, CA), and cycle-sequencing reactions were done using an Applied Biosystems automated DNA sequencer (Model 373A). Analysis of nucleotide and protein sequences was accomplished with MacVector 4.5.3 software (IBI Sequence Analysis Software, Kodak Life Sciences Products; New Haven, CT) and the online Wisconsin Genetics Computer Group Program (GCG Version 8.1).
Northern blot and genomic Southern blot analyses
Embryo and adult tissue RNAs were fractionated by electrophoresis on 1.2% agarose gels containing formaldehyde, transferred to Gene Screen(+) (NEN-DuPont; Boston, MA) and hybridized/washed as described elsewhere (Horton et al., 1996). Hybridization of northern blots with RNA probes necessitated increasing the pretreatment and hybridization temperatures to 55°C, including yeast tRNA (100 μg/ml) and heparin (100 μg/ml) in the prehybridization/hybridization solutions, and increasing the posthybridization wash temperature to 72°C.
Restriction enzyme digests of individual animal genomic DNA (3-5 μg/digest) were size-fractionated on 1% agarose gels in TAE buffer and transferred to Gene Screen(+) according to the manufacturer’s instructions. Blots were hybridized at either 55°C (low criterion) or 68°C (high criterion) as described elsewhere (Larson et al., 1995).
Reverse transcription/PCR amplification
cDNA was synthesized at 45°C in 20 μl reactions containing 10 μg of total RNA and 200 units of Moloney murine leukemia virus reverse transcriptase as recommended by the manufacturer (Gibco, BRL; Grand Island, NY); the sequence of the primer (Md 104) used for reverse transcription was [5′] CTCCACTTTTGGAAGTCTTTGG [3′]. 1 μl aliquots were removed for PCR, which was done using Md 104 as the antisense primer and CiMDF as the sense primer: [5′] AGTCGCATTATCACCACACCAGC [3′]. This primer combination was designed to produce a 212 bp CiMDF-specific PCR product. The reaction program consisted of an initial denaturation step of 94°C for 4 minutes, followed by 25 cycles of 94°C (30 seconds), 55°C (2 minutes), and 72°C (2 minutes). Aliquots of PCR reactions were electrophoresed on 4% NuSieve agarose gels (SeaKem; Rockland, ME), blotted to Gene Screen Plus and probed with a 32P-labeled, nested oligonucleotide: [5′] TGGGCATGCAAAGCGTGTAAAC [3′]. Blots were hybridized at 56°C in 6× SSC, 10× Denhardt’s, 10 mM sodium phosphate (pH 6.8), 1 mM EDTA, 100 μg/ml yeast tRNA, 0.1% SDS. Posthybridization washes were done at 56°C, in 6× SSC, 2% SDS (2× 5 minutes) and in 6× SSC, 0.2% SDS (2× 5 minutes).
Whole-mount in situ hybridization
Embryos at 11 hours of development (early tail formation stage) were dechorionated using enzymes from crab (Cancer sp.) stomach juice (Berrill, 1932) and fixed for 30 minutes at room temperature in sea water containing 9% formaldehyde. Fixed, dechorionated embryos were transferred to phosphate-buffered saline (PBS) containing 9% formaldehyde and 67 mM EGTA, dehydrated through a graded series of methanol/PBS solutions, and stored at –70°C in 100% ethanol. Whole-mount in situ hybridization using a digoxigenin-labeled antisense RNA probe was carried out on rehydrated embryos using the pretreatment, hybridization and washing, and staining procedures of Tautz and Pfeifle (1989). Two modifications were made to these methods: fixed embryos were incubated in 150 mM NaCl, 1% NP-40, 0.5% sodium deoxycholate, 0.1% SDS, 1 mM EDTA, 50 mM Tris-HCl (pH 8.0) for 90 minutes at room temperature in substitution of Proteinase K digestion and, after the hybridization reaction, washed embryos were treated for 1 hour at 37°C with 50 μg/ml RNase A in 3× SSC, 0.1% Tween 20 to remove unhybridized probe.
RESULTS
Characterization of MyoD family transcripts in early embryos
Muscle gene expression is initiated in ascidian embryos at or slightly before gastrulation (Meedel and Whittaker, 1983; Tomlinson et al., 1987), suggesting that the MyoD gene family should also be active at these times. Therefore, we screened a gastrula stage (6 hour) cDNA library with a cloned PCR product of genomic DNA representing the Cys-rich/bHLH region of a putative C. intestinalis MyoD family gene. Fig. 1 shows the sequence of a complete MyoD family transcript, which is a compilation of sequence data from a nearly fulllength clone (1.7 kb) and RACE amplification of an additional 74 nucleotides of 5′ sequence. The sequence contains a potential open reading frame of 524 amino acids flanked by a 5′-untranslated region of 49 bp and a 3′ untranslated region of 135 bp. The 3′ untranslated region includes a polyadenylation motif (Fitzgerald and Shenk, 1981) and is an unusually U-rich sequence that contains the pentanucleotide AUUUA (nt 1739-1743), both of which have been associated with rapid mRNA degradation (Chen and Shyu, 1995). The AUG at the beginning of the open reading frame is surrounded by consensus sequences for translation initiation (Kozak, 1991) and is preceded by an in-frame stop codon (nt 35-37). Alignment of the encoded polypeptide with the MyoD family proteins of other species showed that substantive amino acid similarities are limited to the Cys-rich/bHLH domain, indicated by a box. This domain is 71% identical with mouse MyoD (Davis et al., 1987) and 90% identical with another ascidian (Halocynthia roretzi) MyoD family gene (AMD-1; Araki et al., 1994) characterized from genomic and adult body-wall muscle cDNA clones. Additional sequence comparisons revealed that the CiMDF bHLH region exhibited no bias toward any of the four mouse MyoD family proteins. The basic domain of CiMDF also contains the Ala-Thr dipeptide (amino acid residues 398, 399) characteristic of all MyoD family proteins and critical for myogenic function (Brennan et al., 1991; Davis and Weintraub, 1992). The predicted CiMDF protein is the largest MyoD family member thus far identified with most of the difference in size due to the larger number of amino acid residues between the N terminus and the Cys-rich/bHLH domain of CiMDF.
Nucleotide and deduced amino acid sequence of CiMDFa. The nucleotide sequence of the 5′ and 3′ untranslated regions of the mRNA are shown in lowercase letters and the protein coding sequences are represented by uppercase letters. The open reading frame is segregated as triplet codons with the inferred amino acid sequence below. Nucleotides are numbered on the right and amino acids are numbered on the left. The conserved Cys-rich/bHLH region of the encoded polypeptide is enclosed by the box. A consensus polyadenylation signal sequence is underlined. *** Represent in-frame stop codons. This sequence is deposited in the Genbank database under accession number U80079.
Nucleotide and deduced amino acid sequence of CiMDFa. The nucleotide sequence of the 5′ and 3′ untranslated regions of the mRNA are shown in lowercase letters and the protein coding sequences are represented by uppercase letters. The open reading frame is segregated as triplet codons with the inferred amino acid sequence below. Nucleotides are numbered on the right and amino acids are numbered on the left. The conserved Cys-rich/bHLH region of the encoded polypeptide is enclosed by the box. A consensus polyadenylation signal sequence is underlined. *** Represent in-frame stop codons. This sequence is deposited in the Genbank database under accession number U80079.
CiMDF expression
CiMDF transcripts differentially accumulate during embryogenesis. Whole-mount in situ hybridization showed that CiMDF transcripts were restricted to the two lateral bands of muscle cells on either side of the notochord (N) and did not extend to the epidermal cells (ep) of the dorsal, ventral, or lateral margins of the developing tail (Fig. 2). The whole mounts presented in Fig. 2 also showed that CiMDF transcripts accumulated to nearly identical levels in cells of the primary and secondary lineages. In addition, in situ studies with earlier staged embryos confirmed a previous report by Satoh et al. (1996), which had shown that MDF transcripts accumulate in all blastomeres committed to the muscle lineage (data not shown). CiMDF transcripts (1.8 kb) were first observed at 5 hours postfertilization (Fig. 3A). The cDNA sequence shown in Fig. 1 presumably encodes this mRNA and is labeled CiMDFa in Fig. 3A. The prevalence of CiMDFa transcripts increased during the next 2 hours, however, by 9 hours of development, they were no longer detectable and had been replaced by a 2.7 kb mRNA (CiMDFb). This large transcript first appeared at 6 hours of development, reached a maximum at 9 hours and decreased steadily thereafter. CiMDFb transcripts were still detected at 16 hours postfertilization, approximately 2 hours before hatching. The extent of CiMDF expression was also determined using RT-PCR (Fig. 3B). The greater sensitivity of this assay demonstrated that CiMDF expression began approximately 4 hours postfertilization. These RT-PCR assays did not detect CiMDF transcripts in either early cleavage stage embryos or as stored maternal RNAs.
Localization of CiMDF transcripts in 11 hour-old embryos by wholemount in situ hybridization. Embryos are viewed from dorsal (top) and lateral (bottom) perspectives. A, anterior; P, posterior; D, dorsal; V, ventral; N, notochord; nf neural folds; ep epidermis. The scale bars in each photograph represent 100 μm.
Localization of CiMDF transcripts in 11 hour-old embryos by wholemount in situ hybridization. Embryos are viewed from dorsal (top) and lateral (bottom) perspectives. A, anterior; P, posterior; D, dorsal; V, ventral; N, notochord; nf neural folds; ep epidermis. The scale bars in each photograph represent 100 μm.
Expression of CiMDF during embryogenesis and in representative adult tissues. (A) Northern blot of staged embryo RNAs and adult body-wall muscle RNA hybridized with 32P-labeled probes that include sequences encoding the CiMDF Cys-rich/bHLH region (pcm180). Each lane contains 10 μg of total RNA; equal amounts of RNA were loaded in each lane as determined by ethidium bromide staining of ribosomal RNA bands (data not shown). Arrows identify the 1.8 kb transcript (CiMDFa) and the 2.7 kb transcript (CiMDFb). (B) RT-PCR analysis of CiMDF expression. Staged embryo RNAs and representative adult tissue RNAs were reverse transcribed and used in PCR. Size-fractionated PCR products were blotted and hybridized to a 32P-labeled nested oligonucleotide to detect the CiMDF-specific product. Control reactions without reverse transcriptase were done to ensure that PCR products were mRNA dependent and were not the result of genomic DNA amplification. Positive control reactions included RT-PCR reactions with genes whose expression in eggs, cleaving embryos and adult tissues was established from northern blot analyses. Experiments in which tadpole-stage RNA (16 hours postfertilization) was mixed with RNA from eggs or cleaving embryos demonstrated that failure to detect CiMDF mRNAs before 4 hours of development was due to the absence of CiMDF transcripts at these times.
Expression of CiMDF during embryogenesis and in representative adult tissues. (A) Northern blot of staged embryo RNAs and adult body-wall muscle RNA hybridized with 32P-labeled probes that include sequences encoding the CiMDF Cys-rich/bHLH region (pcm180). Each lane contains 10 μg of total RNA; equal amounts of RNA were loaded in each lane as determined by ethidium bromide staining of ribosomal RNA bands (data not shown). Arrows identify the 1.8 kb transcript (CiMDFa) and the 2.7 kb transcript (CiMDFb). (B) RT-PCR analysis of CiMDF expression. Staged embryo RNAs and representative adult tissue RNAs were reverse transcribed and used in PCR. Size-fractionated PCR products were blotted and hybridized to a 32P-labeled nested oligonucleotide to detect the CiMDF-specific product. Control reactions without reverse transcriptase were done to ensure that PCR products were mRNA dependent and were not the result of genomic DNA amplification. Positive control reactions included RT-PCR reactions with genes whose expression in eggs, cleaving embryos and adult tissues was established from northern blot analyses. Experiments in which tadpole-stage RNA (16 hours postfertilization) was mixed with RNA from eggs or cleaving embryos demonstrated that failure to detect CiMDF mRNAs before 4 hours of development was due to the absence of CiMDF transcripts at these times.
Northern blots revealed both CiMDFa and CiMDFb transcripts in body wall muscle (Fig. 3A). Their presence in this muscle was confirmed by RT-PCR (Fig. 3B). CiMDF transcripts were not detected by RT-PCR in any other adult tissues examined. Their absence from heart is particularly noteworthy because this organ contains striated muscle.
A single MyoD family gene exists in ascidians
RFLP (Restriction Fragment Length Polymorphism) analysis by genomic Southern blot was used to determine whether CiMDF is a single-copy gene in C. intestinalis or a member of a closely related ascidian MyoD gene family. The sequence polymorphisms previously shown to exist in wild populations of C. intestinalis (Meedel and Hastings, 1993) were used to estimate gene copy number through the distribution of allelic CiMDF restriction fragments (rfs) derived from HaeII digests of individual DNAs (Fig. 4A). These digests produced three different CiMDF rfs in the individuals shown. Animals no. 1 and no. 4 produced rfs of different sizes, however, each individual was homozygous at this locus (i.e. the maternal and paternal HaeII fragment lengths were the same). Individual no. 2 displayed two HaeII rfs, one identical to the fragment of animal no. 1 and the other unique among these three individuals. This band pattern (i.e., no individual possessed more than two HaeII rfs and a common fragment did not exist among the three animals examined) is most easily explained by the presence of a single CiMDF gene with a limited number of alleles. The observation that no individual animal displayed more than two restriction fragments (i.e., the maternal and paternal alleles of one gene) was extended to include digests with three other restriction enzymes as well as genomic DNA from two additional individuals (unpublished observations). Finally, the existence of multiple individuals with only a single restriction fragment that is too small to accommodate both CiMDFa and CiMDFb (4.6 kb total, not including possible introns) also suggests that only one CiMDF gene exists.
Determination of CiMDF gene-copy number by genomic Southern blot hybridization. (A) Genomic DNA isolated from the sperm of three different individuals (1, 2, 4) was digested with HaeII, size fractionated on a 1% agarose gel and blotted onto GeneScreen(+). The blot was probed at high criteria with a 32P-labeled, random-primed probe representing the CiMDF Cysrich/bHLH region (pcm180). (B) HaeII digests of sperm DNA from two individuals (D, 6) were size fractionated, blotted in duplicate and hybridized with 32P-labeled pcm180 at either high or low criteria.
Determination of CiMDF gene-copy number by genomic Southern blot hybridization. (A) Genomic DNA isolated from the sperm of three different individuals (1, 2, 4) was digested with HaeII, size fractionated on a 1% agarose gel and blotted onto GeneScreen(+). The blot was probed at high criteria with a 32P-labeled, random-primed probe representing the CiMDF Cysrich/bHLH region (pcm180). (B) HaeII digests of sperm DNA from two individuals (D, 6) were size fractionated, blotted in duplicate and hybridized with 32P-labeled pcm180 at either high or low criteria.
The conclusion that CiMDF is a single-copy gene does not eliminate the possibility that other divergent MyoD family genes may be present. This possibility was rendered unlikely through a comparison of genomic Southern blots hybridized at high (hybridization: 0.75M [Na +],68°C; final wash: 0.1× SSC, 68°C) and low (hybridization: 0.75M [Na +],55°C; final wash: 1× SSC, 55°C) criteria with a probe encoding the conserved Cys-rich/bHLH region of CiMDF (Fig. 4B). These data showed that reducing blot stringency did not yield additional hybridizing bands (i.e., related Cys-rich/bHLH-containing genes). Furthermore, low criteria genomic Southern blots using a nearly full-length cDNA clone (CiMDFa, see Fig. 1) also failed to detect additional hybridizing genomic fragments, thus indicating that the only conserved sequence present is the Cys-rich/bHLH region (data not shown). The low criteria hybridization conditions were such that, if additional genes exist, they are more distantly related to CiMDF than, for example, are the four MyoD family genes of vertebrates to each other.
Developmentally regulated CiMDF transcripts encode different proteins
The 2.7 kb transcript (CiMDFb) in older embryos was characterized by isolating cDNA clones from a 9 hour postfertilization library. A nearly full-length cDNA clone representing CiMDFb showed that CiMDFb and CiMDFa mRNAs appear to initiate at the same transcription start site. The amino acid sequence of the CiMDFb open reading frame is shown aligned to the CiMDFa open reading frame to maximize sequence identity (Fig. 5). This alignment demonstrated that the proteins were essentially collinear up to but not including the translation stop associated with CiMDFa; the nucleotide sequences of the two transcripts diverge at this point (nt 1627 of Fig. 1). As a consequence, the CiMDFb open reading frame is extended relative to CiMDFa and a CiMDFb-specific 3′-untranslated region (809 bp) is also created. The C-terminal extension of CiMDFb encodes 68 amino acids that include the motif designated ‘Domain III’ (Rhodes and Konieczny, 1989). This domain has been implicated in the effector functions of MyoD family proteins (Schwarz et al., 1992) and is conserved among vertebrate MyoD family genes (Rhodes and Konieczny, 1989; Fujisawa-Sehara et al., 1990). This conservation extends to nematodes (Krause et al., 1990), arthropods (Michelson et al., 1990), and echinoderms (Venuti et al., 1991). Since the two transcripts were collinear from the 5′-ends through the open reading frame of CiMDFa, the alternative transcripts probably arise from differential processing of primary transcripts during embryogenesis. Possible processing events include differential splicing or alternative 3′-end formation (Lou et al., 1996). A mechanism involving differential splicing is supported by the presence of an exon-intron border at the translational stop codon of another ascidian MyoD family gene, AMD-1 (Araki et al., 1994).
Sequence comparison of the deduced proteins encoded by CiMDFa and CiMDFb. The open reading frames derived from each cDNA are aligned to achieve a maximum overall fit. Vertical lines between the sequences represent amino acid identities; shaded amino acids indicate allelic differences between the cDNAs. The conserved sequences comprising the Cys-rich/bHLH and Domain III regions of MyoD family genes are enclosed in a double-lined box and a single-lined box, respectively. Amino acids are shown in groups of 10 and numbered on the right. *Translation stop. The nucleotide sequence of the CiMDFb cDNA is deposited in the Genbank sequence database under accession number U80080.
Sequence comparison of the deduced proteins encoded by CiMDFa and CiMDFb. The open reading frames derived from each cDNA are aligned to achieve a maximum overall fit. Vertical lines between the sequences represent amino acid identities; shaded amino acids indicate allelic differences between the cDNAs. The conserved sequences comprising the Cys-rich/bHLH and Domain III regions of MyoD family genes are enclosed in a double-lined box and a single-lined box, respectively. Amino acids are shown in groups of 10 and numbered on the right. *Translation stop. The nucleotide sequence of the CiMDFb cDNA is deposited in the Genbank sequence database under accession number U80080.
Comparison of the overlapping portions of the CiMDFa and CiMDFb protein sequences revealed 9 amino acid differences (Fig. 5; gray boxes), none of them in the Cys-rich/bHLH region. Since CiMDF is a single-copy gene, these differences presumably reflect sequence polymorphisms among the animals used to construct the cDNA libraries. A total of 56 nucleotide changes were noted where the cDNA sequences representing CiMDFa and CiMDFb overlap, indicating a sequence polymorphism level among individuals in this population of 3.5% (56/1609). This is nearly equal to the 4% sequence divergence found in wild populations of subtidal sea urchins (Britten et al., 1978).
Conservation of chordate MyoD family genes
An alignment of representative examples of vertebrate (mouse) and ascidian (C. intestinalis and H. roretzi) MyoD family proteins is shown in Fig. 6A. Collectively, the ascidian myogenic factors are much larger than a typical vertebrate MyoD family protein resulting from dramatic increases in the number of amino acids preceding the Cys-rich/bHLH domains. Sequence comparisons revealed no obvious similarities other than the conserved motifs indicated. Alignment of the His-rich/Cys-rich/bHLH regions of these proteins (Fig. 6B) demonstrated that, except for the His-rich sequence, which is conserved only between the ascidian species, this region is highly conserved among the three chordate species. In addition, only mouse MyoD and CiMDFb contained Domain III (hatched boxes, Fig. 6A), which was strongly conserved between these proteins (11 of 13 amino acids, Fig. 6B).
Structural comparison of ascidian MyoD family members with mouse MyoD. (A) A linear diagram of mouse MyoD is shown to scale with corresponding figures of CiMDFa, CiMDFb and AMD-1. The Cys-rich/bHLH domains (black boxes) are used to align the proteins. Domain III regions of mouse MyoD and CiMDFb are shown as hatched boxes. The number of amino acids in each polypeptide is shown to the right of each figure. (B) Amino acid alignments of the His-rich/Cysrich/bHLH and Domain III regions from mouse MyoD and the MyoD family proteins of C. intestinalis and H. roretzi. Amino acid identities with the mouse MyoD sequence are shown as residues enclosed by boxes. Amino acid differences between the two ascidian MyoD family sequences are shaded.
Structural comparison of ascidian MyoD family members with mouse MyoD. (A) A linear diagram of mouse MyoD is shown to scale with corresponding figures of CiMDFa, CiMDFb and AMD-1. The Cys-rich/bHLH domains (black boxes) are used to align the proteins. Domain III regions of mouse MyoD and CiMDFb are shown as hatched boxes. The number of amino acids in each polypeptide is shown to the right of each figure. (B) Amino acid alignments of the His-rich/Cysrich/bHLH and Domain III regions from mouse MyoD and the MyoD family proteins of C. intestinalis and H. roretzi. Amino acid identities with the mouse MyoD sequence are shown as residues enclosed by boxes. Amino acid differences between the two ascidian MyoD family sequences are shaded.
Ascidian MyoD family orthologues display little sequence identity
Although extensive sequence identity is evident throughout the protein alignments of MyoD family orthologues of different vertebrate species, sequence identity between the ascidian myogenic factors, CiMDFa/b and AMD-1 was limited to the His-rich/Cys-rich/bHLH domain (Fig. 7A). The degree of divergence between these ascidian myogenic factors was remarkable and was also evident at the nucleotide sequence level (~30%). This result brought back into question whether C. intestinalis has two MyoD family genes, CiMDF and an AMD-1 orthologue. The genomic blot data presented earlier (Fig. 4) appears to rule out this possibility by demonstrating that CiMDF is a single-copy gene in C. intestinalis and that other divergent MyoD family genes are not present. Nevertheless, in view of the nearly complete sequence divergence of CiMDF and AMD-1, we attempted to determine whether C. intestinalis has an AMD-1 homologue by low criteria Southern blots of C. intestinalis genomic DNA using probes corresponding to either the Cys-rich/bHLH domain of CiMDF or the bHLH domain of AMD-1 (Fig. 7B). These species-specific probes shared 145 bp of sequence and were only 75% identical. A single 2.9 kb hybridizing band appeared in each blot. Although these data do not rule out the possiblity that two 2.9 kb fragments exist in C. intestinalis, given the sequence polymorphism evident (see Figs 4 and 5), it is most likely there is only a single C. intestinalis MyoD family gene and, therefore, CiMDF and AMD-1 are orthologues.
The MyoD family genes in C. intestinalis and H. roretzi are highly divergent orthologues.(A) Alignment matrix (MacVector 4.5.3) comparing the amino acid sequences of AMD-1 (y-axis) with CiMDF (x-axis). The stringency of this alignment was >60% using a window size of 8 amino acids. (B) HaeII digests of C. intestinalis sperm DNA from a single individual were blotted in duplicate and hybridized at low criteria either with a probe corresponding to the Cys-rich/bHLH domain of CiMDF (lane 1) or with a probe corresponding to the bHLH domain of AMD-1 (lane 2). This AMD-1 probe was generated by primer-specific PCR of H. roretzi genomic DNA (gift of Hiroki Nishida) and represents nucleotides 3024-3168 of the genomic clone reported previously (Araki et al., 1994).
The MyoD family genes in C. intestinalis and H. roretzi are highly divergent orthologues.(A) Alignment matrix (MacVector 4.5.3) comparing the amino acid sequences of AMD-1 (y-axis) with CiMDF (x-axis). The stringency of this alignment was >60% using a window size of 8 amino acids. (B) HaeII digests of C. intestinalis sperm DNA from a single individual were blotted in duplicate and hybridized at low criteria either with a probe corresponding to the Cys-rich/bHLH domain of CiMDF (lane 1) or with a probe corresponding to the bHLH domain of AMD-1 (lane 2). This AMD-1 probe was generated by primer-specific PCR of H. roretzi genomic DNA (gift of Hiroki Nishida) and represents nucleotides 3024-3168 of the genomic clone reported previously (Araki et al., 1994).
DISCUSSION
Gene duplication has played an important role in vertebrate evolution. Most duplications occurred after the lineage leading to vertebrates diverged from the other chordate subphyla (Chan et al., 1990; Holland et al., 1994; Ruddle et al., 1994). This pattern is not true for all gene families (Holland et al., 1995; Araki et al., 1996), however, and underscores the importance of evaluating gene number in any family of interest.
Southern blots of genomic DNA strongly suggested that only a single MyoD family gene (CiMDF) exists in C. intestinalis, and that CiMDF and the MyoD family gene of H. roretzi (AMD-1) are orthologous. Since extensive sequence conservation is seen throughout the proteins encoded by orthologous MyoD family genes of vertebrates (Atchley et al., 1994), it was surprising that the deduced proteins of CiMDF and AMD-1 were similar only in their His-rich/Cys-rich/bHLH regions. These observations imply that differences between CiMDF and AMD-1 reflect species-specific variation, and that ascidian MyoD family proteins are not subjected to the same selective constraints as the vertebrate MyoD family gene orthologues. However, selective constraints on the ascidian MyoD family proteins (in addition to the demonstrated conservation of the His-rich/Cys-rich/bHLH domains) do exist, and are revealed by analyses of the polymorphic nucleotide substitutions found in the cDNA sequences of CiMDFa and CiMDFb. The observed nucleotide differences between these two sequences in the regions encoding the amino acids preceding the Hisrich/Cys-rich/bHLH domains are disproportionately silent (i.e., synonymous) substitutions. A comparison of these sequence polymorphisms shows that the normalized ratio of synonymous to non-synonymous changes (Ks/Ka) between these cDNAs is ~14-fold higher (13.6 vs. 1) than what would have been predicted by random chance (Li and Graur, 1991). Therefore, sequence polymorphism in this region of CiMDF is not occurring by neutral drift implying the existence of selective pressures on these domains.
We suggest the name ‘MDF’ (Muscle Determination Factor) for the ascidian MyoD family gene and propose that orthologues be distinguished by initials representing genus and species; hence CiMDF (Ciona intestinalis MDF) and HrMDF (Halocynthia roretzi MDF). This nomenclature acknowledges the extensive differences among ascidian orthologues and indicates that ascidians have only one MyoD family gene. The designation ‘MDF’ also recognizes that the ascidian MyoD family gene is equally related to all four vertebrate genes.
CiMDF expression and embryonic muscle development
Development of the primary muscle lineage in ascidians depends on maternally expressed cytoplasmic determinants (Satoh and Jeffery, 1995; Meedel, 1997). CiMDF is unlikely to encode such a determinant, because transcripts from this gene were not detected by RT-PCR until 4 hours after fertilization (32/64 cells in C. intestinalis). This result is consistent with previous RT-PCR analyses of MyoD family expression during H. roretzi development (Araki et al., 1994; Satou et al., 1995), but differs from a recent report by this group, using the same method, showing that MDF transcripts were present in fertilized eggs and early cleavage stage embryos of H. roretzi (Satoh et al., 1996). Regardless of the explanation for these differences, our inability to detect MDF transcripts by RT-PCR in fertilized eggs/early embryos supports the contention that these transcripts are nonexistent or extremely rare at these develop-mental stages.
We have not excluded the possibility that inactive forms of CiMDFa or CiMDFb protein exist in eggs that function as a maternal determinant(s). However, the gene expression data presented suggest that maternal determinant(s) probably directly or indirectly activate CiMDF transcription, and perhaps the transcription of other myogenic regulators as well, initiating myogenesis at about 4 hours of development. Subsequent changes in CiMDF expression during development appear to result from differential processing of CiMDF primary transcripts and a decline in steady-state transcript levels after 11 hours.
The two CiMDF transcripts encode putative MyoD family proteins, CiMDFa and CiMDFb, which are essentially identical except for a 68 amino acid C-terminal extension associated with CiMDFb. This extension contains the serine-rich region identified as Domain III (Rhodes and Konieczny, 1989; Fujisawa-Sehara, 1990) that is conserved among all known MyoD family proteins with the exception of the ascidian myogenic factors HrMDF and CiMDFa. It is most likely that only one of two transcripts has been cloned in H. roretzi and the other mRNA (CiMDFb equivalent) remains to be characterized. The presence of Domain III in only one of the two putative MyoD family proteins of C. intestinalis (CiMDFb) is provocative. Vertebrate MyoD family proteins use Domain III, together with a nearby less conserved serine-rich region, to form a two-part transcriptional activation domain that is necessary for muscle-specific gene expression (Schwarz et al., 1992). This finding suggests that a similar activation domain can be formed by Domain III of CiMDFb and may be necessary to insure high level transcription of muscle-specific genes. In this model, CiMDFb would be a potent activator of muscle gene transcription, while CiMDFa, which does not have Domain III, would be a relatively weak activator of muscle gene transcription and may fulfill other regulatory functions. For example, CiMDFa may play an autoregulatory role that insures the continuation of muscle development by leading to the expression of CiMDFb. The presence of E-boxes in the HrMDF gene is consistent with this possibility (Araki et al., 1994).
The temporal differences in the expression of CiMDFa and CiMDFb also suggests that these proteins may regulate muscle development in the primary and secondary lineages, respectively. This idea is consistent with the finding that muscle differentiation begins earlier in the primary lineage than in the secondary lineages (for example see Meedel et al., 1987), however, it is not consistent with the findings that CiMDFb transcripts are present at comparable levels in the primary and secondary muscle cells at 11 hours of development (see Figs 2 and 3A). These results suggest a role for CiMDFb in both lineages. The whole-mount results do not address the question of whether a unique role may exist for CiMDFa in the primary muscle lineage early in embryogenesis. Unfortunately, this possibility can not be resolved easily by in situ hybridization. The CiMDFa-specific 3′-untranslated region (which is only 130 nucleotides long and is 64% (A+T)) yields a probe of such diminished sensitivity as to limit its usefulness even in northern blot experiments (data not shown). It is likely that the required sensitivity and specificity needed to determine the lineage distribution of CiMDFa transcripts can be achieved by examining partial embryos (for example see Meedel and Whittaker, 1984) using RT-PCR.
C. intestinalis embryos appear to generate muscle by a mechanism that involves the biphasic expression of two putative MyoD family transcription factors. This mechanism has parallels to vertebrate muscle development where paralogous genes produce different MyoD family proteins with diverse but overlapping expression patterns and functions. As in other organisms, however, myogenesis is a complex process that certainly requires the activity of many regulatory genes in addition to those of the MyoD family. These genes are likely to include those of the MADS family, such as MEF-2, which appear to be indispensible for muscle development (Edmondson et al., 1992; Molkentin et al., 1995; Lilly et al., 1995).
Chordate muscle relationships
Patterns of MyoD family gene activity can be used to evaluate evolutionary relationships between muscle types. Vertebrate MyoD family genes are active only in skeletal muscle and thus the presence of these transcripts in larval tail muscle and adult body-wall muscle of C. intestinalis implies a relationship between these ascidian muscle types and vertebrate skeletal muscle. This conclusion supports the widely held belief that ascidian tail muscle and vertebrate skeletal muscle are homologous tissues (Bone, 1989; Meedel, 1997), and is also consistent with inferences about the suspected kinship of ascidian body-wall muscle and vertebrate skeletal muscle (Nevitt and Gilly, 1986; Meedel and Hastings, 1993). The absence of MyoD family transcripts in both ascidian and vertebrate cardiac muscles suggests that the morphological and functional similarities between these cell types (Meedel, 1997) also extends to the molecular requirements for muscle specification.
Evolution of the MyoD family
The data presented here suggest that chordates have uniquely evolved a mechanism(s) of muscle development that requires the expression of multiple MyoD family proteins. Fig. 8 diagrams this hypothesis as an evolutionary tree that includes representative phyla for which the MyoD family gene(s) have been characterized. All non-chordate invertebrates examined use one structurally conserved MyoD family gene to encode only a single MyoD family protein. As shown in Fig. 8, multiple MyoD family proteins are found only in chordates. Although gene duplication events resulted in the appearance of multiple genes encoding distinct MyoD family proteins in vertebrates and amphioxus, our results show that a representative of the evolutionarily older subphylum, Urochordata, differentially expresses unique MyoD family proteins from a single gene. This suggests that selection pressures had already led to a molecular mechanism(s) of myogenesis that used multiple MyoD family proteins before gene duplications resulted in multiple MyoD family proteins. Undoubtedly, these mechanisms provided a competitive advantage for early chordates; but, in doing so, they also provided the molecular foundation for regulatory networks that vertebrates would eventually exploit to control skeletal muscle development. Since ascidian embryonic and adult body-wall muscles are not substantially more complex than other invertebrate muscle types (Squire, 1986), the selection pressures on early chordates were probably not to develop larger, more complex muscles. Instead, the pressures were more likely to be of a regulatory nature, perhaps associated with the rapid, yet coordinate expression of muscle genes in specific cell lineages during embryogenesis. These regulatory constraints may still be evident in ascidians and vertebrates through the spatial and temporal regulation of MyoD family proteins during embryogenesis (e.g., differential regulation of CiMDFa and CiMDFb in ascidians or early versus late functions of the vertebrate MyoD/Myf5 and myogenin/ MRF4 gene pairs).
The evolution of MyoD family genes/ proteins. The evolutionary tree (relative time-line not drawn to scale) shows phyla for which MyoD family gene data exist and includes pseudocoelomic animals (Nematoda) and coelomic animals of protostome (Arthropoda) and deuterostome (Echinodermata and Chordata) origins. The MyoD family genes are drawn to scale, relative to mouse MyoD, and are shown with the conserved Cys-rich/bHLH (solid black boxes) and Domain III regions (hatched boxes). The MyoD family gene of the preCambrian ancestor is shown identical to C. elegans CeMyoD as a first approximation of the structure of a MyoD family gene in this hypothesized animal. The data for amphioxus are derived from PCR fragments and do not include a complete description of the encoded proteins. The dashed line represents uncertainty regarding the possible existence of additional genes in amphioxus.
The evolution of MyoD family genes/ proteins. The evolutionary tree (relative time-line not drawn to scale) shows phyla for which MyoD family gene data exist and includes pseudocoelomic animals (Nematoda) and coelomic animals of protostome (Arthropoda) and deuterostome (Echinodermata and Chordata) origins. The MyoD family genes are drawn to scale, relative to mouse MyoD, and are shown with the conserved Cys-rich/bHLH (solid black boxes) and Domain III regions (hatched boxes). The MyoD family gene of the preCambrian ancestor is shown identical to C. elegans CeMyoD as a first approximation of the structure of a MyoD family gene in this hypothesized animal. The data for amphioxus are derived from PCR fragments and do not include a complete description of the encoded proteins. The dashed line represents uncertainty regarding the possible existence of additional genes in amphioxus.
Studies of the MyoD gene family in other invertebrate chordates are necessary to further evaluate this hypothesis; examination of MyoD family gene expression in amphioxus should be particularly informative. In addition, functional studies of CiMDF proteins, and of the developmental mechanism(s) generating multiple MyoD family proteins in ascidians, should also yield insights into the evolutionary pathways that resulted in the diverse muscle types found in chordates.
ACKNOWLEDGEMENTS
We thank Jane Loescher for her technical efforts and support, and Anne Kristensen of the Mayo Clinic Scottsdale Molecular Biology/Flow Cytometry Core Facility for DNA sequencing. Important insights and advice during the study were provided by J. R. Whittaker and Bob Crowther. Critical reviews of the manuscript were provided by Drs Barbara J. Wold, Eric D. Wieben and Kenneth E. M. Hastings. We also wish to thank Marvin Ruona (graphic artist) and Beverly K. Pratley (program assistant). This work was supported through funds provided by the Mayo Foundation, the Rhode Island College Faculty Research Committee and the American Philosophical Society.