ABSTRACT
Little is known about gene action in the preimplantation events that initiate mammalian development. Based on cDNA collections made from each stage from egg to blastocyst, 25438 3′-ESTs were derived, and represent 9718 genes, half of them novel. Thus, a considerable fraction of mammalian genes is dedicated to embryonic expression. This study reveals profound changes in gene expression that include the transient induction of transcripts at each stage. These results raise the possibility that development is driven by the action of a series of stage-specific expressed genes. The new genes, 798 of them placed on the mouse genetic map, provide entry points for analyses of human and mouse developmental disorders.
INTRODUCTION
Preimplantation development of mammalian embryos is marked by many critical and unique events, including the start of zygotic transcription, the first cell differentiation and the initiation of specific cell-cell adhesion (reviewed in Hogan et al., 1994; Pedersen, 1986; Rossant, 1986; Watson et al., 1992). Analysis of these processes is fundamental for the understanding of organ formation, for practical techniques such as veterinary cloning of animals (Wakayama et al., 1998; Wilmut et al., 1997), and for clinical applications such as in vitro fertilization (IVF) and the assessment of fetal well-being. In spite of its importance, very little is known about the molecular events during this early phase of development.
At a macro level, the outline of the process has become clear for several organisms. During preimplantation development, the embryo, confined within the zona pellucida, does not change in overall size. Rather, an increase in cell number is compensated by a decrease in cell size, giving rise to the terminology of ‘cleavage stages’. In most species studied, including Drosophila, C. elegans, Xenopus, sea urchins and fish, morphological changes and cell differentiation rely at this stage of development mainly on maternally stored mRNAs and proteins to drive the regional differentiation of embryonic cells (Wieschaus, 1996). But mammals significantly modify this program. Some oocyte mRNAs are translated, but fertilization triggers massive mRNA degradation (Piko and Clegg, 1982). Two major events then occur. One is the transcriptional activation of the zygotic genome. The timing of this transition is regulated by a ‘zygotic clock’ (Nothias et al., 1995; Schultz, 1993), and is somewhat species-dependent (at the late one-cell stage in mouse, 4-to 8-cell in human and 8-to 16-cell stage in sheep; Schultz, 1993). The second major event, compaction, occurs at the 8-to 16-cell stage, when cells that were previously loosely associated begin to adhere in the tightly organized cell mass of the morula. This is the starting point for cell differentiation into Inner Cell Mass (ICM), which eventually becomes the embryo, and Trophectoderm, which eventually becomes the placenta. By the 32-to 64-cell stage (blastocyst), the two cell types are clearly distinguishable. Now released from the zona pellucida, the blastocyst is implanted in the uterus.
A primary obstacle that has delayed molecular analysis of this developmental program is the difficulty of collecting and analyzing large numbers of eggs and embryos. In early studies, expression patterns of a limited number of genes observed by several methods led to the idea that, notwithstanding the dynamic changes, gene expression is monotonous: once gene expression has begun, it is not switched off, and the encoded proteins then accumulate as development proceeds to the blastocyst stage (reviewed in Kidder, 1992; Schultz and Heyner, 1992; Watson et al., 1992). Later analyses with high-resolution two-dimensional protein gels revealed dynamic changes in quantities of many proteins during the 1-to 4-cell stages (Latham et al., 1991) and the 8-cell to blastocyst stages (Shi et al., 1994), but only a limited number of genes have been identified so far (Schultz, 1999). Similarly some attempts to construct cDNA libraries (Adjaye et al., 1997, 1998; Rothstein et al., 1992, 1993; Sasaki et al., 1998; Taylor and Piko, 1987) and examine gene expression patterns by mRNA differential display (Oh et al., 1999; Schultz, 1999) have provided only short bits of transcripts and fragmentary information.
Aiming at a global survey of gene expression and a definition of the number of genes that are preimplantation-specific, we have adapted techniques to generate cDNA libraries from each stage of preimplantation mouse embryos, carried out large-scale sequencing of cDNAs from each stage, and mapped 798 of the novel species on the mouse genome. The results support the inferences that (1) a significant fraction of the genome is dedicated to genes expressed specifically in early development, adding considerably to the nascent catalogue of mammalian genes; (2) genes coexpressed in the same stage tend to cluster in the genome; and (3) the expressed genes include cohorts acting in a stage-specific manner that may suggest a ‘hit and run cascade’ model for the developmental process.
MATERIALS AND METHODS
Mouse preimplantation embryo collection
Eggs and embryos were collected by standard methods (Hogan et al., 1994). C57BL/6J female mice were superovulated and mated with C57BL/6J male mice. Unfertilized eggs were collected without mating. Embryos from all the other stages were collected by killing the pregnant mice at 0.5, 1.5, 2.5 and 3.5-days post coitum (d.p.c.). Embryos were staged by visual inspection under the stereomicroscope. To avoid undesirable effects of culturing the preimplantation embryos, all the embryos up to the blastocyst stages were collected by flushing the oviduct and uterus.
Construction of stage-specific cDNA libraries
The seven cDNA libraries were constructed from each of seven stages of preimplantation development in essentially the same manner as previously described (Takahashi and Ko, 1994). The normalization and mechanical shortening of cDNA inserts steps were omitted. In brief, total RNAs were extracted from 1528 unfertilized eggs and double-stranded cDNA was synthesized by a kit (Life Technology, Superscriptase II) with an oligo(dT)NotI primer (5′-pGACTAGTT-CTAGATCGCGAGCGGCCGCCC15(T)-3′) from 2.7 μg of total RNA. The double-stranded cDNAs were treated with T4 DNA polymerase and purified by ethanol-precipitation. The cDNAs were ligated to Lone-linker LL-Sal3 (LL-Sal3A: 5′-pGCTATTGACGTCG-ACTATCC-3′, LL-Sal3B: 5′-pGGATAGTCGACGTCAAT-3′). The cDNAs were purified by phenol/chloroform and separated from free linkers by Centricon 100. Then, cDNAs were amplified by long-range high-fidelity PCR using Ex Taq polymerase (Takara) for 25 cycles under the following conditions: denature at 94°C for 20 seconds, 25 cycles of 94°C for 10 seconds, 68°C for 10 minutes (plus 20 seconds for each additional cycle), and a final extension at 72°C for 10 minutes, on a Perkin-Elmer GeneAmp PCR system 9600. Then, the cDNAs were purified by phenol/chloroform and by Centricon 100. The cDNAs were double-digested with SalI and NotI enzymes. Next, the cDNAs were purified by phenol/chloroform extraction and ethanol-precipitated. Then, the cDNAs were size-selected by Size Fractionation Column (Life Technology, Fraction 8 to 10). The cDNAs were ethanol-precipitated and cloned into the SalI/NotI site of pSPORT1 plasmid vector. The DH10B E. coli host was transformed with the ligation mixture by chemical methods.
The other libraries were constructed essentially in the same manner. For the fertilized egg library, double-stranded cDNA was synthesized from 5.4 μg of total RNA extracted from 1137 fertilized eggs. For the 2-cell library, double-stranded cDNA was synthesized from 1.2 μg of total RNA extracted from 397 embryos. For the 4-cell library, double-stranded cDNA was synthesized from 2.6 μg of total RNA extracted from 32 embryos. For the 8-cell library, double-stranded cDNA was synthesized from 4.3 μg of total RNA extracted from 230 embryos. For the 16-cell library, double-stranded cDNA was synthesized with an oligo(dT)GC primer 5′-pGACTAGTTCTAGATCGCG-AGCGGCCGCGC15(T)-3′ from 2.1 μg of total RNA extracted from 42 embryos. For the blastocyst library, double-stranded cDNA was synthesized with an Oligo(dT)-1 primer 5′-GAGAGAGACTAG-TTCTAGATCGCGAGCGGCCGC18(T)-3′ from 1.5 μg of total RNA extracted from 40 embryos.
A single-path sequencing of cDNA clones
A single-path cDNA sequencing was conducted as described (Ko et al., 1998). The 96-well microtiter plates were thawed and cDNA clones were inoculated into a 1 ml deep-well 96-microtiter plate (Beckman). Plasmid preparations from the cDNA clones were performed with Qiagen’s 96-well format REAL-prep system. The plasmid DNAs were resuspended in 50 μl TE (8.0) buffer. 5 μl of DNA were used for cycle-sequencing reactions. The first 2000 Blastocyst ESTs were sequenced using standard dye primer chemistry (Perkin-Elmer-ABI). The ESTs from all other libraries, and the remaining 4000 Blastocyst ESTs, were sequenced using ET-dye primer chemistry (Amersham). All sequencing reactions were performed by an ABI Prism 877 Integrated Thermal Cycler (Perkin-Elmer-ABI).
Sequence data analyses
Clustering of 3′-EST sequences was done using the Blast2 program (Altschul et al., 1990). The criteria for identifying the unique gene set will be described elsewhere (A. H. and H. D., in preparation). In brief, all the 3′-ESTs were searched against each other for sequence similarities. For each EST, hits were sorted according to the score and the difference of scores between that EST and each of the hits were examined. Hits with a score greater than 70% of the highest score (generated by an EST’s homology to itself) in each list were classified to the same group. All the ESTs below this threshold were classified to other gene sets.
Estimation of gene expression levels by EST frequency
The EST data sets from each cDNA library were subjected to Blast2 analyses against the set of 9718 unique genes. Then, the frequency of EST appearance for each gene was tabulated. The 95% confidence interval for each EST frequency in a total 3000 EST set is as follows: 0 EST matches, sample proportion 0, (0, 0.0012); 1 EST matches, sample proportion 0.0003 (0, 0.0018); 2 EST matches, sample proportion 0.007, (0.0001, 0.0023); 9 EST matches, sample proportion 0.0030, (0.0013, 0.0055). Therefore, differences among 0 matches, 1 match and 2 matches are not statistically significant. According to Fisher’s exact test results, 7 EST matches in one library are required to have a statistically significant difference from 1 EST match in another library (1-sided; P=0.035). Similar results were obtained by the formula that has been developed to test the significance of differential gene expression in EST/SAGE projects (Audic and Claverie, 1997; Claverie, 1999). Application of this formula to a total 3000 EST set in each cDNA library indicates a differential gene expression with a probability greater than 0.96 and less than 0.97 in the following combinations of EST matches: 5 EST matches in one library and 0 EST matches in another library, 7 EST matches in one library and 1 EST match in another library, 9 EST matches in one library and 2 EST matches in another library, 11 EST matches in one library and 3 EST matches in another library, 13 EST matches in one library and 4 EST matches in another library, and so on.
RT-PCR analyses
For each stage, 10 embryos were collected under the microscope and stored in 10 μl BGJb Medium (Life Technology). Embryos were collected and directly lysed in 0.05% NP40. Samples were sequentially diluted in fivefold steps and subjected to RT-PCR. Reverse transcription and PCR amplification were performed using EzrTth RNA PCR kit (Perkin-Elmer) in 50 μl reaction mixtures containing 25 mM manganese acetate, 0.2 units rTth DNA polymerase, 10 ng/μl primers (for the Alpha03732 gene: 5′-GTTCCAGGAGACTAAGTTTCCGTG-3′, 5′-AGGCTGTCCATCAGAAAGTTGCT-3′; for the gamma-actin gene: 5′-TTCCTGCGCAGATCGCAA-3′, 5′-GTGACAATGCCGTG-TTCGATAGG-3′), 10 mM dNTPs, 250 mM Bicine (pH 8.2), 575 Mm potassium acetate and 10 μl RNAs. Reactions were incubated at 60°C for 30 minutes for reverse transcription, 94°C for 1 minute for preheating, 40 cycles of PCR at 94°C for 15 seconds and 58°C for 30 seconds, followed by the final extension at 58°C for 7 minutes. The PCR products were electrophoresed on a 3% agarose gel and the gel was stained with SYBR Green. The gel was analyzed with a STORM phosphor Imager (Molecular Dynamics).
Genetic mapping of new ESTs
New ESTs were mapped on the mouse genetic map by using The Jackson Laboratory BSS Interspecific Backcross Panel (Rowe et al., 1994). PCR primer pairs were developed from approximately 350 bp of the most 3′-end of the cDNA sequences to increase the chance of having sequence polymorphisms between C57BL/6J and M. Spretus (Takahashi and Ko, 1993). Primers were designed as a batch in a semi-automatic manner on a Sun Workstation, UNIX platform. The Unix version of PRIMER program developed by the WI/MIT Mouse Genome Center (http://www-genome.wi.mit.edu) was used as a core engine of our primer design program. The front end and the back end of the programs were written by our group. A total of 4500 primer pairs were developed during the course of this work. The primer pairs are available from the Research Genetics (http://www.resgen.com/). To test for sequence polymorphisms, genomic DNAs of C57BL/6J, M. spretus, and an equimolar mixture of C57BL/6J and M. spretus, were amplified by the each primer pair. The PCR products were run on a customized polyacrylamide gel electrophoresis system using 10% non-denaturing polyacrylamide gels. Electrophoresis was performed for 1 hour at 250 volts. The gels were stained with ethidium bromide and photographs were taken on a UV transilluminator. Only the PCR primer pairs that exhibited heteroduplex bands were used in the gene mapping study. Approximately 1000 primer pairs fell into this category.
Assembly of the PCR reactions was performed with the Biomek1000 robotic workstation (Beckman). Genotyping of The Jackson Laboratory BSS Interspecific Backcross DNAs (94 N2 animals plus C57BL/6JEi and SPRET/Ei parental DNAs) was scored by visual inspection and analyzed by the Map Manager computer program (Manly, 1993).
Data and cDNA clones access
All cDNA clones reported in this paper are available from the American Type Culture Collection (ATCC: http://www.atcc.org/) or RIKEN DNA Bank (http://www.rtc.riken.go.jp/DNA/HTML/engsearch.html and http://www.rtc.riken.go.jp/DNA/mouse_info.html). cDNA sequence information is available through Entrez and BLAST servers at NCBI, NIH (http://www.ncbi.nlm.nih.gov/). Detailed map locations of genes are accessible through The Jackson Laboratory Backcross DNA Mapping Resource [http://www.jax.org/resources/documents/cmdata/bkmap/BSS.html]. Information about gene clustering and expression profile is available at ERATO Doi Project Home page (http://www.bioa.jst.go.jp/pge/). PCR primer pairs are available from the Research Genetics (http://www.resgen.com/). Finally the detailed information about cDNA clones, sequences, PCR primer pairs, and the library-specific BLAST search is available at the Laboratory of Genetics home page (http://lgsun.grc.nia.nih.gov).
RESULTS
Construction of cDNA libraries and characterization
cDNA libraries were constructed from each of seven mouse preimplantation stages (Fig. 1). The cDNAs were directionally cloned, with an average size of insert of about 1.5 kb. cDNA clones from each library were arrayed in 96-well microtiter plates and about 400 bp sequenced from 3′ termini. All 25,438 Expressed Sequence Tags (ESTs) obtained were deposited in the public sequence database and have been made available to the scientific community since the summer of 1997 [GenBank accession numbers: C75935-C81630; C85044-C88357, AU014577-AU024803, AU040095-AU046300].
By comparing the EST sequences to the repetitive sequence database, 2279 ESTs containing repeat sequences were identified (Beta clusters in Table 1). ESTs with low complexity sequence information were also identified and excluded from the further analyses (Gamma clusters in Table 1). The rest of the ESTs (23,155, Alpha clusters in Table 1) were condensed to a set of 9718 unique genes based on sequence similarity searches against one another. Similar genes were sought by BlastN search of the non-redundant (nr) public sequence database. Only 10% of the genes (955) showed close matches and were identified as known [named] genes (e.g. Table 2; for more complete information, refer to http://www.bioa.jst.go.jp/pge/). Furthermore, when similar ESTs were sought by the BlastN program against NCBI’s public EST database (dbEST; Boguski et al., 1993), only 55% (5300) showed close matches to at least one EST from human, mouse, or rat. Considering the large number of ESTs (Adams et al., 1991; Hillier et al., 1996; Hwang et al., 1997; Marra et al., 1999, 1998; Okubo et al., 1992) in the dbEST (>1×106 for human, >4×105 for mouse, and >2×105 for rat), the rate of discovery of new ESTs in these cDNA libraries is very high. It supports the notion that many genes expressed in mammalian preimplantation stages have not been otherwise isolated.
Global changes of gene expression patterns during preimplantation development
For each unique gene, the number of reads from each cDNA library was summed. Table 2 shows an example of this summation, using ‘named genes’; complete results are available through the World Wide Web http://www.bioa.jst.go.jp/pge/. Since the frequency of ESTs in a particular cDNA library corresponds roughly to the expression level of the gene (Okubo et al., 1992), the data compiled here provide a first approximation of gene expression levels at each stage.
To assess changes in gene expression, the 9718 unique gene set was grouped into four main patterns based on the EST frequency at each stage (Fig. 2). Since one EST in approximately 3000 obtained from each developmental stage may be present by chance particularly for genes with very low level of expression, the initial analyses have focused mainly on genes with relatively abundant expression, i.e. genes represented by more than two independent clones in a cDNA library. Though this criterion is still statistically weak for individual genes (see Materials and Methods section), the groupings provide an indication of global changes.
The majority of genes (75%) were in Group A ‘Low expression’ throughout preimplantation development. A very small fraction (0.11%) of genes showed constitutive expression throughout preimplantation development (Group B). Some genes (3.25%) showed complex expression patterns (Group C) that may reflect up-and-down regulation but are probably also affected by sampling statistics. The rest of the genes (22%) were classified in ‘Single-peak expression’ (Group D). This group consists of genes undergoing: (1) gradual degradation from maternally stored mRNAs (3.44%, Groups D1-D4) and (2) constitutive expression once the gene is activated at a certain stage (2.14%, Groups D9, D14, D19, D22, D24 and D25). But it also includes an unexpectedly large number of genes (17.2%) that show apparently stage-specific expression.
Stage-specific expressed genes
Expression patterns with a sharp peak in only one stage were observed for each stage. For successive stages, fertilized egg-specific genes (Group D5) comprised 2.50% of the total RNAs; 2-cell-specific (Group D10), 2.90%; 4-cell-specific (Group D15), 2.55%; 8-cell-specific (Group D20), 3.57%; Morula-specific (Group D23), 3.98%; and Blastocyst-specific (Group D25), 1.66%. Table 3 shows examples of such stage-specific expressed genes selected for a statistically significant level of expression that is comparable to the level for classical constitutively expressed genes like actin.
Of course, the EST frequencies in each cDNA library only roughly correlate with the expression level of genes, and there are two possible artifacts that could make these results deviate from the actual expression level of genes: (1) distortion of levels during PCR amplification of cDNA mixtures, and (2) statistical sampling variations. Two lines of evidence, however, argue against these artifacts influencing our results. First, the expression patterns of known genes that have been previously independently studied are consistent with those reported here. For example, the EST frequency of S-adenosylhomocysteine hydrolase is significantly increased at the 16-cell embryo stage and decreased again at the blastocyst stage (Table 3). This gene has been identified as the causative gene for the mouse lethal nonagouti (ax) mutation and our results are consistent with the reported expression of this gene in preimplantation stages (Miller et al., 1994). Another example is connexin31, a gap-junction protein. Blastocyst-specific expression based on EST analysis (Table 3) is consistent with the previous report (Reuss et al., 1997). High expression of radixin at the Morula stage is also consistent with its function as a cell adhesion molecule and its reported expression pattern (Funayama et al., 1991), and blastocyst-stage specific appearance of placental lactogen II (Jackson et al., 1986) is also consistent with placenta-specific expression of the gene.
Second, reverse transcriptase (RT)-PCR analyses on staged embryos confirmed the expression patterns for selected genes, e.g. the gene Alpha03732 (Fig. 3A). Low levels of transcripts were also detected in unfertilized eggs and fertilized eggs, but transcripts were most abundant at the 2-cell stage, confirming the observed EST frequency that showed 2-cell stage expression of this gene. Gamma-actin as a control showed a comparable level of transcripts in all stages (Table 3, Fig. 3B).
Genetic mapping of novel ESTs
Primer pairs were designed and synthesized for 4500 ESTs, selected from the large fraction of new ESTs that were not ‘named genes’ in the public sequence database. The PCR products of all primer pairs were tested for sequence polymorphisms between C57BL/6J and M. spretus by a heteroduplex assay (Ko et al., 1998). About 800 primer pairs were found to produce sequence polymorphisms, and these were genotyped on The Jackson Laboratory BSS Interspecific Backcross Panel. The map location of these ESTs, along with known genes (Ko et al., 1994) and ESTs (Ko et al., 1998) that we previously mapped on the same panel, are shown in Fig. 4. Fig. 4 also indicates the expression patterns of individual genes, assessed by counting the representation of these ESTs at each stage of development. More detailed map information, including the raw data and the relative positions of these markers to other markers in this mapping cross, is accessible through the World Wide Web [http://www.jax.org/resources/documents/cmdata/bkmap/BSS.html]. It is interesting to note that some genes with similar developmental expression patterns have been shown to cluster on the mouse genetic map. For example, there are clusters of blastocyst-specific genes in chromosomes 1 and 8, and unfertilized egg-specific genes in chromosomes 7 and 16 (Fig. 4).
DISCUSSION
Large-scale cDNA sequencing projects have successfully produced more than 1 million ESTs (Adams et al., 1991; Hillier et al., 1996; Hwang et al., 1997; Marra et al., 1999, 1998; Okubo et al., 1992). However, the majority of cDNA libraries have been derived from adult organs and tissues. Because many genes are only expressed at limited times and places, and often at low levels, the gene catalogue thus remains incomplete. In particular, limited information has been obtained about transcripts in the stages of preimplantation mammalian development (Adjaye et al., 1997, 1998; Rothstein et al., 1992, 1993; Sasaki et al., 1998). The optimization of a PCR-based cDNA library construction method (Takahashi and Ko, 1994) has provided seven cDNA libraries used here, and although the libraries were not normalized, a high rate of new gene discovery was seen (9718 unique genes from a total of 25,438 ESTs). This very likely reflects the high complexity of mRNA species in preimplantation embryos. Furthermore, about 50% of the 9718 unique genes were seen for the first time in this study, presumably because these preimplantation mammalian embryonic stages have not been extensively used in other EST projects.
Discussion of the genes showing very interesting expression patterns is largely outside the scope of this paper, but one can easily illustrate some of the usefulness of the data. For example, ERCC3, which is part of TFIIH that has helicase activity and is involved in base excision repair of transcribed DNA (Weeda et al., 1990), is particularly abundant at the 2-cell stage (Table 2). This is concurrent with the initiation of zygotic transcription when many nuclear genes are starting to be expressed. Another example is the BMAL1 gene, which was recently identified as a partner for heterodimer formation with CLOCK genes and plays a critical role in mammalian circadian rhythm (Darlington et al., 1998). Transcripts were present in unfertilized eggs, 2-cell and 16-cell stages (Table 2, http://www.bioa.jst.go.jp/pge/). Although further analyses are required, this pattern suggests that the gene is expressed intermittently, skipping some developmental stages. This finding implies that the timing of cell division at the cleavage-stage mouse embryos may be controlled by the same pathway as the circadian rhythm in the adult mouse.
The newly mapped genes will provide a valuable resource for positional cloning of mouse genes. Given their substantial homology in gene organization, the mouse data should also help the positional cloning of human genes. In addition, many of these genes are apparently unique to early mammalian embryos. Consequently, the gene-mapping efforts presented here will provide a complement to the ongoing large-scale EST mapping projects in human and mouse. Finally, from the EST PCR primer pairs described here, approximately 2500 are now being mapped on the T31 radiation hybrid panel at the MRC UK Mouse Genome Center (Paul Denny and Steve Brown, personal communication, http://www.mgc.har.mrc.ac.uk/est_maps.htm). The 798 genes mapped here in The Jackson Backcross Panel will help to anchor the genetic and radiation hybrid maps.
The map also provides a genome-wide view of the distribution of genes with information on expression levels at each stage (Fig. 4). One significant feature of the map is that genes with similar expression patterns appear to cluster on the mouse genome. This support the previous suggestion that mammalian genomes are evolved so that coexpressed genes often tend to cluster (Ko et al., 1998). Physical proximity may provide the embryos with an efficient means of coordinately regulating the expression of many genes.
Two general methods for global expression analyses – a high resolution two-dimensional protein gel analysis and mRNA differential display – have been applied to the preimplantation mouse development. Comprehensive two-dimensional gel analyses during the 1-, 2-and 4-cell stages have identified about 1500 protein spots, some 38 of which show a transient increase in a 2-cell stage-specific manner (Latham et al., 1991). Another study has examined 1674 protein spots between compacted 8-cell and blastocyst-stage mouse embryos and identified 43 protein spots that are present only at 8-cell stage and 75 protein spots that are present only at blastocyst-stage (Shi et al., 1994). Because the two-dimensional protein gels represent the protein biosynthesis and/or the modification of pre-existing proteins (Oh et al., 1999), the overall pattern changes could reflect differential regulations at the translational or post-translational level. Another inherent drawback is a difficulty in isolating genes that correspond to each protein spot. In fact, only a few protein spots have been identified at a gene level (Schultz, 1999). In contrast, mRNA differential display has been successfully used to identify some 2-cell, 8-cell and blastocyst-specific cDNA fragments (Zimmermann and Schultz, 1994). However, only one new gene (eIF-4C) has been identified as a 2-cell stage specific gene (Davis et al., 1996; Schultz, 1999). The mRNA differential display has inherent difficulties for gene identification, because isolated cDNA fragments are usually too short to provide clear identity and too fragmentary to recover full-length cDNA clones. In this sense, the subtraction cDNA library method has been most successfully used to isolate stage-specific expressed genes (Oh et al., 1999), although it does not provide much information about global changes of gene expression patterns. Four new genes that are present in the 1-cell and 2-cell embryos, but are absent in the 8-cell embryos, have been identified in this manner (Oh et al., 1999). Considering these facts, the EST approach may have advantages over the other methods, because (1) cDNA clones are readily available for the detailed analysis of genes; (2) the expression levels of genes are monitored as the relative abundance of mRNAs; and (3) it provides the information about global changes of gene expression patterns.
The EST data sets from each stage-specific cDNA library provide a first gene-based index of overall gene expression patterns during preimplantation development. For early cleavage-stage embryos, the data presented here confirm earlier analyses based on smaller numbers of genes (reviewed in Kidder, 1992; Schultz and Heyner, 1992; Watson et al., 1992). For example, the results support the previous inference that most maternally stored mRNAs in unfertilized eggs are degraded by the two-cell stage (Piko and Clegg, 1982). The individually isolated genes identify many known and unknown genes that may contribute to a more detailed understanding of processes like RNA degradation. Approximately 50% of all the genes in our collection initiated expression at the 1-cell and 2-cell stages. This is also consistent with previous studies (Piko and Clegg, 1982). Because the majority of studies of zygotic gene activation have been done by analyzing the transcriptional activity of exogenously injected genes (reviewed in Kaneko and DePamphilis, 1998), endogenous genes found here will provide the additional means with which to analyze the zygotic gene activation. An obvious extension of the studies will be to see whether specific cohorts of genes are activated earlier than others.
In contrast, the data presented here would suggest that the paradigm for the late cleavage-stage embryos needs to be revised. The existing paradigm is that although the onset of expression of individual genes is varied, almost all of them continue to be expressed once expression starts (Kidder and McLachlin, 1985; Levy et al., 1986; Kidder, 1992; Schultz and Heyner, 1992; Watson et al., 1992). Such findings have prompted researchers to believe that ‘the transcription of most genes in preimplantation development is not temporally linked with the morphogenetic transitions they participate in’ (Kidder, 1992). However, the new finding here of many apparently stage-specific expressed genes may challenge this conventional view of the regulation of gene expression during early mouse development. Preimplantation development usually takes 4 to 6 days, and each cell division is therefore rather slow, and each cell cycle/division has time to generate new and unusual gene expression patterns from the selective destruction and activation of many genes.
The existence of many stage-specific expressed genes has two primary implications. First, developmental processes may lead to significant differences in gene expression in the transition from 2 cells to 8 cells, even though the process appears to be simply a division of cells. Second, stage-specific expressed genes may actively promote the advancement of embryos from one stage to the next. Considering the rapid and selective turnover of these particular mRNAs, there is likely to be selection for their rapid degradation. The requirement of function of that particular gene is apparently transient, which suggests a ‘hit-and-run’ type of mechanism for expression of cascades of genes. It will be of interest to see, for example, if specific inactivation of phase-specific genes causes arrest of development at stages of oocyte formation or cleavage.
Acknowlegments
We would like to thank Drs David Schlessinger, Dan Longo, and Ramaiah Nagaraja for critical reading of the manuscript, Ms Shoshana Stern for assisting with the art work, and Ms Mary Barter for assisting with the genetic map figure. This work was supported by Science and Technology Agency in Japan through the ERATO/JST program, and received additional support from NICHD/NIH grant (RO1HD32243) and the NIA/NIH intramural research program.