Platyhelminthes are excellent models for the study of stem cell biology,regeneration and the regulation of scale and proportion. In addition,parasitic forms infect millions of people worldwide. Therefore, it is puzzling that they remain relatively unexplored at the molecular level. We present the characterization of ∼3000 non-redundant cDNAs from a clonal line of the planarian Schmidtea mediterranea. The obtained cDNA sequences,homology comparisons and high-throughput whole-mount in situ hybridization data form part of the S. mediterranea database (SmedDb;http://planaria.neuro.utah.edu). Sixty-nine percent of the cDNAs analyzed share similarities with sequences deposited in GenBank and dbEST. The remaining gene transcripts failed to match sequences in other organisms, even though a large number of these (∼80%)contained putative open reading frames. Taken together, the molecular resources presented in this study, along with the ability of abrogating gene expression in planarians using RNA interference technology, pave the way for a systematic study of the remarkable biological properties displayed by Platyhelminthes.
INTRODUCTION
The phylum Platyhelminthes (flatworms) consists of approximately 50,000 different species that populate a remarkable variety of niches(Littlewood and Bray, 2001). In addition to free-living forms, it encompasses parasitic organisms responsible for inflicting debilitating diseases upon hundreds of millions of people throughout the world (see World Health Organization fact sheet 115 athttp://www.who.int/inf-fs/en/fact115.html). Platyhelminthes are considered by many to occupy an important position in the evolution of the Metazoa (Adoutte et al.,1999; Henry et al.,2000; Tyler, 2001;Willmer, 1994), and the panoply of developmental properties displayed by these organisms has attracted the attention of generations of biologists(Newmark and Sánchez Alvarado,2002). For example, the ability of freshwater planarians to regenerate completely from small body fragments has been known for over two centuries (Morgan, 1898;Randolph, 1897), and the life cycles of some digenetic trematodes involve as many as three different hosts as well as both sexual and asexual strategies for their reproduction(Brusca and Brusca, 1990;Hyman, 1951). Yet, as important, abundant and diverse as platyhelminthes are, little is known about the molecular events that guide their sophisticated and often plastic biological properties.
Moreover, many members of this phylum possess large populations of undifferentiated mesenchymal stem cells, the study of which could contribute significantly to fundamental biomedical research in the areas of tissue regeneration, stem cell maintenance and degenerative disorders. In most free-living species these stem cells, which are often referred to as neoblasts, are used for the regeneration of missing body parts and/or the replacement of cells that are lost during the course of physiological turnover(Gschwentner et al., 2001;Ladurner et al., 2000;Newmark and Sánchez Alvarado,2000). Similarly, free mesenchymal cells in parasitic flukes are known to produce complete larval forms(Brusca and Brusca, 1990;Hyman, 1951), and in the cestode Taenia crassiceps complete cysts can be reconstituted from individual cells (Toledo et al.,1997). Thus, platyhelminthes also provide a unique opportunity for studying the mechanisms that underlie the control of cellular pluripotentiality.
To address many of these unsolved problems, we and others(Agata and Watanabe, 1999) have chosen to reintroduce the freshwater planarian as an experimental model. We report the establishment of a clonal line of a diploid, asexual form of the planarian Schmidtea mediterranea (Turbellaria, Tricladida), along with the isolation and sequence characterization of ∼3000 non-redundant,expressed sequence tags (ESTs) from this organism. Furthermore, we show the suitability of using planarians for high-throughput mapping of gene expression patterns in the whole animal, and introduce the S.mediterranea Database (SmedDb) in which the primary data,computational analyses and expression data reside(http://planaria.neuro.utah.edu). RNA interference (Sánchez Alvarado and Newmark, 1999) and the ability to label the S. mediterranea neoblasts specifically(Newmark and Sánchez Alvarado,2000) will permit the identification and characterization of genes involved in regenerative processes, ranging from the control of stem cell proliferation and differentiation to the regulation of polarity, growth, scale and proportion.
MATERIALS AND METHODS
Planarian culture
Clonal lines of the diploid, asexual strain of S. mediterraneafrom Barcelona, Spain (Benazzi et al.,1972) were generated in the laboratory by allowing individuals to undergo numerous fission cycles, feeding the regenerated fission progeny three to five times each week, and then amputating the fission progeny into multiple pieces when they had reached full size. The animals were fed on homogenized baby beef liver paste and maintained as previously described(Newmark and Sánchez Alvarado,2000).
cDNA library preparation
Heads and 2-3 day regeneration blastemas were isolated from individuals of asexual clonal line CIW4. Amputated tissue was immediately frozen in liquid nitrogen and stored at -80°C until use. Total RNA was isolated using TriZol reagent (BRL/Life Technologies); poly(A)+ RNA was prepared using oligo d(T) cellulose (BRL/Life Technologies). Standard procedures were used to synthesize and size-select the oligo d(T) primed cDNAs; the resulting cDNAs were directionally cloned into the EcoRI and XhoI restriction sites of pBluescript II SK (+) and electroporated into DH10B cells. Unamplified cDNA libraries were replica plated on nitrocellulose filters and grown on LB/carbenicillin plates. One set of replicate filters was used for hybridization to identify abundant clones; these were excluded from subsequent rounds of analysis. The second set of filters served as master filters for recovery of non-redundant clones; these filters were stored on LB/glycerol plates at -80°C. Non-redundant clones were picked and grown overnight at 37°C in Magnificent Broth (MacConnell Research) with 100μg/ml carbenicillin. Plasmid isolations were performed using a MiniPrep24 machine (MacConnell Research).
Sequence analysis, bioinformatics and the S. mediterraneaEST database (SmedDb)
Sequencing reactions were performed using Big Dye Terminator chemistry and the resulting products were run on an ABI Prism 377 DNA sequencer. The sequencing strategy is outlined in Fig. 1A. Obtained sequences were compared against one another using stand alone BLAST (Altschul et al.,1990) as a way to measure internal redundancy and to identify unique clones. Statistical analysis of the frequency distribution of unique sequences indicates that the non-redundant clones identified represent a significant proportion of the complexity of the libraries (50-55%; seehttp://planaria.neuro.utah.edufor details). In order to allow the management and internet browser accessibility of the data, unique sequences were deposited in a server database running Cold Fusion 4.5 (Allaire) and batch analyzed at GenBank for homology comparisons using BLASTc13 running either nucleotide-nucleotide(BLASTn) or translated (BLASTx) searches. In addition, dbEST was also queried using BLASTn, BLASTx and tBLASTx. Given the number of new sequences being continuously deposited into GenBank, SmedDb has been programmed to update the BLAST results for all planarian ESTs on command and/or automatically once a week.
Whole-mount in situ hybridization
Planarians (3-5 mm in length) and starved for at least 1 week were treated with 2% HCl for 5 minutes on ice, and then fixed on ice for 2 hours in Carnoy's fixative (EtOH:CHCl3:acetic acid, 6:3:1)(Umesono et al., 1997). After 1 hour in methanol at -20°C, the planarians were bleached in 6%H2O2 in methanol at room temperature. Bleached planarians were loaded into incubation columns in an Insitu Pro hybridization robot (Abimed/Intavis, Germany) and processed as described(Sánchez Alvarado and Newmark,1999) with modifications to accommodate the liquid handling characteristics of the machine.
GenBank accession numbers
GenBank accession numbers were: AY066058-AY066260; AY066262-AY066313;AY066315-AY066438; AY066440-AY067204; AY067206-AY068336; AY068339-AY068349;and AY068675-AY069025.
RESULTS
The S. mediterranea database
Two tissue-specific cDNA libraries made from the heads (H) and head blastemas (HB) of S. mediterranea clonal line CIW4 were used to generate expressed sequence tags (ESTs). Previous sequence analyses of 54 different cDNA clones obtained by subtractive hybridization(Sánchez Alvarado and Newmark,1998) demonstrated that the average length of 3′-untranslated sequences (3′-UTRs) in this species is ≤350 nucleotides. Therefore, to maximize output, minimize the sequencing effort and reduce the complexity of the computational analyses, we chose first to sequence only from the 3′-ends of all cDNA clones isolated from the H and HB libraries. Only clones that failed to display an open reading frame(ORF), or had no homology to GenBank sequences were selected for 5′-end sequencing (Fig. 1A). cDNA clones (5561 clones) representing 2979 non-redundant gene products were sequenced and characterized as shown inFig. 1A. Of these, 972 cDNAs were sequenced from their 5′ ends and subjected to the same bioinformatic protocol (Fig. 1A). Analysis of the EST collection revealed that ∼69% of the entries share similarities with sequences deposited in GenBank and dbEST. The remaining 31% bear no similarity to known sequences in other organisms, even though the majority contained putative ORFs (∼80%). This subset of possibly S. mediterranea- or platyhelminth-specific sequences is similar in number to the percentage of genes found to be species-specific in the proteomes of S. cerevisiae, C. elegans and D. melanogaster (Rubin et al.,2000).
The high percentage of planarian ESTs with putative orthologs in the public databases allowed us to further organize SmedDb into functional categories(Fig. 1B). The categories employed are derived from the expressed gene anatomy database (EGAD;http://www.tigr.org)and the gene ontology(http://www.geneontology.org)functional classification systems. Each entry in SmedDb consists of the cDNA name, similarity description and expression pattern, if available (see below). Selecting an entry in the database provides additional information such as the sequence sent for analysis, the assigned functional category, in situ hybridization data and the corresponding BLAST results linked to Entrez-PubMed(seehttp://planaria.neuro.utah.edu). Examples of SmedDb entries placed into functional categories are shown inTable 1. At least 77 transcription factors, 130 DNA replication/modification molecules and 97 receptors, channels and other membrane-associated proteins were putatively identified.
Category . | Subcategory . | Clone ID . | Description . |
---|---|---|---|
RNA metabolism | Transcription factors | H.119.4D | Class IV POU-homeodomain protein |
H.17.9E | Smad4 | ||
H.38.3f(T3) | LIM/homeobox protein LIM (HRLIM) | ||
H.8.6C | Homeobox protein DTH-2 | ||
H.110.1c | Pre B-cell leukemia transcription factor 2 (Pbx-2) | ||
DNA replication/modification | Chromosome/nuclear structure | H.90.1e(T3) | Sirtuin 6 |
HB.19.8F | Histone acetyltransferase MORF | ||
H.25.11e(T3) | Maleless gene product | ||
H.10.6c(T3) | Karyopherin α 3 | ||
Apoptosis | H.105.11H | Apoptosis inhibitor 2 | |
H.8.7G | Caspase 6 precursor | ||
H.57.1d | Probable Bax inhibitor 1 | ||
Cell-cell communication | Receptors | E-99 | FGF homologous factor receptor |
H.103.12E | Cysteine-rich fibroblast growth factor receptor | ||
H.111.10F | GABAA receptor-associated protein | ||
Other membrane proteins | H.110.2E | Mechanosensory protein 2 | |
H.119.4E | Endothelin converting enzyme | ||
H.44.6a(T3) | Serrate 2 | ||
Intracellular signaling | Channels/transporters | H.102.1B | Delayed rectifier K+ channel |
H.90.5b(T3) | Cu2+ transporter (Menkes disease-associated protein) | ||
H.16.11G | Na+/K+-ATPase α-subunit | ||
Transduction | H.2.10h(T3) | Rab GDP-dissociation inhibitor | |
H.31.11B | GTP-binding regulatory protein Gs α chain | ||
H.56.3A | cAMP-dependent protein kinase catalytic subunit |
Category . | Subcategory . | Clone ID . | Description . |
---|---|---|---|
RNA metabolism | Transcription factors | H.119.4D | Class IV POU-homeodomain protein |
H.17.9E | Smad4 | ||
H.38.3f(T3) | LIM/homeobox protein LIM (HRLIM) | ||
H.8.6C | Homeobox protein DTH-2 | ||
H.110.1c | Pre B-cell leukemia transcription factor 2 (Pbx-2) | ||
DNA replication/modification | Chromosome/nuclear structure | H.90.1e(T3) | Sirtuin 6 |
HB.19.8F | Histone acetyltransferase MORF | ||
H.25.11e(T3) | Maleless gene product | ||
H.10.6c(T3) | Karyopherin α 3 | ||
Apoptosis | H.105.11H | Apoptosis inhibitor 2 | |
H.8.7G | Caspase 6 precursor | ||
H.57.1d | Probable Bax inhibitor 1 | ||
Cell-cell communication | Receptors | E-99 | FGF homologous factor receptor |
H.103.12E | Cysteine-rich fibroblast growth factor receptor | ||
H.111.10F | GABAA receptor-associated protein | ||
Other membrane proteins | H.110.2E | Mechanosensory protein 2 | |
H.119.4E | Endothelin converting enzyme | ||
H.44.6a(T3) | Serrate 2 | ||
Intracellular signaling | Channels/transporters | H.102.1B | Delayed rectifier K+ channel |
H.90.5b(T3) | Cu2+ transporter (Menkes disease-associated protein) | ||
H.16.11G | Na+/K+-ATPase α-subunit | ||
Transduction | H.2.10h(T3) | Rab GDP-dissociation inhibitor | |
H.31.11B | GTP-binding regulatory protein Gs α chain | ||
H.56.3A | cAMP-dependent protein kinase catalytic subunit |
Complete lists for each category and their respective subcategories can be found and searched in SmedDb(http://planaria.neuro.utah.edu).
POU, Pit, Oct, Unc DNA-binding domain; Smad, similar to mothers-against decapentaplegic; LIM, Lim11, Isl1, Mec3 protein-binding domain; DTH, Dugesia tigrina homeobox; Pbx, postbithorax; MORF, monocytic leukemia zinc finger protein-related factor; Bax, Bcl2 associated X gene; FGF, fibroblast growth factor; GABA, gamma amino-butyric acid; GDP, guanosine diphosphate; GTP,guanosine triphosphate; cAMP, cyclic adenosine monophosphate.
Interestingly, when the planarian ESTs with significant homologies to GenBank are ranked by lowest expectancy value, we find that 64% of the entries in SmedDb have highest overall similarities to vertebrate rather than to invertebrate sequences (Fig. 1C). When comparative BLASTx analyses between SmedDb and the proteomes of C. elegans, D. melanogaster and H. sapiens were performed, a set of 124 S. mediterranea ESTs with significant similarity only to proteins found in the human genome were revealed. Sixty-three of these are similar to human genes encoding proteins of unknown function. Noteworthy is the presence in S. mediterranea of thymidine phosphorylase/endothelial cell growth factor 1 (BLASTx E=5×10-30), acyl-CoA dehydrogenase (BLASTx E=2×10-21), epoxide hydrolase (BLASTx E=5×10-29) and formiminotransferase cyclodeaminase (BLASTx E=4×10-42). These genes were recently postulated to be present in the human genome as a result of direct horizontal gene transfer(HGT) between bacteria and vertebrates based on their absence in the genomes of C. elegans and D. melanogaster(Lander et al., 2001). However, the presence of these transcripts in planarians suggests that these loci are most probably not shared by bacteria and vertebrates via HGT, but rather by descent through common ancestry(Kyrpides and Olsen, 1999;Stanhope et al., 2001).
High-throughput in situ hybridization
The ∼3000 independent ESTs available in SmedDb provide a wealth of material for studying the flatworms. One such use will be for identifying cell type- and region-specific markers. Thus, we have used whole-mount in situ hybridization to begin to determine the spatial expression patterns of SmedDb entries; to date, results from nearly 300 clones have been deposited in SmedDb, and more are being added regularly as they become available. The analysis has revealed some surprising complexities in the spatial expression patterns of many of the genes represented in the EST collection(Fig. 2). We find, for example,that the morphologically simple cephalic ganglia of flatworms display a diverse array of expression domains, some of which are depicted inFig. 2A (see figure legend for explanation). In addition, other organ-system-specific genes have been identified that label the gastrovascular system, the dorsal epithelium, the excretory system and the pharynx (Fig. 2B from top to bottom). We also find transcripts expressed in various subsets of cells, including the planarian neoblasts in which piwi, a transcript found in many metazoan stem cells(Benfey, 1999), can be detected(Fig. 2C, bottom picture). Striking expression patterns defining both dorsal and ventral boundaries have been observed as well. This is illustrated by the lateral view of in situ hybridization results using clone H.8.1f, which has no known homolog in the available databases (Fig. 2D).
Cell loss during de-growth
The identification of cell type-specific markers from the large-scale in situ hybridization screen provides new tools for studying morphallaxis, a classic problem first defined by Morgan in 1898(Morgan, 1898). Morphallaxis refers to the remodeling that occurs when small fragments of planarians (or other organisms, like Hydra) restore their appropriate proportion and pattern without adding additional tissue. In addition to this remodeling during regenerative events, planarians show a high degree of plasticity in their ability to either grow or de-grow, depending upon environmental conditions. During periods of prolonged starvation, planarians will shrink(Lillie, 1900;Schultz, 1904;Berninger, 1911;Child, 1911;Abeloos, 1930): a 20 mm long worm can be reduced to less than 1 mm over the course of several months. This change in body size is due to an overall reduction in total cell number, as opposed to a reduction in cell size(Baguñà and Romero,1981; Romero and Baguñà, 1991). Previous studies of this phenomenon have used techniques in which planarians are macerated into a suspension of individual cells. Using this method, roughly 13 different cell types from organisms in varying stages of growth and de-growth were classified and quantitated (Baguñà and Romero, 1981; Romero and Baguñà, 1991). Because the flatworms were dissociated into single cells in these studies, the distribution of the cells could not be monitored in the whole animal as it changed in size. Furthermore,the morphological criteria alone underestimated the true number of different cell types in the planarian.
cDNA clone H112.3c shows weak sequence similarity to degenerin 1 fromC. elegans and is expressed in a subset of cells near the anterior margin of the planarian (Fig. 3A); these cells are likely to be involved in chemoreception through ciliated pits that lie at the ciliated anterior margin in this genus(Farnesi and Tei, 1980). The number of H112.3c-expressing cells can be counted easily in organisms of different sizes after whole-mount in situ hybridization. Remarkably, the number of these cells increases linearly with length(Fig. 3B), suggesting that even for cell types comprising a small percentage of the body (∼0.03%), their total numbers are regulated as the animal grows and shrinks. How these organisms can `count' different cell types relative to total body size remains a complete mystery.
DISCUSSION
Considering that flatworms comprise the fourth largest phylum on Earth(Brusca and Brusca, 1990) and that many of its members have challenged scores of biologists and biomedical researchers, it is puzzling that the molecular biology of the Platyhelminthes has remained largely unexplored. The problems of regeneration, de-growth and proportion regulation remain as puzzling today as they were over 200 years ago. Furthermore, diseases such as Shistosomiasis, which is caused by members of this phylum, continue to be global public health problems with no signs of abating. Thus, deciphering the molecular principles underpinning the biology of these organisms should not only improve our knowledge of the phylum, but also contribute to the fields of developmental biology and biomedicine.
The establishment of a clonal line (CIW4) of the freshwater planarianS. mediterranea and the identification of nearly 3000 non-redundant cDNAs from this line will aid the molecular study of the most salient biological properties of this taxon. Nearly 70% of all S. mediterranea clones share significant homologies to sequences deposited in GenBank (Fig. 1B), and a large number of these have highest similarity to the deuterostome branch of the metazoans (Fig. 1C). These results indicate either a closer proximity of the phylum to the deuterostome lineage as recently proposed by Tyler(Tyler, 2001), or are more likely a reflection of the poor representation of invertebrate sequences in current databases. The latter possibility is illustrated by the identification in planarians of cDNAs encoding proteins that until recently were ascribed to be present only in bacteria and vertebrates based on a comparative analysis of the human, fly and nematode genomes(Lander et al., 2001). The presence of Thymidine phosphorylase/endothelial cell growth factor 1, acyl-CoA dehydrogenase, epoxide hydrolase and formiminotransferase cyclodeaminase inS. mediterranea suggests that these loci reached the vertebrates by common ancestry and not by horizontal gene transfer as originally proposed(Lander et al., 2001). Therefore, even though the proteomes of both C. elegans and D. melanogaster have been deposited in GenBank, limiting sequence comparisons to these two invertebrates is not sufficient to draw sound phylogenetic conclusions, especially on the basis of BLAST results alone. Only rigorous phylogenetic analyses can most closely approximate phyletic relationships and we expect that the sequences in SmedDb will contribute to the production of higher resolution intra- and inter-phyletic metazoan relationships.
In addition to sharing a large number of genes with the human, fly and nematode genomes, it should be noted that several planarian cDNAs with significant similarities to human sequences were not identified in the C. elegans or D. melanogaster genomes by BLAST searches. At least 63 of these cDNAs are similar to human genes encoding proteins of unknown function. Therefore, S. mediterranea is likely to expand and complement the repertoire of organisms used for the study of genes and pathways involved in various aspects of human biology and disease.
The high-throughput in situ hybridization analyses reported here will serve as a first step in deciphering the roles of genes encoding proteins of unknown function. The tissue- or cell type-specific expression patterns of these genes may provide hints as to their function. For example, cDNA clones H.14.5b and H.12.6g share similarity with human genes for which no function is known(hypothetical protein XP_044953.1; E=5e-9 and unnamed protein product AK022687; E=1e-12, respectively), and are expressed in neurons of both the planarian central and peripheral nervous system (seehttp://planaria.neuro.utah.edu). Our previous demonstration that double-stranded RNA can be used to inhibit gene expression in planarians(Sánchez Alvarado and Newmark,1999) provides the means for testing gene function on a large scale, thus allowing the functional characterization of novel, evolutionarily conserved gene products.
Furthermore, cell type-specific markers identified by large-scale in situ screens provide useful reagents for examining the processes of patterning,differentiation and remodeling in intact and regenerating planarians. We have shown the use of such a marker (H.112.3c) to quantify cell number changes as planarians alter their size, and found that these animals also regulate accordingly the numbers of a specific cell type(Fig. 3). This maintenance of pattern and proportion is a fascinating corollary to the regenerative abilities displayed by these organisms. In addition, little is known about the heterogeneity of the stem cell population in planarians and markers such as piwi (H.2.12c) will provide necessary reagents for analyzing the processes by which neoblasts differentiate to give rise to the ∼30 cell types in the animal. The tools described make these daunting problems more amenable to molecular dissection.
Finally, BLASTn and BLASTx queries also revealed that ∼31% of the cDNAs obtained do not share sequence similarities with the available databases. This lack of similarities with GenBank and dbEST is not due to the divergences commonly found in untranslated sequences, because only ∼20% of these cDNAs lack a putative ORF. These results suggest that some of these sequences may correspond to Platyhelminth-specific genes. Therefore, in addition to its obvious advantages for studying the problem of regeneration, the easily manipulable planarian provides a free-living counterpart likely to complement current research efforts on the parasitic forms, in particular Schistosoma mansoni and S. japonicum, for which abundant sequence data are being obtained (Snyder et al.,2001). Given that the parasitic flatworms are difficult experimental subjects, the ability to identify flatworm-specific genes through comparisons to S. mediterranea sequences should help identify candidate molecules for therapeutic intervention. Furthermore, the in situ hybridization data being generated in S. mediterranea will help identify genes expressed in cell types unique to the platyhelminthes,providing additional potential therapeutic targets. The combination of sequence comparisons, gene expression patterns, and RNAi technology provide new experimental possibilities for studying the free-living and parasitic members of this phylum. Thus, the SmedDb resources will be useful to a wide gamut of developmental and biomedical endeavors.
Acknowledgements
We thank our many colleagues at the Carnegie Institution, Department of Embryology for their support and encouragement during the early phases of this project. We are also indebted to Ms Allison Pinder at the Carnegie's sequencing core for overseeing the initial stages and training of personnel for automated sequencing procedures; and to Ms Jennie Bentz([email protected])for the design of the initial version of our web page and for suggesting the use of Cold Fusion to manage SmedDb. Our thanks are also extended to Mr Nestor Oviedo for in situ work. P. A. N. acknowledges the Cancer Research Fund of the Damon Runyon-Walter Winchell Foundation for supporting his study of planarian biology. This work was also supported by NIH NIGMS RO-1 GM57260 to A. S. A. and NIH F32-GM19539 to P. A. N.