The rigid crustacean exoskeleton, the cuticle, is composed of the polysaccharide chitin, structural proteins and mineral deposits. It is periodically replaced to enable growth and its construction is an energy-demanding process. Ecdysis, the shedding event of the old cuticle, is preceded by a preparatory phase, termed premolt, in which the present cuticle is partially degraded and a new one is formed underneath it. Procambarus clarkii (Girard 1852), an astacid crustacean, was used here to comprehensively examine the changing patterns of gene expression in the hypodermis underlying the cuticle of the carapace at seven time points along ~14 premolt days. Next generation sequencing was used to construct a multi-tissue P. clarkii transcript sequence assembly for general use in a variety of transcriptomic studies. A reference transcriptome was created here in order to perform digital transcript expression analysis, determining the gene expression profiles in each of the examined premolt stages. The analysis revealed a cascade of sequential expression events of molt-related genes involved in chitin degradation, synthesis and modification, as well as synthesis of collagen and four groups of cuticular structural genes. The new description of major transcriptional events during premolt and the determination of their timing provide temporal markers for future studies of molt progress and regulation. The peaks of the expression of the molt-related genes were preceded by expression peaks of cytoskeletal genes that are hypothesized to be essential for premolt progress through regulating protein synthesis and/or transport, probably by remodeling the cytoskeletal structure.
Similar to all arthropods, crustacean growth is a stepwise process involving the periodic shedding and subsequent reconstruction of a rigid external exoskeleton, the cuticle. The structural proteins and the enzymes responsible for the partial degradation of the old cuticle and the construction of the new one are synthesized in, or transported through the underlying hypodermis, which is composed mainly of epithelial cells supported by structural and storage connective tissues (Glazer et al., 2013; Johnson, 1980; Roer and Dillaman, 1984). The cuticle is composed of four layers ordered: epicuticle, exocuticle, endocuticle and membranous layer (Roer and Dillaman, 1984; Skinner, 1985; Travis, 1965). The epicuticle is the outer layer facing the environment. They are mainly composed of the polysaccharide chitin bound to structural proteins and include deposited calcium salts. In relation to the required functions at the specific body location and life-cycle stage, a variety of structural proteins, chitin structural characteristics and intensities of mineral deposition are hypothesized.
The crustacean molt cycle (Aiken and Waddy, 1987; Chang, 1995; Skinner, 1985) is divided into four stages, intermolt, premolt, ecdysis and postmolt. Intermolt is characterized by a fully developed, calcified cuticle with very low molt-related activity. Premolt begins with partial digestion of the cuticular chitin and proteins (Shechter et al., 2008a; Wheatly and Ayers, 1995), which is accompanied by the synthesis of new underlying exocuticle and epicuticle. At this stage, there is no major calcification in the new cuticle, which therefore remains flexible. Change of shape and also proliferation of the epithelial cells of the hypodermis occur during premolt, returning to intermolt appearance after ecdysis (Johnson, 1980). Ecdysis, the molting event, is driven by an active absorption of water to increase the body's volume, leading to rupture of the partially degraded old cuticle and stretching of the new soft one, enveloping the now larger individual (Chung et al., 1999). Postmolt is characterized by the formation of the endocuticle and membranous layer, as well as cuticular calcification and sclerotization (Andersen, 2010) accompanied by gradual replacement of the absorbed water by tissues (Aiken and Waddy, 1987; Chang, 1995; Skinner, 1985). In astacid crayfish, transient calcium structures named gastroliths are deposited during premolt in a cavity between a specialized hypodermis and the cuticular layer of the cardiac stomach, experimentally serving as molt stage markers. At postmolt they are dissolved and used as a mineral source for the cuticle (Nakatsuji et al., 2000; Shechter et al., 2007).
Premolt is triggered by a surge of ecdysteroid molting hormones that are synthesized by and secreted from the Y-organ (Lachaise et al., 1993; Tom et al., 2013). Ecdysteroid synthesis and secretion from the Y-organ is inhibited by the molt-inhibiting hormone and crustacean hyperglycemic hormone (CHH), neuropeptide hormones that are produced in the neuroendocrine X-organ and stored and secreted by the sinus gland (abbreviated together as XOSG) located within the crustacean eyestalk (Chung et al., 2010). Hence, removal of the eyestalk eliminates the inhibition, and the consequent elevated ecdysteroid level induces the individual to commence premolt. It has to be noted that the removal of the eyestalk eliminates a variety of peptide hormones and, although inducing premolt, the molt-independent
List of abbreviations
chitin binding domain
crustacean cuticular domain
conserved domains database
crustacean hyperglycemic hormone
low density lipoprotein
myosin heavy chain
myosin light chain
myosin light regulatory chain
molt mineralization index
National Center for Biotechnology Information
next generation sequencing
Rebers and Riddiford
Sequence Read Archive
Transcriptome Shotgun Assembly
X organ sinus gland
physiology of the individual may be affected as well. Just before ecdysis, the level of circulating ecdysteroids is sharply decreased (Chung, 2010; Nakatsuji et al., 2000; Shechter et al., 2007). The level of two additional peptide hormones is increased from this period across ecdysis and postmolt. The crustacean cardioactive peptide is increased just before molt (Phlippen et al., 2000), and the initial shell-hardening process is regulated by bursicon, which is produced by the thoracic ganglia complex, released by the pericardial organ (Chung et al., 2012), and assists the sclerotization process in the new cuticle. A series of molt-related neuropeptides, which also include the crustacean cardioactive peptide and bursicon, were revealed in insects at the terminal stage of the premolt, concurrent with the drop in circulating ecdysone just before and after ecdysis (Mesce and Fahrbach, 2002).
The astacid crustacean Procambarus clarkii (Girard 1852) is the subject species of this study, which takes advantage of the possibilities offered by next generation sequencing (NGS) and is aimed at comprehensively identifying and characterizing the major transcriptional events in the hypodermis during premolt that underly the structural and physiological changes, as well as their timing. The temporal premolt stage-related scheduling of the expression of these genes may serve in future studies as markers for monitoring the process and deciphering its regulation. In addition, correlations between the timing of expression of different groups of genes may promote the generation of working hypotheses regarding possible concerted simultaneous or sequential activities of genes.
Fig. 1 summarizes the premolt progress, presenting the relationship between the molt mineralization index (MMI), the circulating levels of ecdysteroids and the presence of a new cuticle under the existing one. It agrees with a similar relationship that was established in previous astacid studies (Nakatsuji et al., 2000; Shechter et al., 2007). Eighteen males were selected for expression analysis and were divided into seven premolt stages as follows: males at stages P1 and P2 exhibited no gastroliths, and were divided according to the time from eyestalk removal (1 day versus 4 days) and by the small difference in ecdysteroid level. The division of P3–P5 males was based on both MMI and ecdysteroid levels. P6 males had a new underlying cuticle, in addition to their higher MMI and ecdysteroid levels, distinguishing them from males at P5. P7 males differed from those at P6 by the ecdysteroid drop just before ecdysis. The relationship between P1–P7 and the widely accepted Drach molt index (Drach and Tchernigovtzeff, 1967) is presented in Fig. 1, aimed at enabling comparison to previous molt-related studies. The assignment of Drach's index to P1–P7 was accomplished in comparison to the findings of Durliat et al. (Durliat et al., 1988) by using the mutually evaluated ecdysteroid levels as reference molt-stage points.
RNA-seq analysis of hypodermal gene expression during premolt
The raw available NGS read data was deposited into the Sequence Read Archive (SRA) of the National Center for Biotechnology Information (NCBI) (accession numbers SRR870673, SRR1265966). The de novo comprehensive transcriptome assembly revealed 254,561 contigs with an average length of 628±969 bp. The assembly was annotated and deposited in the DNA Database of Japan (DDBJ)/European Molecular Biology Laboratory (EMBL)/GenBank Transcriptome Shotgun Assembly (TSA) database (accession number GBEV01000000). Back mapping of the 18 sample-specific 50 bp sequences to this assembly revealed 80% mapping. The reference assembly used for the subsequent RNA-seq analyses contained 17,097 contigs, to which 95% of the 18 sample-specific reads (76% of the initial reads) were mapped.
The RNA-seq mapping percentage of each of the samples was 73±3.2%, including 3±0.8% non-specific mappings, a relatively low percentage justifying the utilization of only the profiles of uniquely mapped reads for the analysis. The four males of stage P1 (Fig. 1) served as the control group for the RNA-seq, and a group of differentially expressed contigs were determined for each premolt stage using three simultaneously achieved thresholds: (1) relatively relaxed statistical significance at P<0.05, (2) fold change >|2|, and (3) a count >100 normalized reads (out of 10 million reads in a sample; see Materials and methods, below). The latter two thresholds were used with the aim of avoiding misinterpretation of natural variability as differential expression. The selected genes were responsible for 65±5.1% of the total normalized reads, and the percentage of total induction or attenuation of reads, as well as the number of induced or attenuated contigs, at each premolt stage in comparison to those at P1 is presented in Fig. 2A,B. According to this figure, the overall expression can be divided into two transcriptional phases, P1–P3 and P4–P7. The first phase is characterized by an increase of both the percentage of induced reads and contigs in P2. P3 is characterized by reduced gene induction, an interim stage prior to the start of P4–P7, which is characterized by a gradual increase in the percentage of induced reads, a reduction in the number of induced contigs and an increase in the number of attenuated contigs.
Premolt stage-related patterns of gene expression
Contig relative expression patterns throughout premolt were identified, and some of them were related to gene function or putative function. The contigs used for expression pattern determination were selected to have a significance at P<0.05, a fold change >|2| and counts >5000 normalized reads in at least one premolt stage; these were designated ‘prominently expressed contigs’. The 489 prominently expressed genes were responsible for 50±4.74% of the total normalized reads at stages P1–P7. An expression profile was constructed for each selected contig across the seven premolt stages in terms of its percentage from the total normalized reads. The profiles were qualitatively grouped into mutual expression patterns according to visual inspection. Out of the 489 contigs, 106 were eliminated from the grouping owing to difficulties in visually assigning them because they were characterized by variable patterns of expression. Most of them revealed high, constitutive expression levels. The relatively high count-cutoff and the elimination of contigs that were visually difficult to assign to an expression pattern were a trade-off to enable determination of a restricted and clearly defined set of expression patterns (labeled A–H), at the expense of skipping other genes, the expression patterns of which were too difficult for educated interpretation. Several annotated functionally-related groups of genes were highly biased towards specific expression patterns and were thoroughly analyzed. Supplementary material Table S1 provides specific data for all the contigs included in patterns A–H, as well as for contigs that shared annotations with them. The contig-related data of supplementary material Table S1 includes the designation of the contigs in the appropriate TSA databases, the sequences of its conceptual proteins, the locations of domains, as well as the RNA-seq normalized read numbers at all stages and the designation of the pattern of gene expression (A–H), when available. The summed expression patterns of the ‘prominently expressed contigs’ of each important functional group detailed below are presented in Figs 3, 4, 5 and 6 alongside the pattern-specific profiles of non-annotated and non-grouped genes designated PA–PH. Certain structural–functional aspects of three groups of genes – myosin heavy chain (MHC), actin and chitin deacetylase (CDA) – are analyzed below, raising more questions than answers, but considered important in the context of their function in the premolted hypodermis.
MHC and actin structure–function relationships
MHC is a major component of both muscular and cytoskeletal motor systems. In insects, using Drosophila melanogaster as a model, there are two different MHC genes, Zipper (the cytoskeletal MHC) and mhc (the muscular one), each producing several proteins as a result of alternative splicing (George et al., 1989). Hence, D. melanogaster MHC protein sequences (zipper accession numbers ADV37253-6, AAX52688, AAM70805, NP_523860, AAX52687, AFH08269; and mhc accession numbers AFH03722-5, AAF53566) were compared by BLASTP to the 50 translated P. clarkii MHCs, elucidating an overwhelming majority of mhc-like contigs but also two Zipper-like ones (supplementary material Table S1). The prominently expressed mhc-like contigs demonstrated much higher read numbers than the zipper ones.
The location of each specific actin, muscular or cytoskeletal, could not be distinguished solely from its protein sequence. The six D. melanogaster actin proteins have very similar sequences, which are also highly conserved at the nucleotide level within the open reading frame (Fyrberg et al., 1983).
CDA converts chitin into chitosan, digesting part or all chitin acetyl groups. Its catalytic activity domain is characterized by the carbohydrate esterase 4 CDA-like 1 domains [NCBI conserved domains database (CCD) accession number CD10974/5]. Two CDAs were annotated in this study, one of which was prominently expressed. However, a higher number of CDAs has been elucidated previously by us in two other astacids, Cherax quadricarinatus and Pontastacus leptodactylus, which contain four and six CDAs, respectively. Most of them were not published, and they are described here because they are considered to add valuable structural and functional information. Some of them include, in addition to the carbohydrate esterase 4 CDA-like 1 domain identified here, the chitin-binding peritrophin A domain (CBM; Pfam ID PF01607) and/or the low density lipoprotein (LDL) receptor domain class A (Pfam ID PF00057). The sequences of these 10 transcripts are found in TSA GADE01000000 (Glazer et al., 2013) and TSA GAFS01000000 (Tom et al., 2013), and are also characterized at the bottom of supplementary material Table S1 here. A division of the CDAs may be done on a putative functional basis, namely, determining the contigs that contain the essential aspartic acid residue in motif 1 and the two histidine residues in motif 2, as described by Dixit et al. (Dixit et al., 2008). Two CDAs, GAP65 and Pastle_hypoyo_c25720 (TSA GAFS01000000), contain all three domains but lack all three essential amino acids. Another two, CqCDA1 and Pastle_hypoyo_c2922 (TSA GAFS01000000) do not contain the two histidine residues. Also, the latter do not contain the LDL receptor class A domain. All other eight chitin deacetylases, including the one that is prominently expressed in P. clarkii, contain the essential amino acids and are assumed to be functional.
NGS and RNA-seq analysis coupled to functional annotation enabled the comprehensive identification of genes that are expressed in the hypodermis during premolt. It has to be kept in mind that RNA-seq provides only relative expression of contigs at a given time point, and absolute transcription activity should be evaluated external to RNA-seq results. Consequently, the transcription level in the premolt hypodermis is assumed to increase in comparison with the preceding intermolt, as judged by the well-documented increase in the size of epithelial cells and their proliferation during premolt and in view of the considerable premolt protein and chitin catabolic activity of the old cuticle, as well as synthetic activity of the new one, which is required to achieve a successful ecdysis. In Fig. 2A, it is clearly demonstrated that most of the change in the percentage counts is attributed to induction, and in view of the hypothesized increased transcriptional activity during premolt, the induction presumably originates mainly from additional transcriptional activity, not from changes in the relative abundances of transcripts.
Specific gene-related aspects
Structural proteins of the cuticle
The Rebers–Riddiford domain (R&R; Pfam ID PF00379) (Rebers and Riddiford, 1988) binds to chitin (Rebers and Willis, 2001), and proteins that contain the R&R domain were located in the cuticle of insect larvae and adults. In contrast, the chitin-binding peritrophin A domain (Pfam ID PF01607) is not confined to arthropods (Suetake et al., 2000) and is located in lectins and chitinases of organisms from other taxa. Temporary anchoring may be assumed as a putative function of this domain, permitting enzymatic modification or digestion of chitin through chitinases, chitin deacetylases and chitin synthases, rather than a structure-related chitin-binding domain. However, demonstration of a strong binding of this domain to chitin, released only when using sodium dodecyl sulfate (SDS)-containing buffer, also suggests a structure-related role (Du et al., 2006). Jasrapuria et al. (Jasrapuria et al., 2010) have conducted a comprehensive cloning and characterization study of CBM-containing cuticular genes of the red flour beetle Tribolium castaneum. Their study ended with 29 genes, the same order of magnitude of the 77 contigs that were found here. The T. castaneum proteins contained 1–14 repeats of domain PF01607, generally conforming to the eleven PF01607 repeated domains identified here. Two cuticular proteins deserve particular attention. Contig Pc_58934 (supplementary material Table S1) is a CBM domain-containing protein that follows expression pattern F (Fig. 6), peaking at P4, which is apparently not in agreement with other members of its family (Figs 4, 5). Similarly, contig Pc_2802 (supplementary material Table S1), an R&R-containing protein, peaking at P3 (Fig. 6), does not follow a pattern that is consistent with other members of its family. The prominent relative expressions of the two contigs – 0.26% and 0.5% for the CBM- and R&R domain-containing genes, respectively – indicate that they have a unique function in comparison to their family members. In the bacterium Serratia marcescens, it has been found that a CBM-containing protein is bound to chitin, probably changing its conformation to render it accessible to chitinolytic activity (Vaaje-Kolstad et al., 2005), evidence that correlates with the Pc_58934 peak at P4 and that may also correlate with the P3 peak of Pc_2802.
Crustacyanins belong to the broad calycin gene superfamily. Most calycins belong to the large lipocalin family (Åkerström et al., 2006). α-Crustacyanin, a multimeric protein complex located in the epicuticle, binds to the carotenoid pigment astaxanthin and provides the animal with camouflage colors; the structure of this complex is described in Wade et al. (Wade et al., 2009).
In arthropods, chitinases are secreted from the hypodermis and are involved in the partial degradation of the old cuticle during premolt. A variety of insect chitinases have been revealed, belonging to two glycosyl hydrolase families, 18 and 20 (Davies and Henrissat, 1995). Family-18 enzymes, characterized by the glycosyl hydrolases family-18 domain (Pfam ID PF00704) produce chitooligo-saccharides, which are further digested to monosaccharides by family-20 enzymes and may contain an N-terminal signal peptide as well as the CBM domain (Arakane and Muthukrishnan, 2010). Family 20 is characterized by the glycosyl hydrolases family-20 domains (Pfam IDs PF02838 and PF00728). To date, up to six chitinases have been found in crustaceans, related to both digestion, molt cycle and assumedly to defense against pathogens. All belong to glycosyl hydrolase family-18 (Huang et al., 2010; Proespraiwong et al., 2010). A much higher number, 32 chitinase contigs, were annotated in the present study (supplementary material Table S1). This number is in agreement with the range of the number of insect chitinases found in previously elucidated genomes, 17–24 genes (Arakane and Muthukrishnan, 2010). Sequences similar to that of the chitinase active site labeled PS01095 (PROSITE domain database) were identified here in 18 chitinases which contains the glycosyl hydrolases family-18 domain. All of these apparently preserve their chitinase activity, as indicated by the presence of the essential glutamic acid residue in the active site (Arakane and Muthukrishnan, 2010).
CDA activity, as indicated by its carbohydrate esterase 4 CDA-like 1 domain, converts chitin to chitosan. This activity was identified recently in an insect peritrophic membrane (Yang et al., 2010). However, the role of CDA in the gastrolith, and assumedly in the hypodermis, is more complicated. CqCDA1 (accession number GQ421322) of the astacid C. quadricarinatus has been associated with gastrolith formation (Yudkovski et al., 2010), and the effects of silencing of GAP65 manifested both in the organization of the chitinous matrix of the gastroliths and the manner in which the amorphous calcium carbonate spherules accumulated in the gastrolith (Shechter et al., 2008b). However, it was indicated that both CqCDA1 and GAP65 lack deacetylase activity due to the absence of essential amino acids.
The chitin-binding peritrophin A domain was identified in several CDAs, and binding of chitin to the enzyme is a reasonable prerequisite for efficient catalytic activity. The LDL receptor domain class A in mammals (Daly et al., 1995) is part of the LDL receptor. Here, its function is still unknown. Silencing of two CDAs in Tribolium castaneum modifies the characteristics or inhibits the construction of a new cuticle following molt (Arakane et al., 2009). Five to nine CDAs were identified from whole insect genomes (Dixit et al., 2008), which is generally in agreement with the number of CDAs presented here.
Chitin synthase (CHS) of arthropods is an integral membrane protein, responsible for synthesizing chitin from UDP-N-acetylglucosamine and for its trafficking to the extracellular domain (Merzendorfer, 2006). Two chitin synthases are known in insects, one of them is expressed in two isoforms, resulting from alternative splicing (Arakane et al., 2004; Hogenkamp et al., 2005). It has been indicated that one is involved in cuticle construction in larvae and that the other is involved in the construction of the larval and adult peritrophic membrane. The characteristic domains of both genes were elucidated here. Sixteen transmembrane domains have been identified previously in insect CHS, in agreement with the architecture of one of the present CHSs (Pc_1221, supplementary material Table S1).
There is an ambiguity between the cytoskeletal function of MHC genes and the sequence of their derived proteins. A similar discrepancy between the interpreted role of MHCs and their structure has been found (Tom et al., 2014) in the gills of the CHH-affected astacid P. leptodactylus, in which the expression of proteins that are related to a variety of actin system genes was dramatically attenuated one hour after application of the D-isomer of CHH. In both studies, most MHC sequences more resembled the insect muscle-type mhc than the non-muscle zipper, questioning the cytoskeletal origin of most expressed MHCs. By contrast, the lack of muscles in the gills and the careful avoidance of sampling muscles during the present study led us to a working assumption to be checked in future studies, that even the structurally ambiguous MHCs induced here are non-muscular members of the actin system. Nothing is known regarding the distinction between muscular and cytoskeletal MHCs in crustaceans, and it may be that the structural segregation among them was accomplished later in arthropod evolution.
Some of the proteins related to the actin motor system have been assigned, at present, a function related only to the muscular motor system and not to the cytoskeleton, whereas others were studied in both the cytoskeletal and the muscular contexts. However, the lack of cytoskeleton-related activity may come from a lack of research activity. Titin (designated also projectin, connectin or twitchin) is a long protein which binds to filamentous actin and provides elasticity to muscles, and at the present time it is assigned a function only in the muscular context (Ayme-Southgate et al., 2008). By contrast, the modulatory genes of the actin system were studied in both the cytoskeletal and the muscular contexts. Tropomyosin isoforms participate in both cytoskeletal and muscular processes as modulators of actin-myosin binding (Gunning et al., 2005). Paramyosin is an invertebrate-specific modulatory protein of actin binding to myosin, whereas tropomyosin, and troponin T and I modulate this binding activity in vertebrates (Gunning et al., 2005; Landsverk and Epstein, 2005). Dystonin is a cytoskeletal actin-binding protein (Brown et al., 1995). LIM domains mediate protein-protein interactions and are located in many proteins related to a variety of functions (Koch et al., 2012). Two types of LIM-containing proteins were identified here. ‘Four and a half LIM domain proteins’ are engaged in a variety of functions, among them cytoskeletal remodeling and co-transcription, and they bind to several proteins of the actin system (Johannessen et al., 2006). The PDZ and LIM domain protein Zasp has an affinity for α-actinin and is also involved in the connection of integrin to the cytoskeleton. Muscle-specific protein 20 and some other proteins contain the calponin homology (CH) domain (Pfam ID PF00307), which binds to cytoskeletal actin and also to signal transduction proteins. Myosin light chain kinase is a serine/threonine-specific protein kinase that phosphorylates the regulatory light chain of myosin II (Gallagher et al., 1997). Almost all of prominently expressed contigs of the above annotations were expressed according to pattern A (Fig. 4), apart from one myosin light regulatory chain and one dystonin contig, which were expressed in accordance with pattern B (Fig. 5).
To summarize this issue, the timing of the expression of the cytoskeletal proteins and the tissues of origin, which apparently lacks muscles, indicate a cytoskeleton origin for the actin-related genes, but sequence similarities, as well as lack of cytoskeleton-related information for some genes, points to a putative muscular origin. This discrepancy requires further investigation.
Expression pattern-related aspects
Premolt expression patterns without defined functions (patterns D, G and H)
Generally, a peak-shaped expression pattern, confined to a limited duration is assumed by us to be a real functional one. Hence, pattern D (Fig. 6) is not considered functional, but rather as a group of genes with a constant absolute transcription level. Its apparent decline, 26% at P1 to 1% at P7, is considered a computational consequence of the gradually increased induction of other contigs. Moreover, all relatively high P1 read levels that were revealed in other patterns are also suspected to be computational consequences for the same reason. The pattern D apparent peak at P3 is considered to be computational because the number of changed reads and changed contigs at P3 is the lowest in the premolt (Fig. 2), causing the relative number of the reads belonging to pattern D to be elevated.
Pattern G (Fig. 6) could be, at least partially, a computational consequence as its level in P1 is the highest of all other premolt stages. However, its peaks at P3 and P6 seem to be real inductions. Its prominent non-annotated contig (Pc_24651, supplementary material Table S1) is definitely worth further study.
Pattern H (Fig. 6) is a minor component of the expression. It peaks at P5 and looks like a specifically expressed group of genes, induced and attenuated during a confined duration.
It has to be noted that expression patterns A, B, C, E, F, which are discussed below in relation to gene function of specific gene groups, also elucidated the expression patterns of other annotated and non-annotated contigs, which it is not feasible to interpret at present; however, they point to a broader set of proteins that are affected in a similar manner, assumedly related to specific premolt functions. The expression patterns of these contigs were designated PA–PH in Figs 3, 4, 5 and 6.
Premolt expression patterns B and C – cuticular synthetic stages
Patterns B and C are peaked at P6 and P7, respectively, and are major synthetic stages of cuticular proteins comprising 20% and 10% of the expression in stages P6 and P7, respectively. P6 is correlated to the presence of the new underlying cuticle and to maximal levels of circulating ecdysteroids, and the expressed proteins include the majority of the R&R domain-containing proteins (6.1% of the expression in P6), the CBM proteins (1.54%) and the Crustacyanins (0.17%) interpreted to be the building proteins of the epi- and exocuticle. The new cuticle is heavily pigmented at P6, explaining the need for the Crustacyanins at this stage in the epicuticle.
The proteins peaking at P7 but that were also present at stage P6 included the crustacean cuticular (CC) domain-containing protein, particular R&R and CBM proteins, three chitinases and two CHS contigs. It is difficult is to explain the function of these three chitinases, but it is suggested here to participate in final pre-ecdysis cuticular degradation. CHS is required for the production of the chitin component of the new cuticle, but the percentage expression of the gene from the total transcription at P6–P7 is low, 0.06%. Considering the relatively low percentage expression at P7 in general and the lack of descending shoulder, it is hypothesized that the P7 peak is only the increasing shoulder of a peak occurring at postmolt, required for the formation of the endocuticle, the membranous layer and in order to accomplish the cuticular calcification. Consequently, it is assumed that the major part of chitin synthesis by CHS takes place at postmolt. The transcription level of the CC domain-containing protein, which has been previously linked to the calcification of the cuticle (Andersen, 1999), would be increased as part of the increased postmolt calcification, and the transcription level of the three chitinases may contribute to a still unknown chitin modification that is required for the construction of the endocuticle or the membranous layer. A future gene expression study similar to the present one starting at P7 across ecdysis and through postmolt is needed to confirm this hypothesis.
The heterogeneous expression pattern E – collagens and CDAs
Collagens and CDAs are expressed according to the heterogenous pattern E (Fig. 6). However, our interpretation of the expression pattern of each of them is different. Collagen is the main animal structural protein in connective tissues (Shoulders and Raines, 2009) and it is part of the hypodermal connective tissue of crustaceans. The seven prominent collagens demonstrate two relatively low peaks at P2 and P4, lying along a clear line of increasing expression from P1 to P7. This trend of increase is interpreted to be more important than the two peaks, which may be virtually formed owing to the relativity of the RNA-seq analysis. Collagens are located in the connective sub-epithelial tissue, and they are probably synthesized by connective tissue cells and not by the epithelium. They are hypothesized to be transiently needed during premolt to provide physical support to the hypodermal epithelium and the flexible and thin new cuticle. During intermolt, this support is provided by the hard and stable cuticle, which is in close proximity to the hypodermis. Hence, a follow-up hypothesis claims that collagen synthesis would be decreased simultaneously with the hardening of the postmolt cuticle. Interestingly, two collagens (Pc_113201 and Pc_113202, TSA GBEV01000000) also contain the R&R domain, indicating that they participate in the binding of the connective tissue to the chitin component of the cuticle.
Differing from collagen, the two pattern E peaks of CDA at P4 and P6 are prominent, and the trend of increasing expression is less pronounced. P2 is missing and attenuation of expression is observed at P7. Its relative expression at its peak is 0.25% of the total expression. The function of the prominently expressed CDA at stages P4 and P6 here in relation to its peaks is difficult to interpret, including the role of the LDL receptor domain class A (see also the Chitin deacetylase discussion above). No new cuticle is present at P4 in contrast to P6, in which new cuticle exists.
Expression pattern F – chitinase
Pattern F is characterized by four chitinase contigs that peaked at P4 and, reasonably, participate in the degradation of the old cuticle.
The cytoskeleton-related expression patterns A and B
An interesting group of differentially expressed proteins is the cytoskeletal proteins that belong to two types of cytoskeleton filaments, the actin filamentous system, and the microtubules constructed of tubulins and associated proteins. Most actin-associated contigs here conform to expression pattern A. They are expressed at relatively high levels throughout premolt. The count percentage of all the genes assigned to the actin system in each of the premolt stages is responsible for a large proportion of the transcription activity, specifically in the peaks of stages P2, P5 and P7. They reach 3.1%, 4.9%, 5.1%, 4.2%, 10.2%, 8.4% and 8% in stages P3, P4, P2, P5 and P7, respectively. Other cytoskeletal contigs peaked at P6, namely in pattern B, including only one, major actin contig, the tubulins and several minor members. The total percentage count of the tubulins peaked to 1.33% at P6.
The hypodermis alters its functional expression profile three times during premolt with P4, P6 and P7 revealing a different set of expressed transcripts. The peaks of cytoskeleton synthesis at P2, P5 and P7 (pattern A) and P6 (pattern B) prior to each of the functional changes in expression patterns led us to hypothesize that they are preparatory stages for the following synthesis of functional proteins. P2 is the assumed preparatory stage for the synthesis of CDA, and pattern F chitinases peaked at P4. P5 is the preparatory stage for the synthesis of different groups of proteins, all conform to pattern B, and also for the CDA P6 peak. The P6 tubulin peak and the P7 peak of pattern A is preparatory to pattern C protein expression, which starts at P7 and is hypothesized to reach a peak during postmolt. Consequently, the premolt crustacean hypodermis may serve as a model tissue to study the mechanisms of broad alterations in the profiles of synthesized proteins.
Changes in the expression of cytoskeletal proteins and their rearrangement and/or depolymerization were documented to affect the expression of other genes. It is reasonable that structural changes of the cytoskeleton are also accompanied by gene expression even if it has not been examined in a specific study; Blum and Wicha (Blum and Wicha, 1988) have shown a specific effect of the cytoskeletal actin system and microtubule gene expression on the expression of milk proteins in mammary gland cell cultures. Rossete and Karin (Rossete and Karin, 1995) have shown that microtubule depolymerization induces the expression of the transcription factor NF-κB. Németh et al. (Németh et al., 2004) have shown a similar effect on NF-κB expression upon actin filament disruption, which is mediated through IκB, an NF-κB inhibitor. Knöll and Beck (Knöll and Beck, 2011) review the induction of another transcription factor system that is related to the dynamic balance between filamentous and globular actin, which leads to the increased migration of globular actin to the nucleus and its effect on nuclear organization and on the transcription serum response factor. Gelsolin, one of the actin-binding proteins that shifts the balance towards globular actin, was expressed here according to pattern B, conforming to the expression peak of a major actin contig. Depolymerization of both actin and microtubules affects the expression of specific genes in specific cells of the retinal tissue (Oren et al., 1999). In grapevines, cytoskeleton rearrangement affects the expression of specific genes in relation to pathogen-defense mechanisms (Qiao et al., 2010). A variety of effectors of the expression of cytoskeletal genes are also well documented (e.g. Girard et al., 1996; Jones et al., 1999; Mayfield et al., 2010; Müller et al., 2007; Sanchez et al., 2012; Sántha et al., 2013). The changes in the expression of actin system members may be related here to increasing ecdysteroid levels, but owing to the complexity of the expression patterns, other factors are hypothesized to be involved as well.
Expression patterns – summary
It has to be kept in mind that the induction of premolt by eyestalk ablation, unlike natural induction, may interfere with other physiological processes, owing to the elimination of a variety of neuropeptide hormones, and may also affect the expression pattern in the premolted hypodermis in a molt-independent context. Fig. 7 summarizes the major transcriptional events during premolt. Genes belonging to each of these events may serve as indicator genes for future studies of the regulatory mechanisms during premolt. It has been previously shown that repeated ecdysone injections into the circulation initiated premolt, bringing it to ecdysis (Shechter et al., 2007). However, the profile of gene expression is complex and changes several times from P1 to P7, and this does not occur in a unidirectional manner. Hence, it is still unresolved how one hormone initiates this complex sequence of events. A crucial time point of the premolt is P7, in which the ecdysone level drops and in which other putative hormones are involved, as already shown in insects and crustaceans (Chung, 2010; Mesce and Fahrbach, 2002; Phlippen et al., 2000). The genes of which the expression peaked at P7 may, for future studies, serve as indicators of the hormonal effects toward the end of premolt and across postmolt.
The increase of both the percentage of induced reads and contigs in P2 (Fig. 2) is postulated to be related to cellular preparatory events, exemplified by the increase in cytoskeletal synthetic activity. The gradual increase in the percentage of induced reads, the reduction in the number of induced contigs and the increase in the number of attenuated contigs from P4 to P7 (Fig. 2) is interpreted to occur as a result of a gradual increase of the absolute expression level, most of it contributed by a few structural genes of the cuticle with relative attenuation of many other genes.
MATERIALS AND METHODS
Procambarus clarkii maintenance and experimental design
Several catches of P. clarkii males were brought from streams in Brancolo's canal (45°46′N, 13°30′E) in the Friuli Venezia Giulia region in northeastern Italy between June and August 2013. The external sex organs were fully mature in all the sampled males, judged according to Taketomi et al. (Taketomi et al., 1996). Males were kept in 120 l tanks provided with closed-circuit filtered and aerated tap water at ~20–22°C and were fed with fish pellets (Sera granular, Heinsberg, Germany) three times per week. A day after the arrival of each catch, the males were induced to enter premolt by ablating their eyestalks and were gradually sacrificed between 1 and 14 days. The hypodermis underlying the anterior part of the carapace was sampled, taking care not to sample muscles adhered to the cuticle at that region, and immediately homogenized in RNA extraction solution (TRI Reagent, Sigma, USA) and kept at −70°C for RNA extraction. Carapace length, gastrolith width and the presence of new underlying cuticle were recorded upon sacrifice followed by determination of the MMI according to Shechter et al. (Shechter et al., 2007). Briefly, MMI is the width of the gastrolith divided by carapace length, expressing the size of the gastrolith normalized to the size of the individual.
RNA was extracted later according to the TRI Reagent manufacturer's instructions (Sigma, USA) and further purified on an RNA column (E.Z.N.A.™ MicroElute RNA Clean-up Kit, Omega Bio-Tek, USA). The quantity was evaluated by spectrophotometry (NanoDrop 2000, Thermo Scientific, USA), and the quality was tested by capillary electrophoresis (Bioanalyzer 2100, Agilent, USA). For sequencing of 50 bp single-end reads using the Illumina Hiseq 2000 device, 4 μg of total RNA from 18 males was sent to the Istituto di Genomica Applicata (IGA, Udine, Italy). Thirteen samples were run on each lane of the sequencer, differently labeled by TrueSeq primers (Illumina, USA).
Levels of ecdysteroids in the hemolymph
Hemolymph samples were taken at the experimental termination, immediately diluted in three volumes of methanol, thoroughly mixed, kept on ice for around 2 h and then centrifuged for 15 minutes at 12,000 g at 4°C. The supernatants were retrieved, dried by centrifugation under vacuum and stored at −20°C until further analysis. The hemolymph samples were assayed to evaluate the circulating levels of ecdysteroids according to Chung (Chung, 2010).
Digital transcript expression analysis
Digital gene expression analysis (designated RNA-seq) is aimed at evaluating the relative amount of transcripts in a sampled RNA population. It requires a reference transcript sequence assembly. The processing and analysis of the obtained raw sequences for both the preparation of the reference assembly and the digital gene expression analysis was carried out using the CLC Genomics Workbench 6.51 software (CLC Bio, Aarhus, Denmark).
Preparation of a reference assembly
Short reads were assembled to longer conceptual transcripts, termed contigs, using the automatically set-up parameters of the CLC software, resulting in a comprehensive transcript assembly aimed at the best possible representation of the expressed transcriptome in the target tissues, in terms of the number and length of transcripts. The minimum allowed length of an assembled contig was set to 200 bp. The following reads were used to produce this assembly: (1) ~387 million paired 100 bp reads originated from a variety of P. clarkii tissues, including hypodermis and Y organ, aimed at constructing a comprehensive multi-tissue transcriptome which will be described and analyzed in detail elsewhere (C.M., unpublished observation); (2) ~86 million paired 100 bp reads originated from P. clarkii eyestalk (C.M., unpublished observation); and (3) ~62 million single-end 50 bp reads originated from the results of the present experiment after duplicate elimination. The resulting assembly was further elaborated using the quality control tools of the NCBI TSA database. These tools include identification of residual sequencing primers, long N rows artificially produced by the CLC software in poor mapping regions between paired reads and contaminating sequences, mainly mammalian, bacterial and viral ones. Residual primers were eliminated from the contigs, N rows were shortened and foreign contigs were omitted from the assembly. It has to be noted that the present assembly was constructed as a general tool for future studies of P. clarkii physiology and gene regulation in a variety of biological systems and scenarios. To provide a more efficient operational reference assembly adapted to the present needs, the initial comprehensive assembly was back-mapped to the sample-specific 50 bp sequences of the present experiment. Only contigs that were mapped to a total of more than 500 reads were included in the reference assembly, which contained 95% of the total mapped reads, creating a robust set of contigs that were not subject to minor random expression fluctuations or to zero counts, this set of reads was used thereafter as the reference assembly for the following RNA-seq procedure.
RNA-seq was performed as described by Wang et al. (Wang et al., 2009) and the CLC instructions, based on the mapping of a sample-specific population of short sequence reads to the contigs of a reference assembly and counting the mapped reads to each contig. The software was allowed to accept up to two mismatches per mapped read, and a read was accepted as mapped if aligned to a maximum of five contigs. Reads that were mapped to two to five contigs were randomly mapped to one of them and considered as non-specifically mapped. Non-specific reads did not participate in the following RNA-seq analysis but were used to evaluate the uniqueness of the mapping. The expression values were normalized to the approximate actual total reads per sample, 10 million reads. The Baggerly's test (Baggerly et al., 2003) was applied to identify statistically significant differential expression in comparison with the control group.
Characterization of the transcriptome
The comprehensive transcriptome assembly was initially characterized using the TRINOTATE tool of the TRINITY software (Grabherr et al., 2011). The annotation includes: homology search to known sequence data (NCBI BLASTX) (Altschul et al., 1997), identification of conceptual protein sequence, protein domain identification using the domain database Pfam as a reference database (Punta et al., 2012) and signal peptide prediction using the SignalP 4.1 database as a reference (Petersen et al., 2011). Manual contig-by-contig analysis was performed for contigs of interest using the alignment software MEGA5.1 (Tamura et al., 2011), assisted by the above mentioned databases and also other domain databases, such as the CCD of the NCBI (Marchler-Bauer et al., 2013), Prosite (Sigrist et al., 2010) and the transmembrane domain database (Käll et al., 2004).
Federica Cattonaro and the team of IGA (Udine, Italy) are thanked for performing the Illumina sequencing.
The study was supported by the Israel Science Foundation [project 102/09 to M.S. and A.S.]; the European Project LIFE10 RARITY [NAT/IT/000239 to P.G.G. and A.P.]; the Italian Progetti di Ricerca di Interesse Nazionale (PRIN) 2010-11 [Prot. 20109XZEPR_002] - Ministero dell'Istruzione, dell'Università e della Ricerca to P.G.G. and A.P.
The authors declare no competing financial interests.