ABSTRACT
For over a century, researchers have been comparing embryogenesis and regeneration hoping that lessons learned from embryonic development will unlock hidden regenerative potential. This problem has historically been a difficult one to investigate because the best regenerative model systems are poor embryonic models and vice versa. Recently, however, there has been renewed interest in this question, as emerging models have allowed researchers to investigate these processes in the same organism. This interest has been further fueled by the advent of high-throughput transcriptomic analyses that provide virtual mountains of data. Here, we present Nematostella vectensis Embryogenesis and Regeneration Transcriptomics (NvERTx), a platform for comparing gene expression during embryogenesis and regeneration. NvERTx consists of close to 50 transcriptomic data sets spanning embryogenesis and regeneration in Nematostella. These data were used to perform a robust de novo transcriptome assembly, with which users can search, conduct BLAST analyses, and plot the expression of multiple genes during these two developmental processes. The site is also home to the results of gene clustering analyses, to further mine the data and identify groups of co-expressed genes. The site can be accessed at http://nvertx.kahikai.org.
INTRODUCTION
A long-standing question in the field of regeneration is to what extent regenerative programs recapitulate development. Comparing gene expression during these two processes provides clues as to how genes activated during embryogenesis are re-deployed during regeneration. The majority of studies performing this comparison focus on the role of individual or small groups of genes (Binari et al., 2013; Carlson et al., 2001; Gardiner et al., 1995; Imokawa and Yoshizato, 1997; Kaloulis et al., 2004; Katz et al., 2015; Millimaki et al., 2010; Özpolat et al., 2012; Reitzel et al., 2007; Torok et al., 1998; Wang and Beck, 2014). Studies comparing transcriptomes of embryogenesis and limb regeneration in axolotls and zebrafish have been successful in identifying differentially expressed genes involved in these processes (Habermann et al., 2004; Mathew et al., 2009). A comparison of whole-body regeneration to embryogenesis has yet to be performed, and could help with further improving our understanding of how genes are used during embryogenesis and re-used during regeneration. One organism is especially amenable to this line of study: the sea anemone Nematostella vectensis (Fig. 1A) (Layden et al., 2016; Reitzel et al., 2007).
Nematostella has been used as a research model for embryonic development (Finnerty et al., 2004; Kusserow et al., 2005; Matus et al., 2006; Rentzsch et al., 2006; Wikramanayake et al., 2003). Nematostella reproduce sexually and, after fertilization, the zygote undergoes a series of cleavages to form a blastula. Gastrulation occurs at the animal pole and, shortly thereafter, the embryo enters a swimming planula stage during which the pharynx and internal structures, termed mesenteries, develop. After several days, this planula larva settles, develops tentacles, and enters a juvenile stage. Nematostella development research entered the age of genomics with the sequencing of its genome by Putnam and colleagues in 2007 (Putnam et al., 2007; Technau and Schwaiger, 2015). Since then, a large number of developmental genes have been identified in Nematostella, and many commonalities between Nematostella and bilatarian development have emerged (Amiel et al., 2017; Burton and Finnerty, 2009; Darling et al., 2005; Genikhovich et al., 2015; Layden and Martindale, 2014; Layden et al., 2012; Leclère et al., 2016; Matus et al., 2008; Meyer et al., 2011; Reitzel et al., 2007; Röttinger et al., 2012). With the advent of high-throughput transcriptomics, several studies have examined gene expression during embryogenesis at the whole-genome level (Helm et al., 2013; Fischer and Smith, 2013; Fischer et al., 2014; Tulin et al., 2013), firmly establishing Nematostella as an embryonic model.
More recently, Nematostella has shown to be a powerful model for regeneration. Upon bisection, Nematostella is capable of regenerating the missing body half after ∼6 days postamputation (Bossert et al., 2013). Following subpharyngeal amputation (head removal), regeneration occurs via a highly dynamic process: first, there is an initial wound healing phase of ∼6 h, then regeneration follows a stereotypic program in which the mesenteries fuse and, via subsequent cell proliferation, reform the missing pharynx and tentacles over the course of 6 days (Amiel et al., 2015). This process has been shown to be both cell proliferation dependent (Passamaneck and Martindale, 2012) and utilize dynamic tissue rearrangement, with large portions of unamputated tissue contributing to the reformed tissue (Amiel et al., 2015). The existence of adult stem cells and the role they might play in regeneration have yet to be uncovered. This process is known to use several developmental signaling pathways originally deployed during embryogenesis (DuBuc et al., 2014; Schaffer et al., 2016; Trevino et al., 2011). It remains unclear, however, if these pathways are deployed the same way, i.e. with similar or divergent regulatory logic. One way to address this question is to systematically compare gene expression profiles during embryonic development and regeneration, to identify groups of genes originally used during embryogenesis that are re-used during regeneration. To facilitate this line of study, we created N.vectensis Embryogenesis and Regeneration Transcriptomics (NvERTx) – a quantitative gene expression database for comparing embryogenesis and regeneration in the sea anemone N. vectensis. NvERTx comprises several data sets spanning embryogenesis (Helm et al., 2013; Fischer and Smith, 2013; Fischer et al., 2014; Tulin et al., 2013; this study) and regeneration (this study). We used pooled RNA sequencing (RNAseq) data from Nematostella embryogenesis and regeneration to generate a de novo transcriptome assembly. Using this assembly, we then quantified each of the RNAseq data sets and clustered the transcripts to discover groups of genes that share similar expression. This tool can be used to find transcript sequences, identify co-expressed genes, and directly compare expression profiles during embryogenesis and regeneration. All of these data can be found in a searchable database that is accessible at www.nvertx.kahikai.org.
RESULTS AND DISCUSSION
NvERTx database: de novo assembly, annotation and data quantification
NvERTx is a quantitative gene expression database for embryogenesis and regeneration, consisting of several RNAseq data sets. It includes previously published data sets spanning very early embryogenesis to polyp from Helm et al. (2013) [sampled 2, 7, 12, 24, 120 and 240 h postfertilization (hpf)] and Fischer and Smith (2013) and Fischer et al. (2014) (sampled 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 and 19 hpf). To complement these and sample time points during tentacle genesis and pharynx formation, we generated an additional data set sampled at 24, 48, 72, 96, 120, 144, 168 and 192 hpf. Together, these data sets cover the major hallmarks of Nematostella development, including blastula (12-24 hpf), gastrula (24-48 hpf), planula (48-120 hpf) and juvenile (120-244 hpf) stages. The regeneration RNAseq data were sampled from 6-week-old juveniles after subpharyngeal amputation at –1 (uncut), 0, 2, 4, 8, 12, 16, 20, 24, 36, 48, 60, 72, 96, 120 and 144 h postamputation (hpa) (Fig. 1B). We chose these time points because they span the most important events of regeneration, including wound healing (0-6 hpa), pharynx reformation (24-72 hpa) and tentacle reformation (72-120 hpa) (Fig. 1A,B) (Amiel et al., 2015), all stages for which we have embryonic data spanning the initial development of these structures. For each of these data sets, we obtained the raw sequencing reads and used these as input for our quantitative workflow.
Sampling strategy of data sets used in NvERTx. (A) Nematostella anatomy. Nematostella is a small sea anemone (∼5 cm), with a mouth (M), pharynx (P) tentacles (Ten) and a body column with internal structures called mesenteries (Mes), the posterior section of which is termed the physa (Phy). (B) Schematic of RNAseq samples included in the NvERTx database. Three data sets spanning embryogenesis are included: one including data from Fischer and Smith (2013) and Fischer et al. (2014), sampled hourly from 0 hpf to 19 hpf; one including data from Helm et al. (2013), sampled at 2, 7, 12, 24, 120 and 240 hpf; and data from this study, sampled daily from 24 hpf to 192 hpf (250 embryos/time point, biological duplicates). Regeneration was sampled from 6-week-old animals at –1 (uncut), 0, 2, 4, 8, 12, 16, 20, 24, 36, 48, 60, 72, 96, 120 and 144 hpa (300 juveniles/time point, biological triplicates).
Sampling strategy of data sets used in NvERTx. (A) Nematostella anatomy. Nematostella is a small sea anemone (∼5 cm), with a mouth (M), pharynx (P) tentacles (Ten) and a body column with internal structures called mesenteries (Mes), the posterior section of which is termed the physa (Phy). (B) Schematic of RNAseq samples included in the NvERTx database. Three data sets spanning embryogenesis are included: one including data from Fischer and Smith (2013) and Fischer et al. (2014), sampled hourly from 0 hpf to 19 hpf; one including data from Helm et al. (2013), sampled at 2, 7, 12, 24, 120 and 240 hpf; and data from this study, sampled daily from 24 hpf to 192 hpf (250 embryos/time point, biological duplicates). Regeneration was sampled from 6-week-old animals at –1 (uncut), 0, 2, 4, 8, 12, 16, 20, 24, 36, 48, 60, 72, 96, 120 and 144 hpa (300 juveniles/time point, biological triplicates).
To quantify the RNAseq data, we performed a de novo transcriptome assembly, which we term NvERTx. We assembled this transcriptome using the short-read assembler Trinity (Haas et al., 2013), with combined paired-end reads from our regeneration data set and additional embryonic data from Tulin et al. (2013), sampled at 0, 6, 12, 18 and 24 hpf, as input (see Materials and Methods for complete workflow). The resulting assembly includes 234,381 transcripts with an N50 of 1678 and an average length of 837.30 bp (see Table S1 for assembly statistics). Each transcript is identified with a unique NvERTx.4 number. To annotate the transcriptome, we first identified the 231,294 transcripts with an open reading frame (ORF) and extracted the coding sequences using OrfPredictor (Min et al., 2005). We then compared the resultant protein sequences with NCBI's nonredundant protein database (NR) and the UniProt database using the Basic Local Alignment Search Tool (BLAST)-like tool PLAST (Nguyen and Lavenier, 2009). From these analyses we identified 85,475 transcripts with a significant hit to NR (e-value <5e-5), and 69,335 transcripts with a significant hit to the UniProt databases (e-value <5e-5). Additionally, we compared each transcript with the current Nemve1 gene predictions (https://genome.jgi.doe.gov/Nemve1/Nemve1.home.html) using nucleotide BLAST (BLASTn), and identified 102,581 transcripts with a significant hit (e-value <5e-5) (Kent, 2002). Of these 102,581 Nemve1 hits, 19,565 represent unique Nemve1 ‘genes’. We found that 110,531 transcripts did not hit any of the three databases, a proportion typical of Trinity assemblies (Conesa et al., 2016), and could be noncoding sequences or other assembly artifacts (see Table S2 for full annotation statistics). For each of the 234,381 NvERTx transcripts, we provide all available annotations. We then used the transcriptome assembly to quantify each of the RNAseq data sets. We did this by aligning the reads using Bowtie2 (Langmead et al., 2009) and quantifying each transcript using RSEM and edgeR (Li and Dewey, 2011; Robinson et al., 2010). Transcripts with the same best Nemve1 hit were combined to obtain ‘gene-level’ quantification. To validate these data, we compared our RNAseq quantifications with results obtained by performing quantitative reverse transcription PCR (RT-qPCR) for several target genes using our in-house generated data sets, and observed a high level of concordance in their expression profiles (Figs S2 and S3; see Materials and Methods, ‘RNAseq quality assurance’ section). Finally, the three embryonic data sets were corrected for batch effects using the sva R package with time point as a categorical covariate (Leek et al., 2012) (see Materials and Methods for details). The quantified data sets are reported for each transcript in the database.
Retrieving expression plots, count tables and sequences
The NvERTx database can be accessed using multiple points of entry: by searching annotations (Fig. 2Ai), using BLAST (Fig. 2Aii) or exploring the co-expression clusters (Fig. 2Aiv) (see ‘Exploring gene expression clusters’ section). To demonstrate this, we use the Nematostella Brachyury protein (NvBra, NCBI GenBank ID AAO27886.2) as an example. To retrieve transcripts corresponding to Nvbra, we can use the search function to query the transcript annotations and enter the Nemve1 gene model ID from the current genome assembly (jgi|Nemve1|770), the NCBI GenPept accession number (gi|122058623), the NCBI GenBank accession number (AAO27886.2), or the gene name (‘Brachyury’) (Fig. 2Ai). Using any of these queries identifies several transcripts that correspond to Nvbra (Fig. 2B). Multiple matching transcripts reflect the different isoforms predicted by the transcriptome assembler. We can confirm which transcripts correspond to Nvbra by examining the annotations that are reported with the search results. It is normal for different isoforms to have slightly different annotations as each transcript was individually annotated. Clicking the NvERTx.4 numbers fills in the field on the left of the screen, enabling the user to directly compare their expression profiles by clicking ‘Plot!’ (Fig. 2Aiii).
NvERTx points of entry. (A) Screenshot of the NvERTx portal. (Ai) Users can search for genes using the gene name, Nemve1 accession number, or NCBI GenBank accession number. (Aii) The transcriptome can also be searched using BLASTn or tBLASTn. (Aiii) Multiple transcripts can be queried simultaneously. (Aiv) Users can also directly explore co-expression clusters from embryogenesis and regeneration to identify groups of co-expressed genes. (B) Screen shot of results from searching the annotations using the term ‘Brachyury’. (C) tBLASTn results using Mus musculus Brachyury as a query (GenBank AAI20808.1) identify several homologous transcripts. The top scoring isoform is reported first.
NvERTx points of entry. (A) Screenshot of the NvERTx portal. (Ai) Users can search for genes using the gene name, Nemve1 accession number, or NCBI GenBank accession number. (Aii) The transcriptome can also be searched using BLASTn or tBLASTn. (Aiii) Multiple transcripts can be queried simultaneously. (Aiv) Users can also directly explore co-expression clusters from embryogenesis and regeneration to identify groups of co-expressed genes. (B) Screen shot of results from searching the annotations using the term ‘Brachyury’. (C) tBLASTn results using Mus musculus Brachyury as a query (GenBank AAI20808.1) identify several homologous transcripts. The top scoring isoform is reported first.
Similarly, the BLAST tool can be used to retrieve NvERTx.4 transcripts using either a nucleotide (BLASTn) or protein (tBLASTn) query. The reported alignments can be then used to identify the correct transcripts. For example, using tBLASTn against the transcriptome with Mus musculus Brachyury (NCBI GenBank accession number AAI20808.1), we find several homologous transcripts, the first of which is Nvbra (Fig. 2C). Again, clicking the NvERTx.4 number fills in the field on the left of the screen, enabling the user to obtain the expression plots and annotations by clicking ‘Plot!’ (Fig. 2Aiii).
Once the NvERTx.4 IDs for Nvbra, NvERTx.4.100808 and NvERTx.4.100809, are selected we can query the database by clicking ‘Plot!’ on the left (Fig. 2Aiii). The first page that appears displays the transcripts’ expression during regeneration and embryogenesis (Fig. 3A). We can see that the expression of Nvbra exhibits two peaks during regeneration, beginning at 8 hpa and 60 hpa, while during embryogenesis, Nvbra is expressed early, rapidly peaks at 20 hpf, then decreases throughout development. Note that the expression profiles for the two transcripts are perfectly superimposed and appear as one. This is because transcripts corresponding to the same Nemve1 best hit are quantified equivalently as they are from the same ‘gene’. To distinguish transcript isoforms from separate genes we can compare the individual transcripts in the ‘alignment’ tab, where a MUSCLE sequence alignment is reported (Edgar, 2004) (Fig. 3F). In this case, we observe that NvERTx.4.100809 is a longer assembled isoform of Nvbra. The results tabset also includes the normalized count tables (Fig. 3B), transcript annotations (Fig. 3C), the sequences in FASTA format (Fig. 3D), and links to bibliographical resources including PubMed articles citing the protein and a PaperBlast query (Price and Arkin, 2017) (Fig. 3E). In the annotations tab, we can also see which co-expression cluster the transcript belongs to for embryogenesis and regeneration. Exploring these clusters is very useful for identifying co-expressed genes.
Example results for the transcripts NvERTx.4.100808 and NvERTx.4.100809. (A) Expression plots for regeneration (left) and embryogenesis (right). The two transcripts are isoforms of the same genes so their expression profile plots and counts are equivalent. (B) Count data from each of the data sets. (C) Transcript annotations. (D) Sequences in FASTA format (the second sequence is shown cropped owing to space limitations). (E) Bibliographical resources including PubMed links and PaperBlast queries. (F) MUSCLE alignment to compare similar transcripts. The alignment shown is cropped to display the homologous region.
Example results for the transcripts NvERTx.4.100808 and NvERTx.4.100809. (A) Expression plots for regeneration (left) and embryogenesis (right). The two transcripts are isoforms of the same genes so their expression profile plots and counts are equivalent. (B) Count data from each of the data sets. (C) Transcript annotations. (D) Sequences in FASTA format (the second sequence is shown cropped owing to space limitations). (E) Bibliographical resources including PubMed links and PaperBlast queries. (F) MUSCLE alignment to compare similar transcripts. The alignment shown is cropped to display the homologous region.
Exploring gene expression clusters
Co-expression analysis is particularly useful to identify genes that function in the same gene regulatory module. Co-expressed genes can also represent groups of genes that participate in a similar biological function. One method for identifying co-expressed genes is to cluster genes or transcripts by expression profile. For NvERTx, we performed fuzzy c-means clustering to regroup genes by expression profile, and provide those clusters in the ‘Co-expression clusters’ section of the site. These clusters can be used to identify transcripts that are co-expressed with a given gene of interest. Furthermore, comparing the membership of gene clusters during embryogenesis and regeneration can be used to identify groups of genes that function similarly during these processes.
The gene expression clusters can be browsed by either clicking a cluster in the ‘Co-expression clusters’ section (Fig. S1A) or by following a direct link from the annotation table of a transcript to its cluster (Fig. 3C). Using our previous example, Nvbra, from the annotations results, we can see that Nvbra participates in regeneration cluster two (R-2). When exploring R-2, we see all of the transcripts found in the cluster sorted by membership score. The score reflects how strongly a gene's expression matches the cluster core. By plotting several high-scoring transcripts, we can identify groups of co-expressed genes. For example, when we plot NvERTx.4.40781 (best NR hit: XP_015758878.1 forkhead box protein G1-like Acropora digitifera), NvERTx.4.57897 (best NR hit: AOP31964.1 dickkopf3-like 1 N. vectensis) and NvERTx.4.100808 (best NR hit: AAO27886.2 Brachyury protein N. vectensis), we see that they are indeed co-expressed with an initial expression peak between 4 hpa and 16 hpa, followed by a gradual rise from 36 hpa onward (Fig. S1Bii). By contrast, these genes are not co-expressed during embryogenesis and exhibit divergent expression patterns (Fig. S1Bi), raising the hypothesis that this particular grouping of genes is unique to regeneration. Using this method to find co-expressed genes is an effective way of identifying potential gene-regulatory modules and gene batteries. Importantly, assessing whether these genes are co-expressed during regeneration and embryogenesis can shed light on how these gene batteries are used or re-used during these two processes.
Differentially expressed genes
Comparing which genes are differentially expressed during embryogenesis and regeneration can provide important clues about how genes are initially deployed during embryonic development and if/how they are reused during regeneration. Systematically analyzing genes differentially expressed during regeneration has allowed for the discovery of ‘regeneration-specific’ genes in other models, including axolotl, zebrafish and newts (Bryant et al., 2017; Knapp et al., 2013; Looso et al., 2013; Mathew et al., 2009). To facilitate these lines of inquiry using NvERTx, we provide the results of intra-data-set pairwise differential gene-expression testing on the summarized Nemve1 genes. For each data set, we compare each time point to t0, which is defined as 0 hpa for the regeneration data, 7 hpf (the estimated beginning of zygotic transcription) for the Fischer and Smith (2013), Fischer et al. (2014) and Helm et al. (2013) data, and 24 hpf (the first time point sampled) for the in-house embryonic data. The results of this testing can be queried on the DE Genes page, and an interactive volcano plot displaying the negative log10 false-discovery rate (FDR) as a function of fold change is generated (Fig. 4Ai,Bi). Each transcript is plotted as a single point, and multiple isoforms for a single gene are superimposed. Users can use the plot tools to click or drag-select the transcripts that are then displayed in the table below (Fig. 4Aii,Bii). These transcripts can then be compared by ticking the boxes in the table and querying the database as described above.
Exploration of differentially expressed genes using the ‘DE Genes’ tool. Intra-data-set differential expression testing for summarized Nemve1 gene models. Each time point is compared with t0, defined as 0 hpa for the regeneration data, 7 hpf (the estimated beginning of zygotic transcription) for the data from Fischer and Smith (2013), Fischer et al. (2014) and Helm et al. (2013), and 24 hpf (the first time point sampled) for the in-house embryonic data. (Ai,Bi) When a user selects a comparison, a volcano plot displaying the –log10(FDR) as a function of fold change is generated. Significant transcripts [FC>log2(2) and FDR <0.05] are colored magenta. Using the plot tools, multiple transcripts can be selected (red dotted line box, Ai,Bi) and a table is generated showing the corresponding transcripts (Aii,Bii). A user can compare these transcripts by ticking the corresponding box which fills the form on the left of the page. (Aiii,Aiv,Biii,Biv) The expression profiles shown here are then found on the results page.
Exploration of differentially expressed genes using the ‘DE Genes’ tool. Intra-data-set differential expression testing for summarized Nemve1 gene models. Each time point is compared with t0, defined as 0 hpa for the regeneration data, 7 hpf (the estimated beginning of zygotic transcription) for the data from Fischer and Smith (2013), Fischer et al. (2014) and Helm et al. (2013), and 24 hpf (the first time point sampled) for the in-house embryonic data. (Ai,Bi) When a user selects a comparison, a volcano plot displaying the –log10(FDR) as a function of fold change is generated. Significant transcripts [FC>log2(2) and FDR <0.05] are colored magenta. Using the plot tools, multiple transcripts can be selected (red dotted line box, Ai,Bi) and a table is generated showing the corresponding transcripts (Aii,Bii). A user can compare these transcripts by ticking the corresponding box which fills the form on the left of the page. (Aiii,Aiv,Biii,Biv) The expression profiles shown here are then found on the results page.
For example, when we select the three transcripts with the most dramatic fold change during embryogenesis from 7 hpf to 24 hpf (Fig. 4Ai, red dotted line box), we identify several transcripts corresponding to three predicted proteins (Fig. 4Aii). Querying the database shows that indeed these genes are highly expressed at 24 hpf (Fig. 4Aiii). Conversely, these genes show little variation during regeneration (Fig. 4Aiv). Likewise, when we examine differential expression during regeneration at 24 hpa, three genes show a large fold change (Fig. 4Bi, red dotted line box). Selecting these genes shows two predicted proteins and a T-box transcription factor (Fig. 4Bii). Comparing these genes, we see that all three are indeed upregulated at 24 hpa (Fig. 4Biv). Of these three, only the two predicted proteins show significant variation during embryogenesis (Fig. 4Biii, NvERTx.4.229217, NvERTx.4.119508), while no embryonic data are found for NvERTx.4.207772, meaning that this gene is not expressed at a detectable level in any of the embryonic data sets and could represent a ‘regeneration-specific’ gene.
Conclusion and future directions
NvERTx provides a platform to efficiently compare gene expression during embryogenesis and regeneration in Nematostella. Additionally, the comprehensive transcriptome provides high-quality transcript models that can be used to identify gene sequences. Using co-expression clusters, one can explore groups of genes that share similar expression patterns during embryogenesis and regeneration. Finally, mining the differentially expressed genes enables the identification of embryogenesis or ‘regeneration-specific’ genes. All of these tools are aimed at inspiring and building hypotheses concerning embryogenesis and regeneration for Nematostella and non-Nematostella researchers alike. Users can use their own groups of co-expressed genes to test for conservation of regeneration gene batteries, or to explore gene expression clusters to identify genes that share expression, and examine these in their own models.
As this web application is intended to complement and expand upon existing resources, we provided transcript models that have been annotated using a variety of gene/protein databases (NR, trEMBL, Nemve1). Sequencing technologies are evolving to achieve longer reads, and assemblers will soon provide more robust transcriptomes; in the future, we plan to take advantage of these technologies to improve our transcript models. We also plan to grow the database as future data sets examining embryogenesis and regeneration emerge. Finally, we foresee merging this resource with an existing spatial gene expression database found at http://www.kahikai.org/index.php?content=genes (Ormestad et al., 2011). This will enable the identification of syn-expression groups, genes that are co-expressed both spatially and temporally (Niehrs and Pollet, 1999), and further facilitate studies comparing differential gene usage during embryogenesis and regeneration.
MATERIALS AND METHODS
Animal culture, spawning and amputation
Adult N. vectensis were cultured at 16°C in the dark in 1/3 strength artificial sea water (ASW) as previously described (Amiel et al., 2017). Spawning was induced by feeding the animals with oysters the day before and transferring the animals to a light table for 12 h. Regeneration experiments were performed using 6-week-old juveniles as previously described (Amiel et al., 2017).
Transcriptomic data sets
The sequences that served as input into our de novo assembly consisted of two data sets: one spanning the first 24 h of embryogenesis originally reported by Tulin et al. (2013), and another spanning the first 144 h of regeneration generated in-house (see ‘Library preparation and sequencing’ section). The embryonic data set was downloaded from the Woods Hole Open Access Server (http://darchive.mblwhoilibrary.org/handle/1912/5613, last accessed 1 June, 2017) and includes Illumina HiSeq 100 bp paired-end sequencing prepared from Nematostella embryos at 0, 6, 12, 18 and 24 hpf. The regeneration data set includes Illumina NextSeq 75 bp paired-end sequencing from regenerating Nematostella at –1 (uncut), 0, 2, 4, 8, 12, 16, 20, 24, 36, 48, 60, 72, 96, 120 and 144 hpa (see ‘Library preparation and sequencing’ section for details). The sequence data that were used for transcript quantification are composed of four data sets. Three of the data sets spanned embryogenesis: one originally reported by Fischer and Smith (2013) and Fischer et al. (2014) (http://darchive.mblwhoilibrary.org/handle/1912/5981, last accessed 1 June, 2017) (Illumina HiSeq 100 bp paired-end replicates sampled hourly from 0 hpf to 19 hpf), a second embryonic data set originally reported in Helm et al. (2013) [NCBI Sequence Read Archive (SRA) project, PRJNA189768] (Illumina HiSeq 50 bp single-end replicates sampled from 2, 7, 12, 24, 120 and 240 hpf), and a third embryonic data set generated in house, sampled at 24, 48, 72, 96, 120, 144, 168 and 192 hpf (Illumina MiSeq 75Bp single-end replicates). The regenerative data used for quantification are the same as those used for the transcriptome assembly (Illumina NextSeq 75 bp single-end triplicates). Raw reads for the in-house generated data sets can be found in the NCBI SRA BioProject, under accession numbers PRJNA418421 and PRJNA419631.
Library preparation and sequencing
Two novel RNAseq data sets, one spanning embryonic, larval and postmetamorphic development and the other regeneration, were generated for this study. For the embryonic data set, ∼250 embryos per time point were cultured in 1/3 strength ASW at 18°C. At each time point, 24, 48, 72, 96, 120, 144, 168 and 192 hpf, the embryos were transferred to 500 ml Tri Reagent and homogenized for 15 s with a pestle. The resulting lysate was snap frozen in liquid nitrogen and stored at −80°C. This was repeated to obtain duplicates for each time point. After all the samples were collected, the RNA lysate was extracted using two phenol-chloroform extractions and precipitated with isopropanol. For the full extraction protocol, see Layden et al. (2013). The resulting nucleic acids were treated with the TURBO DNA-free kit from Invitrogen (AM1907) for 10 min at 37°C. The resulting RNA was quantified with a Qubit spectrometer, and RNA integrity was checked on an Agilent Bioanalyzer 2100. Then, 100 ng of RNA was used as input for an Illumina TruSeq Stranded mRNA Library Prep for NeoPrep Kit, and the libraries were prepared using an Illumina NeoPrep system. The 75 bp single-end sequencing was carried out on the NextSeq500 sequencer of the Institute for Research on Cancer and Aging, IRCAN Genomics Core Facility, Nice, France.
For the regenerative data set, ∼350 6-week-old Nematostella juveniles per time point were amputated below the pharynx. At each time point, –1, 0, 2, 4, 8, 12, 16, 20, 24, 36, 48, 60, 72, 96, 120 and 144 hpa, the juveniles were transferred to 500 ml Tri Reagent and homogenized for 15 s with a pestle. The resulting lysate was snap frozen in liquid nitrogen and stored at −80°C. This was repeated for each of the three replicates. After all samples were collected, the RNA was extracted as described above for the embryonic samples. Samples were stored in GenTegra-RNA stabilization reagent (GTR5100-S) and shipped to the NextGen Sequencing Core at the University of Southern California, CA, USA, for library preparation and sequencing. The samples were prepared with a KAPA Stranded RNA Kit (KR0960). Two replicates were sequenced as 75-bp single-end sequencing, and one replicate as 75-bp paired-end sequencing, on an Illumina NextSeq500 sequencer.
RNAseq quality control
All reads from each data set were processed equivalently. Reads were first quality filtered to remove low-quality reads and adapter trimmed using timmomatic (Bolger et al., 2014) and cutadapt (Martin, 2011), respectively.
Trinity de novo assembly
For the de novo assembly, paired-end reads from regeneration (–1, 2, 4, 8, 12, 16, 20, 24, 36, 48, 60, 72, 96, 120 and 144 hpa) and embryonic (0, 6, 12, 18 and 24 hpf) (Tulin et al., 2013) data sets were filtered of ribosomal sequences by aligning to Nematostella mitochondrial and ribosomal sequences using Bowtie2 (Langmead et al., 2009) and retaining the unmapped reads. These surviving reads were inputted into Trinity (v2.4.0) for assembly (Haas et al., 2013). To annotate the assembly, the ORFs were found using OrfPredictor (Min et al., 2005) and the resulting peptide sequences were compared with the NCBI NR database using the BLAST-like tool PLAST (Nguyen and Lavenier, 2009), with an e-value cutoff of 5e-5. The transcriptome was also compared with the UniProt databases, Swiss-Prot and trEMBL, using translated BLAST (BLASTx) and protein BLAST (BLASTp), respectively, with an e-value cutoff of 5e-5. The annotations were then compiled using the script totalannotation.py from the ‘Simple Fool's Guide to Population Genomics via RNAseq’ (De Wit et al., 2012).
Quantification
To quantify the RNAseq data, single-end reads for each data set, regeneration and the three embryonic data sets [from Fischer and Smith (2013), Fischer et al. (2014), Helm et al. (2013) and this study], were aligned to the Trinity assembly using Bowtie2 (Langmead et al., 2009). Read counts were quantified using RSEM (Li and Dewey, 2011). To account for the many isoforms per gene reported by Trinity, transcripts were compared with the Nemve1 filtered gene models using BLASTn, and counts for transcripts with the same Nemve1 hit were combined. Transcripts with low-read counts, those that did not have fewer than five counts in at least 25% of the samples, were excluded. Each data set was then normalized separately using the R package edgeR and the counts per million (cpm) mapped reads were calculated (Robinson et al., 2010). Intra-data-set differential expression testing for each Nemve1 gene model was carried out by comparing each time point to t0 using edgeR [t0=7 hpf for the Helm et al. (2013), Fischer and Smith (2013) and Fischer et al. (2014) data sets; t0=24 hpf for the in-house embryonic data set; and t0=0 hpa for the regeneration data set]. A significantly differentially expressed gene is defined as having an absolute fold change (FC) >2 and a false discovery rate (FDR) <0.05. To correct for batch effects of the embryonic data sets after normalization, the R function ComBat from the SVA packages was used on log2(cpm+1) transformed data using time point as a categorical covariate (Leek et al., 2012).
RNAseq quality assurance
ΔCt is the difference of the crossing threshold at the reference time point (defined as 24 hpf for the in-house embryonic data and 0 hpa for the regeneration data) and time point of interest, and E is the efficiency of the primer pair (Pfaffl et al., 2002). Actin was used as a reference gene (Forward: 5′-GGACAGGTCATCACCATTGGCAAC-3′; Reverse: 5′-CGGATTCCATACCCAGAAAGGAGG-3′; efficiency, 2.11). All other primer sequences, efficiencies and resultant traces are listed in Figs S2 and S3. Overall, the RT-qPCR expression profiles correlate with those of the RNAseq data sets (Figs S2 and S3).
Fuzzy c-means clustering and gene ontology term enrichment
The expression profiles for each Nemve1 gene model were clustered using the R package mFuzz (Kumar and Futschik, 2007) on the batch-corrected combined embryonic data set and the regeneration data set separately. The cluster number was set to 9 for the regeneration data and 8 for the embryonic data sets as these numbers represented the point at which the centroid distance between clusters did not significantly decrease when new clusters were added (inflection point). For each cluster, gene ontology (GO) term enrichment was calculated using a Fisher's exact test and the R package topGO on the GO terms identified from comparing the transcriptome with the UniProt database (all identified GO terms were used as a background model). The resulting GO term list was reduced and plotted using a modified R script based on REVIGO (Supek et al., 2011).
NvERTx website
The website was constructed using the Django python framework (https://www.djangoproject.com/). Plots are generated using the Plotly javascript library (https://plot.ly/javascript/). The source code for the website can be found at https://github.com/IRCAN/NvER_plotter_django. Data sets from the database can be found at http://ircan.unice.fr/ER/ER_plotter/about.
Acknowledgements
We thank Valerie Carlin for animal care; Christian Baudoin from the IRCAN Genomics Core Facility, and Charles Nicolet from the University of Southern California Sequencing Core Facility, for preparation and sequencing of the RNAseq libraries; and the IRCAN Genomics Core Facility for providing access to the NextSeq500 sequencer.
Footnotes
Author contributions
Conceptualization: J.F.W., E.R.; Methodology: J.F.W., V.G., E.R.; Software: J.F.W., V.G.; Validation: J.F.W., E.R.; Formal analysis: J.F.W., V.G.; Investigation: J.F.W., A.R.A., H.J., K.N.; Resources: E.R.; Data curation: J.F.W., V.G.; Writing - original draft: J.F.W., E.R.; Writing - review & editing: J.F.W., A.R.A., E.R.; Visualization: J.F.W., V.G., E.R.; Supervision: E.R.; Project administration: J.F.W., E.R.; Funding acquisition: E.R.
Funding
This work was supported by ATIP-Avenir (Institut National de la Santé et de la Recherche Médicale and Centre National de la Recherche Scientifique) funded by the Plan Cancer (Institut National Du Cancer) [C13992AS], Seventh Framework Programme [631665], Association pour la Recherche sur le Cancer [PJA 2014120186 to E.R.; PDF20141202150 to J.F.W.], Fondation pour la Recherche Médicale [SPF20130526781 to A.R.A.], Ligue Contre le Cancer [to K.N.] and Ministère de l'Enseignement Supérieur et de la Recherche [to H.J.].
Data availability
In-house data sets generated for this study are available at the NCBI SRA BioProject, under accession numbers PRJNA418421 and PRJNA419631. The dataset originally reported by Helm et al. (2013) is available at the NCBI SRA BioProject, under accession number PRJNA189768. The dataset originally reported by Fischer and Smith (2013) is available in the Woods Hole Open Access Server (http://darchive.mblwhoilibrary.org/handle/1912/5981). The dataset originally reported by Tulin et al. (2013) is available in the Woods Hole Open Access Server (http://darchive.mblwhoilibrary.org/handle/1912/5613).
References
Competing interests
The authors declare no competing or financial interests.