Analysis of differential gene expression is crucial for the study of cell fate and behavior during embryonic development. However, automated methods for the sensitive detection and quantification of RNAs at cellular resolution in embryos are lacking. With the advent of single-molecule fluorescence in situ hybridization (smFISH), gene expression can be analyzed at single-molecule resolution. However, the limited availability of protocols for smFISH in embryos and the lack of efficient image analysis pipelines have hampered quantification at the (sub)cellular level in complex samples such as tissues and embryos. Here, we present a protocol for smFISH on zebrafish embryo sections in combination with an image analysis pipeline for automated transcript detection and cell segmentation. We use this strategy to quantify gene expression differences between different cell types and identify differences in subcellular transcript localization between genes. The combination of our smFISH protocol and custom-made, freely available, analysis pipeline will enable researchers to fully exploit the benefits of quantitative transcript analysis at cellular and subcellular resolution in tissues and embryos.
Analysis of gene expression patterns is an essential tool in many areas of biological research. In developmental biology, for instance, it provides valuable information on the role of differential gene expression in determining cell fates (Junker et al., 2014a; Satija et al., 2015; Thisse and Thisse, 2008; Tomancak et al., 2007). Spatial patterns of gene expression have historically been studied by RNA in situ hybridization, but this technique is generally not quantitative (Gross-Thebing et al., 2014; Thisse and Thisse, 2008; Tomancak et al., 2007). Relative levels of gene expression are often studied by RNA-sequencing approaches. When performed at the cellular level, however, this technique only detects the ∼10% most abundant transcripts and is thus rather insensitive (Grün et al., 2014; Junker et al., 2014a; Satija et al., 2015). Furthermore, neither technique provides subcellular resolution. The development of single-molecule fluorescence in situ hybridization (smFISH) has enabled the detection of individual transcripts both in single cells and tissues (Bahar Halpern et al., 2015; Battich et al., 2013; Boettiger and Levine, 2013; Itzkovitz et al., 2012,, 2011; Little et al., 2013; Lyubimova et al., 2013; Mueller et al., 2013; Nair et al., 2013; Oka and Sato, 2015; Peterson et al., 2012; Raj et al., 2008). This technical advance has, for example, improved our understanding of the design principles of the developing mouse intestine (Itzkovitz et al., 2012) and the establishment of precise developmental gene expression patterns in Drosophila blastoderm embryos (Boettiger and Levine, 2013; Little et al., 2013). However, broad application of smFISH in complex samples has been hampered by the limited availability of protocols for embryos and by the lack of an automated image analysis pipeline that combines transcript detection with cell segmentation (Bahar Halpern et al., 2015; Itzkovitz et al., 2011; Lyubimova et al., 2013; Oka and Sato, 2015). Thus, the potential of smFISH in fields such as developmental biology remains to be fully exploited.
Here, we present a protocol for smFISH on embryo sections in combination with an analysis pipeline for automated transcript detection and cell segmentation. We apply our approach to the quantification of RNA expression in single cells of developing zebrafish embryos. To illustrate the power of our method, we identified cell type-specific differences in gene expression and assigned transcripts to different subcellular compartments. The combination of our smFISH protocol and image analysis pipeline opens the door for automated, high-resolution transcript analysis in a variety of complex systems. This tool will be valuable in many areas of biological research, including development, stem cell biology and regeneration.
RESULTS AND DISCUSSION
Sensitive and specific detection and quantification of transcripts
To detect mRNA at single-molecule resolution, we developed a protocol for smFISH on 8 μm cryosections of zebrafish embryos. We imaged and analyzed stacks of 17 z-slices with 0.3 μm spacing, corresponding to a total thickness of ∼5 μm (Fig. 1A and Materials and Methods). To visualize single RNA molecules, we used 48 oligonucleotide probes 20 bases long, each coupled to one fluorophore (Stellaris, Biosearch Technologies) (Raj et al., 2008). Once hybridized to an RNA molecule, the probes generate diffraction-limited fluorescent spots that can readily be distinguished from background signal (Fig. 1).
To test our protocol, we performed smFISH for ntla (also known as ta - ZFIN) and eif4g2a on sections of zebrafish embryos at 50% epiboly [5.3 hours post fertilization (hpf)] (Fig. 1B-E, Fig. S1). ntla is involved in mesoderm specification and has been shown to be expressed in the presumptive mesoderm at the margin of the embryo (Harvey et al., 2010; Schier and Talbot, 2005) (Fig. S2A,B). By contrast, eif4g2a is a ubiquitously expressed housekeeping gene (Fig. S2C). To detect transcripts for both genes simultaneously, we labeled the two probe sets with different fluorophores (ntla-Q670, eif4g2a-CF610). We included DAPI staining to detect nuclei (Fig. 1D,E). Embryos were imaged in a tile scan on a wide-field microscope and the resulting images were stitched with the Grid/Collection stitching plugin in Fiji (Preibisch et al., 2009; Schindelin et al., 2012). In agreement with its known expression pattern, ntla expression was only detected at the margin of the embryo (Fig. 1B-D, Fig. S1). By contrast, eif4g2a was detected ubiquitously (Fig. 1B,C,E, Fig. S1A). Interestingly, and consistent with the localization of the upstream activators of ntla in the yolk syncytial layer [BMP and Nodal (Harvey and Smith, 2009; Harvey et al., 2010; Schier and Talbot, 2005)], smFISH revealed that there is a vegetal-animal gradient of ntla expression (Fig. 1B-D, Fig. S1). ntla was also detected at single-molecule resolution in notochord and tail bud at 19 hpf (Fig. 1F,G), in line with whole-mount in situ hybridization data (Schier and Talbot, 2005), illustrating the versatility of our protocol. Taken together, these results indicate that we can obtain specific, high-resolution information on gene expression for multiple genes simultaneously in zebrafish embryos at various stages of development.
Next, we developed a Fiji plugin (Transcript analysis) to quantify transcript numbers in an automated fashion. To detect transcripts, we filtered images, detected local maxima of intensity and used a threshold to separate true transcripts from background noise, similar to previous approaches (Lyubimova et al., 2013; Mueller et al., 2013; Raj et al., 2008). To determine the appropriate threshold for detection of ntla transcripts, we plotted the intensity distribution of all detected maxima (Fig. 1H). For each probe set, we manually set the threshold for transcript detection between the low intensity peak, reflecting background signal, and the high intensity peak, reflecting transcripts. The unimodal shape of the transcript peak confirms that the spots we identify were indeed single RNA molecules (Raj et al., 2008; Vargas et al., 2005). Comparison of the transcript detection output with the smFISH image suggested that the sensitivity of transcript detection with the image analysis pipeline is high (Fig. 1D,I).
To quantify the sensitivity and specificity of our method, we first analyzed slc7a8a transcripts with two probe sets that were labeled with different fluorophores (Fig. 1J). Of the spots detected with probe set 1 (slc7a8a-Quasar670), 87% was also detected with probe set 2 (slc7a8a-CalFluor610). Conversely, 81% of spots detected with probe set 2 was also detected with probe set 1. This might even be an underestimation of the efficiency, because the use of two probe sets for one gene precludes the use of the 48 best probes. In comparison, previous studies reported detection efficiencies of 70-85% for smFISH (Oka and Sato, 2015; Raj et al., 2008). Next, to test the specificity of the method, we performed dual-color labeling of two different genes (eif4g2a and ntla). This resulted in an overlap of only 2% in cells where both genes are expressed (Fig. S3). Finally, transcript numbers obtained by smFISH correlated well (r=0.94) with RNA-sequencing data (Pauli et al., 2012), confirming the quantitative power of our smFISH approach (Fig. S4). Taken together, these results show that our method detects transcripts efficiently and specifically.
In addition to individual transcripts, high-intensity foci corresponding to sites of active transcription (Bahar Halpern et al., 2015; Levesque and Raj, 2013) were sometimes observed in the nucleus (Fig. 1D, arrows). As expected, a maximum of two foci per nucleus was observed, one for each allele. We extended our analysis pipeline to include the automated detection of transcription foci based on their size and intensity (Materials and Methods and Fig. 1I). We compared detected foci with foci in smFISH images and found a detection sensitivity close to 90%, with a precision of more than 97% (Fig. S5). Only weak foci were not detected automatically. When 100% detection efficiency of foci is essential, an intronic probe can be used to mark transcription sites specifically. To quantify the number of transcripts in each focus, we divided the sum intensity of the transcription foci by the median sum intensity of the transcripts (Mueller et al., 2013). In conclusion, our smFISH protocol and analysis pipeline (Fig. S6) enable the detection of single RNA molecules and transcription foci in zebrafish embryo sections with high sensitivity and specificity.
An automated membrane segmentation pipeline to assign transcripts to cells
In order to assign transcripts to cells and specific cellular compartments, cells and nuclei have to be segmented. So far, the use of smFISH for the quantitative analysis of gene expression in complex samples has been hampered by the lack of an efficient cell segmentation pipeline. Current analysis pipelines rely on manual segmentation of cells (Bahar Halpern et al., 2015; Itzkovitz et al., 2011; Lyubimova et al., 2013; Oka and Sato, 2015), which is not feasible for large amounts of data or samples as large as the zebrafish embryo. To overcome this problem, we developed an automated pipeline to segment cells in tissue sections (Fig. 2, Fig. S6).
To identify the cell membrane, we incorporated a phalloidin-staining step in our smFISH protocol (Fig. 2A). We used the middle slice of our z-scan acquisition for cell segmentation. This is a good approximation of the cell outline in thin sections. We trained a cascaded Random Forest (Breiman, 2001; Tu and Bai, 2010) to predict for each pixel the probability that it belongs to the membrane, and additionally the probability that it belongs to a membrane intersection point (vertex) based on the phalloidin staining (Fig. 2B). Given these probabilities, we can trace paths that are likely to run along the membrane between points that are most likely vertices. This results in a mask of cell membranes (Fig. 2C). Depending on the quality of the membrane staining, the membrane-tracing software can produce both over- and under-segmentation errors. These errors can easily be corrected manually by drawing missing lines and breaking excessive ones with our Fiji tool ‘Cell annotation’. In our samples, and with the settings we chose, automated segmentations exhibit on average 91% precision (100% would indicate no over-segmentation) and 70% recall (indicating the fraction of correct segmentations) (Fig. S7). Manual corrections take 5 min per image, compared with 20 min for a completely manual segmentation. Finally, the individual cells are identified (Fig. 2D). Our pipeline significantly reduces cell segmentation time compared with existing approaches that rely on manual segmentation (Bahar Halpern et al., 2015; Itzkovitz et al., 2011; Lyubimova et al., 2013; Oka and Sato, 2015). In the future, it might be possible to implement assisted manual correction, which would further reduce segmentation times. In addition, we segmented nuclei to be able to distinguish between cytoplasmic and nuclear transcripts (Fig. 2E,F). We used a watershed-based approach (Ollion et al., 2013) to segment nuclei in 2D on a maximum z-projection. Together, our smFISH method, cell segmentation and nuclear detection allow us to automatically assign transcripts and transcription foci to specific cells and nuclei (Fig. 2F).
Using the automated pipeline, we can calculate transcript densities per cell as number of transcripts per μm3. We used transcript density as a measure of gene expression because it has been shown to be a more reliable readout than transcript number (Padovan-Merhar et al., 2015), and because we do not image complete cells in our cryosections. A flowchart of the complete analysis pipeline including transcript detection can be found in Fig. S6.
Quantification of cell type-specific differences in gene expression
To validate our method, we quantified gene expression at dome stage (4.3 hpf) when the first two cell types, the extra-embryonic cells of the enveloping layer (EVL) and the embryonic cells (deep layer, DEL) (Kimmel et al., 1990), have been specified (Fig. 3A-C, Fig. S8). We analyzed the maternally loaded gene eif4g2a and two genes involved in early zebrafish development, sox19a and mex3b. No differences in gene expression were detected for these genes by regular in situ hybridization (Fig. S9). To quantify gene expression in EVL and DEL, we expanded our annotation tool to categorize cells. With this tool, any segmented cell can be assigned to a selected class by simply clicking on it (Fig. 3D). Here, we identified cells based on location, but markers can also be used. Antibody staining can easily be incorporated in the smFISH protocol (data not shown; Lyubimova et al., 2013; Raj et al., 2008). Quantification of transcript densities in EVL and DEL revealed that expression of sox19a was 4.6-fold higher in the EVL than in the DEL, whereas expression of mex3b was 5.1-fold lower in the EVL (Fig. 3E). By contrast, eif4g2a was expressed at similar levels in both cell types (Fig. 3E). Thus, our approach allows sensitive detection and quantification of differences in gene expression between cells in an embryo, making it a useful tool in a variety of applications, such as the analysis of transcript levels in relation to cell fate determination.
Quantification of subcellular transcript distribution
The localization of mRNAs plays an important role in organizing cellular function (Besse and Ephrussi, 2008; Jambor et al., 2015; Lécuyer et al., 2007). To determine whether our approach is able to identify differences in mRNA localization, we assigned transcripts of three genes to nuclei and cytoplasm and identified the level of transcriptional activity (in transcription foci) at sphere stage (4 hpf) (Fig. 4A-C, Fig. S9). The maternally loaded housekeeping gene eif4g2a was expressed at an average density of 8.1×10−2 transcripts per μm3. Very few transcripts were found in foci or dispersed throughout the nucleus and most eif4g2a transcripts were localized to the cytoplasm (Fig. 4A,D, Fig. S10A). Thus, at sphere stage, most eif4g2a transcripts are available for translation. The zygotically expressed genes tbx16 (spadetail) and akap12b were expressed at average densities of 3.0×10−2 and 4.2×10−2 transcripts per μm3, respectively. In contrast to eif4g2a, a large proportion of these transcripts was located in transcription foci or scattered throughout the nucleus (Fig. 4B-D, Fig. S10B,C). Fewer than half of tbx16 and akap12b transcripts were located in the cytoplasm (Fig. 4B-D, Fig. S10B,C). Since nuclei were segmented in 2D, small nuclear sizes might reflect incomplete presence of the nucleus in the z-stack, resulting in the mis-assignment of transcripts. To avoid this potential problem, we analyzed only those cells with the top 25% largest nuclei, which are most likely to fill the entire z-stack (Fig. 4D, Fig. S11A). However, analyzing all cells resulted in very similar distributions (supplementary Materials and Methods and Fig. S11B). Taken together, these data show that our approach can quantify the distribution of transcripts between nuclei and cytoplasm.
mRNAs can be localized to more sites than nuclei and cytoplasm (Jambor et al., 2015; Lécuyer et al., 2007). Interestingly, and in contrast to the localization of akap12b at sphere stage, at the onset of gastrulation, most akap12b transcripts were localized in clusters at the plasma membrane (Fig. 4E). akap12b encodes a scaffold protein that regulates the transition from convergence to extension movements during gastrulation (Weiser et al., 2007). The zebrafish Akap12b protein has been shown to localize to plasma membranes when expressed in cultured human cells, but not much was known about the potential localization of akap12b mRNA (Weiser et al., 2007). Localization of akap12b mRNA to the membrane might facilitate its translation right at the site of action of the protein (Besse and Ephrussi, 2008; Lécuyer et al., 2007). Taken together, these results show that our approach can quantify asymmetries in the localization of transcripts, which is important for determining their function.
In conclusion, we have developed a method in zebrafish that enables the automated detection and quantification of transcripts at cellular and subcellular resolution in large samples. So far, studies in large and complex samples have used manual segmentation to assign transcripts to specific cells (Bahar Halpern et al., 2015; Itzkovitz et al., 2011; Lyubimova et al., 2013; Oka and Sato, 2015). This has limited the number of cells that could be analyzed, and as a consequence, the potential of smFISH has not been fully exploited. For example, to draw reliable conclusions about variability in gene expression between cells, data on large numbers of cells is required (Battich et al., 2013). Furthermore, gene expression has often been indicated as a function of an animal/organ axis (Hoyle and Ish-Horowicz, 2013; Junker et al., 2014b; Kim et al., 2013; Nair et al., 2013; Peterson et al., 2012). Although this kind of representation is informative, cellular resolution would provide more precise information. Recent examples of where this would be of value include sonic hedgehog signaling dynamics in the developing neural tube (Peterson et al., 2012) and the relationship between the expression level of a micro RNA and its target (Kim et al., 2013). In summary, our method facilitates the automated detection and quantification of transcripts and their assignment to cells and subcellular structures. Our custom-made software is freely available in KNIME and Fiji and allows researchers working with complex tissues in diverse systems to start exploiting the benefits of high-resolution transcript analysis.
MATERIALS AND METHODS
Zebrafish were maintained and raised under standard conditions. Wild-type (TLAB) embryos were left to develop to the desired stage at 28°C. Staging was done according to Kimmel et al. (1995).
smFISH sample preparation
Embryos were fixed in 4% formaldehyde in PBT (PBS with 0.1% Tween) at 4°C overnight. The next day, embryos were dechorionated manually in PBT and incubated in several changes of fresh 30% sucrose in PBS over the course of several hours before being incubated in 30% sucrose in PBS/OCT (50/50, v/v) at 4°C for 5 days. Then, embryos were embedded in OCT and blocks were quickly frozen in precooled isopentane at −80°C. Cryosection blocks were wrapped in foil and stored at −80°C. 8 μm cryosections were attached to selected #1.5 22×22 mm coverslips, that were cleaned by sonicating once in 1:20 mucasol and twice in 100% ethanol, and were then coated with 1:10 poly-L-lysine (Sigma, P8920). Coverslips with sections were stored in sealed 6-well plates at −80°C.
smFISH was performed as described previously (Lyubimova et al., 2013) with some changes to obtain high-quality sections of fragile embryos and to reduce background signal. In brief, sections were postfixed in 4% paraformaldehyde in PBS for 15 min and rinsed twice with PBS. Sections were equilibrated in 70% ice-cold ethanol for 5 min and incubated in fresh 70% ice-cold ethanol at 4°C for 4-8 h for permeabilization. Samples were rehydrated in 2× SSC and subjected to a mild proteinase K digestion step at 1:2000 (10 mg/ml stock) for 10 min to increase accessibility of RNAs. After two 5 min washes in 2× SSC, samples were equilibrated in 10% smFISH wash buffer for several minutes (10% formamide, 2× SSC) before probe hybridization. Probes (Biosearch Technologies) were hybridized at a concentration of 75-250 nM in 10% hybridization buffer [10% dextran sulfate (w/v) (Sigma, D8906), 10% formamide (v/v), 1 mg/ml E. coli tRNA (Roche), 0.02% BSA, 2 mM vanadyl-ribonucleoside complex (NEB, S1402S), 2× SSC]. For this, smFISH wash buffer was carefully drained from the coverslips as much as possible before coverslips were placed section down on a 100 μl drop of hybridization buffer with probe on a Parafilm-coated cell culture dish. Hybridization was performed at 30°C for ∼16 h. Then, coverslips were carefully released from the Parafilm with 10% wash buffer. Samples were rinsed with 2 ml of 10% smFISH wash buffer and washed for 2×30 min with 1 ml 10% wash buffer at 30°C. 1:2500 DAPI (1 mg/ml stock) and 1:100 Phalloidin (Life Technologies, A12379) were added to the second wash to stain the nucleus and membrane, respectively. After the second wash, samples were placed in GLOX buffer (10 mM Tris-HCl, pH 7.5, 0.4% glucose, 2× SSC) at 4°C until mounting. Samples were mounted in freshly prepared GLOX mounting medium [GLOX buffer with 1:50 each of 3.7 mg/ml glucose oxidase (Sigma, G2133), Catalase suspension (Sigma, C3515) and Trolox] and sealed with nail polish.
A total of 48 probes per mRNA, each 20 bases long, were designed using the Stellaris Probe Designer (https://www.biosearchtech.com/stellarisdesigner/). CAL Fluor Red 610 and Quasar 670 labeled probes were ordered from Biosearch Technologies. For slc7a8a, we designed 96 probes and ordered them with alternating fluorophores for dual-color detection. For probe sequences, see Table S1.
Samples were imaged in a tile scan of 19 z-sections on a Delta Vision epifluorescence microscope equipped with a 60×1.42 NA oil objective, a Photometrics Cool Snap CCD camera and the following emission filter sets: 435/48, DAPI; 525/36, Alexa Fluor 488; 632/60, CAL Fluor Red 610; 676/34, Quasar 670. Pixel size in the image plane is 0.1072×0.1072 μm. We acquired z-stacks with 0.3 μm spacing. After acquisition, image tiles were stitched with the ‘Grid/Collection stitching’ plugin in Fiji (Preibisch et al., 2009; Schindelin et al., 2012).
The first 17 optical z-slices (corresponding to ∼5 μm thickness) of our 8 μm sections were used for analysis. We empirically determined that this gives the best smFISH results. For other tissues and probe sets, the depth at which good imaging results can be obtained with an epifluorescence microscope might differ depending on the overall background levels (auto fluorescence) and non-specific probe binding. Therefore, when setting up the technique in another tissue, the thickness of sections and the imaging depth should be empirically determined.
First, background signal was removed from images using top-hat filtering. Next, images were smoothed with a Gaussian kernel to remove noise. Transcripts were detected as local maxima in this image and distinguished from the background noise with an intensity threshold, Ttx. In the histogram of local maxima intensity, Ttx was chosen between the one or two sharp peak(s) corresponding to the background and the lower peak of the transcripts at higher intensity. Transcripts were segmented using watershed segmentation initiating from the detected maxima. Transcription foci were detected among the regions defined in the transcript segmentation with the use of thresholds for maximum intensity and volume. For further details, see supplementary Materials and Methods.
Cell segmentation was based on Phalloidin staining. The middle slice of the z-scan acquisition was used for cell segmentation as an approximation of the cell outline in our thin sections. With a pixel-level classifier, the probability of being on a membrane, as well as a probability of being at the intersection of multiple membranes (i.e. a vertex) was assigned to each pixel. To this end, we trained a two-level cascaded Random Forest classifier from manually segmented training data. Based on the output of this classifier, we traced membranes as highly likely paths between vertices. The set of shortest paths whose length falls below a specific threshold constitutes our automated membrane segmentation. For more details, see supplementary Materials and Methods.
Whole-mount in situ hybridization
Whole-mount in situ hybridization was performed as described previously (Thisse and Thisse, 2008). After staining, embryos were cleared in methanol and gradually transferred to 87% glycerol for imaging. Samples were imaged in 87% glycerol on a Leica M165C dissecting scope equipped with a Leica MC170 HD camera. Probes were made by PCR amplification of regions of target gene cDNA and cloning these into the pSC-A vector (StrataClone PCR cloning kit). The following primer pairs were used: eif4g2a FW: ACGCTTCTCTTTGGCCTCATCG, RV: CAGGCTGTGTTTGGTAATCCCTG; sox19a FW: GAATGACCCAGCTGAACGGTGG, RV: GCCATGGCGGATGGATACTGC; mex3b FW: CCCTGCGAGCAAAGACCAATAC, RV: CGTTCCCATGCAGGTCAAAACC. For ntla, a previously published probe was used (Bennett et al., 2007).
We thank Caren Norden, Máté Pálfy, Iain Patten and Pavel Tomancak for critically reading the manuscript, Jan Philipp Junker and Alexander van Oudenaarden for advice on smFISH, and the following MPI-CBG Services and Facilities for their support: Biomedical Services (Fish facility), Light Microscopy, and Scientific Computing.
L.C.S. and N.L.V. conceived the study. L.C.S. performed and analyzed all experiments. B.L. developed and documented the transcript detection, nuclear segmentation and tissue annotation pipelines. C.B., D.K. and E.W.M. developed and documented the membrane segmentation pipeline. F.J. implemented the membrane and vertex classification tools in KNIME. L.C.S. and N.L.V. wrote the manuscript with input from other authors.
This work was supported by Max Planck Institute of Molecular Cell Biology and Genetics core funding; the German Federal Ministry of Research and Education [031A099]; a Human Frontier Science Program Career Development Award [CDA-00060/2012-C to N.L.V.]; the Klaus Tschira Stiftung (to E.W.M.); a Boehringer Ingelheim Fonds PhD fellowship (to L.C.S.); and a Center for Systems Biology Dresden postdoctoral fellowship (to D.K.).
The authors declare no competing or financial interests.