Gene expression profiling of epidermal cell types in C. elegans using Targeted DamID

ABSTRACT The epidermis of Caenorhabditis elegans is an essential tissue for survival because it contributes to the formation of the cuticle barrier as well as facilitating developmental progression and animal growth. Most of the epidermis consists of the hyp7 hypodermal syncytium, the nuclei of which are largely generated by the seam cells, which exhibit stem cell-like behaviour during development. How seam cell progenitors differ transcriptionally from the differentiated hypodermis is poorly understood. Here, we introduce Targeted DamID (TaDa) in C. elegans as a method for identifying genes expressed within a tissue of interest without cell isolation. We show that TaDa signal enrichment profiles can be used to identify genes transcribed in the epidermis and use this method to resolve differences in gene expression between the seam cells and the hypodermis. Finally, we predict and functionally validate new transcription and chromatin factors acting in seam cell development. These findings provide insights into cell type-specific gene expression profiles likely associated with epidermal cell fate patterning.


Development • Supplementary information
Table S1.List of genes expressed in seam cells and hypodermis at L2 and L4 by TaDa and their overlap with sci-RNAseq and PAT-seq.
Table S2.List of genes (including transcription and chromatin factors) expressed in seam cells at L2 and L4 by TaDa.
Click here to download Table S1 Click here to download Table S2 Development :10.1242/dev.199452:Supplementary information Development • Supplementary information

FigC
Fig. S1.Establishing seam cell and hypodermis-specific promoters for RNApol TaDa.(A) Representative image of srf-3 smFISH at the L2 stage, during the asymmetric seam cell division, showing high levels of transcript in seam cells.Seam cell nuclei are marked by the SCMp:GFP reporter and srf-3 mRNAs appear as black spots.(B) Illustration of the srf-3 locus on chromosome IV.Pink blocks signify exon sequences and grey indicate 3' UTRs.Two isoforms of srf-3 (isoform a and b) are shown.Shaded areas near the 5' mark putative promoter or regulatory sequences tested for seam cell specific expression.With grey, the 1093 bp putative promoter sequence of isoform a (srf-3ap), extending from the end of the upstream gene txt-19 3'UTR to the srf-3 isoform a start codon.With peach, the 2246 bp sequence starting at the same position and extending to the start codon of isoform b (srf-3bp) and with teal, the 1081 bp first intron of isoform a (srf-3i1).(C) Representative fluorescence images of late L4 transgenic animals carrying single-copy transgenes of transcriptional reporters driving expression of GFP-H2B under the srf-3ap promoter (grey frame), the srf-3bp promoter (peach frame) and the srf-3i1::pes-10 promoter (teal frame).White arrowheads show expression in seam cell nuclei, green in intestinal and red in hypodermal.Yellow outlined areas indicate further expression in the germline.(D-E) Representative fluorescence images showing expression of mCherry-H2B under the promoter of dpy-7 from a single-copy transgene (D) and dpy-7syn1::mCherry-H2B from a multi-copy transgene (E) at the L3 asymmetric cell division stage.Seam cells are marked in cyan by membrane (arf-3:pes10::GFP-CAAX) and nuclear (SCMp::GFP) reporters.Note expression of dpy-7p:mCherry-H2B in seam cell nuclei, indicated by white arrowheads, which is more prominent during divisions and is abolished in dpy-7syn1::mCherry-H2B transgenics.Scale bars are 100 µm in C and 10 µm in D and E.

Fig
Fig. S3.RPB-6 occupancy signatures are similar across cell types and different to controls.(A-B) Pearson correlation heatmaps based on genome-wide normalised aligned read count maps for the seam (A) and hypodermis (B).The correlation coefficient for each pairwise comparison is printed in each cell of the heatmap.For both promoters, the dam:rpb-6 samples show higher correlation than the control samples.(C) Scatterplots and Pearson correlations across replicates, cell types and developmental stages based on read-count normalised scores of GATC fragments participating in protein-coding genes.The correlation coefficient r values are shown in each corresponding cell.Note that these values are not identical to those reported in Fig.2B, likely due to differences in signal distribution across GATC fragments in protein-coding genes.

Fig
Fig. S5.TaDa-identified expressed genes show higher expression in RNA-seq experiments and are biased for GATC availability.(A) Density plots of protein coding genes, separated in groups of expressed or nonexpressed based on TaDa (FDR<0.05),per cell type and developmental stage plotted against: the number of GATCs within gene, gene length, sci-RNA-seq expression values (in TPM: transcripts per million) (Cao et al., 2017) and stage-matched whole-animal RNA-seq expression values (in dcpm: depth of coverage per base per million reads) (Boeck et al.,2016).For all cell types and stages TaDa expressed genes showed significantly increased sci-RNA-seq and whole-animal RNA-seq expression values compared to non-expressed, as well as higher numbers of GATCs and consequently longer gene lengths.(B) Density plots of protein coding genes, separated in groups of expressed or non-expressed based on sci-RNA-seq (TPM>10)(Cao et al., 2017), per cell type plotted against: the number of GATCs within gene, gene length and TaDa RPB-6 occupancy values (log2(rpb-6:dam/NLS-GFP:dam)).In both cell types expressed genes showed significantly longer gene length and likely consequently more GATCs than non-expressed, as well as higher RPB-6 occupancy values.In A and B dashed lines indicate the distribution mean and statistical significance of difference in the distributions was determined with a Kolmogorov-Smirnov test, p< 2.2×10 - 16 .The X-axis is limited to the 99 th percentile of values and for sci-RNA-seq the Y-axis is limited to 0.02 for visualisation.

Fig
Fig. S6.TaDa-identified expressed genes overlap significantly with other published transcriptomes.(A) Barplots of numbers of TaDa-identified expressed genes for each cell type and developmental stage that are in common with gene-sets classified based on their tissue specificity (Serizay et al., 2020).(B-C) Barplots of the sizes and statistical significance, assessed by a Fisher's exact test, of all possible intersections between TaDa, sci-RNA-seq genes over a 10 TPM threshold (Cao et al., 2017) and PAT-seq-identified sets of expressed genes (Blazie et al., 2017) in the seam cells (B) and the hypodermis (C).All overlaps are highly significant.(D) Correlation scatterplots of expression levels for L2 genes common between seam (left) or hypodermis (right) TaDa sets and the sci-RNA-seq or the PAT-seq datasets for seam cells and hypodermis.For TaDa the log2(dam:rpb-6/dam:NLS-GFP) scores are used as a measure of expression levels, for sci-RNA-seq the values are transcripts per million reads (TPM) and for PAT-seq fragments per kilobase of transcript per million reads (FPKM).All correlation analyses showed significant although weak correlation (indicated by the r values) across methods with p<0.0001 for all.

Fig. S7 .
Fig. S7.Intersections of TaDa-identified expressed gene sets show significant overlap across cell types and reveal seam cell and hypodermal-specific genes.(A-B) Multiple intersections of all the acquired gene-sets expressed in the seam and hypodermis in both stages.The Venn diagram (A) presents the number of genes that are shared across datasets or are unique to the cell type and/or developmental stage.The circular plot (B) reports the sizes of all pairwise and higher-order intersections between the sets (indicated in green whether they are included in each comparison) and indicates that they are highly significant (in red) with a Fisher's exact test.(C-D) Barplots representing intersections between TaDa seam-only (C) or TaDa hypodermis-only (D) gene sets and celltype-specific genes from 7 non-related cell types from sci-RNA-seq (genes with a TPM>10 and 5-fold higher than the 2 nd most expressing cell type) (Cao et al., 2017).The colour of the bar signifies statistical significance assessed by a Fisher's exact test, gridded bars indicate expected overlap based on gene set and genome size and the enrichment fold between observed and expected overlaps is printed on top of bars.The most significant intersections are those between gene sets specific for the same-cell type (p≤7.06×10 -2 ).

Table S3 .
List of strains used in this study.

Table S4 .
Primers used in this study (general oligos and smFISH probes).

Table S6 .
List of RNAi clones used in this study.