Organ formation relies on the orchestration of pattern formation, proliferation and growth during development. How these processes are integrated at the individual cell level remains unclear. In the past decades, studies using Drosophila wing imaginal discs as a model system have provided valuable insights into pattern formation, growth control and regeneration. Here, we provide single cell transcriptomic landscapes of pattern formation, proliferation and growth of wing imaginal discs. We found that patterning information is robustly maintained in the single cell transcriptomic data and can provide reference matrices for computationally mapping single cells into discrete spatial domains. Assignment of wing disc single cells to spatial subregions facilitates examination of patterning refinement processes. We also clustered single cells into different proliferation and growth states and evaluated the correlation between cell proliferation/growth states and spatial patterning. Furthermore, single cell transcriptomic analyses allowed us to quantitatively examine disturbances of differentiation, proliferation and growth in a well-established tumor model. We provide a database to explore these datasets at http://drosophilayanlab-virtual-wingdisc.ust.hk:3838/v2/.
Drosophila wing imaginal discs have been an important developmental model for study of pattern formation and organ size control (Beira and Paro, 2016). The wing imaginal discs in Drosophila larvae are a pair of sac-like epithelial organs that later become adult wings. A wing disc is composed of a squamous layer of epithelial cells called the peripodial membrane and a columnar epithelial layer called the disc proper, with myoblast cells, tracheal cells and a few neurons attached to the disc proper (Beira and Paro, 2016). Over the 96 h larval period, wing disc cells rapidly proliferate from 30-50 cells to 30,000-50,000 cells (Hariharan, 2015; Milan et al., 1996). The wing disc is patterned along the anterior-posterior (AP), dorsal-ventral (DV) and proximal-distal (PD) axes, during which cells acquire positional information through the functions of morphogens such as Hedgehog (Hh) (Basler and Struhl, 1994; Strigini and Cohen, 1997; Tabata and Kornberg, 1994), Wingless (Wg, a Wnt family member) (Neumann and Cohen, 1997; Zecca et al., 1996) and Decapentaplegic (Dpp, a TGFβ family member) (Lecuit et al., 1996; Nellen et al., 1996; Tabata and Takei, 2004). Wing disc size is tightly controlled through biochemical signals, such as the Dpp pathway (Burke and Basler, 1996; Rogulja and Irvine, 2005; Wartlick et al., 2011) and the insulin/PTEN pathway (Böhni et al., 1999; Brogiolo et al., 2001; Goberdhan et al., 1999; Leevers et al., 1996; Montagne et al., 1999; Weinkove et al., 1999), as well as through structural and mechanical signals, which probably function through tuning Hippo signaling activity (Hariharan, 2015; Irvine and Shraiman, 2017).
Although the processes of pattern formation and growth control are generally thought as two independent processes, several lines of evidence suggest that wing disc pattern formation and size control processes have underlying links; for example, Dpp and wg signaling activity is required for both pattern formation and wing disc growth (Capdevila and Guerrero, 1994; Giraldez and Cohen, 2003; Harmansa et al., 2015; Lecuit et al., 1996; Martin-Castellanos and Edgar, 2002; Teleman and Cohen, 2000; Zecca et al., 1995). Another line of evidence comes from studies of wing disc tumor models. Wing discs mutant for a group of neoplastic tumor suppressor genes (nTSGs) fail to differentiate and, at the same time, lose growth control (Bilder, 2004; Gateff, 1978). The representative nTSG mutants, including lethal(2) giant larvae [l(2)gl], discs-large (dlg) and scribble (scrib), encode highly conserved scaffold proteins that form a basolaterally localized cell polarity complex in epithelial cells (Bilder et al., 2000; Bilder and Perrimon, 2000). The tissue architecture of nTSG mutant wing discs, which are composed of unpolarized and undifferentiated cells, is severely disrupted and the mutant wing discs are presumably devoid of patterns and differentiation (Bilder et al., 2000; Bilder and Perrimon, 2000; Gateff, 1978). Interestingly, it was shown that nTSG mutant clones grow preferably in the wing hinge region whereas the same mutant clones are eliminated in the pouch region (Tamori et al., 2016). A third line of evidence comes from the study of wing disc regeneration. It has been shown that different wing disc regions have dramatically different regenerative capacities (Martin et al., 2017; Verghese and Su, 2016). These lines of evidence suggest that the patterning process and cell differentiation states might affect cell competency for proliferation and growth.
Here, we provide a systematic analysis of single cell transcriptomic datasets for wild-type and scrib mutant wing imaginal discs. Wing imaginal discs, as a classical model for study of pattern formation, provide an ideal setting for mapping single cells to distinct spatial domains with high accuracy. We used gene co-expression correlation analysis to identify two gene sets related to DNA replication and protein translation in order to cluster cells into distinctive proliferation and growth states. We found that cells of different proliferation and growth states were distributed uniformly in wild-type imaginal discs. We also examined patterning, proliferation and growth in scrib mutant wing imaginal discs, a well-established wing imaginal disc tumor model. Surprisingly, although the homozygous scrib mutant wing imaginal discs were entirely composed of cells lacking apicobasal polarity and the overall organ architecture was severely disrupted, the mutant imaginal discs retained partial patterning with notum identity cells becoming more distinguishable over time. In scrib mutant imaginal discs, the distribution of cell proliferation and growth states was severely disrupted. Together, these data provide landscapes of pattern formation, proliferation and growth at single cell level in wing imaginal discs and demonstrate broad potential applications of single cell transcriptomic data beyond cell type identification.
Overview of 10× single-cell RNA-sequencing data from wing imaginal discs
We generated single-cell RNA-sequencing (scRNA-seq) data for dissociated wing imaginal disc cells using the 10× genomics platform from five samples (Table S1). Although the total cell number captured varied with different samples, the median number of genes detected per cell was consistent at around 3000 genes per cell and the median UMI (unique molecular identifier) counts per cell was stable at around 20,000 per cell (Table S1). For each sample, we plotted the distribution of UMI number versus gene number detected per cell (Fig. S1). In addition to removing low quality cells with few genes detected, we also removed cells with over 4000 genes detected per cell to avoid potential doublets or even cell clusters (Fig. S1).
We first visualized single cells at 96 h after egg laying (AEL) in wild-type imaginal discs using the UMAP (uniform manifold approximation and projection) dimension reduction method (Becht et al., 2018). We noticed that the myoblasts and tracheal cells were well clustered, expressing unambiguous markers such as myoblast markers twist (twi) and Mef2 (Gunage et al., 2014; Sudarsan et al., 2001) and tracheal cell marker breathless (btl) (Klambt et al., 1992; Roy et al., 2014) (Fig. 1A-C). The number of tracheal cells captured in our datasets was low (n=19) and therefore we did not perform any further analysis. We examined the myoblast cluster (n=689) and found that the myoblasts split into two clusters with differential Notch signaling activity signatures (Fig. S2A-C; Table S2), consistent with previous findings of heterogeneous myoblast populations in wing imaginal discs (Gunage et al., 2014).
The majority of single cells (n=4498), which are from the wing disc epithelia, form a continuum without clear-cut clustering boundaries to distinguish wing disc subregions (Fig. 1B). We inspected known landmark genes for wing disc subregions and noticed that cells expressing genes that mark the wing disc regional identity along the PD axis (e.g. tsh and nub) were well clustered (Fig. 1C), whereas cells expressing AP identity genes (e.g. ci and en) were less well clustered in the UMAP plot (Fig. S3). Similarly, dpp- and wg-expressing cells, which form signaling centers in wing discs, were also scattered as pockets of cells in the UMAP plot (Fig. S3).
Assignment of wing disc epithelial cells to wing disc subregions
Single cells from stage 6 Drosophila embryos have previously been successfully mapped to their spatial origins at single cell resolution (Karaiskos et al., 2017). A 96 h AEL wing imaginal disc contains 10,000-20,000 cells, in contrast with 3000 cells in a stage 6 embryo. Moreover, there are no digitalized in situ patterns of marker genes available as a spatial reference map for wing imaginal discs as there are for stage 6 embryos (Fowlkes et al., 2008), posing more challenges for spatial mapping at single cell resolution. However, decades of work using wing discs as a model for pattern formation has established a number of marker genes for distinguishing wing subregions (Bryant, 1975; Held, 2002). Using these marker genes as seeds, we performed gene expression correlation analysis to identify additional landmark genes with region-restrictive expression patterns (Fig. 1D). For additional landmark genes identified through gene expression correlation analysis, we confirmed their expression patterns in published in situ datasets (Butler et al., 2003; Ibrahim et al., 2013) and the Janelia FlyLight database (Jory et al., 2012). We then used the combinatorial expression of binarized 40 landmark genes as a spatial reference map (Table S3) to assign the wing disc epithelial cells (n=4498) with high confidence into three regions: the pouch/hinge region, the notum region and the peripodial membrane (PM) region (Fig. 1D-F). Specifically, we manually binarized the expression data for each landmark gene and adopted the DistMap strategy previously developed (Karaiskos et al., 2017) to compute the most likely region for each cell (Fig. 1E,F).
After single cells were assigned to specific regions, wing disc patterns become readily visible on the UMAP plots. For the pouch/hinge region (n=2906), the pattern was polarized along the PD axis, with marker genes including defective proventriculus (dve) (Kolzer et al., 2003; Nakagoshi et al., 2002), nab (Terriente Felix et al., 2007), rotund (rn) (St Pierre et al., 2002), nubbin (nub) (Ng et al., 1995), zinc finger homeodomain 2 (zfh2) (Terriente et al., 2008), Sox box protein 15 (Sox15) (Cremazy et al., 2001) and teashirt (tsh) expressed in partially overlapping domains (Wu and Cohen, 2002). In particular, pouch marker genes (including dve, nab, rn and nub) were expressed in concentric circular domains that extended partially to the hinge region at different lengths, whereas hinge marker genes (including zfh2, Sox15 and tsh) were expressed only in part of the hinge region (Ayala-Camargo et al., 2013; Azpiazu and Morata, 2000; Terriente et al., 2008; Wu and Cohen, 2002). The expression patterns for these PD axis marker genes were well maintained on the UMAP plot (2A,B). The nub+rn+dve-nab– region was enriched for wg+ inner ring cells (Fig. 2C). The Sox15+zfh2+ hinge region (n=1448) exhibited active JAK/STAT signaling signatures, consistent with prior studies (Fig. S4) (Ayala-Camargo et al., 2013). In the pouch region, cells from the Dpp and Wg signaling stripes, as well as distinct regions responding to the spatially graded Dpp and Wg signals, respectively, were also clearly visible on the UMAP plot (Fig. 2D,E).
For the notum region (n=1231), pnr+, wg+ and Iro-C+(ara+mirr+caup+) cells were well clustered, defining the medial and lateral domains of the notum region (Fig. 3A-F) (Calleja et al., 2000; Diez del Corral et al., 1999; Letizia et al., 2007; Sato and Saigo, 2000). Similarly, en+hh+, ci+ and eyg+ cells were clustered along the AP axis (Fig. 3G-K) (Aldaz et al., 2003). Note that the en+hh+ domain was small in the notum region, consistent with previous reports (Fig. 3G-K) (Aldaz et al., 2003; Zecca and Struhl, 2002).
For the PM region (n=303), Ubx+ cells were enriched in the en+ or hh+ posterior domain and Antp+ cells were enriched in the ci+ anterior domain, consistent with previous reports (Pallavi and Shashidhara, 2005; Tripura et al., 2011) (Fig. S5). Note that a significant percentage of PM region cells were en-hh-ci– (n=112), possibly as a result of dropout events, which are an inherent challenge in scRNA-seq studies because of the low amounts of mRNA captured and the stochastic occurrence of failure to detect an expressed gene.
Together, these data suggest that the wing disc patterning information is robustly maintained in single cell transcriptomic data. In particular, the PD patterning process is a dominant source for 96 h AEL wing disc single cell clustering. Assignment of wing disc cells into spatial subregions and secondary clustering within each subregion facilitates further analysis of patterning refinement processes.
Partial differentiation occurs in scrib mutant imaginal discs despite disruption of organ architecture
The scrib mutant imaginal discs grow into neoplastic tumors, which are thought to lack differentiation capacity (Bilder, 2004; Gateff, 1978). These mutant discs are composed of cells lacking apicobasal polarity and the overall organ architecture is largely disrupted in comparison with wild-type imaginal discs (Bilder and Perrimon, 2000). It has been difficult to examine the degree to which the patterning and differentiation processes occur in scrib mutant imaginal discs.
We generated single cells from scrib mutant imaginal discs at 4, 5, 8 and 14 days AEL because scrib mutants from these time points have distinctive global transcriptomic signatures, as shown by bulk transcriptome analysis in another study (Ji et al., 2019). These datasets provided us with an opportunity to examine systematically the patterning and differentiation processes in scrib mutants compared with wild-type imaginal discs. Single cells from scrib mutant wing discs and wild-type wing discs were segregated on the UMAP plot (Fig. 4A). The scrib mutant imaginal discs uniquely expressed genes such as pvf1, Ilp8 and upd1, consistent with previous findings (Fig. S6) (Bunker et al., 2015).
To examine the patterning and differentiation processes in scrib mutant imaginal discs, we examined the expression of landmark genes previously used for assigning wild-type single cells to wing disc subregions. These genes can be grouped into four clusters, based on gene expression correlation in wild-type imaginal disc single cells (Fig. 1D). In scrib mutant imaginal discs, the expression correlation within each gene cluster was severely disrupted (Fig. 4B). Interestingly, the clustering of landmark genes improved over time, indicative of a continuous PD patterning process as scrib mutant imaginal discs developed over time (Fig. 4B). Similar to wild-type imaginal discs, we used patterning landmark genes as a spatial reference map to assign scrib mutant cells to wing disc subregions (Fig. 4C). In comparison with wild-type wing disc cells (Fig. 1F), the assignment of scrib mutant cells was more uncertain as the Matthews correlation coefficient (MCC) scores of three different regions became less differentiating, which could be quantified by an overall increase in entropy (Fig. 4D). Note that as scrib mutant imaginal discs developed, entropy decreased, supporting the finding that PD patterns become more distinguishable over time. For example, the UMAP projection of scrib mutant cells no longer revealed distinct expression domains along the PD axis marked by Iro-C genes (ara, caup, mirr) and wg, but the co-expression pattern of ara, caup and mirr became evident in scrib mutant tumors at 14 days AEL (Fig. S7).
For each of the patterning landmark genes used to define wing subregions and expressed in scrib mutant imaginal discs at all stages, we also examined the percentage of cells expressing that specific marker at all stages (Fig. 4E). We ranked all the expressed genes (7986 in total) according to the change in the percentage of cells expressing that specific gene, comparing scrib mutant imaginal discs at 4 and 14 days AEL, and found that the notum identity marker genes were among the top 5% of genes that showed a significant percentage increase (Fig. 4E). We stained scrib mutant imaginal discs for the notum identity marker Tailup (Tup) (Del Signore et al., 2012) and found that the number of Tup+ cells indeed increased significantly from day 4 to day 14 AEL (Fig. 4F). Together, these data demonstrate that notum identity cells increase in scrib mutant imaginal discs over time.
Analysis of proliferation and growth states for wing disc single cells
The cell proliferation and growth states can be potentially deduced from single cell transcriptomic data (Buettner et al., 2015; Liu et al., 2017; McDavid et al., 2016). In Drosophila imaginal discs, Yorkie (Yki) activity is a key proliferation regulator (Huang et al., 2005). Several well-established Yki targets, including diap1, ex and CycE, show weak co-expression correlation among single cells (data not shown), consistent with published results (Zhang et al., 2017) and probably a result of cross-talks between Yki and other growth regulatory pathways. Moreover, a few marker genes are not sufficient to determine cell proliferation and growth states because of the intrinsic stochasticity in single cell transcriptomic data (Liu et al., 2017). To find gene expression patterns as proxies to indicate cell proliferation and growth states, we performed gene expression correlation analysis for all expressed genes in wing disc cells and identified two gene sets related to proliferation and growth that exhibited high co-expression correlation within the gene set itself (Fig. 5A). The first gene set was composed of 47 genes, including cdc25/string, CycE and other genes functionally important for DNA replication, which are probably co-regulated by E2F1and Polycomb group genes (Ji et al., 2012; Neufeld et al., 1998; Thacker et al., 2003) (Table S4). The second gene set was composed of 91 genes related to protein translation, including ribosomal genes, translation initiation factors and elongation factors (Table S4). We then clustered wing disc cells into four proliferation/growth states based on the combinatorial expression pattern of these two gene sets (Fig. 5B) and found that cells of different proliferation and growth states were distributed among all wing disc subregions without obvious bias (Fig. 5C,F). These results are consistent with previous studies showing that proliferation and growth are largely uniform in wing imaginal discs (Rogulja and Irvine, 2005; Schwank et al., 2011).
We adopted the same strategy to analyze single cells from scrib mutant imaginal discs. The co-expression correlation within the protein translation and DNA replication gene sets was still observed among scrib mutant cells (Fig. 5D). A moderate anti-correlation relationship between the DNA replication gene set and the protein translation gene set was observed in wild-type cells (Fig. 5A), but was no longer detectable in scrib mutant cells (Fig. 5B). We clustered scrib mutant cells into four classes corresponding to those in wild-type cells, based on the k-nearest neighbor algorithm (Fig. 5B,E), and found that the distribution of these four classes of cells changed significantly in scrib mutant cells (Fig. 5E). These results are consistent with previous studies demonstrating coordination of cell division and growth in wing imaginal discs (Neufeld et al., 1998) and a disruption of proliferation and growth coordination in cancerous cells. We then examined the distribution of the four proliferation/growth states for cell populations marked by specific patterning genes and did not detect any significant deviation from the overall distribution (Fig. 5F). These results suggest that proliferation and growth states are not biased in specific cell populations marked by single patterning genes in either wild-type or scrib mutant wing imaginal discs.
Here, we demonstrate that single cell transcriptomes from Drosophila wing imaginal disc cells are potential resources for systematically analyzing processes such as pattern formation, proliferation and growth. We have prepared an accompanying database for exploring these datasets (http://drosophilayanlab-virtual-wingdisc.ust.hk:3838/v2/).
These datasets provide comprehensive transcriptomic signatures for different cell populations in wing imaginal discs and enable the identification of additional marker genes for each cell population. For example, myoblasts are clustered into two populations with distinctive Notch signaling signatures, consistent with previous findings (Gunage et al., 2014). Additional markers distinguishing these two myoblast subpopulations can provide candidate markers and genes for future study. We provide two functions to explore gene expression patterns in the accompanying database. First, for any given gene detected in our datasets, its normalized expression pattern can be examined on UMAP plots of single cells from wild-type wing imaginal discs, wing imaginal disc subregions and scrib mutant imaginal discs of different stages. Second, we have built a virtual imaginal disc as a rough reference for exploring gene expression patterns. For this, we divided the imaginal discs into 56 regions based on commonly used regional markers such as dpp, wg, en, ci, ap, salm, bi, brk, Dll, Sox15, Zfh2, pnr, ara and eyg. Probably because of the coverage limitation of scRNA-seq on a 10× platform, even though wing discs in our datasets perform relatively well with a median of around 3000 genes detected per cell, we could not find genes highly correlated with certain markers such as ap and bi to correct for dropout events; therefore, the virtual imaginal disc gene expression pattern is less accurate than the UMAP plot function.
We found that gene co-expression analysis in single cells is effective at identifying functionally related gene sets or modules important for patterning. For any given gene detected in our datasets, its correlation coefficients with other genes in wild-type imaginal discs can be calculated and shown in a table ranked by either descending or ascending orders in the accompanying database. In our experience, a correlation coefficient above 0.4 is meaningful in our datasets. For example, the correlation coefficients of dpp/ptc and en/hh/ci/inv were around 0.4. We detected correlation coefficients of around 0.2 for dpp with a set of glycolytic genes including Eno, Pglym78, Tpi, Gapdh2, Ald, Gapdh1, Pgk and Pfk. Moreover, when we performed differential gene expression analysis for dpp+ cells versus dpp– cells in the pouch region, the glycolytic genes were identified as significantly differentially expressed genes. However, unlike dpp and ptc, the expression of glycolytic genes was widely distributed and we could not detect quantitative differences for glycolytic genes between dpp+ and dpp– regions using fluorescent in situ hybridization (data not shown). Although it is possible that the in situ hybridization method is not sensitive enough to detect subtle differences, correlation coefficients of around 0.2 between two genes should be approached with caution, even if they are ranked among the highest for all genes.
We identified two gene sets through gene co-expression analysis as proxies for defining proliferation and growth states in single cells. The first gene set includes cdc25/string, CycE, PCNA and other genes functionally important for DNA replication. The second gene set includes ribosomal genes, translation initiation factors and elongation factors. The DNA replication and protein translation gene sets were the only two proliferation/growth-related gene groups that we could identify as co-regulated and suitable for cell clustering and classification. However, the exact cell state that each class represents is unclear at the moment. The DNA replication gene set is probably co-regulated by E2F1and Polycomb group genes (Ji et al., 2012; Neufeld et al., 1998; Thacker et al., 2003). The high co-expression correlation of the 91 translation-related genes suggests an underlying co-regulatory mechanism that remains to be uncovered. Interestingly, in wing imaginal discs, the pS6 staining pattern and TORC activation is patchy and stochastic (Romero-Pozuelo et al., 2017). Likewise, the staining patterns of common proliferation markers such as PH3+ and BrdU+ are known to be sporadic and uniformly distributed. It seems that a balanced distribution of cells in different proliferation and growth states exist in wild-type imaginal discs. How this balance is achieved in wild-type imaginal discs and becomes biased in scrib mutant tumors remains to be explored.
MATERIALS AND METHODS
The fly strains used in this study were FRT82B (BL2035) and scrib1FRT82B/TM6B (a kind gift from the Doe Lab, University of Oregon).
Imaginal discs were dissected, fixed and stained according to standard protocols. The primary antibody used was mouse anti-tup (1:1000; 40.3A4, DSHB). The secondary antibody conjugated with Alexa Fluor 568 (ThermoFisher) was used at 1:500. Phalloidin conjugated with Alexa Fluor 647 (1:1000; ThermoFisher) and Hoechst 33342 (1:10,000; ThermoFisher) were used to stain F-actin and DNA, respectively. All images were acquired on a Leica TCS SP8 confocal microscope.
Wing imaginal disc staging, clone induction, dissection, single cell dissociation and sequencing
Fly embryos were collected on apple juice plates with yeast pastes and transferred into fly food vials within 3 h. Embryo numbers were controlled to about 50 to avoid crowding. Staged wild-type and scrib1 wing imaginal discs were dissected and transferred to Dulbecco's phosphate-buffered saline (DPBS; ThermoFisher). Wild-type larvae were dissected at 96 h AEL. The scrib1 mutant larvae were dissected at 96, 120, 192 and 336 h AEL. Wing imaginal discs were dissociated in 0.25% trypsin-EDTA solution at 37°C for 10 min. Cells were then washed in DPBS and passed through a 35 μm filter before library preparation. Construction of 10× single-cell libraries and sequencing on the Illumina Hiseq platform were performed by Novogene. NCBI GEO accession numbers are GSE133204 for 96 h AEL wild-type imaginal discs and GSE130566 for scrib mutant imaginal discs of different stages.
10× Genomics raw-data processing, quality filtering and standard analysis in the Seurat pipeline
The sequencing raw data files were processed by the Cell Ranger pipeline (v 2.2.0) with default settings. Read alignment and UMI counts were based on a BDGP6 genome reference fasta file and annotated by a BDGP6.91 gtf file developed by Ensembl. Cell-UMI count tables were loaded into Seurat projects. Cells expressing 1000-4000 genes and genes detected in more than 10 cells were used as filtering gates to select cells for downstream analysis. The expression data normalization was performed using the default LogNormalize method in Seurat (v2.3.4). Variable genes were selected by the Seurat FindVariableGenes function and used for subsequent analysis. PCA analysis, dimension reduction, cell clustering and cluster marker identification were performed with the standard Seurat pipeline (R script available at: https://github.com/drosophilayanlab1/disc_script/tree/master).
Wing disc subregion spatial mapping
Well-established wing disc region-specific genes, including nub, rn, dve, nab, Sox15, zfh2, pnr, eyg, ara, caup, Ubx and odd, were used as seed genes to identify other landmark genes that showed high correlation coefficients with seed genes. The expression levels of landmark genes were binarized into on/off states, with the binarization threshold for each gene as specified in Table S3. The thresholds for a few landmark genes were manually adjusted using expression regions of seed genes as references. For example, the gene fz was expressed in all the disc proper cells and showed a high correlation coefficient with Sox15 and, therefore, served as a good landmark gene using a high-cutoff threshold. A cell-bin score for each cell to three positional bins was calculated based on the MCC, a strategy adopted from the DistMap algorithm (Karaiskos et al., 2017). The assignment of an individual cell was determined by the highest score among the three scores. A small number of cells with identical cell-bin scores were designated as uncertain cells and excluded from subclustering analysis.
Calculation of gene expression correlation coefficients
The Pearson correlation coefficient between gene i and gene j was calculated from normalized gene expression levels (calculated using the default LogNormalize method in Seurat) of gene i and gene j in single cells using the cor function in R. Gene correlation coefficients larger than that for en and hh were retained for visualization in a correlation matrix constructed using the hclust function in R.
Calculation of Shannon entropy
The Shannon entropy of a cell is used to quantify the uncertainty regarding its spatial assignment and is calculated as follows:
For each single cell, denote Si as the MCC score of this single cell to be assigned into the i-th subregion. Let , then the entropy (Ent) of the MCC scores for this single cell is given by
Wing disc cell proliferation and growth state analysis
The DNA replication gene set (47 genes; Table S3) and the protein translation gene set (91 genes; Table S3) were identified as discrete gene clusters in the correlation matrix. These 138 genes were used as variable genes to cluster wild-type single cells into the four classes of distinctive proliferation/growth states shown in Fig. 5B. The scrib mutant cells were allocated to the four classes defined in wild-type cells through the k-nearest neighbor algorithm (k=11).
We thank Dr Chris Doe for fly stocks, Dr Ting Xie for providing the 10× scRNA-seq protocol and Dr Trudi Schupbach for helpful comments on the manuscript.
Conceptualization: Y. Yan; Methodology: Y. Yan, M.D., Y.W., L.Z., S.H., J.W., H.G., T.I.; Software: S.H., J.W.; Formal analysis: Y. Yan, M.D., H.G., T.I.; Investigation: M.D., Y.W., L.Z.; Resources: Y.W., L.Z.; Data curation: M.D., Y.W., L.Z., Y. Yang; Writing - original draft: Y. Yan, H.G.; Visualization: Y. Yan, M.D.; Supervision: Y. Yan; Project administration: Y. Yan; Funding acquisition: Y. Yan.
Y. Yan received funding from the Hong Kong Research Grants Council, University Grants Committee (GRF16103815, 16150016, 16104018, AoE/M-09/12) and from the Shenzhen Science and Technology Innovation Commission (JCYJ20170306161537148, JCYJ20180223181229868). J.W. received funding from the Hong Kong Research Grants Council, University Grants Committee (N_HKUST606/17, 26102719, C6002-17GF, C7065-18GF) and the Innovation and Technology Commission - Hong Kong (ITCPD/17-9, ITS/480/18FP).
The authors declare no competing or financial interests.