Mammalian organs consist of diverse, intermixed cell types that signal to each other via ligand-receptor interactions – an interactome – to ensure development, homeostasis and injury-repair. Dissecting such intercellular interactions is facilitated by rapidly growing single-cell RNA sequencing (scRNA-seq) data; however, existing computational methods are often not readily adaptable by bench scientists without advanced programming skills. Here, we describe a quantitative intuitive algorithm, coupled with an optimized experimental protocol, to construct and compare interactomes in control and Sendai virus-infected mouse lungs. A minimum of 90 cells per cell type compensates for the known gene dropout issue in scRNA-seq and achieves comparable sensitivity to bulk RNA sequencing. Cell lineage normalization after cell sorting allows cost-efficient representation of cell types of interest. A numeric representation of ligand-receptor interactions identifies, as outliers, known and potentially new interactions as well as changes upon viral infection. Our experimental and computational approaches can be generalized to other organs and human samples.
In multicellular mammalian organs, specialized cell types, such as beta cells in the pancreas and cardiomyocytes in the heart, perform organ-specific functions in coordination with generic cell types, such as the omnipresent endothelial and immune cells. Such coordination occurs at the microscopic level, such that many organs can be viewed as basic units repeated millions of times, as exemplified by alveoli in the lung, glomeruli in the kidney and hair follicles in the skin. Besides cooperating for physiological functions, these multi-cell type units are also integrated niches underlying homeostatic maintenance and injury repair (Chang-Panesso and Humphreys, 2017; Tata and Rajagopal, 2017; Varga and Greten, 2017). At the molecular level, cell–cell communication is chiefly mediated by secreted ligands and membrane receptors, and is made possible by continuous patrolling of immune cells and more permanent contacts via interdigitating cellular processes embedded in extracellular matrices. This can be readily appreciated in the lung, where each alveolus is mostly air bordered by several intermixed cells of the epithelial, endothelial, mesenchymal and immune cell lineages, such that most, if not all, cell types are within a signaling distance of each other (Cohen et al., 2018; Raredon et al., 2019; Yuan et al., 2018).
Delineating genome-wide ligand-receptor interactions between all pairwise combinations of constituent cell types in a given organ, hereinafter named interactomes, becomes feasible with the advancement of single-cell RNA sequencing (scRNA-seq) technology, using which individual cell types can be profiled upon computational, instead of physical, purification (Han et al., 2018; Tabula Muris Consortium et al., 2018). Several such single-cell interactomes have been constructed to characterize organs and cell culture systems at baseline and upon perturbation (Camp et al., 2017; Cohen et al., 2018; Kumar et al., 2018; Raredon et al., 2019; Skelly et al., 2018; Vento-Tormo et al., 2018; Vieira Braga et al., 2019). One method employed in several of these studies is to apply a threshold to categorize ligands and receptors as present or absent, and to count these binary outcomes as a measure of interaction strength between cell types, although the numerical expression values available from scRNA-seq can be more effective in extracting the biological information (Kumar et al., 2018; Vento-Tormo et al., 2018). Nevertheless, systematic comparison of scRNA-seq and bulk RNA sequencing (bulk RNA-seq) is often not incorporated into experimental design to allay the known gene dropout concern in scRNA-seq. In addition, current algorithms and outputs are not readily adopted by bench scientists without advanced computational skills.
In this methodological study using the mouse lung as a model system, we have evaluated the sensitivity and cellular resolution of scRNA-seq in comparison with bulk RNA-seq, accordingly optimized a cell sorting protocol to consistently capture all major lung cell types, and generated numeric interactomes in normal and diseased lungs where significant interactions are intuitively represented as outliers. Developed by bench scientists, our combined experimental and computational methods can be generalized to other biological systems with minimal adjustment.
scRNA-seq has cell type resolution with sensitivity comparable to that of bulk RNA-seq
One challenge to construct an interactome using scRNA-seq is the so-called gene dropout issue, where only a few thousand genes are detected in a given cell due to technical inefficiency, compared to the 20,000-30,000 genes expected and obtained by bulk RNA-seq of typical mammalian cells (Hicks et al., 2018; Kharchenko et al., 2014). However, it is important to recognize that bulk RNA-seq does not detect all genes in every cell of the sample, and the apparent high coverage is the result of summing over tens of thousands of cells. We thus hypothesized that, by combining cells of a cell type readily identifiable in scRNA-seq, we could achieve comparable sensitivity to bulk RNA-seq of the same purified cell type. To test this, we used as a standard our published bulk RNA-seq data of fluorescence-activated cell sorting (FACS)-purified alveolar type 1 (AT1) and alveolar type 2 (AT2) cells (Little et al., 2019), and evaluated scRNA-seq gene dropouts as a function of expression level (Fig. 1A; Fig. S1A). We found that the dropout rate, defined as the percentage of genes detected by bulk RNA-seq but not scRNA-seq, decreased with increasing expression levels, as measured by fragments per kilobase of transcript per million mapped reads (FPKM) from bulk RNA-seq. For example, only 4% of genes at 10 FPKM were missed in scRNA-seq of 345 AT1 cells, and this percentage was even lower (0.6%) for AT2 cells where 1252 cells were sequenced. Technical or biological noise likely dominated for low FPKM genes, as 61 and 54 genes were detected by scRNA-seq but not bulk RNA-seq for AT1 and AT2 cells, respectively. The consistency between scRNA-seq and bulk RNA-seq was further supported by the good agreement between the levels of gene expression measured by both methods (Fig. 1B; Table S1).
After establishing the comparable performance of scRNA-seq and bulk RNA-seq in measuring the same cell populations, we examined the added benefit of scRNA-seq in dissecting cell type heterogeneity without prior knowledge and availability of surface markers or genetic drivers that were necessary for physical purification in bulk RNA-seq. For this, we used lung endothelial cells (ECs) that were bulk purified with a pan-EC marker, ICAM2, but contained 4 subpopulations in scRNA-seq: Plvap ECs, Car4 ECs, Vwf ECs and lymphatic ECs (Vila Ellis et al., 2020) (Fig. S1B). As before, dropout rates were low except for lymphatic ECs, probably because only 15 cells were sequenced (Fig. 1C). However, the apparent higher sensitivity of bulk RNA-seq was likely driven by expression of those dropout genes in non-lymphatic ECs and not because bulk RNA-seq-detected mRNAs derived from lymphatic ECs. Nevertheless, cell type heterogeneity captured by scRNA-seq was demonstrated by comparing population-specific gene expression as a function of expression in bulk RNA-seq (Fig. 1D; Table S1). Again, the majority of the genes showed good agreement between scRNA-seq and bulk RNA-seq. As we hypothesized, cell type-specific genes over a range of expression levels – such as Cd24a, Car4 and Igfbp7 for Car4 ECs – showed the expected enrichment; the converse was also true for genes depleted in Car4 ECs, such as Plvap. Notably, Thy1, a known lymphatic EC marker (Jurisic et al., 2010), was negligible in bulk RNA-seq but readily detected and distinguishable in the lymphatic ECs of only 15 cells (Fig. 1D). Such cell type stratification was, by design, lost in bulk RNA-seq.
The varying performance of scRNA-seq seemingly as a function of cell number prompted a systematic evaluation: we computationally sampled the Plvap EC population (Fig. 1E) and found that the number of detected genes rapidly approached the technical limit, such that ∼90 cells were necessary to detect 10,000 genes (Fig. 1F). The same 90-cell cutoff held true when comparing cell types of different cell numbers (Fig. 1G) and therefore was used to guide the optimization of our scRNA-seq wet-lab protocol as described below.
Overall, our analyses showed that scRNA-seq performs as well as bulk RNA-seq in detecting and quantifying genes when computationally combining enough cells for a given cell type, and outperforms bulk RNA-seq in identifying cell type heterogeneity and associated marker genes.
An optimized normalized lung scRNA-seq protocol
The theoretical minimal number of cells, as determined above, could be difficult to obtain in practice, owing to highly skewed proportions of the dozens of cell types in a typical mammalian organ. For example, published lung scRNA-seq datasets had limited representation of endothelial and mesenchymal cells and significant variations in cell proportions across experiments (Fig. 2A). We reasoned that consistent and sufficient sampling of major lung cell types could be achieved by first purifying and then sequencing equal proportion of the 4 cell lineages – epithelial, endothelial, immune and mesenchymal lineages – which were generally considered non-interconvertible. As a benchmark, we determined via immunostaining the in vivo average proportions of the 4 listed cell lineages as 26%, 38%, 17% and 19% – a skewed and variable distribution that warranted consideration in experimental design (Fig. 2B). We then identified 3 cell surface markers that robustly distinguished the 4 lineages in FACS and, in comparison with our immunostaining results, introduced biases presumably due to varying efficiency in dissociating cells of different lineages (Fig. 2C). To reduce the cost of scRNA-seq, we remixed and sequenced equal numbers of cells from the purified 4 lineages after taking into account lineage-specific difference in cell viability (Fig. 2C).
This cell-lineage-level normalization was a cost-effective trade-off between non-selective whole-lung scRNA-seq and in-depth albeit narrow-focused cell type-specific scRNA-seq. Proportions of cell lineages and individual cell types within a lineage could be retrieved by analyzing FACS and scRNA-seq data, respectively (Fig. 2D). Our method routinely captured 18 lung cell types in a sufficient number to construct the interactome.
Numeric representation of ligand-receptor interaction
As ligand-receptor interaction was directional – consisting of ligand-expressing signaling cells and receptor-expressing receiving cells – we evaluated each cell type in our scRNA-seq for its potential as a ligand-expressing cell when paired with each of all cell types, including itself in the case of autocrine interaction (Fig. 3A; Table S2). For each of these directional cell type pairs, we used a scatterplot to visualize all 2356 ligand-receptor pairs, such that a data point off both axes indicated the presence of the corresponding ligand and receptor, as exemplified by the expected Vegfa-Kdr expression in the AT1 cell-Car4 EC pair (Vila Ellis et al., 2020; Yang et al., 2016) (Fig. 3A). In these scatterplots, user-defined horizontal and vertical thresholds could be used to tally all ligand-receptor pairs present in specific cell type pairs – an approach commonly employed in the literature but at the expense of available quantitative expression values (Camp et al., 2017; Cohen et al., 2018; Skelly et al., 2018). Furthermore, the number of ligand-receptor pairs was not necessarily a valid predictor of functional interactions, and a single threshold was unlikely to suit all ligands and receptors expressed at varied levels.
To overcome these limitations, we sought a numeric representation of ligand-receptor interaction, which we reasoned should positively correlate with the expression values of both the ligand and the receptor. Inspired by the principle of chemical equilibrium where the product – the ligand-receptor complex in this case, which should determine the signaling output – equals the product of the substrates divided by the equilibrium constant, we chose to represent the interaction strength by multiplying the average scRNA-seq expression values of the ligand and the receptor in the involved cell types (Fig. 3B). By assuming comparable protein translation and delivery among cell types, this numeric representation allowed us to leverage 324 cell type pairs for a given ligand-receptor pair and readily captured potential interactions as outliers. This was demonstrated using heatmaps for the known interactions between AT1 cells and Car4 ECs via Vegfa and Kdr, and between ECs and pericytes via Pdgfb and Pdgfrb (Fig. 3C). These visually outlying interactions were quantitatively defined as those outside 3 times the interquartile range (Tables S3 and S4).
Identify interactions altered upon viral infection
Next, we extended our numeric interactome analysis to compare lungs at baseline and upon perturbation. We chose a Sendai virus infection model (Goldblatt et al., 2020), instead of genetic models, because infection was expected to induce global changes involving multiple cell types and their associated cell interactions, necessitating a quantitative genomic analysis as our interactome method. At 2 weeks after infection, when the virus had largely been cleared and the lung was undergoing repair (Goldblatt et al., 2020; Holtzman et al., 2005), scRNA-seq detected infection-induced proliferation of ECs and AT2 cells, as well as aberrant appearance of Trp63-expressing basal-like cells (Fig. 4A,B), reminiscent of pods or lineage-negative epithelial progenitors observed upon severe H1N1 virus infection (Kumar et al., 2011; Vaughan et al., 2015; Zuo et al., 2015).
For each directional cell type pair, we compared control and infected lungs by plotting the corresponding numeric interactions of individual ligand-receptor pairs, such that data points above or below the diagonal line represented enhanced or diminished interactions, respectively, whereas those on the y- or x-axis represented de novo or lost interactions, respectively. One example was an infection-induced interaction between myeloid cells and T cells via Cxcl9 and Cxcr3 (Fig. 4C,D; Table S5). This was supported by CXCR3 immunostaining, showing its upregulation in clustered T cells possibly as a result of chemotaxis (Groom and Luster, 2011; Weng et al., 1998) (Fig. 4E).
To quantify infection-induced changes in interactions, we resorted to the aforementioned concept of outliers by pooling the differences in a given ligand-receptor interaction across 324 cell type pairs and used the same 3 times interquartile range cutoffs (Table S5). We found that 3.6% (27,611 out of 763,344) interactions were outliers, involving 448 and 433 unique ligands and receptors in 289 cell type pairs and averaging 85 outlying interactions per cell type pair (Table S5).
Integrate interactomes with signaling pathway analysis
To corroborate our interactomes and capitalize on the transcriptomic information available from scRNA-seq, we sought to analyze the activities of signaling pathways and integrate them with associated ligand-receptor pairs. We chose signaling pathways annotated by Gene Ontology (GO), instead of Ingenuity Pathway Analysis (IPA), because of its public availability and inclusiveness. Although these databases did not take into account specific biological contexts, limiting their use in general, we reasoned that by averaging over a sufficient number of bona fide, context-independent pathway members, a pathway signature might be still detectible, as assumed in gene set enrichment analysis (Subramanian et al., 2005).
We considered all genes in each GO pathway as a metagene and generated the corresponding GO pathway (i.e. metagene) score for each cell, which was then averaged over all cells of a given cell type (Fig. 5A; Table S6). Using the same outlier concept, albeit with a less stringent cutoff of 1.5 times the interquartile range because of the limited number of cell types available (18 cell types, instead of 324 cell type pairs), we compared GO pathway scores across 18 cell types, as well as their changes upon infection, and identified outlying pathways (Tables S7 and S8). We identified 4% (276 out of 6300) outlying pathways at baseline and 5% (321 out of 6300) outlying alterations upon infection. To integrate with our interactomes, for a given outlying ligand-receptor pair, we required the receptor to be a member of an outlying signaling pathway in the same cell type as the receiving one in the interactome (Fig. 5A). This integration led to 0.3% (2323 out of 763,344) outlying interactions at baseline and 0.7% (5516 out of 763,344) outlying alterations upon infection that were also supported by the corresponding pathway activation (Tables S9 and S10). One example was infection-enhanced Bdnf-Ntrk2 signaling from AT1 cells to ECs, largely driven by Ntrk2 upregulation (Fig. 5B). It was tempting to speculate that the supposedly angiogenic role of this signaling (Dalton et al., 2015; Kermani and Hempstead, 2007) contributed to the infection-induced EC proliferation (Fig. 4B).
Meaningful interpretation of rapidly growing single-cell data depends on biological insights of bench scientists, who are often discouraged by the complexity of existing computational tools. As bench scientists ourselves, we describe here optimized experimental and computational methods to construct single-cell interactomes in lungs at baseline and upon infection. Our approaches are intuitive and readily adaptable to study other organs and species by those equipped with a consumer-grade computer and an intermediate level of R programming skills, but without formal education in computer and statistical sciences. Our interactome improves upon existing ones (Camp et al., 2017; Cohen et al., 2018; Kumar et al., 2018; Raredon et al., 2019; Skelly et al., 2018; Vento-Tormo et al., 2018; Vieira Braga et al., 2019) in the following 3 aspects.
First, we systematically evaluate and then capitalize on our observations that, at the cell type level, scRNA-seq has comparable sensitivity to genetic driver-based bulk RNA-seq and yet is more robust in identifying and purifying a given cell type (Fig. 1). Our cell type-level analysis bypasses the gene dropout issue inherent to scRNA-seq and imputation methods that are still under development (Gong et al., 2018; Hicks et al., 2018; Huang et al., 2018; Kharchenko et al., 2014; van Dijk et al., 2018). Experimentally, our normalization method is guided by computational assessment of the minimal cell number needed and captures all major lung cell types without inhibitory cost or resorting to non-commercial platforms. A potential concern about cell type-level analysis is failure to capture cellular heterogeneity; this, however, can be alleviated by careful cell type identification. Furthermore, a cell type that is not sufficiently distinct or abundant may not be reliably analyzed even with individual-cell-level algorithms, as evidenced by the significant cell–cell variation of lymphatic ECs (Fig. 1C). While not applicable to the paired control and infected lungs in our study, batch effects should be considered in experimental design and could be computationally ameliorated (Butler et al., 2018; Haghverdi et al., 2018; Polanski et al., 2020) – a preprocessing step compatible with our interactome algorithm.
Second, we assign significance based on the concept of outliers, made possible by our numeric representation of ligand-receptor interactions and our normalization protocol to profile a large number of cell types. Alternatively, significance was computed by permutation to evaluate observed versus expected expression in individual cells (Vento-Tormo et al., 2018; Vieira Braga et al., 2019). Whereas the permutation method is a formal test of statistical significance that is applicable to even a small number of profiled cell types and can account for multiple hypothesis testing, our interquartile range method focuses on biological significance and, by using other cell types as internal controls to identify outliers, mimics in vivo competition among cell types to achieve specific signaling. Ligand and receptor expression values could also be thresholded to a binary outcome, which was counted or cross-correlated to represent the interaction strength between cells (Camp et al., 2017; Cohen et al., 2018; Skelly et al., 2018). However, counting interactions may miss functional interactions that are few in number; thresholding at an early step may introduce irremediable bias. Our analysis preserves numeric representations of interactions and pathway scores so that users can define and adjust cutoffs afterwards – 1.5 or 3 times the interquartile range in this case. Independent from our work, numeric interaction scores can also be used to correlate with other biological scores (Kumar et al., 2018).
Last, we supplement our interactomes with signaling pathway scores to allay the concern of inferring functionality solely based on ligand-receptor expression. Prior knowledge, including the ligand-receptor pair list and pathway components collected by GO or IPA, is necessary for bioinformatics but largely ignores specific biological contexts; the resulting analysis often errs on the side of false positives. Integrating interactomes with pathway scores, both of which are based on correlations and likely include false positives, allows for prioritizing candidate genes for experimental validation. The accuracy may be further improved by better knowledge of ligand-receptor biology, including elucidation of signal transduction components and transcriptional targets of individual ligand-receptor pairs.
MATERIALS AND METHODS
Sendai virus infection
Sendai virus (parainfluenza type 1) strain 52 (ATCC #VR-105, RRID:SCR_001672) was expanded in primary rhesus monkey kidney cells (Cell Pro Labs #103-175). Wild-type 8-week-old C57Bl/6 mice obtained from The Jackson Laboratory were anesthetized with isoflurane, suspended by the maxillary incisors on a dosing board with a 60° incline, and infected with a sub-lethal dose of 2.1×107 plaque-forming units (pfu) of virus in 40 µl phosphate buffered saline (PBS) through oropharyngeal aspiration. For immunostaining, both male and female mice were used. For scRNA-seq, a pair of male mice was used. Investigators were not blind to the group allocation. No power calculation was used to determine the sample size. All animal experiments were approved by the Institutional Animal Care and Use Committee at Texas A&M Health Science Center Institute of Biosciences and Technology and MD Anderson Cancer Center.
Section immunostaining and cell counting
For cell lineage counting by immunostaining, lungs were inflated and harvested as previously described (Yang et al., 2016). Briefly, mice were anesthetized using Avertin (T48402, Sigma-Aldrich) and the lungs perfused with PBS. The trachea was cannulated and the lungs were inflated to full capacity with 0.5% paraformaldehyde (PFA; P6148, Sigma-Aldrich) in PBS at 25 cm H2O pressure. The harvested lungs were immersion-fixed in 0.5% PFA at room temperature for 4-6 h and washed overnight in PBS at 4°C. Section immunostaining was performed following published protocols with minor modifications (Alanis et al., 2014; Chang et al., 2013). Fixed lungs were dissected and cryoprotected overnight at 4°C in 20% sucrose in PBS containing 10% optimal cutting temperature compound (OCT; 4583, Tissue-Tek). The lobes were then frozen in OCT-filled embedding molds on a dry ice and 100% ethanol slurry then kept at −80°C until sectioned. Sections were obtained at 10 μm thickness and then blocked in PBS with 0.3% Triton X-100 and 5% normal donkey serum (017-000-121, Jackson ImmunoResearch). Blocked sections were incubated in a humidified chamber at 4°C overnight with the following primary antibodies diluted in PBS containing 0.3% Triton X-100: anti-GFP (chicken, 1:5000, Abcam, AB13970), anti-ERG (rabbit, 1:5000, Abcam, AB92513), anti-CD45 (rat, 1:2000, eBioscience, 14-0451-81), anti-CD3E (Armenian hamster, 1:250, BioLegend, 100301) and anti-CXCR3 (rat, 1:250, R&D Systems, MAB1685). The sections were submerged in PBS for 30 min and incubated with 4′,6-diamidino-2-phenylindole (DAPI) and donkey secondary antibodies (Jackson ImmunoResearch) diluted in PBS with 0.3% Triton X-100 at room temperature for 1 h. The sections were washed again in PBS then mounted with Aqua-Poly/Mount (18606, Polysciences) and imaged on a Nikon A1plus confocal microscope or an Olympus FV1000 confocal microscope. For cell lineage counting, 3 images with airways located in the center were taken from 2 RosaSun1GFP/+; ShhCre/+ lungs, in which all epithelial cells were labeled with GFP (Little et al., 2019) to allow co-staining for endothelial and immune cells. Endothelial cells (ERG+ nuclei) and alveolar epithelial cells (GFP+ nuclei) excluding the airways were counted using Fiji's ‘Find Maxima’ function. Airway epithelial cells were counted by drawing a region of interest around airways and using Fiji to Find Maxima for DAPI. Immune cells (CD45+ nuclei) and mesenchymal cells (nuclei negative for the other markers) were counted manually.
Cell dissociation and labeling
Mice were anesthetized using Avertin injected intraperitoneally, the chest cavity was exposed and the lungs were perfused with PBS injected through the right ventricle. After clearance of circulating cells, the lungs were removed into PBS and finely minced using forceps. The lung tissue was digested in Leibovitz's L-15 medium (Gibco, 21083-027) with 2 mg/ml collagenase type I (Worthington, CLS-1, LS004197), 2 mg/ml elastase (Worthington, ESL, LS002294) and 0.5 mg/ml DNase I (Worthington, D, LS002007) for 30 min on a 37°C heat block. Digestion was stopped by addition of fetal bovine serum (FBS; Invitrogen, 10082-139) to a final concentration of 20%. The samples were moved to ice and all remaining steps were performed in a cold room. The tissue was triturated at 15 min into digestion and also after digestion was quenched. The resulting cell suspension was filtered through a 70 µm cell strainer (Falcon, 352350), centrifuged at 1500 g for 1 min and the pellet resuspended in red blood cell lysis buffer (15 mM NH4Cl, 12 mM NaHCO3, 0.1 mM EDTA, pH 8.0) for 3 min. The cells were pelleted again at 5000 rpm for 1 min and washed with Leibovitz's L-15 medium with 10% FBS, then filtered into a 5 ml tube with a cell strainer cap (Falcon, 352235). The cells were incubated with the following conjugated antibodies: CD45-PE/Cy7 (BioLegend, 103114), ICAM2-A647 (Life tech, A15452), E-cadherin-A488 (Invitrogen, 53-3249-82) at a concentration of 1:250 for 30 min. The cells were centrifuged at 1500 g for 1 min, washed in Leibovitz's L-15 medium with 10% FBS for 5 min and finally resuspended in Leibovitz's L-15 medium with 10% FBS and filtered through a strainer cap. The sample was incubated with SYTOX Blue (Invitrogen, S34857) and sorted on a BD FACSAria Fusion Cell Sorter. The cells were collected in a volume of 250 μl PBS with 10% FBS per collection tube.
FACS sorting and normalization
After exclusion of dead cells and doublets, cells were gated into the 4 cell lineage populations using a serial gating strategy. All CD45+ cells were collected as the immune population; CD45– cells exhibiting ICAM2+ signal were collected as endothelial cells; CD45–, ICAM2–, but E-cadherin+ cells were collected as epithelial cells; cells negative for all markers were collected as mesenchymal cells. To determine cell lineage population-specific reduction in viability after sorting, an equal number of cells from each lineage were mixed and, after adding fresh SYTOX Blue, re-sorted to determine the percentages of each lineage. These percentages were compared to the expected percentage, 25%, and lineage-specific viability correction factors were calculated and taken into account when combining the 4 lineages for scRNA-seq to achieve as close to equal proportions as possible.
scRNA-seq data analysis
The normalized whole-lung samples were processed through the Chromium Single Cell Gene Expression Solution Platform (10× Genomics) using the Chromium Single Cell 3′ Library and Gel Bead Kit (v2, rev D). Library sequencing was performed on an Illumina NextSeq500 using a 26X124 sequencing run format with 8 bp index (Read1). The single-cell reads were aligned against the mm10 mouse reference genome (provided by 10× Genomics), counted and aggregated using the Cell Ranger pipeline (version 3.0, 10× Genomics). Raw data for the P7 wild-type lung and control and 2 weeks post-Sendai virus infection lungs were deposited at NCBI Gene Expression Omnibus (GEO) under accession numbers GSE144769 and GSE144678. R codes for figures and supplementary tables are included in the Supplementary Data.
Cell type identification
Data processing was performed on a personal computer equipped with an Intel Core i5-6300U and 16.0 GB RAM using the R package Seurat (version 3.1.2) (Butler et al., 2018). Unless specified, default parameters were used. Single-cell count data used for all downstream calculations was normalized to the number of reads per cell using Seurat's normalization function (normalized UMI). Data were scaled before undergoing dimensionality reduction using the RunPCA Seurat function. Cells were visualized on a projection map using the RunUMAP function. Feature plots for known lung cell type markers were used to identify the cell types clustered using the Louvain algorithm implemented in the FindClusters function after cells were embedded in a graph using FindNeighbors. The resolution parameter was adjusted to appropriately cluster the cells based on the known cellular gene markers. The cell type distributions were calculated by dividing the number of cells assigned to each cell type by the total number of cells in the sample. The number of unique genes detected in each cell type was counted. Dot plots were used to visualize the marker genes identifying each cell type.
Bulk RNA-seq versus scRNA-seq comparison
To calculate the dropout rate of genes in the scRNA-seq data, normalized UMI averaged across all cells for a given cell type was compared to gene expression (FPKM) from the corresponding bulk RNA-seq data (AT1 and AT2 cells from GSE129583; endothelial cells from GSE124324). For genes within specified bins of bulk RNA-seq expression values, the fraction of genes with zero expression in scRNA-seq was calculated and plotted.
Analysis of the number of genes expressed per cell type
The number of genes expressed by each cell of one cell type was counted. The cells were ordered both by increasing and by decreasing the number of genes expressed. Then, as one cell was added after another, the number of unique genes detected in that cell not detected in the previous cells was tallied, resulting in the cumulative number of unique genes detected for that cell type.
A list of ligand-receptor pair genes from a published human database (Ramilowski et al., 2015) was matched to the gene names present in the Seurat object. Out of 708 ligand genes from the human database, 647 gene names matched in Seurat, and out of 691 receptor gene names, 646 matched. The normalized ligand and receptor gene count data were accessed from the Seurat object, averaged over all the cells of each cell type and plotted for each cell type pair. For each ligand-receptor pair for each cell type pair, we multiplied the average ligand and receptor gene expression. The heatmap visualizing these values was created using ggplot2 (version 3.2.1). To identify outlying interactions, we computed the interquartile range (IQR) across all cell type pairs for each ligand-receptor interaction and used a cutoff of 3 times IQR.
GO pathway score
The 350 GO pathway gene lists, publicly available (GO Consortium; containing the phrase ‘SIGNALING_PATHWAY’ in the ‘Biological Process’ category), were used to compute pathway scores for each cell type using the AddModuleScore function in Seurat. Outlier pathways were identified for each cell type by applying a 1.5 times IQR cutoff across the scores for all cells for each pathway. To identify possible pathway activation as an outcome of a signaling interaction, the gene list for each outlying pathway was used to match the receptor genes of outlying interactions for the same cell type as the signal-receiving cell.
We thank Odemaris Narvaez del Pilar for generating the scRNA-seq data of the P7 mouse lung.
Conceptualization: M.P.C., J.C.; Methodology: M.P.C., J.C.; Software: M.P.C.; Validation: M.P.C.; Formal analysis: M.P.C., B.J.H., J.C.; Investigation: J.C.; Resources: M.P.C., B.J.H., J.C.; Data curation: M.P.C.; Writing - original draft: M.P.C., J.C.; Writing - review & editing: M.P.C., J.C.; Visualization: M.P.C.; Supervision: J.C.; Project administration: J.C.; Funding acquisition: J.C.
This work was supported by the University of Texas MD Anderson Cancer Center Start-up and Retention Fund and the National Institutes of Health (NIH; R01HL130129 to J.C.). The University of Texas MD Anderson Cancer Center DNA Analysis Facility and Flow Cytometry and Cellular Imaging Core Facility are supported by a Cancer Center Support Grant (CA #16672) from the NIH.
The authors declare no competing or financial interests.