Synovial joint development begins with the formation of the interzone, a region of condensed mesenchymal cells at the site of the prospective joint. Recently, lineage-tracing strategies have revealed that Gdf5-lineage cells native to and from outside the interzone contribute to most, if not all, of the major joint components. However, there is limited knowledge of the specific transcriptional and signaling programs that regulate interzone formation and fate diversification of synovial joint constituents. To address this, we have performed single cell RNA-Seq analysis of 7329 synovial joint progenitor cells from the developing murine knee joint from E12.5 to E15.5. By using a combination of computational analytics, in situ hybridization and in vitro characterization of prospectively isolated populations, we have identified the transcriptional profiles of the major developmental paths for joint progenitors. Our freely available single cell transcriptional atlas will serve as a resource for the community to uncover transcriptional programs and cell interactions that regulate synovial joint development.
Synovial joints are complex anatomical structures comprising diverse tissues, including articular cartilage, synovium, fibrous capsule and ligaments (Decker et al., 2014). Each of these tissues is susceptible to a range of diseases and common injuries that, collectively, have a profound global morbidity (Asahara et al., 2017; den Hollander et al., 2019). A better understanding of the inter- and intracellular networks that govern how these structures emerge during development will inform efforts to generate pluripotent stem cell derivatives for cell replacement therapy and disease modeling (Wang et al., 2019b), and efforts to elicit regeneration in situ (Johnson et al., 2012). Moreover, an improved understanding of joint development will aid in identifying putative disease-causing genes (Kelly et al., 2020).
Over the past two decades, lineage tracing has revealed much regarding the cell populations contributing to murine synovial joint development. It begins with the formation of the interzone (IZ), a region of condensed mesenchymal cells at the site of the prospective joint. In the mouse hindlimb, the IZ is initiated from a Col2a1+ Sox9+ pool of cells recruited from the mesenchymal condensation of the emerging limb bud starting at E11.5 (Hyde et al., 2008; Soeda et al., 2010). It is generally believed that chondrocytes at the presumptive joint de-differentiate (i.e. undergo a chondrocyte-to-mesenchymal transition) and begin to exhibit the flattened and layered morphology that is indicative of the IZ. A history of expressing Gdf5, a TGFβ ligand crucial to joint formation (Storm and Kingsley, 1999), marks cells that initially form the IZ or that later immigrate into it, and that subsequently go on to contribute to all of the major joint constituents, including articular chondrocytes, ligament, meniscus and synovium (Chen et al., 2016; Shwartz et al., 2016).
To gain a more comprehensive understanding of these developmental programs, bulk microarray expression profiling and RNA-Seq have been applied to the developing limb (Taher et al., 2011), to whole joints, including the elbow and knee (Pazin et al., 2012), to the meniscus (Pazin et al., 2014), to the tendon (Liu et al., 2015b), to connective tissue (Orgeur et al., 2018), and to laser-capture micro-dissected regions of the interzone (Jenner et al., 2014). Although these investigations have yielded new insights into the genetic programs underpinning limb and joint morphogenesis, they provide limited resolution of the expression states for individual cell types due to the heterogenous nature of the samples profiled. With the advent of single cell profiling, it is now possible to detect transient populations of cells, to reconstruct developmental transcriptional programs and to identify new cell populations (Guo et al., 2010; Kumar et al., 2017). For example, Feng et al. revealed molecular signatures and lineage trajectories of an interzone-related Lgr5+ population in the murine E14.5 knee joint that contribute to the formation of cruciate ligaments, synovial membrane and articular chondrocytes (Feng et al., 2019).
Here, we have applied single cell RNA-sequencing on Gdf5-lineage cells of the murine hindlimb to determine the transcriptional programs of early synovial joint development. In contrast to the recent study of Feng et al. (2019), which focused on lineage divergence of a specific Lgr5+ interzone population, we sought to characterize formation of the entire IZ and to discover the extent to which heterogeneity in the nascent interzone is resolved into the distinct lineages that are apparent later at cavitation. Therefore, we sequenced Gdf5-lineage cells from the presumptive joint of the hindlimb from E12.5 (prior to overt IZ formation) through E15.5 (coinciding with cavitation). We combined computational analytics and in situ hybridization to identify the transcriptional signatures of joint progenitors and their elaboration of the interzone into the major synovial joint lineages. To aid the community in discovering additional transcriptional programs and in inferring cell interactions that contribute to synovial joint development, we have made this data freely and easily accessible with a web application at cahanlab.org/resources/joint_ontogeny.
Gdf5Cre+ cells in the hindlimb from E12.5 to E15.5 Gdf5CreR26EYFP mice are primarily located in the interzone, articular cartilage, ligament, menisci and synovium, as well as in other non-joint tissues
Gdf5-lineage cells contribute to several components of synovial joint, including articular cartilage, meniscus, ligaments, and synovium. To isolate joint progenitors, we crossed Gdf5 promoter-driven Cre mice with the R26 reporter mice in which loxP-flanked STOP sequence followed by the EYFP was inserted into the Gt(ROSA)26Sor locus, allowing us to identify Gdf5-lineage cells by YFP expression. We used fluorescent immunohistochemistry to determine the spatial and temporal pattern of YFP. At E12.5, YFP is mainly expressed in the presumptive joint area including part of the bone anlagen and the surrounding mesenchyme (Fig. 1). At E13.5, YFP+ cells are more centered in the interzone (IZ) and in the surrounding connective tissue; they are sparse in the anlagen of the femur and tibia. By E14.5, YFP+ staining is mainly present at the area of future articular cartilage (AC), synovium and surrounding soft tissue. YFP expression becomes obvious in menisci 1 day later. YFP+ cells are also seen in AC, epiphyseal cartilage and synovium at E15.5.
We observed ‘ectopic’ YFP expression in non-joint tissues such as the dermis and muscle, consistent with previous reports (Roelofs et al., 2017). However, because our scRNA-Seq analysis pipeline includes a ‘cell typing’ step (see below), we were able to identify these non-joint cells in silico and exclude them from our in-depth analyses that focus on the Gdf5-lineages of the joint. We refer to cells that passed our in silico filtering as Gdf5-lineage enriched (GLE) cells rather than YFP+ Gdf5-lineage cells because we cannot absolutely prove that YFP expression tracks with Gdf5-lineage in this system. Nonetheless, our staining, combined with prior reports examining Gdf5Cre cells in the limb, indicate that GLE cells are major cellular contributors to the knee joint. Therefore, determining their transcriptomes will yield insights into the genetic circuitry that accompanies IZ formation and the emergence of articular components such as ligament and tendon.
GLE cells form three distinct super clusters across two major developmental stages
To define the transcriptional states of joint cells and their progenitors during landmark developmental events, we isolated YFP+ cells from the hind limbs of male embryos from E12.5 (the time just before overt IZ formation) to E15.5 (coinciding with cavitation). To minimize contamination with Gdf5-lineage cells from the ankle and digits, we manually dissected the region of the limb containing the presumptive joint and excluded the paw (Fig. S1A). We then collected Gdf5 lineage cells by fluorescence-activated cell sorting (FACs) of YFP+ cells after enzymatically disassociating the presumptive knee joint region (Fig. S1B). We loaded approximately 6000 cells per timepoint for single cell RNA-Seq library preparation using the 10x Genomics platform (Fig. S1C), and we sequenced the transcriptome of ∼1000 to 5000 cells at a target depth of 100,000 reads per cell (Table 1).
Next, we performed quality control to exclude potential doublets and potential low-quality libraries. We defined doublets as the cells in the top 5% of total reads per capture, based on the estimated doublet rate of the 10x platform (Zheng et al., 2017). We labeled libraries as low-quality if they had fewer than 500 genes detected or if their total transcriptome consisted of greater than 5% of mitochondrially encoded genes. We then sought to identify the major transcriptional states in our data by clustering using the Leiden graph-based community detection algorithm (Traag et al., 2019). We found 12 clusters, many of which contained cells from multiple timepoints (Fig. S2A). To determine the cell type of each cluster, we used SingleCellNet to classify individual cells based on a well-annotated reference data set (Tan and Cahan, 2019), and we used differential gene expression to identify marker genes of cell types that are not included in current single cell reference data sets (e.g. neural crest cells and melanocytes). This approach identified six clusters made up of non-joint cell types, including myoblasts, immune and red blood cells, neural crest cells and melanocytes, and endothelial cells (Fig. S2B). After removing these non-joint cells, we re-clustered the data and performed differential gene expression analysis (Fig. S2C). All clusters had detectable levels of the osteochondral transcription factor (TF) Sox9, except one that had high levels of genes associated with dermis, including Twist2 and Irx1 (Fig. S2D). To localize the cells in this cluster, we performed in situ hybridization, confirming that they were dermal cells (Fig. S2E), and we excluded these cells from further analysis, resulting in a dataset of 7329 synovial joint GLE cells.
Next, we asked whether there were discernible transcriptional profiles that spanned timepoints. To address this question, we clustered all of the GLE cells and uncovered three ‘super-clusters’ (SCs), two of which contain a plurality of cells from more than a single timepoint (Fig. 2A,B). One of the clusters corresponds roughly to developmental time: SC1 is 98.2% E12.5 cells. The other two SCs are mixtures, with SC2 and SC3 predominately made up of cells from E13.5 to E15.5. We inferred the stage of cell cycle by scoring each cell based on its expression of phase-specific sets of genes. This analysis showed that, as expected, the proportion of cells in either the G2M or S stages gradually decreased over developmental time (Fig. 2C). To gain a better understanding of these SCs, we examined the expression of genes with well-established roles in limb and joint development. Prrx1 and Pitx1 are preferentially expressed in the early SC1 (Fig. 2D), consistent with their roles in specifying limb mesenchymal cells from lateral plate mesoderm (Bobick and Cobb, 2012; Marcil et al., 2003; Wang et al., 2018). Shox2, which regulates the onset of early chondrogenesis (Bobick and Cobb, 2012), has a similar expression pattern. As many cells in SC1 express Sox9 but few express Col2a1, it is likely that this supercluster comprises a mixture of progenitor cells of mesenchymal character and chondroprogenitors. SC2 is similar to SC1 in expression profile, but it also preferentially expresses Sox9, Gdf5, Col11a1 and Col2a1, suggesting that this SC is likely to contain a mixture of IZ cells and transient chondrocytes (Zhao et al., 1997). SC3 cells express the fibrous-related genes Col3a1, Col1a1, Lgals1 (Dasuri et al., 2004) and Dcn (Havis et al., 2014), indicating that SC3 largely consists of fibroblast-related cells.
Gene set enrichment analysis largely corroborated our supervised annotation of the superclusters (Fig. 2E). SC1 is enriched in limb and joint development-associated pathways, including embryonic limb morphogenesis, the Hippo signaling pathway (Bhattaram et al., 2010; Pan, 2010) and epithelial-to-mesenchymal transition. SC2 is enriched in extracellular matrix (ECM) organization, skeletal system development and cartilage development. SC3 involvement in fibrous differentiation is supported by the enrichment of collagen fibril organization and elastic fiber formation.
Taken together, this analysis has revealed three major transcriptional states of GLE cells in synovial joint development. It has also hinted at substantial heterogeneity within SCs. To more clearly define the cell types and states of GLE cells, we next analyzed each SC separately, as described in the following sections.
Two categories of early GLE cells: chondrogenic and mesenchymal
By applying Leiden clustering to only SC1, we identified two subclusters: SC1_A and SC1_B (Fig. 3A). SC1_B has high expression levels of genes associated with chondrogenesis (e.g. Sox9 and Col2a1) and the IZ (e.g. Nog and Gdf5) (Hartmann and Tabin, 2001; Ray et al., 2015; Storm and Kingsley, 1996). SC1_A exhibited high expression levels of genes associated with fibrous and mesenchymal cells, such as Col3a1 and Col1a2 (Niederreither et al., 1992), as well Osr1, which is mainly expressed in the outer mesenchyme where it promotes fibroblast differentiation and inhibits chondrogenesis (Stricker et al., 2012) (Fig. 3B). A substantial fraction of each subcluster was in S or G2M phase of cell cycle, indicating active proliferation. These results suggest that SC1 comprises chondroprogenitors and early chondrocytes of the limb anlagen, nascent IZ cells, as well as the non-chondrogenic mesenchymal cells situated outside the anlagen. We tested and confirmed this conjecture using in situ hybridization for genes indicative of each cluster (Fig. 3C,D).
To determine the lineage relationship between these clusters, we performed RNA Velocity analysis (La Manno et al., 2018). Our results predicted that there is very little transition between SC1_A and SC1_B (Fig. 3E). To test this prediction, we prospectively isolated E12.5 YFP+ cells using antibodies specific for SC1_A (CD9) or SC1_B (PDGFRA), and measured lineage-specific marker expression after culturing the cells in vitro for 7 days. Cells from the PDGFRA+ population exhibited a mesenchymal morphology, whereas cells from the CD9+/PDGFRA− population exhibited a chondrocyte-like morphology (Fig. 3F, left). Consistent with their respective shapes and appearances, the PDGFRA+ population yielded a substantially higher proportion cells positive for the tendon and ligament marker TNMD compared with the CD9+/PDGFRA− population, and a lower proportion of cells positive for the chondrogenesis regulator SOX9 as measured by immunofluorescence [Fig. 3F (right), G]. Although the CD9+ population yielded more THY1-positive cells, neither group had a substantial fraction of positive cells. The fact that both populations were not mutually exclusive for TNMD and SOX9 expression can be explained by incomplete lineage commitment, by the imperfect ability of PDGFRA to mark SC1_A and of CD9 to mark SC1_B, and by impurity in the FACS gating. With these caveats in mind, the data support a model where the in vitro differentiation propensity of SC1_A is towards a tenocyte/ligamentocyte fate, whereas the in vitro propensity of SC1_B is towards a chondrocyte fate.
Diverse trajectories of the Sox9 high compartment of E12.5 GLE cells
Although most SC1_B cells expressed Sox9, we noticed that they were heterogeneous in terms of IZ- and chondrocyte-related genes, suggesting that this cluster consisted of subpopulations or sub-states. To examine this further, we clustered SC1_B alone and identified five clusters: SC1_B1 to SC1_B5 (Fig. 4A). SC1_B2 was marked by high levels of Col2a1 and Matn1, indicating that it contained cells destined to become transient, epiphyseal chondrocytes (Hyde et al., 2007) (Fig. 4B). SC1_B5 was marked by high levels of cell cycle-related genes. The three other clusters expressed both chondroprogenitor transcription factors (e.g. Sox5, Sox6 and Sox9), as well as the IZ marker Gdf5. These clusters varied in the extent to which they expressed other IZ-related genes: SC1_B4 had high levels of Sfrp2, Vcan and Trps1, whereas SC1_B2 had the highest levels of Ebf1, Jun and Sox4, and SC1_B1 had the highest expression of Ptn, Irx3 and Meis2 (Fig. 4B) (Pazin et al., 2012; Choocheep et al., 2010; Kunath et al., 2002; Norris et al., 2007; Salva and Merrill, 2017).
To explore possible lineage relationships between these clusters, we performed RNA velocity analysis. Although there was some predicted flow from the proliferating cluster SC1_B5 to SC_B2 (PHC) and to SC1_B4 (IZ), there was otherwise very limited, consistent flow predicted between clusters, suggesting that these clusters represent distinct lineages. Next, we used pseudotime analysis to identify genes and pathways that are dynamically expressed and thus that potentially contribute to the nascent IZ and the PHC trajectories. We applied diffusion-based pseudotime on each subcluster separately, selecting a root or starting point for the trajectory based on the RNA velocity results (Fig. S3A,B) (Haghverdi et al., 2016). Then we identified genes that were significantly correlated with pseudotime along each trajectory, grouped these genes based on the period (beginning, middle or end) at which they had the highest mean expression (Street et al., 2018), and then performed gene-set enrichment.
The trajectory of PHC differentiation was characterized by initially high levels of expression of genes involved in proliferation and metabolism, as well as transcription factors (TFs) associated with early limb development, such as Prrx1, Prrx2 (ten Berge et al., 1998), Pitx1 (Nemec et al., 2017), Tbx18 (Haraguchi et al., 2015), Tcf4 (Cho and Dressler, 1998) and Ets2 (Ristevski et al., 2002) (Fig. 4D and Table S1). The detection of Ctnnbip1 (also known as ICAT), a repressor of Wnt signaling (Tago et al., 2000), at the onset of PHC pathway in cells that express both Tcf4 and β-catenin is consistent with the pre-eminent role of Wnt in regulating IZ formation (Hartmann and Tabin, 2001; Tufan and Tuan, 2001). In addition to these pathways and genes previously implicated in chondrogenesis, our analysis has identified Tsc22d1 and Zfp36l1 as potential novel regulators of this initiation phase.
Genes with diverse roles in chondrogenesis and chondrocyte differentiation were upregulated in the middle phase. The growth factor Mdk and the proteoglycan Vcan play roles in chondrogenesis via Wnt and TGFβ signaling, respectively (Choocheep et al., 2010; Haffner-Luntzer et al., 2014; Wang et al., 2014). P4ha2 encodes a component of a collagen synthesis enzyme, and mice null for this gene exhibit impaired ECM in chondrocytes (Aro et al., 2015). Gpc3, which is expressed in the regions of transient chondrocytes, controls limb patterning by interacting with BMP4, a stimulator of chondrogenesis (Miljkovic et al., 2008; Paine-Saunders et al., 2000). Aebp1 regulates collagen in ECM during chondrocytes differentiation (Blackburn et al., 2018). And Wwp2 directly interacts with Sox9 to enhance its transcriptional activity in chondrogenesis (Nakamura et al., 2011). Other genes upregulated in the middle phase do not have overt or previously described links to chondrogenesis and differentiation, but rather suggest a differentiation pause, as indicated by expression of Id1 and Id2, which interact with Hes1 to repress differentiation (Zhang et al., 2014), and by expression of several cell cycle regulators, including Cdk1, Ccnd2 and Cdkn1c.
The late stage of the PHC trajectory was marked by expression of many well-characterized players in pre-hypertrophic chondrocyte function and regulation (Fig. 4D and Table S1). Many genes encoding contributors to the ECM were upregulated in this phase: Acan, Col2a1, Col9a1, Col9a2, Col9a3, Epyc, Col7a1, Hapln1, Col11a1, Matn1 and Matn4 (Hyde et al., 2007; Lawrence et al., 2018; Mayo et al., 2009; Zhang et al., 2008). Several signaling pathways were implicated in this phase, including Fgf, Wnt, BMP/TGFβ and Hedgehog. Fgfr3, which induces terminal hypertrophic differentiation (Su et al., 2014), the Wnt antagonist Frzb (Enomoto-Iwamoto et al., 2002; Leimeister et al., 1998; Witte et al., 2009), Smad7 (Vargesson and Laufer, 2001), Cthrc1 (Kimura et al., 2008) and Dlk1 (Chen et al., 2011; Schmid et al., 2010; Taipaleenmäki et al., 2012; Wong et al., 1997), genes associated with BMP/TGFβ, and the Sonic hedgehog receptor Ptch1 were all upregulated in this phase. This analysis also revealed several TFs that have not, to our knowledge, been implicated in PHC, including Son and Peg3.
Next, we explored the early IZ trajectory of SC1_B4 cells (Fig. 4E and Table S1). They start with high expression of TFs such as Shox2, which regulates the onset of early chondrogenesis (Bobick and Cobb, 2012), the limb morphogenesis determinant Tshz1 (Coré et al., 2007) and the patterning homeobox TF Hoxa9 (Raines et al., 2015). Genes such as Cnn3 (Haag and Aigner, 2007), which is associated with the actin cytoskeleton, were also upregulated. The middle phase of early IZ formation was marked by a similarity to TFs of the early PHC trajectory (Pitx1, Prrx1, Tbx18 and Foxn3), in addition to the mesenchymal TF Twist2. In addition, Trps1 acts downstream of Gdf5 to promote chondrogenesis (Itoh et al., 2008). The late stage was characterized by a mixture of IZ- and chondrocyte-associated genes and pathways. For example, some chondrogenic-related genes and their regulators included Col2a1, Col9a3, Dlk1, Wwp2, Sulf1, Cd9 and Col11a1 (Wang et al., 2018; Chen et al., 2011; Seegmiller et al., 1971; Zhao et al., 2006), and Sox5, Sox6 and Sox9. On the other hand, IZ-associated genes such as Gdf5, Sfrp2, Osr2, Tgfbi, Tgfb2 and Pgk1 (Amarilio et al., 2007; Gao et al., 2011; Pazin et al., 2012), and Snai2 were also upregulated. Taken together, this profile suggests a complex state that is poised between chondrogenic and mesenchymal (or IZ).
Compared with SC1, many SC2 cells had high levels of IZ-related genes such as Cd44 and Sfrp2; other SC2 cells exhibited more established chondrocyte profiles. To resolve this population heterogeneity, we performed clustering on SC2 and identified two groups (Fig. 5A). SC2_A was enriched in chondrocyte-related genes Col2a1, Col9a1, Sox9 and Matn1, whereas SC1_B expressed IZ-related genes such as Gdf5, Sfrp2 and Htra1 (Fig. 5B). We confirmed the expression localization of some of these genes by in situ hybridization (Fig. 5C and Fig. S4A).
Noticing heterogeneity of gene expression within SC2_A and SC2_B, we clustered them at a finer level of resolution, resulting in six and five subclusters, respectively (Fig. 5D). As in SC1_B, the chondrocyte SC2_A cluster included a subcluster (SC2_A6) that expressed Ihh, Cd200, Mef2c, and Panx3, suggesting a pre-hypertrophic state (Fig. S4B) (Etich et al., 2015). SC2_A also included clusters indicative of a proliferative state (SC2_A4), higher collagen expression patterns (SC2_A2), developmental stage (SC2_A1 and SC2_A3) and specialized chondrocyte expression signature [e.g. coordinated expression of S100a6 and S100a10, Anxa1 and Anxa2, and Timp1 in SC2_A5 (Headland et al., 2015; Yammani, 2012)]. We did not follow up on these transcriptionally distinct transient chondrocyte clusters. Rather, we chose to focus more on SC2_B because it included subclusters expressing high levels of IZ-related genes. Subcluster SC2_B1 was distinguished by higher expression of fibroblast-related collagen Col1a1, Col1a2, Ptn and Col14a1, suggesting it represents cells of the meniscus (Fig. 5E) (Hyde et al., 2008; Mittapalli et al., 2009). Further supporting this hypothesis is the fact that several genes previously reported as being more highly expressed in E16 meniscus as compared with AC or CL (i.e. Dpt, Igfbp3, Rbp1 and Ogn) were preferentially expressed in SC2_B1 (Pazin et al., 2014) (Fig. S4C). SC2_B2 expressed high levels of IZ-linked Sfrp2 and Htra1 (Oka et al., 2004), as well as Mfap4 and Tgfbi. Because Sfrp2 is expressed primarily in intra-articular ligament at later stages of development (Pazin et al., 2012), we hypothesized that SC2_B2 may include IZ cells that contribute to the presumptive intra-articular ligament. We explored this hypothesis by performing multiplex in situ hybridization using probes for Htra1 and Mfap4. We found that cells positive for both Htra1 and Mfap4 were primarily located at the nascent intra-articular ligament at E15.5, supporting the notion that SC2_B2 is a ligament associated IZ cluster (Fig. 5F).
SC2_B4 had high levels of sheath of cartilage-related and tendon-related genes Gdf5, Tppp3 (Staverosky et al., 2009), Sox9 and Cd44 (Hartmann and Tabin, 2001). This cluster also exhibited high levels of Igfbp2, which has previously been reported to be specifically expressed along the superficial aspects of the articular cavity (Pazin et al., 2014) (Fig. S4D). Based on this, we speculated that this cluster represents cells of the thin sheath of fibrous tissue that lines the surface of articular chondrocytes and the meniscus. To determine the spatial positioning of these cells, we again used multiplex in situ hybridization with probes for Cd44 and Emp1, a gene that inhibits chondrocyte differentiation (Li et al., 2011) that was also preferentially expressed in SC2_B4 and SC2_B5. We found that Cd44 was specifically expressed at the emerging articular surface (Fig. 5G), whereas Emp1 was expressed on articular cartilage, menisci and the joint capsule. We identified a small number of cells that expressed both Cd44 and Emp1 at the outermost AC layers and menisci. Taken together, these results support the notion that SC2_B4 contains synovial cavity-lining cells and that SC2_B5 comprises meniscal cells.
To understand the lineage relationship between these clusters, we performed RNA velocity analysis on SC2 cells (Fig. 5H). Overall, the minimal flow between SC2_A and SC2_B is consistent with the fact that most SC2_A cells expressed Matn1 and thus are already undergoing pre-hypertrophic chondrocyte differentiation. However, we did observe two cases in which there was flow between SC2_A and SC2_B. Approximately ten SC2_A (SC2_A5) cells had velocity towards SC2_B (SC2_B5); however, these cells did not express Matn1, suggesting that they were undergoing a chondroblast-to-mesenchymal transition (Fig. S4E). On the other hand, a small number of SC2_B (SC2_B3, SC2_B4) cells had velocity towards a Matn1-negative region of SC2_A (SC2_A2, SC2_A4), suggesting that some synovial cavity IZ cells undergo chondrocyte specification at this stage. We also discovered a consistent velocity flow into the meniscus cluster SC2_B1 from SC2_B5. SC_B5 was enriched in chondrocytes enhancers Hmga2 and Hmga1b (Richter et al., 2011), and had elevated expression levels of Cd9 (Singh et al., 2015) and Emp1, indicating that these cells might contribute to the chondrocyte-like component of the meniscus.
Development of articular fibrous components
SC3 exhibited some similarities in expression profile with the IZ cells of SC2. For example, SC3 expressed high levels of Postn and Egfl6 (Fig. 6A), genes that were detected in both the IZ and flanking mesenchyme (Fig. 6B). However, SC3 specifically expressed the Wnt inhibitor Dkk2, which we detected in the flanking mesenchyme but not the IZ (Fig. 6B). SC3 specifically expressed other genes that have previously been localized to the mesenchyme adjacent to developing limb cartilage anlagen, including Hic1 (Pospichalova et al., 2011), Osr1 (Vallecillo-García et al., 2017) and Dcn (Olguin and Brandan, 2001) (Fig. 6C), suggesting that SC3 is composed of mesenchymal/fibroblast-related cells that flank the joint and developing bone. We identified other genes that distinguish SC3 from SC2, including Crabp1, Zfhx3, Ebf2, Celf2, Rspo3, Map1b and Selenop, as they were high in SC3, low or not detected in SC2, and variably expressed in SC1 (Fig. S5A).
As we did for SC1 and SC2, we performed a deeper analysis of SC3 by clustering it more finely into two subclusters (Fig. S5B). SC3_A mainly comprised E13.5 cells and had high levels of genes related to cell proliferation, whereas SC3_B was made up primarily of E14.5 and E15.5 cells, and had higher levels of fibroblast- and ECM-related genes such as Postn and Col3a1 (Fig. S5C). As SC3_A and SC3_B differed mainly by developmental stage, we subclustered each to search for more subtle differences in state or lineage, resulting in four subgroups in SC3_A (Fig. 6D) and four in SC_B. Here, we focused on the subclusters of SC3_B as they appeared to have more defined and distinct signatures. SC3_B1 likely represents remaining residual undifferentiated mesenchymal progenitors, as they maintain Osr1 expression and have upregulated Dlk1 and Meg3 (Fig. 6E) imprinted genes associated with a progenitor state (Miller and Cole, 2014; Sanli et al., 2018). Cells of SC3_B2 are likely to be ligament or tendon cells based on preferential expression of Tnmd and Scx (Soeda et al., 2010; Sugimoto et al., 2013; Subramanian and Schilling, 2015), as well as other tendon-associated genes, including Thbs4 (Havis et al., 2014; Subramanian and Schilling, 2014), Htra1 (Oka et al., 2004) and Meox2 (Havis et al., 2014). Our single cell data indicated that nascent cruciate ligament cells (SC2_B2) express Scx but not Tnmd, whereas the SC3_B2 cells express both genes (Fig. S5E). We confirmed the localization of SC3_B3 to the outer tendon using multiplex in situ hybridization (Fig. 6F). SC3_B3 is likely to include synovial fibroblasts and fibrocartilage cells of the enthesis (Zelzer et al., 2014) based on the preferential expression of Cthrc1, which is produced by synovial fibroblasts (Park et al., 2004; Shekhani et al., 2016) and chondrocyte-related genes Aspn, Mia and Dlx5 (Ferrari and Kosher, 2006). We found that Col8a2 and Dlx5, which were preferentially expressed in SC3_B3, were co-expressed in synovium, enthesis and perichondrium, indicating that SC3_B3 is made up of fibrochondrocytes within these sites (Fig. 6G). Finally, the small SC3_B4 cluster is made up of pericytes/vascular smooth muscle cells based on expression of Pdgfrb, Ets1, Rgs5 and Acta2.
The synovial joint initiates from a thin layer of mesenchymal cells marked by Gdf5 expression. Through lineage tracing of Gdf5, it has become apparent that Gdf5-expressing IZ cells give rise to multiple joint lineages. The extent to which the spatial distribution of Gdf5+ descendants changes is due to cell movement, de novo expression or cell death is not precisely known; however lineage tracing indicates that cell movement and de novo expression are the primary contributors (Ito and Kida, 2000; Shwartz et al., 2016). The transcriptional programs that drive IZ formation and elaboration has remained underexplored. In this study, we applied scRNA-Seq to Gdf5 lineage cells during embryonic stages of synovial joint development to define the continuum of expression states that govern the process from IZ formation to joint cavitation. Here, we have revealed the dynamic transcriptome changes and heterogeneity in GLE cells, we have inferred the lineage trajectories of some subpopulations, and we have identified novel regulators and contributors of these developmental decisions. Several insights have emerged from our dataset and analyses that have implications for the field.
First, our results have revealed that GLE cells at E12.5 already exhibit transcriptional heterogeneity, with one cluster tending towards a more mesenchymal state (SC1_A) and one cluster tending towards a more chondroprogenitor state (SC1_B). By prospectively isolating cells using markers that distinguished these clusters, we confirmed the in vitro lineage propensity of these cell populations. The degree of commitment of these cells in vivo remains to be determined. Second, we discovered further substructure within the SC1_B group of cells, with one cluster actively proliferating (SC1_B5), one cluster expressing markers of pre-hypertrophic differentiation (SC1_B2), one cluster expressing genes indicative of spatial patterning (i.e. SC1_B3 preferentially expressed more proximal Meis2 and proximal/anterior Irx3) and one cluster expressing higher levels of IZ-associated genes (SC1_B4: Sfrp2, Vcan and Trps1). Third, we used a combination of RNA velocity and pseudotime analysis to identify known and novel regulators of the early IZ and PHC trajectories. Fourth, we have defined the expression signatures of discrete lineages within the developing synovial joint (Fig. 7) and how they emerge over time (Fig. S6).
We note several important caveats to our results, which suggest areas for improvement and future investigation. First, we cannot exclude the possibility that certain populations of Gdf5-lineage cells were particularly vulnerable to the isolation procedure and thus underrepresented in our scRNAseq data. Another important consideration is the impact of tissue dissociation on expression state. Although we cannot exclude this latter concern, we did not detect an enrichment of immediate-early genes in any of our clusters, supporting the notion that these cells are not particularly prone to this response (van den Brink et al., 2017). Finally, it is possible that certain classes of genes are particularly sensitive to the relatively long tissue digestion necessary to achieve single cell dissociation, and that this influence may vary by cell population. We were not able to systematically assess this possibility but new technologies that enable spatial transcriptomics may illuminate the impact of this potential confounder (Ståhl et al., 2016).
In conclusion, we have made our data freely and easily accessible with a web application at www.cahanlab.org/resources/joint_ontogeny. We believe that this resource will aid the community in discovering additional transcriptional programs and in inferring cell interactions that underpin synovial joint development. Furthermore, we anticipate that these data can be used to yield improved protocols for the derivation of synovial lineages from pluripotent stem cells (Craft et al., 2015; Kawata et al., 2019; Oldershaw et al., 2010; Yamashita et al., 2015) by, for example, using it to identify candidate signaling pathways or by using the expression data as a reference against which to compare engineered cells.
MATERIALS AND METHODS
The Gdf5-Cre (Sperm Cryorecovery via IVF, FVB/NJ background) mouse strain was obtained from the Jackson laboratory. B6.129X1-Gt(ROSA)26Sortm1(EYFP)Cos/J (RosaEYFP) (C57BL/6J background) was gifted by the lab of Prof. Xu Cao (Johns Hopkins University, Baltimore, MD, USA). Gdf5-Cre::Rosa-EYFP mice were generated by crossing heterozygote Gdf5-Cre strain with homozygote RosaEYFP strain. The genotype of the mice was determined by PCR analyses of genomic DNA isolated from mouse tails using the following primers: Gdf5-directed Cre forward, 5′GCCTGCATTACCGGTCGATGCAACGA3′; and reverse, 5′GTGGCAGATGGCGCGGCAACACCATT3′ (protocol provided by Prof. David Kingsley, HHMI and Stanford University, CA, USA). Day 5 wild type refers to the C57/BL10 mouse. All the protocols were approved by the institutional review board of Johns Hopkins University.
Mice gender identification
We identified mouse gender by genotyping Sry Y gene using the primers: forward, 5′CTGGAAATCTACTGTGGTCTG3′; and reverse, 5′ACCAAGACCAGAGTTTCCAG3′.
Mice were kept in a light-reversed room (light turns on at 10 pm and turns off at 10 am). Timing was determined by putting one male mouse and two female mice in the same cage at 9 am and separating them at 4 pm on the same day. We counted midnight as E0.5. On E12.5, E13.5, E14.5 and E15.5, the pregnant mice were sacrificed using CO2 at 3 am. The cells were isolated as previously described (Gosset et al. 2008) with modification: the embryos (usually n=6-8) were rinsed three times in PBS on ice. Two presumptive knee joints were isolated by transfemoral and transtibial division in a single 3 cm dish (Fig. 1A) and incubated in digestion solution I (3 mg/ml collagenase D, DMEM high glucose culture medium, serum free) for 45 min at 37°C, then in digestion solution II (1 mg/ml collagenase D, DMEM high glucose culture medium, serum free) for 3 h (one embryo per dish) at 37°C. During the period of incubation, the mice gender was identified by genotyping and only male samples were chosen for further processing. The tissues with medium were gently pipetted to disperse cell aggregates and filtered through a 40 µm cell strainer, then centrifuged for 10 min at 400 g. The pellet was suspended with 0.4% BSA in PBS.
All cells were fractionated by fluorescence-activated cell sorting (FACS). A MoFlo XDP sorter (Beckman Coulter) was used to collect YFP+ cells and propidium iodide was used to exclude dead cells from any observations.
Single cell RNA sequencing
GemCode Single Cell platform (10X Genomics) was used to determine the transcriptomes of single cells (Zheng et al., 2017). Cells at 1000/µl were obtained after sorting and placed on ice. Each time point, one sample was selected and profiled based on the viability and amount. A total of 6000 cells were loaded each time, followed by GEM-RT reaction and cDNA amplification. Single cell libraries were constructed by attaching P7 and P5 primer sites and sample indices to the cDNA. Single cell RNA sequencing was performed on Illumina NextSeq 500 and HiSeq 2500 to a depth ranging from 347 to 489 million reads per sample.
Analysis and visualization of scRNA seq data
CellRanger (version 2.0.0) was used to perform the original processing of single cell sequencing reads, aligning them to the mm10 reference genome. We used the command line interface of Velocyto, version 1.7.3, to count reads and attribute them as spliced, unspliced or ambiguous (La Manno et al., 2018). The resulting loom files for each sample were then concatenated and subjected to quality control processing, normalization, estimation of cell cycle phase, clustering and differential gene expression analysis using Scanpy 1.4.3 (Wolf et al., 2018). Specifically, we excluded cells in which mitochondrial gene content exceeded 5% of the total reads or in cells with fewer than 500 unique genes detected. We then excluded genes that were detected in fewer than 10 cells, resulting in a dataset of 10,124 cells and 16,352 genes. Next, we performed an initial normalization on a per cell basis followed by log transformation and scaling. We scored the phases of cell cycle using cell cycle-associated genes as previously described (Satija et al., 2015). We then identified the genes that were most variably expressed across the whole data set, and within each timepoint, resulting in 3593 genes. We performed PCA and inspected the variance ratio plots to determine the ‘elbow’, or number of PCs that account for most of the total variation in the data. We generated a graph of cell neighbors using diffusion maps (Coifman et al., 2005), and then we performed Leiden clustering (Traag et al., 2019), which we visualized with a UMAP embedding (McInnes et al., 2018). We also analyzed this with SingleCellNet (Tan and Cahan, 2019), which had been trained using the Tabula Muris dataset (Tabula Muris Consortium et al., 2018). We removed cells in clusters that were classified by SingleCellNet as ‘blood’, ‘erythroblast’ and ‘endothelial’. We also removed cells in clusters that we identified as likely to be myoblasts based on high levels of Myod1 and other muscle-specific genes, melanocyte (based on Pmel expression) and neural crest (based on Sox10 expression). Then, we repeated the pre-processing and analysis pipeline on the remaining 8378 genes. Finally, we removed cells in a cluster that we determined by in situ hybridization to consist mainly of dermis cells, resulting in final dataset of 7329 cells and 16,352 genes. Super-clusters and all subclusters were identified by following the same pipeline as described above, except that the analysis was limited to the corresponding set of cells. For example, the superclusters were identified by first finding the genes that vary across both all cells, and within each time point. A neighborhood graph was then plotted using the principal components (the number of which was decided by examining the variation ratio plot), followed by Leiden clustering and UMAP embedding, and, for some subsets of data, diffusion map embedding. Differentially expressed genes were identified using the Scanpy rank_genes_groups function. Gene set enrichment analysis was performed using GSEAPY (github.com/zqfang/GSEApy), a Python interface to enrichR (Chen et al., 2013; Kuleshov et al., 2016). The analysis pipeline of Velocyto was applied to data subsets as mentioned in the main text. We used the Velcoyto results to manually assign roots for diffusion map pseudotime analysis. Genes correlated with pseudotime were identified as described previously (Street et al., 2018).
Histochemistry, immunohistochemistry and histomorphometry
The specimens were fixed in 10% buffered formalin for 24 h at room temperature, washed with distilled water and equilibrated in 20% sucrose in PBS at 4°C overnight, then mounted in OCT and frozen at −80°C. Coronal-oriented or sagittal-oriented sections (10 μm) were cut using a cryostat.
We performed trichrome staining according to the Trichrome Stain (Connective Tissue Stain) Kit protocol (Abcam). Immunostaining was performed using a standard protocol. Sections were incubated with primary antibodies to mouse GFP (1:200) (Ishiguro et al., 2020), TNMD (1:100) (Qu et al., 2018), SOX9 (1:500) (Yu et al., 2019) and THY1 (1:100) (Chen et al., 2018) in antibody diluent (Dako) at 4°C overnight followed with three 5 min washes in TBST. The slides were then incubated with secondary antibodies conjugated with fluorescence [Ab150077, 1:200 (Wang et al., 2019a); BA1032, 1:100 (Liu et al., 2015a)] at room temperature for 1h while avoiding light followed with three 5 min washes in TBST and nuclear stained with mounting medium containing DAPI. Images were captured by Nikon EcLipse Ti-S, DS-U3 and DS-Qi2. See Table S2. Six view fields under 100 X were randomly chosen for histomophometry. The percentage of positive cells was calculated (total number of positive cells/the total number of cells under one view field×100).
The data from the histomorphometry were expressed as mean±s.d., and statistical signiﬁcance was determined using an unpaired Student's t-test. The level of signiﬁcance was deﬁned as P<0.05. Data analyses were performed using SPSS 15.0 analysis software.
In situ hybridization
See Table S3 for the information of oligonucleotides used for templates for antisense RNA probes. The T7 and SP6 primer sequence were added to 5′ and 3′ ends, respectively. SP6 RNA polymerase was used for probe transcription. Probes were synthesized with digoxygenin-labeled UTP and hybridized at 68°C overnight. Results were visualized using alkaline phosphatase-conjugated anti-digoxygenin antibody and BCIP/NBT substrates.
RNAscope HiPlex probes were designed by Advanced Cell Diagnostics (ACD). Assays were performed according to protocol provided by ACD. In detail, fixed frozen tissue samples were sectioned, pretreated with protease and then incubated with Hybridize RNAscope HiPlex Amp and HiPlex Fluoro probes (green). The first-round images were captured following counterstaining. The fluorophores were cleaved prior to hybridization with second round probes (red).
FACS for prospective isolation
E12.5 embryonic hindlimb cells were isolated as described in the ‘Cell isolation’ section. After filtering through a 40 µm cell strainer, cells were suspended in autoMACS rinsing solution at 1 million per ml. After spin down, E12.5 cells were then stained with PDGFRA (1 µg per 10 million cells) and CD9 (1 µg per 5 million cells) in 100 µl autoMACS rinsing solution in the dark for 30 min followed by two washes with autoMACS rinsing solution. Cells were re-suspended in autoMACS rinsing solution. A negative control without staining was used to set up the gate. The following two E12.5 populations were collected at the same time: a YFP+PDGFRA+ population and a YFP+PDGFRA−CD9+ population.
We thank Wenyan Lu for technical support with in situ hybridization probe designs. 10× Genomics single cell processing and sequencing was conducted by Melissa Olson, Kakali Sarkar and David Mohr at the Genetic Resources Core Facility, Johns Hopkins Institute of Genetic Medicine. Flow cytometry and cell sorting was conducted by Hao Zhang at the Cell Sorting Core Facility, Bloomberg School of Public Health, Johns Hopkins University.
Conceptualization: Q.B., P.C.; Methodology: Q.B., Y.-H.C., J.P.W., E.Y.S., D.W.K., H.W., S.Y., S.B., P.C.; Software: Y.-H.C., E.Y.S., P.C.; Formal analysis: Q.B., P.C.; Investigation: Q.B.; Resources: Q.B., P.C.; Data curation: Q.B., P.C.; Writing - original draft: Q.B.; Writing - review & editing: J.P.W., P.C.; Supervision: P.C.; Project administration: P.C.; Funding acquisition: Q.B., P.C.
This work was supported by the National Institutes of Health (R35GM124725 to P.C. and R25GM109441 to J.P.W.) and by the Maryland Stem Cell Research Fund [2017-MSCRFF-3910 (Award ID: 90074850) to Q.B.]. This work was made possible by support from the Johns Hopkins University Medicine Discovery Fund to P.C. Deposited in PMC for release after 12 months.
Peer review history
The peer review history is available online at https://dev.biologists.org/lookup/doi/10.1242/dev.185777.reviewer-comments.pdf
The authors declare no competing or financial interests.