ABSTRACT
Inspired by Waddington's illustration of an epigenetic landscape, cell-fate transitions have been envisioned as bifurcating dynamical systems, wherein exogenous signaling dynamics couple to the enormously complex signaling and transcriptional machinery of a cell to elicit qualitative transitions in its collective state. Single-cell RNA sequencing (scRNA-seq), which measures the distributions of possible transcriptional states in large populations of differentiating cells, provides an alternate view, in which development is marked by the variations of a myriad of genes. Here, we present a mathematical formalism for rigorously evaluating, from a dynamical systems perspective, whether scRNA-seq trajectories display statistical signatures consistent with bifurcations and, as a case study, pinpoint regions of multistability along the neutrophil branch of hematopoeitic differentiation. Additionally, we leverage the geometric features of linear instability to identify the low-dimensional phase plane in gene expression space within which the multistability unfolds, highlighting novel genetic players that are crucial for neutrophil differentiation. Broadly, we show that a dynamical systems treatment of scRNA-seq data provides mechanistic insights into the high-dimensional processes of cellular differentiation, taking a step toward systematic construction of mathematical models for transcriptomic dynamics.
INTRODUCTION
During development and tissue regeneration, it is envisioned that cells progress through multiple transitions to ultimately adopt a distinguishable function. Although each transition en route to a terminal fate involves the coordination of myriads of molecules and complex gene regulatory networks interacting with external factors, there is a common view that they depend on significantly fewer control parameters. This view was notably explicated by Conrad Waddington in an illustration of an epigenetic space as a tilted, bifurcating landscape, where a vast number of nodes (genes) provide the scaffold for the smooth hills and valleys (cell state) down which a pebble (cell) can reliably roll until it finds a resting position (terminal fate) (Fig. 1A) (Waddington, 1957).
Cell type differentiation as a dynamical process. (A) Reimagination of Waddington's landscape of cell fate commitment in which cell fates are represented as valleys, commitment barriers as hills and gene activity as pegs underneath that control the heights of hills and valleys. (B) Schematic cell-population snapshots of maturation (top), in which one cell fate transitions to a different one, and a cell fate decision (bottom), in which a pluripotent cell differentiates to either of two lineages. Cell type images by A. Rad and M. Häggström. CC-BY-SA 3.0 license. (C) Gene expression trajectories for cells (dots) at varying levels of a differentiating stimuli for cases where the differentiation landscape does not bifurcate (top), undergoes a saddle-node bifurcation (middle) or undergoes a pitchfork bifurcation (bottom).
Cell type differentiation as a dynamical process. (A) Reimagination of Waddington's landscape of cell fate commitment in which cell fates are represented as valleys, commitment barriers as hills and gene activity as pegs underneath that control the heights of hills and valleys. (B) Schematic cell-population snapshots of maturation (top), in which one cell fate transitions to a different one, and a cell fate decision (bottom), in which a pluripotent cell differentiates to either of two lineages. Cell type images by A. Rad and M. Häggström. CC-BY-SA 3.0 license. (C) Gene expression trajectories for cells (dots) at varying levels of a differentiating stimuli for cases where the differentiation landscape does not bifurcate (top), undergoes a saddle-node bifurcation (middle) or undergoes a pitchfork bifurcation (bottom).
Many of the characteristics of Waddington's landscape have been codified into the language of dynamical systems, including that cell fates resemble valleys (attractors) in gene expression or transciptomic space (Huang et al., 2005; Corson and Siggia, 2012; Slack et al., 1991; Camacho-Aguilar et al., 2021), that a small amount of stable states can emerge from large interconnected Boolean networks (Kauffman, 1969), and that known genetic interactions can yield multiple cell fates (bistability) (Huang et al., 2007; Weston et al., 2018). Waddington's illustration has also motivated analysis of the wealth of data captured in single-cell RNA-sequencing (scRNA-seq), in which the transcriptome of individual cells are measured, often at multiple time-points, as they differentiate. For example, fitting a mathematical model of a pitchfork bifurcation to scRNA-seq data yields predictions for developmental perturbations (Marco et al., 2014), reducing the dimensionality of large transcriptomic matrices can enhance the resolution of bifurcations to precisely determine the genes enabling a cell fate decision (Setty et al., 2016; Tusi et al., 2018), and well characterized cell-lineage relationships can be used to extract predictive models of gene regulation (Furchtgott et al., 2017; Qiu et al., 2020; Wang et al., 2022). While these studies generally characterize cell fate decisions as bifurcations of an underlying developmental landscape, other studies formulate cell fate transitions as stochastic jumps between co-existing states of a multimodal cell-fate landscape that can occur even in the absence of bifurcations, to infer lineage relationships and state transition probabilities (Weinreb et al., 2018a; Zhou et al., 2021; Lange et al., 2022).
In the face of these contrasting views, it remains unclear when, during development, transcriptomes undergo bifurcations and whether they can be identified purely from statistical analyses of single-cell expression data alone. To address these unknowns, we note that as a bifurcation is a qualitative augmentation of the steady state solutions, or branches, of a dynamical system that occurs as a control parameter varies, detecting bifurcations from transcriptomic data requires that steady states and control parameters exist, and that their dynamics can be identified from the data. We hypothesized an association between cell fates and transcriptomic steady states in scRNA-seq data, as the dynamic molecular processes that lead to transcriptomic changes, such as signal transduction and transcription, generally occur in the order of seconds and minutes (Shamir et al., 2016), whereas cell fates change over the course of hours or days (Slack et al., 1991), yielding a significant separation between the time scales of molecular mechanisms and data collection. We also hypothesized that inferred developmental time (pseudotime) could be used as a high resolution readout of a biological control parameter to pinpoint developmental bifurcations, as it coincides well with intrinsic cellular dynamics, and may therefore correlate with known biological control parameters, such as morphogen concentration (Trapnell et al., 2014; Setty et al., 2016; Street et al., 2018).
Here, we use these hypotheses to lay out and demonstrate a statistical formalism for detecting and interrogating bifurcations in developmental fate transitions directly from transcriptomic pseudotime trajectories. Contrasting previous studies (Marco et al., 2014; Setty et al., 2016; Huang et al., 2007; Weston et al., 2018), we do not assume any specific mathematical form for the underlying genetic interactions, nor do we assume the shape, or even existence, of an underlying cell-fate landscape (Fig. 1B) (Chen et al., 2012; Mojtahedi et al., 2016) as it is not our goal to discern a specific model. Instead we rigorously query whether the necessary statistical signatures of bifurcations are present in a developmental timecourse. We build on and compare with similar styles of approach, which use correlation structure to detect signatures and molecular mechanisms of disease (Chen et al., 2012; Liu et al., 2012), analyze differentiation processes in temporal and pseudotemporal gene expression trajectories (Mojtahedi et al., 2016; Chen et al., 2018), and characterize reversibility in saddle-node bifurcations (Li et al., 2019). We show that our dynamical systems-driven approach enables us to distinguish between three different types of transcriptomic variation directly from systems-level data: a non-bifurcative cell fate change that is due to continuous changes in gene expression (Fig. 1C, top); a cell fate change that is due to a one-to-one state transition (Fig. 1C, middle), such as those that may occur during terminal-fate maturation (Ferrell, 2012); and a cell fate change that is due to a one-to-many state transition, e.g. those that occur when pluripotent cells decide between multiple cell lineages (Fig. 1C, bottom). We apply our framework to a class of in silico high-dimensional genetic networks to demonstrate its ability to recover the salient features of a bifurcating dynamical system, and examine the effects of high dimensionality and noise. We demonstrate the utility of our framework in the context of a recently published scRNA-seq exploration of hematopoiesis (Weinreb et al., 2020), and show that cell-fate bifurcations can be pinpointed and analyzed in scRNA-seq data, even without detailed knowledge of the dynamics and controls of the underlying system. Finally, we demonstrate that our framework allows us to identify a low-dimensional phase plane in which the dynamics unfolds, and can be used to distinguish new cellular clusters and extract genetic relationships that are pivotal to the bifurcative cell-fate change.
RESULTS












The correlation structure and expansion of correlation coefficients at a bifurcation (Eqn. 4) has been used previously in the theory of Dynamical Network Biomarkers, or DNB (Chen et al., 2012), to explore bifurcations in cases where it can be determined which state variables are mechanistically involved in the bifurcation, including single-cell data (Mojtahedi et al., 2016). In our analysis, we analytically (Appendix S1, section 2) and empirically compare with this method, but note that unlike DNB analysis, it is not necessary to delineate which genes drive the bifurcation, as Eqn. 2 is a global feature of the covariance. Additionally, in contrast to previous studies that focus on correlation coefficients, we explore how covariance eigenvectors (Eqn. 3) provide direct insight into the underlying mechanisms driving developmental bifurcations.
Thus, three specific changes to the transcriptomic covariance data, Eqns. 2-4, that can be determined from observations of state variables, can inform us of the salient features of the system, its bifurcations, even when we have no direct access to the generative model for the dynamics or to its corresponding underlying geometry. Notably, these features rely only on the transcriptomic data being sampled from the vicinity of a steady state, and do not rely on special circumstances, such as the Jacobian being symmetric, or on the noise being of a particular nature. We first use theoretical models of noisy, high-dimensional genetic networks to demonstrate how this approach can be leveraged to detect and assess bifurcations of an underlying dynamical system from observations of state-variables alone (for a procedural outline, see Materials and Methods section ‘Analysis pipeline’). Following this, we emphasize the power of this approach by directly applying it to scRNA-seq data for the neutrophil lineage in the hematopoietic system.
Covariance analysis recovers salient features of a high-dimensional in silico gene regulatory network

Analysis of a gene regulatory network around a saddle-node bifurcation. (A) Schematic of the GRN for 5 of the 102 genes. Undisplayed nodes have a unidirectional arrow stemming from either g1 or g2. (B) Distributions of g1 at steady state (see Materials and Methods section ‘Simulation methodology’) for three values of m1. (C-G) GRN observations as a function of the bifurcating variable m1 evaluated over a distribution of 100 cells (see Materials and Methods section ‘Simulation methodology’). (C) Average final expression for each gene. Expression of driver genes, g1,2 (bottom and top rows, respectively), are min-max normalized. Response genes are sorted by their corresponding driver [d(i)] and activation level (αi). (D) Black and gray indicate largest eigenvalue of covariance matrix shifted to have 0 min. Purple dashed line indicates DNB order parameter for gene expression matrix, where driver genes and tightly coupled responders |αi−0.5|<0.25 are in the DNB (Chen et al., 2012). (E) Distribution of normalized gene expression for each cell projected onto the bifurcating axis. (F) Red squares indicate the largest eigenvalue of the Jacobian matrix. Error bars are s.e.m. Blue circles indicate Euclidean distance from the corresponding Jacobian eigenvector () to the principle covariance eigenvector
. (G) Distribution of Pearson's correlation coefficients for all gene pairs. (H) Correlation coefficients between responding genes and their driver as a function of the coupling coefficient α (Eqn. 6) at three values of m1. Each column in E and G integrates to 1.
Analysis of a gene regulatory network around a saddle-node bifurcation. (A) Schematic of the GRN for 5 of the 102 genes. Undisplayed nodes have a unidirectional arrow stemming from either g1 or g2. (B) Distributions of g1 at steady state (see Materials and Methods section ‘Simulation methodology’) for three values of m1. (C-G) GRN observations as a function of the bifurcating variable m1 evaluated over a distribution of 100 cells (see Materials and Methods section ‘Simulation methodology’). (C) Average final expression for each gene. Expression of driver genes, g1,2 (bottom and top rows, respectively), are min-max normalized. Response genes are sorted by their corresponding driver [d(i)] and activation level (αi). (D) Black and gray indicate largest eigenvalue of covariance matrix shifted to have 0 min. Purple dashed line indicates DNB order parameter for gene expression matrix, where driver genes and tightly coupled responders |αi−0.5|<0.25 are in the DNB (Chen et al., 2012). (E) Distribution of normalized gene expression for each cell projected onto the bifurcating axis. (F) Red squares indicate the largest eigenvalue of the Jacobian matrix. Error bars are s.e.m. Blue circles indicate Euclidean distance from the corresponding Jacobian eigenvector () to the principle covariance eigenvector
. (G) Distribution of Pearson's correlation coefficients for all gene pairs. (H) Correlation coefficients between responding genes and their driver as a function of the coupling coefficient α (Eqn. 6) at three values of m1. Each column in E and G integrates to 1.
We simulated this model for a fixed number of genes (ng), statistical replicates or cells (nc), noise scale (s), duration (Nt) and timestep (δt) for different values of the control parameters (m1, kD) (see Materials and Methods section ‘Simulation methodology’). We define the [nc×ng] transcriptomic matrix G(m1, kD) once the system has reached steady state in the simulation. We observed that the steady-state distributions for individual genes (e.g. g1, shown in Fig. 2B) shift their mean as the control parameter, m1, is varied and exhibit bimodality at the bifurcation point, m1=m1c=3, as expected for saddle-node bifurcations.
Having verified that our model simulates a system that undergoes a high dimensional saddle-node bifurcation driven by a two-gene driver core, we used it to examine the effects of noise and a large number of responding genes on the theoretical predictions (Eqns 2-4). As predicted, we found that ω1(m1), the largest eigenvalue of the covariance of G(m1), is maximal at the critical value m1c (darker line in Fig. 2D), and the increase is significantly larger than can be obtained from a null distribution (lighter line in Fig. 2D) that lacks the correlations between genes of the model (see Appendix S1, section 4). This contrast between the data and the null can be understood by considering the bimodality of the transcriptomic distribution at the bifurcation. Far from the bifurcation, the transcriptomic distribution is unimodal, and all ωi values scale with the noise scale s, which is undirected and therefore unaffected by resampling, yielding (Fig. S3, left and right panels). However, at the saddle-node bifurcation, the transcriptomic distribution is bimodal, so ω1 scales with the distance between the two modes (Fig. S3, center top); marginal resampling of transcriptomes at the bifurcation yields new modes and the increased dimensionality of the bifurcation diminishes
, compared with ω1 (Fig. S3, center bottom). While Fig. S3 only demonstrates the bifurcation bimodality in g1,2, the full transcriptomic bimodality can be visualized by computing
, the normalized projection of the transcriptome of each cell along the principal covariance eigenvector. The distribution of this projection is densely centered around different fixed points to the right and left of m1c, but widens significantly at m1c as there is non-zero probability for both transcriptomic modes (Fig. 2E).
Although this toy model has an explicit bifurcation parameter, often the controls for specific developmental transitions are unknown in scRNA-seq data and the developmental time for each cell is inferred (pseudotime). To verify that similar spikes are expected even when the covariance is measured as a function of an indirect measure of a control parameter, such as pseudotime, we performed a pseudotime analysis on our toy model (Fig. S4A). In particular, we reduced the dimensionality of G using the SPRING method (Weinreb et al., 2018a), in which the (x, y) coordinates of each cell are determined by optimally placing each cell closest to its 4 nearest neighbors in the space of the top 10 gene-wise principal components (PC) of highly variable genes (Weinreb et al., 2018a), and computed their pseudotime using Slingshot (Street et al., 2018), in which pseudotime is approximated by the distance of the reduced-dimension data to a spline fit from one end of the data to the other (see Appendix S1, section 7.2 for details). We then binned the replicates (cells) by their pseudotime rank in 100 cell bins, and computed the principal covariance eigenvalue in each bin. We found that ω1 exhibited a statistically significant spike precisely where the distance between the average control parameter in the pseudotime bin was closest to the critical parameter (Fig. S4B). This result suggests that our analytical framework may be directly applicable to detecting similar bifurcations in pseudotime-sorted scRNA-seq data.
Because, in this example, we have an explicit generative model (), given by Eqns 5 and 6), we can validate that just as m1c resembles a bifurcation from analysis of the covariance matrix, it also resembles a bifurcation of the full noisy GRN, from analysis of the Jacobian. We show that the maximum negative eigenvalue (λd) of the Jacobian (
) for this network approaches 0 from below as m1→m1c (Fig. 2F). We also show that at m1c, the direction of maximal covariance, is given by the corresponding eigenvector of the Jacobian (
), as the Euclidean distance between
, the principal eigenvector of the covariance, and
approaches 0, as m1→m1c (Fig. 2F). Thus, although the finite system size (nc) prevents, or regularizes, ω1 from diverging, and
, ω1 is still at its largest and the eigenvectors are in closest correspondence at the bifurcation.
To empirically benchmark the principal covariance eigenvalue as a bifurcation indicator, we selected genes with strong connections (|αi−0.5|>0.25) and computed the DNB order parameter as a function of m1 (see Appendix S1, section 2) (Liu et al., 2012; Chen et al., 2012; Li et al., 2019; Chen et al., 2018). We found that the variation of ω1 matched the variation of the DNB order parameter (Fig. 2D), providing empirical support for using ω1 to identify bifurcations. Importantly, computing ω1 did not require preprocessing, while computing the DNB order parameter requires a preprocessing step to select the DNB genes (Chen et al., 2018). Additionally, ω1 is computationally more efficient, as it is obtained via the singular value decomposition of the gene expression matrix, whereas the DNB order parameter is obtained via the correlation matrix across tens of thousands of genes (Hastie et al., 2009).
Although the covariance eigen-decomposition provides insight into the timing and direction of a bifurcation, Eqn. 4 predicts that the (Pearson) correlation coefficients between genes may help determine which genetic relationships are most critical for the dynamics at the bifurcation. We found that, for low and high m1, when the network only has one fixed point, the distribution of correlation coefficients Rij is strongly centered around 0 (Fig. 2G). However, at the bifurcation, this distribution spreads out to ± 1, as predicted in Eqn. 4 (Fig. 2G). To determine whether the gene pairs that yielded large Rij corresponded with critical gene relationships in our network, we plotted Ri,d(i): the correlation between all responder genes and their drivers, sorted by their connection strength αi. We found that these correlation coefficients were much more strongly indicative of the responder-driver dependency (αi) at bifurcation (Fig. 2H, green) rather than far away from the bifurcation (Fig. 2H, red and blue). Again, the GRN model makes it explicit that, although the correspondence between geometry and dynamics is not universal, owing to the high-dimensionality of the system, in the vicinity of a bifurcation, a form of dimensionality reduction emerges that enables the gleaning of geometric characteristics directly from the dynamics. Thus, entries of a correlation matrix with high magnitude at a bifurcation may be reliable indicators of mechanistic gene-regulatory features.
We further used our GRN model to probe a pitchfork bifurcation induced by varying kD (Fig. S5A). Unlike the example of a saddle-node bifurcation, we observed that ω1 does not peak at the bifurcation parameter kDc=0.5, but rather begins to increase (Fig. S5B). This feature directly follows our interpretation that ω1 corresponds to the distance between the two modes of transcriptomic distribution. Whereas the bimodality of the saddle-node bifurcation results from the discontinuous transition between states, the bimodality of pitchfork bifurcation emerges continuously from its root and becomes more pronounced as the control parameter is increased. Therefore, the distance between the modes (ω1) increases with the control parameter. By clustering the cells according to their transcriptomic mode, or branches, we are able to recover the bifurcation signature predicted by Eqn. 2 (Fig. S5C), but we note that precise clustering requires prior knowledge (e.g. how many clusters there are).
As developmental decisions are often modeled as noise-induced state transitions between co-existing transcriptomic states (Weinreb et al., 2018b; Zhou et al., 2021), rather than bifurcations, we sought to determine whether our framework could distinguish these two possibilities. We used our model gene network (Fig. 2A) to explore the noise-induced transition possibility by varying the noise scale s of the network at fixed values of the bifurcation parameters (see Appendix S1, section 5 for details). We found that the principal covariance eigenvalue exhibited unique step-like dynamics as the stochasticity was varied (Fig. S6), which was unlike either the one-to-one (Fig. 2D) or one-to-many (Fig. S5B) bifurcation examples explored, demonstrating that analysis of the covariance dynamics in transcriptomics should enable distinguishing between noise-induced transitions and bifurcations.
In this example, we have demonstrated the applicability and power of the theoretical infrastructure outlined above to analyze a high-dimensional and noisy dynamical system undergoing a variety of bifurcations, by uncovering its crucial aspects, including its location, direction in gene space and influential genetic relationships. These calculations are also computationally simple; although covariance matrices can be cumbersome to compute for large numbers of genes and cells, reduced singular value decomposition can be used to determine quickly its largest eigenvalue and eigenvector, which is all our approach requires. Notably, our results apply only if the system is measured at steady state, otherwise there is no reason to anticipate clear divergences in the distribution of eigenvalues, transient bimodality or equivalence between the covariance and Jacobian principal directions (Fig. S7B-D).
Covariance analysis pinpoints a bifurcation in neutrophil development
Having verified that gene-gene covariance can be used to provide insight into transitions in a simulated genetic context, we applied our analysis framework to a recently published scRNA-seq data set of mouse hematopoietic stem cell (HSC) differentiation (Weinreb et al., 2020). In this experiment, HSCs were isolated in vitro, barcoded, plated in a media that supports multilineage differentiation (day 0) and subsequently sampled for single-cell sequencing using inDrops (Klein et al., 2015 ; Plasschaert et al., 2018) on days 2, 4 and 6. The resultant transcriptomic matrix (25,289 genes in 130,887 individual cells) was visualized in 2D using the SPRING method (Weinreb et al., 2018a) (Fig. 3A), using 4 nearest neighbors in the top 50 PC space for highly variable, non-cell cycle genes (Weinreb et al., 2018a). Each cell was then associated with one of 11 different cell types (annotations in Fig. 3A) based on its position in the SPRING plot and expression of cell type-specific marker genes (Weinreb et al., 2020). Cells that belonged to the developmental transition from multipotent progenitor (MPP) to neutrophil were identified by recategorizing cells as a cell-label distribution, and ranking cells by their similarity to fully committed neutrophils (see details in Appendix S1, section 7.1). The 61,310 cells identified as belonging to the neutrophil transition were sorted into a neutrophil pseudotime trajectory (Fig. 3A) by ranking cells according to their similarity with the earliest pluripotent cells (see Appendix S1, section 7.1). This data-specific pseudotime algorithm was validated via the clonal barcodes of the cell, by ensuring that the MPP cells in the trajectory included neutrophil clones, and via the sequencing time, by ensuring that cells collected earlier were ranked earlier in the trajectory. Thus, several features of this trajectory make it ideal for applying our analysis framework: it includes a large number of cells, enabling statistically reliable covariance measurements; it is robust to the systematic, temporal controls of sequencing -time and cellular barcodes; and it is part of hematopoiesis, a well-characterized developmental process that enables the comparison of our findings with past work. We found that this trajectory was extremely dynamic, as the expression of hundreds of highly expressed genes is temporally variable, with large groups of genes either monotonically increasing or decreasing (Fig. 3B).
Covariance analysis of temporal scRNA-seq data shows signatures of developmental bifurcations. (A) SPRING visualization for each cell (point) in an in vitro scRNA-seq experiment of mouse hematopoeitic stem cell differentiation (Weinreb et al., 2020). Cells in the pseudotime trajectory analyzed are colored accordingly (blue to yellow), whereas others are gray. SPRING coordinates, cluster labels, pseudotimes and a similar visualization were first reported by Weinreb et al. (2020). (B,C) Observations of neutrophil trajectory as a function of pseudotime calculated for each of 121 bins of pseudotemporally adjacent cells. All bins had 1000 cells except for the last one, which had 1310 cells, and had a 50% overlap with neighboring bins. (B) Average gene expression in pseudotemporal bins for highly expressed [max (〈expr〉)>1] and highly varying [coeff. of var.(〈expr〉)>0.5] genes. (C) Principal covariance eigenvalue (black) compared with a statistical null (gray, details in Appendix S1, section 4), shifted to have 0 min. Error bars of null are ±1s.d. DNB order parameter also shown, where the DNB comprises neutrophil marker genes (purple dashed line) or is the average of many random gene sets (purple solid line, see Fig. S9 for details). Green and yellow dashed lines indicate the developmental transition times τd and τm.
Covariance analysis of temporal scRNA-seq data shows signatures of developmental bifurcations. (A) SPRING visualization for each cell (point) in an in vitro scRNA-seq experiment of mouse hematopoeitic stem cell differentiation (Weinreb et al., 2020). Cells in the pseudotime trajectory analyzed are colored accordingly (blue to yellow), whereas others are gray. SPRING coordinates, cluster labels, pseudotimes and a similar visualization were first reported by Weinreb et al. (2020). (B,C) Observations of neutrophil trajectory as a function of pseudotime calculated for each of 121 bins of pseudotemporally adjacent cells. All bins had 1000 cells except for the last one, which had 1310 cells, and had a 50% overlap with neighboring bins. (B) Average gene expression in pseudotemporal bins for highly expressed [max (〈expr〉)>1] and highly varying [coeff. of var.(〈expr〉)>0.5] genes. (C) Principal covariance eigenvalue (black) compared with a statistical null (gray, details in Appendix S1, section 4), shifted to have 0 min. Error bars of null are ±1s.d. DNB order parameter also shown, where the DNB comprises neutrophil marker genes (purple dashed line) or is the average of many random gene sets (purple solid line, see Fig. S9 for details). Green and yellow dashed lines indicate the developmental transition times τd and τm.
To determine whether the transitions from HSC to neutrophil were due to bifurcations in transcriptomic space, we split the neutrophil trajectory into overlapping bins of 1000 cells (last bin had 1310) and applied our covariance analysis to the full, row (cell)-normalized transcriptomic matrix at each bin G(τ). We found that the largest eigenvalue of the covariance of the full gene expression matrix [ω1(τ), dark line in Fig. 3C] exhibited very little variation for τ<τd=85, but began to increase at τd and exhibited a significant spike at τm=109, which is indicative of a bifurcation. To determine whether ω1 changes were statistically significant, we computed the corresponding statistical null (, lighter line in Fig. 3C; details in Appendix S1, section 4) and found that the large peak at τm was easily distinguishable from the null. To benchmark our result against established bifurcation identification methodologies, we computed the DNB order parameter (Chen et al., 2012) across pseudotime, using known marker genes for neutrophils and neutrophil progenitors (Weinreb et al., 2020) as the DNB (see details in Appendix S1, section 2). We found that the DNB order parameter (Fig. 3C, purple dashed line) exhibited similar dynamics to ω1(τ), but we emphasize that, unlike the computation of ω1(τ), computing the DNB requires gene filtering, as random sets of genes did not exhibit bifurcation signatures (Fig. 3C, purple solid line; Fig. S9). We also verified that bin size did not generally impact the dynamics of ω1(τ) (Fig. S10).
We first focus our attention at the dynamics at τm, after which we will address those observed at τd. As this pattern of a statistically significant spike following near-constant ω1 echoed the observed behavior of a saddle-node bifurcation in our toy model (Fig. 2D), we speculated that at τm there was a one-to-one transcriptomic state transition. Additionally, to verify that the temporal trend in Fig. 2D was not limited to the diffusion-based pseudotime algorithm used by Weinreb et al. (2018a), we recalculated pseudotime using Slingshot (Street et al., 2018), and found the same rise and peak of ω1 (see Appendix S1, section 7.2 and Fig. S11A,B).
We focus now on our observations in proximity to τd. As the increase in ω1 at τd strongly resembled the pitchfork bifurcation of our toy model (Fig. S5B), as well as the proliferation of cell fates seen in high-resolution time-course scRNA-seq experiments (Nitzan and Brenner, 2021), we hypothesized that the increase of ω1 at τd was also due to transcriptomic state changes. Further evidence of a developmental transition is that, at τd, the distribution of expression of each gene across cells begins to significantly shift toward higher values (Fig. S10C). However, the precise nature of this developmental transition is unclear, because, in contrast to our toy model, is nearly indistinguishable from
during the increase.
To determine whether the transcriptomic state transitions we identified had biological significance, we compared our findings against the tree of cell fates for neutrophil development (Fig. 4A). We found (Fig. 4B) that τd, the moment ω1 begins to increase, corresponded well with the moment in pseudotime that cells switch between the endpoints of this tree: from not expressing any terminal-fate marker genes to primarily expressing neutrophil cell-fate markers (Weinreb et al., 2020). At a more granular developmental level, the pseudotimes highlighted by our covariance analysis align with specific transitions between intermediate neutrophil progenitor states (Fig. 4A). These transitions include: (1) one-to-many cell fate changes (i.e. decisions), such as the transition between a granulocyte monocyte progenitor (GMP, or myeloblast) and any of its four terminal fates (neutrophil, monocyte, eosonophil and basophil); and (2) one-to-one cell fate changes (i.e. maturation), such as the transition between promyelocyte and myelocyte (Weinreb et al., 2020; Borregaard, 2010; Ostuni et al., 2016). τd corresponds well with the pseudotime at which promeylocyte marker genes are maximal (Fig. 4C and Fig. S11C), suggesting a connection between τd and the one-to-many change from GMP to promyelocyte. This can perhaps be understood in light of other one-to-many state transitions, such as a pitchfork bifurcation (Fig. S5A), where ω1 increases steadily if different branches are left unclustered (Fig. S5B). Although the null and signal were significantly closer in the neutrophil trajectory than in the toy model, this discrepancy may be related to the difficulty in identifying one-to-many transcriptomic changes in non-ideal conditions, compared with one-to-one transitions (see Appendix S1, section 6 and Fig. S8 for details). Accordingly, the increase of ω1 at τd suggests that, although cells in the neutrophil trajectory are expected to include only the neutrophil lineage branch of GMP, they may in fact include other GMP lineages, such as eosonophils or basophils.
Detected bifurcations correspond to biologically characterized developmental transitions. (A) Schematic of neutrophil development, beginning from hematopoietic stem cells and ending at the neutrophil myelocyte – a committed neutrophil progenitor. Lines indicate naturally occurring progeny, other than the cell type itself. Subsequent neutrophil-committed fates (neutrophil metamyelocyte, band cells and neutrophils) are not shown. Cell type images by A. Rad and M. Häggström. CC-BY-SA 3.0 license. (B) Fraction of cells in each cell type, based on annotated clustering in Weinreb et al. (2020). (C) Average expression of promyelocyte (blue) and myelocyte (gold) marker genes (Weinreb et al., 2020). Error bars (s.e.m.) are smaller than symbols.
Detected bifurcations correspond to biologically characterized developmental transitions. (A) Schematic of neutrophil development, beginning from hematopoietic stem cells and ending at the neutrophil myelocyte – a committed neutrophil progenitor. Lines indicate naturally occurring progeny, other than the cell type itself. Subsequent neutrophil-committed fates (neutrophil metamyelocyte, band cells and neutrophils) are not shown. Cell type images by A. Rad and M. Häggström. CC-BY-SA 3.0 license. (B) Fraction of cells in each cell type, based on annotated clustering in Weinreb et al. (2020). (C) Average expression of promyelocyte (blue) and myelocyte (gold) marker genes (Weinreb et al., 2020). Error bars (s.e.m.) are smaller than symbols.
Conversely, τm corresponds well with the pseudotime when myelocyte marker genes are maximal and promyelocyte genes have reduced expression (Fig. 4C and Fig. S11C), suggesting that τm indicates the transition point between these two cell fates. Although the myelocyte marker genes begin increasing earlier than τm in the trajectory, this may be because of additional cellular processes that smooth out their dynamics over the course of a cell-fate transition. Alternatively, the marker gene dynamics may indicate that the transition at τd is connected to the transition at τm; e.g. the eigenvalue dynamics at τd may hint at an early bias toward the ultimate myelocyte transition, similar to other developmental biases that have recently been identified in hematopoiesis (Wang et al., 2022).
Thus, by using Eqn 2 to quantify the geometry of neutrophil development, we were able to recover the known GMP-neutrophil cell fate decision, qualify the trajectory as likely including other lineages and pinpoint a maturation step in neutrophil development. Importantly, this analysis also highlights the difficulties in using the principal covariance eigenvalue alone to characterize bifurcations, as one-to-many bifurcations in particular may be extremely sensitive to small errors or biases. To address these difficulties, and provide additional insights into possible dynamical transitions, we now leverage a key feature of the covariance analysis outlined above: that signatures of the underlying mechanisms driving a system through a bifurcation are evident in its principal covariance eigenvector.
Covariance eigenvectors provide interpretable low dimensional representations of neutrophil bifurcations
Perhaps the most surprising consequence of the Continuous Time Lyapunov equation, encapsulated in Eqn. 3, is that a high-dimensional bifurcation eigenvector, which is a characteristic of the underlying Jacobian of the system, is directly calculable from the transcriptomic-state data, as it equals, up to a sign, the principal covariance eigenvector . This result motivated us to probe
, the principal eigenvectors of the covariance matrix as a function of pseudotime, and in particular its structure in the vicinity of τd and τm, to glean further insight into the biological nature of our detected transition points.
We first sought to determine the uniqueness of , as it is extremely high dimensional, by measuring how it varies across pseudotime, compared with average gene expression. We found that the correlation of average gene expression across pseudotime exhibits an approximate two-block structure throughout the trajectory, in which expression is well correlated in τ∈[0, 80] and again in τ∈[90, 121t] (Fig. 5A), hinting at the existence of two gene expression states. Interestingly, the correlation of
was significantly more detailed, exhibiting as many as six distinct blocks, with higher positive correlation and lower negative correlation than seen in expression (Fig. 5B). Importantly, τd and τm align well with transitions between blocks, further bolstering their significance as markers of developmental transitions. Thus, the variation of
, even when it is high dimensional, may reveal significant detail in the structure of a developmental trajectory.
Analysis of high-dimensional bifurcation directions. (A) Pairwise correlation of average normalized gene expression in each pseudotime bin. (B) Pairwise correlation of the principal covariance eigenvector of each pseudotime bin. (C) Projection of normalized gene expression along principal eigenvectors at τ0, τd and τm. Each dot is a cell and its color indicates the pseudotime. Inset shows corresponding position of the cell in the SPRING plot (Fig. 4A). (D) Average weight, per gene, of the highest weighted categories in the KEGG database for Mus musculus, for the principal covariance eigenvectors at τ0,τd, and τm. (E) Distribution of gene expression projected onto , near τd (top), and distribution of gene expression projected onto
near τm (bottom). (F) As in C, but the color of points (cells) indicates their cluster in the GMM; insets are split by cluster.
Analysis of high-dimensional bifurcation directions. (A) Pairwise correlation of average normalized gene expression in each pseudotime bin. (B) Pairwise correlation of the principal covariance eigenvector of each pseudotime bin. (C) Projection of normalized gene expression along principal eigenvectors at τ0, τd and τm. Each dot is a cell and its color indicates the pseudotime. Inset shows corresponding position of the cell in the SPRING plot (Fig. 4A). (D) Average weight, per gene, of the highest weighted categories in the KEGG database for Mus musculus, for the principal covariance eigenvectors at τ0,τd, and τm. (E) Distribution of gene expression projected onto , near τd (top), and distribution of gene expression projected onto
near τm (bottom). (F) As in C, but the color of points (cells) indicates their cluster in the GMM; insets are split by cluster.
We next sought to examine whether contain axes of a simplified space along which to examine the trajectory. As the correlation between eigenvectors exhibited distinct blocks, two of which had bifurcative dynamics, we used
near the beginning of the trajectory and near τd and τm to revisualize the neutrophil trajectory. For τ0 and τd, we used an
toward the middle of its block (at
and
, respectively; see Appendix S1, section 8 for details) to ensure that the direction had stabilized; however, for τm we used the vector precisely at τm, as the direction of a one-to-one transition may only be observable at the moment of bifurcation (Eqn. 3). We found that for the majority of the trajectory, the variation was localized to
, but starting near τd, cells began to vary along both
and
(Fig. 5C). As the variation along
and
coincided, and they had a high correlation of 0.67 (Fig. 5C), while both being nearly completely orthogonal to
, we sought to determine their differences. We found that whereas the fraction of variation of gene expression along
begins to increase near τd and remains high even around τm (Fig. S12A), highlighting the importance of
for the transition at τm,
accounts for only a significant fraction of variation very close to τm (Fig. S12A), suggesting additional features that are distinct from
(Fig. S12A). Additionally, the projection of the vector tangent to gene expression onto these eigenvectors appears high for both bifurcation eigenvectors after τd, suggesting that they are simultaneously involved in the gene expression dynamics (Fig. S12B) and that, at least from τd onward, the trajectory appears multi-dimensional. We found hints of the distinctly
features in the gene expression projection (Fig. 5C), where towards the end of the trajectory, when cells have already deviated far from the
plane, some of the cells appear closer to that plane. Thus, the increased variance along
reflects the spike of ω1 at τm (Fig. 4C), and further illuminates it by showing that the direction of the bifurcation is toward the pluripotent state.



Last, we used to examine whether the increased ω1 at τd and τm is due to the emergence of a second distinct cellular population or, alternatively, to an increase in the transcriptomic variance. Concretely, if the increased ω1 is due to a second population of cells, then the projection of gene expression at τ along
should appear bimodal. We found that this projection widened near both τd and τm (Fig. 5E and Fig. S12C), reflecting their increased ω1, and exhibited bimodality at τm, suggesting the emergence of a second population of cells. This bimodality is also apparent from the gene expression distributions for many of the genes with highest weights in
, including Fth1, Psap and Ccl6 (Fig. S13). To further disentangle the two populations of cells, we fit a two-peak Gaussian Mixture Model (GMM) to G(τm) (see Appendix S1, section 9 for details) (Pedregosa et al., 2011). The GMM clearly distinguished the two modes in bimodal gene expression distributions, and showed that many other genes, such as S100a9 and Ngp, which appear initially to have unimodal distributions, also exhibit bimodal structure (Fig. S13).
To evaluate the dynamics of the two cellular populations, we used the GMM to predict the cluster label for all cells in the full trajectory. We found that, for τ<τd, all cells belonged to the same cluster (GMM-a), and began to separate into two clusters (GMM-a and GMM-b) near τd (Fig. S14). Additionally, the two clusters exhibit contrasting ω1 dynamics: GMM-a has increasing ω1, while GMM-b has a spike in ω1 at τm (Fig. S14). Importantly, the clustering split the cells along , such that cells that were late in pseudotime, but close to the
plane, are part of cluster GMM-a, indicating that they are functionally earlier in the neutrophil specification path (Fig. 5F).
Summarizing, we found that the geometry of the data in transcriptomic space, explicated by the principal covariance eigenvector, yielded significant insight into developmental specification dynamics throughout pseudotime, especially at bifurcations. Throughout pseudotime, correlations between eigenvectors were able to pinpoint pivotal trajectory moments, including, and in addition to, those indicated by the principal covariance eigenvalue. Although the eigenvalue analysis alone did not clearly distinguish between the two transitions, the eigenvector analysis highlighted unique mechanistic features at each transition, suggesting they represent distinct developmental events. Eigenvectors at the bifurcations were also helpful in visualizing the trajectory, and inferring the molecular and cellular processes that reshape the developmental landscape. Furthermore, by viewing the bifurcating data along its eigenvector, we were able to discern bimodality and to distinguish between multiple cell fates within the same lineage. Thus, eigenvector analysis appears to be a promising new direction for analysis of pseudotime trajectories.
DISCUSSION
A singular challenge in understanding cellular fate transitions using transcriptomics has been dimensionality: cell fates are a low-dimensional functional description, a valley in Waddington's landscape, whereas gene-expression profiles are points in a myriad-dimensional space – how can gene expression possibly show the geometry of development? In this study, we have leveraged the continuous-time Lyapunov equation to show that the dynamics of state-variable covariance, even at high dimensionality, are sufficient to assess a crucial aspect of developmental geometry: when and how linear stability is lost to yield a bifurcation. Our central and novel result, based on a restricted region of the transcriptomic trajectories present during the process of hematopoiesis, is that the requisite statistical signatures of a bifurcation are detectable and present during development, even through the complex, high-dimensional lens of sequencing. Although biases and imperfections in data may confound developmental bifurcations, pairing analysis of both the principal covariance eigenvalue and eigenvectors enable us to disentangle multiple transition points, and elucidate mechanistic features that are normally completely hidden in the absence of a candidate mathematical model. Thus, our results have important consequences for the theoretical understanding of developmental transitions, the specific biology of neutrophil development and the analysis of dynamic biological data.
Our finding that a transcriptomic trajectory can have distinct geometric signatures, including durations during which the principal covariance eigenvalue is constant or spikes, has considerable consequences for the theoretical understanding of developmental dynamics. That we saw any consistent behavior in the principal covariance eigenvalue lends significant support to our initial hypothesis that cell fate modifiers operate at a much slower rate than transcriptomic modifiers, because if these occurred on similar timescales, no such statistical signature would be evident, let alone those that align well with current understanding. Additionally, whereas previous statistical analyses of scRNA-seq data found that developmental trajectories appear as monotonic proliferations of cell fates (Nitzan and Brenner, 2021), our focus on a single developmental trajectory enables the distinction of multiple developmental epochs, including durations of development during which cell fates do not undergo qualitative changes, but proliferate (the GMP-to-promyelocyte transition) and change state (the promyelocyte-to-myelocyte transition). Finally, our evidence of bifurcations starkly contrasts with scRNA-seq visualizations that show gene expression varying smoothly along a developmental path, and underscores the importance of understanding both noise and non-linear dependencies when using transcriptomic profiles to classify the fate of a cell (Moris et al., 2016).
Our analysis of the data of Weinreb et al. (2020) also yielded intriguing implications regarding the specific geometry of neutrophil development in mice. In particular, some of the known cell fate changes in neutrophil development were not distinguishable in the covariance eigenvalue trajectory [e.g. from common myeloid progenitors (CMPs) to GMP], which indicates that these changes are less bifurcative than the GMP-to-promyelocyte or promyelocyte-to-myelocyte transitions. This could mean, for example, that even when CMPs differentiate to GMPs, the transition lacks commitment and is dependent on a sustained developmental signal, whereas once cells transition from GMP to promyelocyte, they are committed to becoming neutrophils regardless of an external signal. Alternatively, these non-bifurcative transitions may be driven by early fate biases (Wang et al., 2022), and further research may be necessary to robustly determine which progenitor cell fates are statistically stable. Additionally, in comparing the principal covariance eigenvectors throughout pseudotime (Fig. 5B), it became apparent that the direction along which that fate change happened was well aligned with the direction of the GMP-to-promyelocyte transition. This result may be a sign of distinct, soft directions in transcriptomic space along which cell fates are most likely to change. In addition, as the steady increase and spike in the neutrophil trajectory (Fig. 3C) did not resemble the covariance dynamics of noise-induced state transitions (Fig. S6B), our results suggest that the promyelocyte and myelocyte transitions likely occur due to a loss of stability between fixed points, rather than stochasticity alone.
Aside from these geometric implications, pinpointing bifurcations in pseudotime also enhances analysis of temporal biological data, as it enables the efficient identification of the genes and molecular mechanisms that drive a cell fate transition. At a bifurcation, the principal covariance eigenvectors can aid visualization and highlight critical mechanisms that distinguish clusters (Fig. 5). As the principal covariance eigenvector is equivalent to the Jacobian eigenvector at the bifurcation, and the Jacobian directly reflects gene dynamics, the eigenvector may also be useful for constraining an inferred global Jacobian (Nägele et al., 2014). Additionally, the correlation matrix at a bifurcation may aid in building regulatory network models when combined with previous protein-interaction data or new experimental perturbations (Sun et al., 2015). Furthermore, it may be possible to incorporate our covariance analysis into other indications of pseudotime rank, such as cellular barcodes and low-dimensional distance, to constrain developmental trajectories along bifurcative paths.
Although we focus here on scRNA-seq data, our approach is broadly applicable, and could, in principle, aid in illuminating other aspects of high-dimensional biological dynamics, such as the relationship between development and evolution, or the genomic structural modifications necessary for fate transitions (Buenrostro et al., 2013; Jia et al., 2018). Our analysis was only possible because scRNA-seq experiments can now measure the expression of tens of thousands of genes in hundreds of thousands of cells, enabling accurate covariance measurements. That we found bifurcative events in these data implies that there are low-dimensional, non-linear dynamical systems at play, and that sufficient biological sampling, coupled with physics-based analyses, can reveal the knobs to controllably tilt developmental landscapes.
MATERIALS AND METHODS
Continuous time Lyapunov equation for transcriptomic matrices





















Covariance at bifurcation











Bifurcation eigenvector equivalence





Simulation methodology
Analysis pipeline
Eqns 2-4 imply an analysis pipeline characterizing bifurcations in high-dimensional temporal data, which we use in this article:
(1) Obtain highly sampled temporal data. Caveat: for data types such as scRNA-seq, where frequent sampling is difficult, and samples may include realizations from many different times, time may be inferable, using, for example, pseudotime inference (see Appendix S1, section 7.1).
(2) Bin the data along the temporal axis.
(3) Compute the largest eigenvalue of the covariance matrix (ω1) in each bin (e.g. using an off-the-shelf PCA function).
(4) Evaluate whether a bifurcation occurs by comparing ω1 with a suitable null (see Appendix S1, section 4): spike indicates a one-to-one bifurcation; steady increase indicates a one-to-many bifurcation.
(5) If a bifurcation is detected (e.g. at τc), compute and examine the principal covariance eigenvector at τc, as it reflects mechanistic aspects of the underlying dynamical system.
Acknowledgements
We thank Richard Carthew, Yogesh Goyal and Karna Gowda for reviewing the manuscript and providing suggestions. We thank Northwestern Information Technology for access to the Quest High-Performance Computing Cluster.
Footnotes
Author contributions
Conceptualization: S.L.F., S.G., M.M.; Validation: S.L.F.; Formal analysis: S.L.F.; Investigation: S.L.F., B.X., S.G., M.M.; Writing - original draft: S.L.F., S.G., M.M.; Writing - review & editing: S.L.F., S.G., M.M.; Visualization: S.L.F.; Supervision: S.G., M.M.; Project administration: S.G., M.M.; Funding acquisition: S.G., M.M.
Funding
This work was supported by the National Science Foundation-Simons Center for Quantitative Biology at Northwestern University and the Simons Foundation (597491 to M.M.). S.G. acknowledges the University of Toronto’s Medicine by Design initiative, which receives funding from the Canada First Research Excellence Fund, and a Natural Sciences and Engineering Research Council of Canada Discovery Grant. M.M. is a Simons Foundation Investigator. Open Access funding provided by Northwestern University. Deposited in PMC for immediate release.
Data availability
Instructions and Python code for reproducing all figures in this manuscript are available at http://github.com/simfreed/sc_bifurc_figs.
Peer review history
The peer review history is available online at https://journals.biologists.com/dev/lookup/doi/10.1242/dev.201280.reviewer-comments.pdf.
References
Competing interests
The authors declare no competing or financial interests.