Neuronal replacement therapies rely on the in vitro differentiation of specific cell types from embryonic or induced pluripotent stem cells, or on the direct reprogramming of differentiated adult cells via the expression of transcription factors or signaling molecules. The factors used to induce differentiation or reprogramming are often identified by informed guesses based on differential gene expression or known roles for these factors during development. Moreover, differentiation protocols usually result in partly differentiated cells or the production of a mix of cell types. In this Hypothesis article, we suggest that, to overcome these inefficiencies and improve neuronal differentiation protocols, we need to take into account the developmental history of the desired cell types. Specifically, we present a strategy that uses single-cell sequencing techniques combined with machine learning as a principled method to select a sequence of programming factors that are important not only in adult neurons but also during differentiation.
The goal of neuronal replacement therapy is to provide differentiated neurons to the brain. These are either the product of in vitro directed differentiation of embryonic stem cells (ESCs) or induced pluripotent stem cells (iPSCs), or result from the reprogramming (transdifferentiation) of differentiated adult cells into specific neuronal cell types. Establishing these protocols relies on the expression of a few transcription factors or signaling molecules (termed programming or reprogramming factors), or on treatment with cytokines, growth factors or small molecules (Kidder, 2014; Pfisterer et al., 2016; Huch et al., 2017). The results are variable; not all cells acquire the desired fate and even those that do vary in the extent of their differentiation. This has direct negative consequences on any potential neuronal replacement therapy.
Understanding how transcription factors and signaling molecules are utilized in diverse cell types to orchestrate their specification and differentiation is fundamental to achieve robust and efficient protocols for regenerative medicine. Although the idea of defining in vitro differentiation protocols that follow similar steps to embryonic development is not new (Cohen and Melton, 2011), we suggest here that, in light of recent advances in developmental neurobiology, new approaches can be employed to enhance in vitro differentiation and transdifferentiation efficiency. We discuss the concept of ‘phenotypic convergence’ and the genetic and epigenetic mechanisms that explain it, and we highlight how it is one of the reasons for pitfalls in current approaches. We argue that it would be beneficial to change the design of differentiation protocols, using single-cell RNA sequencing approaches and machine-learning algorithms, to transition from informed guesses of programming and reprogramming factors to algorithmically chosen methods.
Current neuronal differentiation strategies
Neurons are characterized by a number of different features that support their function: they express specific adhesion molecules that ensure synaptic specificity; they acquire specific morphologies and target different regions of the body; they form chemical or electric synapses; they use a range of different neurotransmitters to deliver a signal and express different neurotransmitter receptors to receive and propagate signals; and they secrete different signaling proteins. To acquire these features faithfully, neurons express transcription factors that regulate their structural, molecular and physiological characteristics and generate their impressive cell type diversity. These transcription factors control different features at different times during development and are thus expressed upon neuronal specification and/or during differentiation (Fig. 1). Indeed, the nature of these factors and the order in which they are expressed are fundamental for a neuron to acquire its features.
Current neuronal differentiation protocols rely on the synchronous or serial supply of a handful of transcription factors and signaling molecules to ESCs or iPSCs (Fig. 1). Alternatively, already differentiated cells (often non-neuronal) can be directly reprogrammed into a neuronal cell type by expressing one or a few reprogramming factors (Gascón et al., 2017). This conversion of one cell type to another via the misexpression of specific transcription factors was first performed in the 1980s (Davis et al., 1987). However, research in programming and reprogramming cells towards specific cell fates took off after the discovery of the Yamanaka factors, Oct4 (Pou5f1), Sox2, Klf4 and Myc, which were able to reprogram mouse or human fibroblasts into iPSCs. To discover these factors, Takahashi and Yamanaka (2006) screened different combinations of 24 selected factors known to be important for ESC fate and tested them for their efficiency to reprogram fibroblasts into iPSCs. This required a tremendous amount of work, which was rewarded by the 2012 Nobel Prize in Physiology or Medicine. Since then, multiple protocols have been established to reprogram differentiated cells into iPSCs (Takahashi and Yamanaka, 2016), and to differentiate iPSCs or ESCs into specific cell types.
The development of these protocols usually involves two steps: (1) identifying candidate genes that could have a decisive role in the development of the desired cell type; and (2) testing these candidates in different combinations for their efficiency in generating the desired cell type. Together, these protocols have allowed the generation of a variety of neuronal cell types using various starting points, including iPSCs (Hester et al., 2011; Ho et al., 2016; Nehme et al., 2018), ESCs (Hester et al., 2011), neuronal progenitors, fibroblasts (Pfisterer et al., 2016; Son et al., 2011; Wapinski et al., 2013; Xu et al., 2016; Xiao et al., 2018), astrocytes (Corti et al., 2012) or even T cells (Haag et al., 2018), and using a multitude of different neurogenic transcription factors [such as Neurog2, Ascl1, Lmx1a, Brn2 (Pou3f2) etc.]. Indeed, this approach has been applied successfully to induce the production of motor neurons, excitatory cortex neurons, GABAergic neurons, dopaminergic neurons and serotonergic neurons (Caiazzo et al., 2011; Hester et al., 2011; Ho et al., 2016; Xu et al., 2016).
Drawbacks and challenges in neuronal differentiation
Despite huge advances over the last 10 years in generating various neuronal cell types, most of the factors used for directed differentiation have been identified by informed guesses and extensive ‘trial and error’ approaches. Moreover, they often exhibit low (re)programming efficiency. Notably, three main drawbacks have emerged. First, the efficiency of these protocols, i.e. the percentage of cells that acquire the desired identity, varies from low (<10%; Son et al., 2011) to moderate (∼60%; Hester et al., 2011) and in very rare cases is high enough for clinical utilization. Second, even if the efficiency of programming were to be 100%, the cells generated do not belong to one specific cell type but rather represent a broad collection of related cell types that are often not completely differentiated. Recent single-cell analyses have uncovered the amazing neuronal type diversity of the human brain. Therefore, genetic protocols that are able to differentiate ESCs and iPSCs into cholinergic, GABAergic or glutamatergic neurons are destined to generate a variety of these cell types. Finally, in vitro generated neurons often do not correspond accurately to any cell type found to occur naturally within a primary tissue (La Manno et al., 2016). The value of these protocols for in vivo programming or reprogramming for neuronal replacement in the clinic therefore remains questionable. The only approaches close to clinical application are those that involve in vitro differentiation followed by purification and transplantation, such as that of fetal progenitor cells transformed into dopaminergic neurons to treat Parkinson's disease (Kefalopoulou et al., 2014; Barker et al., 2013) or of ESCs to retinal pigment epithelium to treat age-related macular degeneration (Schwartz et al., 2012; Schwartz et al., 2015). Nonetheless, these protocols have proven to be useful for modeling diseases in vitro and for studying these cell types in culture where the successfully programmed cells can be selected based on marker gene expression.
Moving forward, these and other challenges need to be overcome in order to achieve the goal of neuronal replacement therapy. As the brain is composed of a huge number of different cell types, each of which exhibits a unique developmental history and function, it is necessary to know how exact neuronal cell types are specified during development in order to recapitulate their development in vitro. In addition, how the efficiency of a given protocol is evaluated currently varies and needs to be addressed. Programming efficiency has generally been measured by assessing the expression of generic neuronal markers, such as Tuj1 (Tubb3), MAP2, NeuN (Rbfox3), Syt1, Syn1, as well as specific markers for serotonergic, dopaminergic or other neuronal types. But how the levels of these translate into neuronal identity and function in vivo remains unclear. Moreover, the functionality of a differentiated neuron is often measured by its capacity to produce spike trains or form synapses but, again, this does not necessarily mean that it is capable of performing the exact function of its natural counterpart.
Phenotypic convergence and in vitro differentiation
It is striking that multiple very distinct protocols are capable of programming stem cells or converting differentiated cells into a given cell type. After the four Yamanaka factors were discovered, a number of papers identified other combinations of pluripotency factors that are capable of inducing reprogramming, including members of the Sox, Klf and Myc families, along with Nanog, Lin28, Glis1 and others (Yu et al., 2007; Nakagawa et al., 2008; Abdelalim et al., 2014; Bourillot and Savatier, 2010; Maekawa et al., 2011). The same is true for factors that promote the in vitro differentiation of ESCs and iPSCs towards hepatocytes, cardiomyocytes and neurons. In fact, a recent study (Tsunemoto et al., 2018) evaluated 598 pairs of transcription factors (from a pool of 59 transcription factors) for their capacity to reprogram fibroblasts into neurons; 76 of the tested pairs (i.e. 13%) were able to reprogram fibroblasts into neuronal cells that expressed the expected neuronal markers, exhibited neuronal morphology, produced spike trains, and formed synapses. This raises very important questions: does this mean that there are many ways to generate functional neurons? What types of neurons are produced and are they all equivalent? Are they all candidates for neuronal replacement therapies?
This finding also illustrates the notion that the generation of specific neuronal features in a given cell type can be achieved by different regulatory mechanisms and through different routes of differentiation. This phenomenon is called phenotypic convergence (Fig. 2): two cell types can achieve the same phenotype (e.g. morphological, physiological or molecular characteristics) following different developmental paths and using different regulators (Konstantinides et al., 2018). The phenomenon of phenotypic convergence is observed in several in vivo contexts. For example, cholinergic neurons in Caenorhabditis elegans can be generated in vivo in different ways, i.e. different transcription factors are employed in different neuronal types to drive the expression of the cholinergic gene battery. The same is true for C. elegans GABAergic (Gendrel et al., 2016) and glutamatergic gene batteries. Drosophila optic lobes also employ different transcription factors to generate neurons with the same neurotransmitters (Fig. 2), and to drive the expression of other broadly expressed genes (Konstantinides et al., 2018). In extreme cases in C. elegans, the exact same cell type can be generated through distinct developmental paths (Mizeracka et al., 2019 preprint). Phenotypic convergence is also likely to exist in vertebrate neurons, which would explain why distinct differentiation cocktails can be used to generate similarly looking neuronal types. But the question remains: are the neurons generated by different cocktails the same cell type? Furthermore, although they might exhibit shared expression of particular markers, how much do these cell types differ?
The genetic and epigenetic landscape of neurons with convergent characters
Although two cell types may converge on a number of specific characters (e.g. having the same neurotransmitter), it should be noted that these two cell types are by no means equivalent. They will have followed alternative routes during their differentiation and this will undoubtedly have had consequences on their regulatory landscapes. Although these consequences may not be immediately clear from the specific markers that are tested at the end of the differentiation protocol, it is essential to understand them if these cells are to be used as a therapeutic resource.
When two differentiation cocktails are used to direct iPSCs towards neuronal identity (Fig. 2), the transcription factors that are supplied in the two cocktails are different, but lead to the expression of generic neuronal markers that direct the cell to acquire neuronal features and activate the expression of the machinery needed to generate action potentials. However, the two cocktails also activate a number of non-overlapping genes that drive the differentiated neurons to differ from each other. For example, although two different cocktails may both give rise to cholinergic neurons, the neurons that are generated from each protocol may have differences in the cell adhesion molecules or neurotransmitter receptors that they express (Fig. 2). Therefore, depending on which cell type one wants to generate in vitro, different transcription factor cocktails must be used.
The different neurons generated by the 76 pairs of transcription factors described above (Tsunemoto et al., 2018) share similarities with regard to a number of neuronal characteristics. Interestingly, several different transcription factor combinations are able to lead to the same neurotransmitter phenotype, providing yet another example of phenotypic convergence. However, the overall transcriptomes of these cells differ significantly (Tsunemoto et al., 2018), as they include different gene regulatory modules, indicating that they correspond to different cell types. Furthermore, when comparing these neurons to endogenous neuronal populations, only limited similarities to any known specific neurons are found. This highlights both the extent of phenotypic convergence, as well as how difficult it is to use candidate transcription factor approaches to identify in vitro differentiation cocktails that can generate specific neuronal types precisely.
Aside from their effects on transcriptomes, transcription factor cocktails have pronounced effects on chromatin marks in differentiating neurons. For example, neuronal progenitors in different parts of the mouse central nervous system have different capacities for producing different neuronal types. This appears to be mediated by the differential chromatin accessibility of their genomes (Metzis et al., 2018), which is determined early on in development depending on the spatial location of their progenitors. The same is true for Drosophila ventral nerve cord neural stem cells, in which spatial genes that are expressed early establish neuroblast-specific chromatin landscapes for the later-acting temporal transcription factors (Sen et al., 2019). It comes as no surprise that different transcription factors that are used to generate neurons (e.g. Ascl2 versus Neurog2) have very distinct effects on the chromatin landscapes of the generated neurons, which in turn affects the neuronal subtypes produced in protocols that use one gene or the other. For instance, Brn2, Ebf2 and Onecut2 bind to different genomic sites that are highly dependent on the chromatin landscape set up by Ascl2 or by Neurog2 (Aydin et al., 2019), leading to the production of different motor neurons.
Therefore, although two cell types may appear similar, they are not necessarily functionally equivalent. The obvious question that arises is how similar should the in vitro differentiated neuronal population be to the natural one to be able to complement it functionally in order to achieve clinical relevance? Looking in non-neuronal systems, we can draw some interesting conclusions. Chondrocytes, for instance, represent a strong example of phenotypic convergence. Seemingly identical chondrocytes can have ectodermal (neural crest, e.g. nasal chondrocytes) or mesodermal (lateral plate mesoderm and paraxial mesoderm, e.g. knee cartilage) origins (Taïhi et al., 2019). Interestingly, transplantation of nasal chondrocytes can restore knee cartilage defects (Taïhi et al., 2019). Although chondrocytes represent a fairly simple cell type with only three different subtypes (Ji et al., 2019) the main role of which (independent of its origin) is to secrete cartilage, this gives us hope that even if cells are not completely identical, they may still replace each other after transplantation. Cardiomyocytes, by contrast, appear to be less able to replace each other functionally. During development, these cells are generated by both first and second heart field progenitors (Später et al., 2014). The pluripotent stem cell-derived cardiomyocytes that are typically used for transplantation are a mixture of ventricular, atrial and nodal cardiomyocytes (Kadota and Shiba, 2019), making it difficult to assess the contribution of each to the cardiac subtypes. However, it has been demonstrated that atrial cells cannot replace ventricular cells functionally when transplanted into the left ventricle, as they retain their unique atrial phenotype (e.g. shorter calcium transient duration) (Rubart et al., 2003). It is thus not yet clear how faithfully in vitro differentiated cell types should recapitulate their natural counterparts to be able to substitute them, and this is something that needs to be studied in further detail.
Single-cell sequencing and machine learning as a means to predict core regulatory transcription factors
Providing terminal transcription factors to stem cells in order to program them largely ignores the developmental path of neurons, and we believe that this is the main reason behind the low efficiency of many programming protocols. However, the advent of single-cell sequencing techniques (Zheng et al., 2017), the development of trajectory inference (Trapnell et al., 2014; Cannoodt et al., 2016; Haghverdi et al., 2016; Setty et al., 2016; Nowakowski et al., 2017; Qiu et al., 2017; La Manno et al., 2018; Wolf et al., 2018; Saelens et al., 2019), and the use of machine learning algorithms to analyze large datasets now gives us the opportunity to identify (re)programming factors in a more rigorous way and to improve differentiation protocols. Based on these advances, we suggest six steps (Fig. 3) to develop differentiation strategies that are designed intelligently and can be used for neuronal replacement therapies: (1) perform single-cell sequencing at different developmental stages, from early embryogenesis all the way to a fully developed brain, using developing brains if available or organoids representing regions in which the specific neurons of interest are naturally produced; (2) use these data to build trajectories from neural stem cells to fully differentiated neuronal types of interest; (3) using machine learning, identify the transcriptions factors involved in the adult and developmental core regulatory complexes that are necessary for a given cell type to develop and that define this cell type; (4) use these transcription factors to develop new in vitro differentiation protocols; (5) evaluate the induced neurons using single-cell RNA sequencing and compare them with their natural counterparts; (6) depending on the results, use the data from the evaluation to adapt the protocol by improving the machine learning algorithms and repeating steps 4-5.
Step 1: Single-cell sequencing throughout development
A number of different single-cell sequencing platforms, such as 10x Genomics (Zheng et al., 2017) and Smart-Seq (Ramsköld et al., 2012), can be used to generate transcriptomic information from almost every neuronal type, even rare ones in the adult brain. The Human Cell Atlas (Regev et al., 2017) has already set out to generate such data for every adult tissue, and the Allen Institute for Brain Science (Tasic et al., 2018) and the Brain Initiative Cell Census consortium (Ecker et al., 2017) have shown that this is feasible in a short time frame. However, it is clear that many neuronal features, such as the acquisition of appropriate morphology, synaptic partner selection and synapse formation, are established early during development; thus, sequencing adult neuronal types is not sufficient to capture these events. Obviously, obtaining such data from different developmental stages requires a coordinated effort, equivalent to that of the Human Cell Atlas, with the goal of mapping all developmental lineage decisions. A main obstacle to gaining transcriptomic information during development is acquiring access to developing neurons in situ. Access to fetal brain tissue coming from second trimester pregnancy terminations is possible (Nowakowski et al., 2018; Mayer et al., 2019) but is unpredictable as well as being legally and ethically regulated. A potential solution to this involves the use of organoids as a surrogate for human brains. Human brain organoids are three-dimensional structures derived from ESCs or iPSCs. They self-organize and resemble a simplified human brain (Huch et al., 2017). Their main advantage is that they can be used to study human organ development and to recapitulate human diseases, while at the same time being amenable to genetic manipulation; they also hold promise for generating patient-specific models. However, brain organoid technology (Hattori, 2014; Qian et al., 2016; Di Lullo and Kriegstein, 2017; Paşca, 2018; Pollen et al., 2019) is still in its infancy and faces a number of limitations, mainly because organoids cannot recapitulate the sheer complexity of the brain; they do not contain all of the different cell types, they vary from one organoid to the next, and they mature slowly (Di Lullo and Kriegstein, 2017). Moreover, it is not clear how well organoids recapitulate normal development (Pollen et al., 2019). Nonetheless, recent protocols to generate brain organoids are highly reproducible (Velasco et al., 2019) and can recapitulate the main aspects of cortical development (Pollen et al., 2019), allowing the production of radial glial cells and neurons of all six different layers in a temporal fashion (Kadoshima et al., 2013; Hattori, 2014; Paşca et al., 2015; Qian et al., 2016; Di Lullo and Kriegstein, 2017; Huch et al., 2017). Human brain organoids even form an outer subventricular zone, a region that is missing in mice (Di Lullo and Kriegstein, 2017). The protocols for generating brain organoids are rapidly improving and hopefully with time will provide access to neuronal cell types at different stages of development in order to assemble their developmental trajectories.
Step 2: Trajectory inference
Once single-cell sequencing has been performed, trajectory inference algorithms (Trapnell et al., 2014; Cannoodt et al., 2016; Haghverdi et al., 2016; Setty et al., 2016; Nowakowski et al., 2017; Qiu et al., 2017; La Manno et al., 2018; Saelens et al., 2019; Wolf et al., 2018) allow the ordering of cell states during differentiation processes based on transcriptomic data. It is thus possible to order single cells that belong to the same cell type according to their age or level of maturity. These algorithms can then, for example, define molecules that are differentially expressed during differentiation, cluster genes according to similar expression trends, identify new regulatory dynamics, and pinpoint key differential events. A number of trajectory inference algorithms have been published over the last 5 years (Cannoodt et al., 2016) that allow the discovery of transcription factors (or other key molecules) that are differentially expressed at various stages of neuronal differentiation. The various trajectory inference algorithms available differ in their performance, each of them offering distinct advantages in terms of accuracy and stability of prediction, as well as usability and scalability (Saelens et al., 2019). Although there is no single algorithm that would work well in every dataset, methods such as PAGA (Wolf et al., 2019) and Slingshot (Street et al., 2018) appear to be accurate, reproducible and scalable to large datasets. Using these algorithms, developmental trajectories of cell types of interest in a human brain can be obtained.
Step 3: Identification of core regulatory factors that specify cell type identity
Each cell type is characterized by the expression of a number of transcription factors that can be used to distinguish it from other cell types. Together, these transcription factors generate the cell type-specific characteristics that are necessary for a given cell's functions and represent the core regulatory complex (CoRC) (Arendt et al., 2016) of each cell type. To generate a specific cell type accurately, it is necessary to identify its CoRC at each developmental stage and define how these transcription factors are expressed during and upon successful differentiation (Fig. 4).
But how is it possible to identify the CoRC of each differentiated cell type? First, the transcription factors should be expressed in developing or adult cells, as they are responsible for the generation of differences in gene expression between cell types, e.g. the gene module that is responsible for the generation and release of a particular neurotransmitter at the synapse; these features will be reflected in the transcriptome of the cells. Second, CoRC transcription factors have to be differentially expressed in some cell types compared with others. Such CoRC transcription factors can be identified from single-cell sequencing data using machine learning algorithms. These algorithms build mathematical models using training data that allow them to make predictions beyond these data. Different types of mathematical models exist, the simplest being linear regression; other types include decision trees, random forests and support vector machines. These models can be used to infer relationships such as covariance between genes. Covariance between transcription factors and terminal effector genes can be used to infer regulatory interactions between them, which can then be tested genetically. Specifically, one can use the single-cell sequencing data that were generated in step 1 to identify transcription factors and neuronal effector genes that are expressed in different neuronal types. Machine learning algorithms can then identify transcription factors that co-vary with neuronal effector genes and expression of which could predict the identity of the neuronal type (Konstantinides et al., 2018). These would reduce the hundreds of transcription factors that are expressed in each neuronal type to a handful (10-15) that control the acquisition and implementation of neuronal type identity.
A number of such predictive algorithms that were generated to accelerate high-throughput screenings of potential differentiation factors have used expression data from adult differentiated cells to identify cell type-specific differentiation cocktails (Cahan et al., 2014; Rackham et al., 2016; Duan et al., 2019) that perform better than those generated by trial and error. For instance, CellNet (Cahan et al., 2014; Morris et al., 2014) initially relied on 3419 published gene expression profiles of diverse cell types and tissues, such as ESCs, neurons, glia, muscle, fibroblasts, endothelial cells and hematopoietic stem cells, to identify gene regulatory networks that are expressed in specific cell types and are necessary for endowing cells with their correct identity. It was then used to compare the directed differentiation of stem cells to the direct conversion (transdifferentiation) of one cell type to another, and to improve protocols of in vitro transdifferentiation of B cells into macrophages (Morris et al., 2014). Mogrify (Rackham et al., 2016) similarly uses gene expression data and gene regulatory information to predict transcription factors required for different cell type conversions (Rackham et al., 2016), allowing the generation of new protocols for the transdifferentiation of human fibroblasts into keratinocytes and of human keratinocytes into microvascular endothelial cells. Reprogram-Seq (Duan et al., 2019) incorporates the use of single-cell mRNA sequencing and perturbation analysis to more accurately predict and evaluate transcription factor cocktails that can reprogram specific cell types, allowing, for instance, the identification of a new combination of transcription factors that can convert embryonic fibroblasts into epicardial cells.
As a second step, using the developmental trajectories emanating from steps 1 and 2 and machine learning algorithms, we can then identify the crucial transcription factors that are expressed during development to supplement the adult CoRC. The ability of the identified developmental and adult CoRC to define a cell type can then be tested by using these transcription factors to direct the programming of neural stem cells towards specific cell types in vitro. This will test both the effectiveness of the CoRC and will be a practical tool.
Steps 4-6: Develop new programming protocols and evaluate using single-cell RNA sequencing
Identifying the developmental and adult CoRC of a specific cell type will allow the development of neuronal differentiation protocols using these transcription factors as programming factors. Although it is not an easy endeavor to co-express numerous factors, a number of techniques, including traditional cDNA overexpression but also more elaborate CRISPR-based multiplexed genome engineering techniques (Campa et al., 2019) that allow for controlled expression or silencing of multiple genes in the same cell, could be used to supply these transcription factors in a synchronous or serial manner. Single-cell sequencing can then be used to evaluate the programming efficiency of any protocol and assess the extent of potential heterogeneities that may emerge during differentiation. It can also be applied at different stages of the in vitro differentiation process to construct trajectories of different cell types that can then be compared with the developmental trajectories that occur during embryonic development. More importantly, techniques such as ‘CellTagging’ (Biddy et al., 2018), which relies on the sequential delivery of heritable barcodes during in vitro differentiation, can be used for the simultaneous capture of lineage and cell identity. This can be particularly insightful in cases of lower programming efficiency and can provide an understanding of how undesired cell types arise during the differentiation process. This understanding, in turn, could be used to modify the differentiation protocol to tilt the balance towards the desired cell type, thus increasing the programming efficiency and decreasing nonspecific byproducts.
Other considerations when designing differentiation strategies
Although we have focused here on directed differentiation (i.e. from ESCs or iPSCs), direct reprogramming can also be used to generate specific neuronal cell types (Gascón et al., 2017). This process bypasses the pluripotency step and instead converts one cell type into another through transdifferentiation. However, it is less efficient, because evidence of the initial differentiation of the cell, which is prominently written into its chromatin, is difficult to erase. As such, the cells generated via direct reprogramming are often incompletely converted (i.e. they are hybrid cells) and may remain developmentally immature. The efficiency of direct conversion also, therefore, depends on the starting cell type and its relationship to the target cell type. For instance, direct conversions are not efficient in crossing germ layer boundaries (Sieweke, 2015; An et al., 2018), although they are not impossible (Vierbuchen et al., 2010; Karow et al., 2012). By contrast, the conversion of astrocytes or other glial cells to neurons is efficient (Berninger et al., 2007; Masserdotti et al., 2016). This highlights that the genetic regulatory context in which the transcription factors are expressed can enhance or repress reprogramming efficiency (Aydin and Mazzoni, 2019); therefore, in an already differentiated cell, the chromatin marks that are specific to the cell of origin must be erased, and the specific transcription factors of the target cell must then be expressed. This dependency of transcription factors on the state of chromatin has been clearly shown in C. elegans, where RNAi against a chromatin-regulating factor (lin-53) and overexpression of a terminal selector (che-1) are together sufficient to allow the conversion of a germ cell into a specific neuronal type (Tursun et al., 2011; Kolundzic et al., 2018). The best option for direct conversion is thus to select a cell type of origin for which the genomic landscape resembles that of the target cell type.
Beyond transcription factors and transcriptomes
It is clear that, although there are constantly expanding ways to overexpress transcription factors or downregulate genes of interest in a spatially and temporally controlled way, it is not so trivial to ‘write’ and ‘erase’ chromatin marks that might affect the reprogramming process. However, single-cell chromatin accessibility assays (Buenrostro et al., 2015; Cusanovich et al., 2015, 2018; Cao et al., 2018; Pliner et al., 2018) can now be performed to profile the landscape of different neuronal types and select the best candidate chromatin modifiers. In addition, single-cell epigenomic assays could offer complementary information to the transcriptome. For example, some transcription factors may operate in different ways depending on chromatin accessibility (Velasco et al., 2017). Moreover, some transcription factors, being expressed at low levels, may not be unambiguously recovered with single-cell sequencing, but their effects on chromatin might be more profound. This is in agreement with recent studies showing that single-cell transcriptomes are insufficient to separate progenitor cells that will generate distinct cell lineages (Weinreb et al., 2020). This means that there are some hidden layers of progenitor diversity (e.g. differences in chromatin landscapes and/or transcription factors expressed at low levels) that may affect their capacity to generate different cells and that might not be recovered using single-cell sequencing approaches.
Although the role of transcriptional regulation has been very well studied, post-transcriptional processes also affect cell fate, most notably microRNAs, which participate in different levels of reprogramming (Beh-Pajooh et al., 2018). For example, miR-145 regulates the expression of three of the four Yamanaka factors and is involved in a double-negative loop that controls pluripotency in iPSCs (Xu et al., 2009). SImilarly, a combination of microRNAs – miR-1, -133, -208 and -499a – is able to convert mouse cardiac fibroblasts to functional cardiomyocytes (Jayawardena et al., 2012, 2014). Different single-cell sequencing techniques are available that can detect microRNAs alone (mime-Seq; Alberti et al., 2018) or in conjunction with mRNA (single-cell microRNA-mRNA co-sequencing; Wang et al., 2019). Using these techniques, one can identify miRNAs that may be incorporated into the CoRC of transcription factors to better define and generate a particular neuronal identity.
Conclusions and perspectives
We have presented here a strategy to combine single-cell sequencing with machine learning to identify crucial programming factors that could allow us to recapitulate developmental routes and, ultimately, faithfully program in vitro neuronal cell types from stem cells. This strategy, however, only addresses the identification of molecules that are necessary to program a cell efficiently. A number of other considerations should also be taken into account to define improved differentiation protocols. For example, how does one ensure the expression of the necessary differentiation molecules at the appropriate timing and physiological levels? How can one control the environment in which a cell grows (i.e. the cell-cell signaling), which is profoundly different in a dish compared with its tissue of origin? Although this falls beyond the scope of this article, CRISPR-based methods that are constantly being developed can address many of these considerations (Cheng et al., 2013; Maeder et al., 2013; Perez-Pinera et al., 2013; Qi et al., 2013; Dahlman et al., 2015; Kiani et al., 2015; Boettcher et al., 2018). For example, Cas12a has recently been used for constitutive, conditional, inducible and orthogonal gene editing, whereby dozens of different genes can be independently manipulated (upregulated or downregulated) in a controllable, i.e. conditional and inducible, way (Campa et al., 2019).
Sydney Brenner once said ‘I will ask you to mark again that rather typical feature of the development of our subject; how so much progress depends on the interplay of techniques, discoveries and new ideas, probably in that order of decreasing importance’ (Brenner, 2002). As we have highlighted here, the development of exciting new techniques over the last few years clearly has the potential to lead to important discoveries and trigger new ideas. The interplay of these will hopefully lead to novel applications that can have a lasting impact and provide means for therapeutic interventions, such as regenerative medicine and neuronal replacement therapy.
We are very grateful to Esteban Mazzoni for invaluable discussions and comments during the preparation of the manuscript, as well as Tony Rossi and Gord Fishell for comments on the manuscript.
Work in the lab is supported by grants from the National Institutes of Health (R01 EY017916) and from the New York State Stem Cell Science (DOH01-C32604GG to C.D.). N.K. was supported by a postdoctoral Human Frontier Science Program fellowship (LT000122/2015-L) and is currently supported by the National Eye Institute (K99 EY029356-01). Deposited in PMC for release after 12 months.
The authors declare no competing or financial interests.