Recent publications describe the development of in vitro models of human development, for which applications in developmental toxicity testing can be envisaged. To date, these regulatory assessments have exclusively been performed in animal studies, the relevance of which to adverse reactions in humans may be questioned. Recently developed cell culture-based models of embryo-fetal development, however, do not yet exhibit sufficient levels of standardisation and reproducibility. Here, the advantages and shortcomings of both in vivo and in vitro developmental toxicity testing are addressed, as well as the possibility of integrated testing strategies as a viable option in the near future.
Although only 2-4% of congenital developmental defects in children are attributed to known chemical or physical exposure during pregnancy or even before (Schaefer et al., 2011), the consequences of such an exposure can be tragic. Detrimental effects can manifest pre- or postnatally as growth and developmental retardation, malformations, functional defects and even death (Nikolaidis, 2017). The thalidomide tragedy of the early 1960s highlighted the importance of assessing the effects of substance exposure on unborn children, which has since been mandatory before the marketing of drugs and some chemicals. As such adverse effects on human development are entirely preventable if the toxic potential of a substance is known, rigorous testing is warranted. Here, reproductive toxicity assessment covers the adverse effects of substances at all stages of the reproductive cycle, including maternal and paternal fertility, and the developing organism, whereas developmental toxicity assessment focuses only on the latter (UNECE, 2021).
The question that has not yet been answered satisfactorily is how to reliably test substances for their adverse effects on human reproduction and development. For over 50 years, whole-animal models have been used owing to a lack of alternatives (Collins, 2006). Currently, the combined fields of developmental and reproductive toxicity analysis require the highest number of animals in EU regulatory toxicity testing (EC, 2021). Whether these animal models optimally represent the human situation, however, is debatable.
Faithful and reliable in vitro models could, therefore, contribute to the shared aim of academia, industry and regulatory authorities to replace animal experimentation as soon as scientifically possible (EU, 2010; 2021). Indeed, the past few years have seen an unprecedented surge in publications describing stem cell-based in vitro models with an ever-increasing similarity to the natural embryo and its developing organs. These models promise to improve our understanding of the molecular mechanisms driving processes such as embryogenesis, implantation and organogenesis, but may also facilitate practical applications, especially toxicity assessments. This Spotlight discusses present strategies of regulatory developmental toxicity testing and first attempts to introduce in vitro assays, and addresses the feasibility and potential impact of using novel in vitro models of human development in toxicity testing approaches.
The status quo: reproductive and developmental toxicology in animals
Regulatory reproductive and developmental safety testing of chemicals and drugs is currently only conducted in vivo using animal models (Box 1; Fig. 1) and is most typically performed in rats and rabbits (Collins, 2006). The requirements for toxicity data in Europe are most stringent for substances with an intended direct application in humans or with intended biological activity, such as pharmaceuticals and pesticides. For other chemicals, the testing requirements increase progressively with production and marketing quantities or when reproductive effects are anticipated (Hareng et al., 2005).
Regulatory safety testing of industrial, agricultural and other chemicals is performed according to the Organisation for Economic Co-operation and Development (OECD) test guidelines. Developmental and reproductive toxicity investigations start with screening assays according to test guidelines (TG) 421/422, in which exposure begins before mating, and effects on the parental and first generation shortly after birth are investigated. Higher production volumes require a prenatal developmental toxicity study (TG 414) focusing only on the gestation period using 800 rats, and an extended one-generation reproductive toxicity study (TG 443) with at least 1400 rats. Dosing in the latter starts before mating and continues in the offspring until sexual maturity; effects on the parental generation and adult offspring are examined.
The safety evaluation of drugs is specified in guidelines published by the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH). Depending on the target population, pharmaceuticals must undergo testing in all relevant stages of the reproductive cycle according to guideline ICH S5, generally comprising: (1) fertility and early embryonic development; 2) embryo-fetal development; and 3) pre- and postnatal development. Compounds are administered during the respective period and characterisation of effects on parental animals and offspring takes place at the end. These studies typically require 16-20 litters of rodents and rabbits per test condition, and their application is included in guidelines ICH M3 (nonclinical safety studies), S6 (biopharmaceuticals) and S9 (anticancer pharmaceuticals).
In vivo testing approaches aim to observe readouts, such as reduced litter size and frequency, as well as abnormalities in the offspring as a result of substance exposure. The major advantage of this method is the coverage of the whole reproductive cycle: detrimental effects caused on either the maternal or paternal reproductive systems, on embryonic, fetal or neonatal development can be inferred from the readouts. The established test species have a robust reproductive capacity with short gestation, large litter sizes and well-understood physiology. Furthermore, longstanding and widespread experience with in vivo studies has been gained and has led to the availability of extensive reference data (ICH, 2020). However, reproductive and developmental toxicity tested in routinely used animal models is not necessarily predictive for human biology. For example, the mean positive predictability for known human teratogens to cause teratogenicity across 12 animal species is only 60% (Bailey et al., 2005).
To increase the chances of detecting possible adverse effects on human development, most testing proposals require the use of a second animal species. However, there is also discordance in toxicological responses between test species. The rat, for example, can predict teratogenicity of only 61% of compounds with known teratogenicity in either rat, mouse or rabbit (Hurtt et al., 2003). Testing in a second species, therefore, also increases the likelihood of false-positive results: of several thousand developmental toxins identified in animal studies, only about 50 have similar effects in humans (Schardein et al., 1989). This is problematic because, for example, many drug candidates safe for human application may never advance to clinical testing. However, inter-species variability may also partly result from reproducibility issues between animal studies, rather than between animal species (Braakhuis et al., 2019).
By examining morphological effects on – or the lack of – offspring, current guidelines grant observational readouts rather than mechanistic insights. However, a variety of mechanisms can cause developmental toxicity, such as effects on signal transduction, or on the endocrine and nervous systems, or indirect effects due to maternal toxicity (Grindon et al., 2008). Without the identification of the toxicological targets, let alone the affected stage of the reproductive cycle, transferring study results to humans is associated with high uncertainty. Implantation, the process by which the blastocyst-stage embryo invades the maternal endometrium, is an example of a target for reproductive toxicity that is not adequately captured by current test guidelines, while exhibiting high inter-species variation (Bremer et al., 2007).
In summary, in vivo studies of reproductive and developmental toxicity suffer from significant inter-species differences and/or reproducibility issues. In addition, such approaches are expensive, time-consuming and ethically divisive (Schenk et al., 2010). To ensure that human developmental toxicants are correctly identified, more human-relevant mechanism-based reliable testing strategies are urgently needed.
Validated in vitro methods for developmental toxicity testing
Currently, the Database on Alternative Methods to Animal Experimentation (DB-ALM), which is provided by the European Union Reference Laboratory on Alternatives to Animal Testing (EURL ECVAM), contains three validated in vitro methods for developmental toxicology testing. These systems include the rat limb bud micromass (MM) test, rat post-implantation whole embryo culture (WEC) test and the mouse embryonic stem cell test (EST).
For the MM test, skeletal progenitor cells isolated from dissected limb buds of rat embryos are differentiated into chondrocytes. The inhibition of differentiation and decrease in cell viability induced by a test substance are examined (Flint and Orton, 1984). Thus, only toxicological effects on skeletal development can be evaluated. The WEC test also employs rat embryos. It observes normal early organogenesis (e.g. of the heart, ear, eye and limb buds) and interference in the development of these organs is indicative of embryotoxicity (New, 1978). As intact embryos are used, more comprehensive toxicological effects on the whole embryo and interactions between embryonic tissues can be modelled. Standard WEC procedure involves a cultivation period of 48 h (Piersma, 1993), making it the fastest assay discussed here (not counting the required animal husbandry).
The EST uses mouse embryonic stem cells (mESCs), which are derived from the pre-implantation inner cell mass of blastocyst-stage embryos (Doetschman et al., 1985). Upon aggregation, mESCs form embryoid bodies (EBs) and undergo spontaneous differentiation along the lineages of the three germ layers (endoderm, mesoderm and ectoderm) that form the future organism. Several differentiation protocols for the generation of distinct cell types, such as cardiomyocytes, are based on EB formation (Brickman and Serup, 2017). Indeed, the EST assesses the inhibition of differentiation into cardiomyocytes via EBs, as well as a relative comparison of cytotoxic effects on mESCs and fibroblasts (Spielmann et al., 1997). However, differentiation patterns within EBs are disorganised and heterogenous, and, therefore, do not truly recapitulate natural development (Shankar et al., 2021).
Considerable efforts have been made to extend the potential of the EST for embryotoxicity evaluations: (1) by analysing tissue-specific gene expression, e.g. with a flow cytometry-assisted version based on the expression of marker proteins specific for developing heart (Buesen et al., 2009); (2) by measuring EB size, a method that has been validated recently with an accuracy of more than 80% (Lee et al., 2020); (3) by incorporating transcriptomics (reviewed by van Dartel and Piersma, 2011); or (4) by integrating additional tissue differentiation protocols, such as for osteoblast (Chen et al., 2015) and neuronal lineages (Baek et al., 2012), as these can be relevant to the manifestation of embryotoxicity. These and other new strategies are sought to render the EST more quantifiable and predictive. Further work is required, however, to standardise these approaches. Protocols for the derivation and analysis of human EBs have been published, but not validated (e.g. Flamier et al., 2018).
All three validated in vitro methods require reduced cost and time compared with in vivo studies. However, although partly complementing each other, they only represent narrow developmental windows during which they can predict toxicological effects; in the case of the EST and MM tests, this predictivity is further limited to individual tissues only. Fetal-maternal interactions cannot be addressed unless the maternal metabolism and the placental barrier are reproduced in the model. Moreover, none of these methods uses a human-relevant model; the EST – which is based on permanent cell lines – is the only one that does not require the sacrifice of pregnant animals.
Although they have a prediction accuracy of 70-80% (Genschow et al., 2002), the application of the three validated in vitro methods in regulatory developmental toxicology is still not widely accepted, because they are not suitable as stand-alone replacements of animal experiments. Nevertheless, the pharmaceutical industry does apply the EST in particular, because of its good performance and ease of use, and also the rat WEC test and zebrafish whole-embryo culture for compound screenings and guiding internal decisions (Paquette et al., 2008; Whitlow et al., 2007).
Although individual in vitro tests cannot model all the processes involved in development, they can be integrated into a testing strategy. For this, human development (or the whole reproductive cycle) can be divided into well-defined stages, each represented by specific assays that cover all relevant mechanisms of toxicity and thus provide a better assessment than individual assays. Rajagopal et al. have proposed 14 key stages of human reproductive biology and embryo-fetal development; Fig. 1 shows a simplified version of this timeline (Rajagopal et al., 2022).
The feasibility of such a battery approach for reproductive toxicity testing has already been investigated. For example, the European ReProTect project has applied ten test chemicals to 14 in vitro assays (including the WEC test and EST) covering different endpoints related to the endocrine system, fertility and embryonic development; most predictions were correct compared with whole-animal data (Schenk et al., 2010). Similarly, the European ChemScreen project investigated the developmental and reproductive toxicity of 12 compounds with reliable in vivo data in 31 assays (including the EST) and correctly predicted 11 (van der Burg et al., 2015). Unfortunately, none of the identified test strategies have been approved for regulatory reproductive toxicity testing to date.
One recognised challenge for advancing in vitro toxicity testing is that the scientific basis informing the decision-making often lags behind state-of-the-art methods (Carusi et al., 2022). Indeed, the formal validation of the three in vitro tests for embryotoxicity was published 20 years ago (Genschow et al., 2002), whereas the tests themselves were developed in the 1970s-1990s. The progress in embryonic developmental modelling and mechanistic understanding of the past decade may now lead to more modern methods being considered as part of a testing strategy.
An opportunity for in vitro models of human development
Research on human embryos is legally restricted and their availability is limited (Matthews and Moralí, 2020); however, recent advances in stem cell-based in vitro models now offer the possibility to interrogate the mechanisms of human embryo-fetal development and to test for their disruption. For example, several groups have developed human blastocyst-like structures, so-called blastoids, by aggregating either naïve pluripotent stem cells (Kagawa et al., 2021; Yanagida et al., 2021; Yu et al., 2021), expanded potential cells (Fan et al., 2021; Sozen et al., 2021) or fibroblasts undergoing reprogramming to pluripotency (Liu et al., 2021). Although these approaches yielded constructs with similar morphology and cell lineage configuration to human blastocysts, it has since been suggested that only the approaches by Yanagida et al. and Kagawa et al. may represent faithful models of the natural blastocyst stage (Zhao et al., 2021 preprint). Owing to its inaccessibility in vivo, peri-implantation processes are particularly challenging to study experimentally, even in animals. Blastoids, which incorporate extra-embryonic cell types that are important for the realisation of implantation, could potentially be applied to test the effects of drugs or chemicals on these processes. By co-culturing their blastoids with human endometrial epithelial cells, Kagawa et al. have already provided a conceivable model of this: they reported successful attachment and cellular interaction. For a more detailed comparison of blastocyst-like structures we refer the reader to Terhune et al. (2022).
Other stage-specific in vitro models can also represent crucial developmental processes that follow implantation. Gastruloids, for example, replicate multiple important aspects of gastrulation, the process that specifies the three germ layers endoderm, mesoderm and ectoderm (reviewed by van den Brink and van Oudenaarden, 2021). Under specific culture conditions, adherent human pluripotent stem cells differentiate and distinctly segregate into these three germ layers as two-dimensional gastruloids (Warmflash et al., 2014), while their aggregation in suspension forms three-dimensional (3D) gastruloids (Moris et al., 2020). These 3D constructs exhibit symmetry breaking, elongation along the anterior-posterior axis and an organisation of derivatives of the three germ layers that follows the human embryonic body plan. In the mouse context, 3D gastruloids with more advanced features, such as somite development (Veenvliet et al., 2020) and beating heart-like structures (Rossi et al., 2021), have been generated, although no such improved human gastruloid models have been reported yet.
Finally, human organoids could be used to investigate adverse effects on organogenesis, e.g. of heart and brain (reviewed by Fritsche et al., 2021). Organoids are 3D structures that resemble a specific organ in terms of the presence of multiple representative cell types, the spatial organisation of these cells, as well as the replication of some specific functions of the organ. As human organoids generated from pluripotent stem or progenitor cells self-organise according to the same basic developmental programmes as the organ itself, organoid cultures enable the in vitro recapitulation of aspects of in vivo organogenesis (Lancaster and Huch, 2019). Furthermore, in vitro differentiated cell types, also within organoids, are often described as displaying an immature or fetal phenotype (Holloway et al., 2019), which presents a disadvantage when modelling adult tissues, but seems optimal for studying effects of substance exposure on developing organs. Organoids usually lack vasculature and immune cells, as well as general tissue crosstalk present in vivo, which can be remedied by co-culture approaches, e.g. in microphysiological systems (Fritsche et al., 2021).
These three examples of in vitro models of development can be produced in high numbers with relative ease, making them compatible with toxicological screens and automated data collection, consequently requiring less time and resources than toxicity studies in animals, much like the EST. However, the novel models are more faithful to natural human development than EBs, in terms of morphology and cell lineage composition (Simunovic and Brivanlou, 2017). Therefore, they might be expected to have similar – if not better – predictive power. Blastoids have not yet been studied in the context of toxicology, whereas gastruloids have shown a good predictivity for known in vivo teratogenicity of a small selection of compounds (Mantziou et al., 2021), and multiple organoid models have already been demonstrated to predict adverse chemical reactions (reviewed by Matsui and Shinozawa, 2021).
As the discussed models represent different aspects of development (Fig. 1), they could be feasible to include in a human-relevant testing strategy of developmental toxicity. Of note, the described models would not be able to represent the whole reproductive timeline. To cover all reproductive stages, additional in vitro assays investigating, for example, male and female fertility would be required. In order for all in vitro systems to be applied in the context of toxicity testing, the methods need to be validated. For this, high reproducibility in consecutive experiments and transferability between laboratories based on detailed protocols are crucial prerequisites and need to be demonstrated (Kugler et al., 2017). Indeed, the self-organising nature of the discussed in vitro models creates a complexity that introduces experimental variation and causes a limited reproducibility (Kagawa et al., 2021; Lancaster and Huch, 2019; van den Brink and van Oudenaarden, 2021). A solution to this might be found in the bioengineering of 3D micro-environments (Holloway et al., 2019), as well as in the automation of the processes of separating constructs with the desired phenotype and of analysing them – which would also reduce bias. Furthermore, a systematic refinement of culture conditions and a standardisation of protocols may help to increase experimental robustness.
It is also important that a sufficient number of both positive and negative reference chemicals (i.e. with known toxic and non-toxic properties) are included in the validation process to assess specificity and sensitivity of the respective assay (Marikawa, 2022). In general, animal reference data are available, but may not apply to the human situation. Although human reference data are limited and less standardised, the relevance to humans is of particular importance due to the differences in toxicological responses between species discussed earlier. A careful curation of reference data may, therefore, be needed to allow a reliable assessment of the predictive capacity of any given method.
The development of in vitro models of human development is a desirable endeavour in itself, because this technology enables the elucidation of mechanisms underlying implantation, early embryonic development and organogenesis in the human context. A further objective is to implement such systems for new applications, such as advancing developmental toxicity testing to be reliable without compromising the protection of either humans or non-human animals. Here, it is time to reassess the status quo: developmental toxicology needs to become more predictive for the human organism based on identified pathways of toxicity.
To this end, toxicological risk assessment is increasingly interested in ‘adverse outcome pathways’ (AOPs), which describe the defined sequence of molecular and cellular events required to produce a toxic effect in an organism. In vitro assays reflecting specific events of this chain can be combined with available in vivo animal and epidemiological human data to develop Integrated Approaches to Testing and Assessment (IATA). Establishing an IATA for developmental toxicity or, more broadly, for reproductive toxicity is conceivable. Several AOPs pertaining to adverse reproductive outcomes have already been described (reviewed by Knapen et al., 2015). By allowing the identification of potential toxicological mechanisms that cannot be obtained from in vivo guideline studies alone, this approach will enable a more hypothesis-driven approach to testing.
The paradigm shift away from animal testing is not only favourable from a scientific perspective, but would also lessen the ethical and economic burden of regulatory testing. Well-validated testing strategies involving in vitro models already have great potential to be used in drug development to screen out the most toxic compounds before any in vivo testing, thereby reducing the number of animals needed. Eventually, in vivo tests may be replaced by IATA strategies altogether.
The changes, such as those outlined above, require the combined contribution of different stakeholders. Researchers working with embryo models have demonstrated the feasibility of recapitulating human development in vitro. Now, protocols need to be improved and validated to support a potential application in a regulatory context. OECD and ICH provide guidance to method developers on key elements, such as appropriate reference items, in a ‘Good In vitro Method Practices’ document (OECD, 2018) and guideline ICH S5(R3) (ICH, 2020), respectively. Specifically, standardisation and reproducibility issues require addressing. On the other hand, validation processes are very time and resource consuming, and are not easily achievable by individual research groups. Funding agencies, therefore, need to focus the allocation of funds more heavily on the development and also refinement of human-relevant methods. Regulatory authorities need to make the validation process worthwhile for involved parties by also improving the chances of an actual implementation of novel methods into a testing strategy. For this, they may have to rethink the one-to-one replacement relationship: instead of substitution, initial integration of alternatives to animal testing may be more productive, so that confidence in these methods can be built incrementally. Ultimately, as IATA strategies are devised, more research into the mechanisms of developmental toxicity is required, but so is an increased acceptance of human-relevant in vitro methods.
The authors thank Ailine Stolz and Michael Oelgeschläger for critical reading of the manuscript, and Matthias Simonis for support with preparing the figure.
Open access funding provided by Bundesinstitut für Risikobewertung. Deposited in PMC for immediate release.