The origins and the early evolution of multicellular animals required the exploitation of holozoan genomic regulatory elements and the acquisition of new regulatory tools. Comparative studies of metazoans and their relatives now allow reconstruction of the evolution of the metazoan regulatory genome, but the deep conservation of many genes has led to varied hypotheses about the morphology of early animals and the extent of developmental co-option. In this Review, I assess the emerging view that the early diversification of animals involved small organisms with diverse cell types, but largely lacking complex developmental patterning, which evolved independently in different bilaterian clades during the Cambrian Explosion.
The discovery of deep homologies (see Glossary, Box 1) across bilaterian animals, and highly conserved, developmentally significant genes among cnidarians, sponges and the closest relatives of Metazoa, has revealed a new understanding about the early history of animals (Fig. 1). In the 1990s, the discovery of extensive conservation of developmental genes between vertebrates and arthropods, such as the Hox genes Pax6 and distalless, led to ongoing disputes regarding the morphology of the last common ancestor of these two clades (the protostome-deuterostome ancestor or PDA): a morphologically complex urbilaterian, based on assumptions that genetic homology implies conservation of developmental processes (e.g. Arendt, 2008; Carroll et al., 2001; De Robertis and Sasai, 1996; De Robertis, 2008; Knoll and Carroll, 1999; Panganiban et al., 1997), versus a morphologically simpler ancestral urbilaterian (Valentine et al., 1999; Davidson and Erwin, 2006; Erwin and Davidson, 2002; Genikhovich and Technau, 2017; Hejnol and Martindale, 2008; Tweedt and Erwin, 2015).
Basal. Toward the root of a phylogenetic tree.
Cis-regulatory element (CRE). Non-coding regulatory regions containing binding sites for transcription factors; they can be up- or downstream of target coding genes.
Co-option. The re-use of regulatory genes or entire regulatory subcircuits in a new developmental context.
Crown group. A clade containing the ancestor of all living species within the clade as well as any extinct taxa that originated after the ancestor.
Deep homology. The remarkably conserved gene expression patterns shared across bilaterians by many morphological structures traditionally not considered homologous, such as eyes or appendages.
Ediacaran–Cambrian radiation (ECR). The appearance in the fossil record and initial diversification of all major animal groups between about 570 and 520 million years ago; a stricter definition might focus on events in the early Cambrian Period, from about 539 to 520 million years ago.
Effector cassettes. Genes at the distal region of a GRN that control differentiation gene batteries responsible for cell-type specification; or suites of genes executing morphogenetic activities, such as cell motility or epithelial-mesenchymal interactions.
Gene regulatory network (GRN). Regulatory genes and the interactions between them, which control spatial and temporal expression patterns, and thus determine developmental cell fates.
Long-branch attraction (LBA). The incorrect inference that rapidly evolving lineages are closely related produced using some methods for reconstructing phylogenetic relationships.
Molecular clock/relaxed clock methods. A method for estimating divergence times between living taxa based on sequences with relatively regular rates of mutation and calibrated with fossil evidence. Relaxed molecular clock models improve estimates of divergence times by accounting for variation in mutation rates across lineages.
Node. Branchpoints within a phylogenetic tree (internal nodes) or tips at the end of the tree.
Synteny (micro-/macro-). Preservation of the order of genes on a chromosome from a common ancestor. Micro-synteny is limited to only a few genes while macro-synteny involves extensive conservation of gene order.
Over the past few decades, new experimental techniques and the study of genomes of a broader array of animals, and particularly of pre-bilaterian metazoan clades, have allowed an increasing emphasis on elucidating the evolution of the Metazoan regulatory genome and how the various components of the genome interact. Molecular clock estimates date the ancestral metazoan to about 750 million years ago (Ma) (dos Reis et al., 2015; Erwin et al., 2011; Lozano-Fernandez et al., 2017), but the appearance of different clades in the fossil record and other geological data place important constraints on the interpretation of genomic and developmental data. In this Review, I argue that genomic and developmental studies suggest that the most plausible scenario for regulatory evolution is that highly conserved genes were initially associated with cell-type specification and only later became co-opted (see Glossary, Box 1) for spatial patterning functions.
Networks of regulatory interactions control gene expression and are essential for the formation and organization of cell types and patterning during animal development (Levine and Tjian, 2003) (Fig. 2). Gene regulatory networks (GRNs) (see Glossary, Box 1) determine cell fates by controlling spatial expression of regulatory states, thus linking sequences to the development of body architectures. At the level of individual genes, cis-regulatory elements (CREs; see Glossary, Box 1) are non-coding regulatory regions that include promoters, which lie immediately upstream of the transcription start site(s), and enhancers containing multiple binding sites for transcription factors that may be up- or downstream of the target coding genes they influence. The regulatory state of a cell is determined by the combination of transcription factors expressed within the cell, which in turn reflects the states of GRNs. In bilaterians, developmental genes may have multiple enhancer and promoter regions, each responsible for different expression in different contexts. Three major classes of promoters have been identified in animals: adult (type I), ubiquitous (type II) and developmentally regulated (type III) (Lenhard et al., 2012; Haberle and Lenhard, 2016). Within enhancers, TF activity is combinatorial, with multiple TFs binding to activate (or repress) a single gene. Tissue-specific enhancer activity involves both high- and low-affinity binding sites for TFs, where the activity of low-affinity binding sites is particularly dependent on the order, orientation and spacing of TF-binding sites (Farley, et al., 2016). In most unicellular eukaryotes, regulatory sequences are short and adjacent to the genes they control; however, particularly in bilaterians enhancers may lie up to one million base pairs from the target gene (Levine and Tjian, 2003). Thus, a key question is when did the more complicated metazoan regulatory genome evolve?
Chromatin architecture provides an additional level of regulatory control. In addition to repression of gene expression by histones, additional architectural details of chromatin have only recently become clear through imaging and chromatin capture assays. Chromosomes in Bilateria [Node 5 (see Glossary, Box 1), Fig. 1] are spatially subdivided into topologically associating domains (TADs), within which regulatory interactions are more common than beyond the boundaries of a TAD (TADs themselves may be hierarchically nested). Binding sites for insulator proteins, dominated by CCCTC-binding factor (CTCF) sequences, demarcate TAD boundaries (Furlong and Levine, 2018).
Here, I review the growing knowledge of the regulatory genome and discuss what it reveals about the early history of animals. Three clear conclusions emerge. First, the roots of metazoan gene regulation lie deep in holozoan lineages. Second, the last common metazoan ancestor (LCMA: Node 2, Fig. 1) was likely to have had more cell types and morphologic complexity than previously understood. Third, early reconstructions of the Protostome-Deuterostome ancestor (PDA: Node 5, Fig. 1) proposed a morphologically complex animal, with a central nervous system, gut, eyes, segmentation and other features (e.g. Carroll et al., 2001). The view of the PDA developed here suggests less developmental or morphological complexity, with a variety of cell types and patterning systems (anterior-posterior, dorsal-ventral) but limited complex morphology. The developmental machinery for appendages, eyes, gut formation, segmentation and other features arose independently in the major bilaterian clades after the PDA, largely through extensive co-option of existing regulatory components.
Understanding regulatory evolution requires a solid phylogenetic framework. The introduction of sequence data and more rigorous methods of phylogenetic reconstruction in the 1990s revolutionized our understanding of the animal tree, resolving long-standing controversies such as the relationship between annelids and arthropods. This work recognized that bilaterian animals formed three major clades: lophotrochozoans and ecdysozoans (which together comprise the protostomes), and the deuterostomes (Fig. 1). However, some of the relationships within these major clades remain unresolved. The basic metazoan topology has remained relatively stable for some time, and is well-supported by studies using different data sets and methods of phylogenetic reconstruction (for recent summaries, see Dunn et al., 2014; King and Rokas, 2017; but also the cautions of Laumer et al., 2019). These studies show that some important morphological features, such as segmentation (Chipman, 2010), arose independently in different animal clades. Choanoflagellates have long been recognized as the closest living relatives of animals and, together with Filasterea and Ichthyosporea, form the Holozoa. But some relationships within animals remain controversial, including the phylogenetic position of ctenophores relative to sponges and of the Xenoacoelomorpha, a collection of ‘worms’ lacking a coelom (body cavity).
Ctenophores are a clade of voracious predators of other zooplankton. Resolving the phylogenetic position of ctenophores with molecular data has long been hampered by apparent high rates of molecular substitution leading to long-branch attraction (see Glossary, Box 1), and some analyses have consequently ignored the group. Surprising claims that ctenophores were basal (see Glossary, Box 1) to sponges (Ryan et al., 2013; Moroz et al., 2014) have been challenged based on analytical issues, including long-branch attraction and inappropriate models of sequence evolution, with other results supporting sponges as the most basal metazoan clade (Feuda et al., 2017; Pisani et al., 2015; Simion et al., 2017). However, if ctenophores could be shown to be basal to sponges, that would have significant implications for understanding the evolution of nerve cells, muscles and the gut (Moroz, 2009). A basal position for ctenophores within metazoan phylogeny is inconsistent with the close structural similarities between choanoflagellates and the collar cells of sponges (Brunet and King, 2017). But single-cell transcriptomics of the sponge Amphimedon revealed relatively few microsyntenic blocks with choanocyte-specific expression (choanocytes being the most similar sponge cell type to choanoflagellates), supporting earlier observations about the strength of the similarity between choanoflagellates and sponges (Zimmermann et al., 2019). This controversy over the placement of ctenophores may be unresolvable with present approaches (Pett et al., 2019), and many recent publications depict a polytomy (multiple branches) at the base of Metazoa (Fig. 1).
Recent studies place Placazoa (Fig. 1) as a sister clade to Cnidaria (Laumer et al., 2019, 2018), in contrast with earlier studies that suggested that placazoans diverged after sponges but sister to all other metazoans, including cnidarians. However, their placement is sensitive to the position of ctenophores (Laumer et al., 2019) and they are shown in a polytomy with cnidarians in Fig. 1. The Xenoacoelomorpha are important for understanding the nature of the PDA, but have bounced between a position basal to the PDA (Node 4, Fig. 1) (Rouse et al., 2016) and a basal position within deuterostomes (Node 5, Fig. 1) (Philippe et al., 2019); this controversy remains unresolved. Other uncertainties persist about relationships between Panarthropoda (Node 7, Fig. 1) and Lophotrochozoa (Node 8, Fig. 1). It should be clear from this overview that our understanding of metazoan phylogeny is not fixed but changes as new data and improved analytical methods are introduced.
Fossil evidence for the early history of animals
When Charles Darwin wrote The Origin of Species he was troubled by the sudden appearance of animal fossils. Since the 1980s, extensive field studies, the discovery of new fossil clades, an increasingly resolved temporal framework and detailed phylogenetic studies have revealed the appearance and early diversification of metazoan clades in exquisite detail from the mid-Ediacaran (∼570 Ma; Fig. 3) to the early stages of the Cambrian Period (∼539 Ma). The fossil record preserves three different types of information about the early history of animals (Fig. 4). Body fossils generally receive the most attention; however, important information also comes from burrows and trackways (trace fossils), and organic from materials (such as lipid biomarkers).
Body fossils are commonly the hard parts of organisms, such as mollusk shells, arthropod carapaces or bones. However, the Ediacaran–Cambrian periods encompasses unusual styles of fossil preservation, including fossils with soft parts, such as eyes, appendages and traces of the gut. Together, this diversity of fossils records the emergence of animals, bilaterians in particular, providing rich insights into the evolutionary and developmental dynamics of change. The crucial challenge for those interested in this interval is how reliably the appearance of these groups tracks to the origins of these clades, rather than to the generation of preservable bodies.
Tonian and Cryogenian periods
Rocks from the Tonian Period (ca. 1000-720 Ma; Fig. 1) preserve an array of organic-walled microfossils, including predatory eukaryotes, and others with multicellularity and different cell types (Xiao and Tang, 2018). The succeeding Cryogenian Period (ca. 720-635 Ma) included two long glacial episodes (Fig. 1) and the continuing diversification of single-celled eukaryotic lineages (Cohen and Riedman, 2018).
A variety of minute balls of cells have been recovered from the Weng'an biota in the Doushantuo Formation (∼609 Ma) in southern China and synchrotron imaging has revealed remarkable cellular and subcellular detail (Xiao et al., 2014; Zhou et al., 2017) (Fig. 4). But their phylogenetic affinities remain contentious. Initially described as animal embryos (Xiao and Knoll, 2000), the affinities of these fossils have been linked to various protists, sulfur-oxidizing bacteria, volvocine green algae and embryonic, larval or adult animals (reviewed by Xiao et al., 2014; Cunningham et al., 2017b). One form, Caveasphaera, exhibits patterns of cellular development analogous to gastrulation in animals, and has been plausibly assigned a holozoan, but not necessarily metazoan, affinity (Yin et al., 2019) (Node 1, Fig. 1). There is no unambiguous evidence for metazoans among these Weng'an fossils, but they do demonstrate patterns of cell adhesion similar to animals.
Ediacaran macrofossils (ca. 570-539 Ma) were the oldest, macroscopic, multicellular organisms plausibly related to animals, and formed the earliest complex macroscopic ecosystems (Figs 3 and 4) (Droser et al., 2017). These soft-bodied fossils are often preserved in fine detail, yet most lack appendages and eyes, and evidence of a mouth or guts (Droser and Gehling, 2015; Erwin and Valentine, 2013). One of the great curiosities of this interval is the general absence of sponges (Sperling et al., 2010). Evidence of muscles are present in Haootia, which is strikingly similar to modern stalked jellyfish (Liu et al., 2014), and morphological features establish that at least some of these fossils represent metazoans, including Kimberella (555 Ma), likely a lophotrochozoan and possibly a mollusk (Ivantsov, 2012). Ediacaran organisms provide unique insights into the architectural possibilities of the available developmental tools, as well as the inherent limitations of the low-oxygen, environmentally perilous interval before the Cambrian explosion of animals (Droser et al., 2017). Until just a few years ago, the transition from the Ediacaran biotas to the morphologically complex and phylogenetically diverse Cambrian assemblages appeared abrupt, but recent field studies have revealed a more gradual transition, with the earliest skeletonized forms appearing in the late Ediacaran (Darroch et al., 2018; Wood et al., 2019).
Between 538 and 520 Ma, all major groups of durably skeletonized marine animals (Box 2) appeared in the fossil record, along with many soft-bodied forms found in extraordinary deposits such as the Chengjiang fauna in southern China (Fig. 3) and many now extinct clades whose phylogenetic affinities remain controversial. There are three crucial observations relating to the Ediacaran-Cambrian radiation (ECR) (see Glossary, Box 1). First, quantitative studies show that the greatest morphological range (which paleontologists describe as disparity) of most clades is close to their first appearance as fossils (Erwin and Valentine, 2013; Hughes et al., 2013). Second, although many of the new fossils discovered over the past few decades have clarified the phylogenetic relations of some clades (such as inclusion of many unusual forms within the Panarthropoda), they also increased disparity. Third, by 520 Ma, or about 18 million years into the Cambrian Period, the great burst of evolutionary novelty and innovation transitioned to more traditional dynamics of speciation and extinction with fewer morphological novelties (e.g. Paterson et al., 2019).
The Cambrian Explosion was long demarcated by the first appearance of skeletons (although the earliest signs of metazoan biomineralization are now found in the late Ediacaran Period). Biomineralization developed independently in many metazoan clades, with lineages generating siliceous, calcium carbonate or calcium phosphate skeletons interspersed with non-mineralizing lineages (Murdock and Donoghue, 2011). The fact that biomineralization appeared nearly simultaneously remains a great curiosity and has fostered persistent suggestions that much of the Cambrian Explosion was driven by predation (Bengtson, 2005; Stanley, 1973). One indication that the origin of biomineralization approximates the first appearance of the earliest fossils in each clade is the correlation between the type of skeleton (aragonitic or calcite) and the coeval ocean chemistry (different chemistries favor deposition of aragonite or calcite (Porter, 2010). Similarly, independent co-option appears to have played a crucial role in other aspects of the bilaterian body plan; Hall (2018) detailed the role of co-option in the origin of neural crest, for example. In the scenario developed here, gene regulatory networks (GRNs) responsible for specific cell types or patterning of cells were the foundation for hierarchically nested modular GRNs involved in regional patterning of the developing embryo. In my view, independent co-option of cell types or patterning systems for small clusters of cells has affected the following systems and regulatory components. Segmentation: hairy and engrailed; tripartite brain and central nervous system: otx, emx and six3/6; sensory systems, including eyes: Pax genes; appendages: distalless; and regionalized gut: GATA and brachyury. From this, we expect that morphogenetic patterning systems have been conserved within major clades (such as within Panarthropoda) but are likely to have arisen independently between major clades. Returning to biomineralization, within mollusks, for example, it appears to have arisen independently in bivalves and gastropods (Jackson et al., 2010), while biomineralization in echinoids involves the co-option of an ancestral VEGF GRN involved in tubulogenesis (Morgulis et al., 2019).
The idea that body fossils preserve a fairly reliable record of the early history of large-bodied metazoans is supported by trace fossils. Burrowing, moving or walking across or through sediment are often preserved in the fossil record. In many cases the trace makers lacked a durable skeleton and thus would be unlikely to be preserved as body fossils. Surficial trace fossils date to about 560 Ma (Mángano and Buatois, 2017), with the diversity and complexity of trace fossils increasing through the Ediacaran Period. A possible trackway of a bilaterian with paired appendages was found in the latest Ediacaran Period of south China (Chen et al., 2018). The full suite of metazoan trace fossils with active vertical burrowing and other behaviors does not appear across a broad range of marine environments until after 529 Ma (Buatois and Mangano, 2016; Mángano and Buatois, 2016). The behaviors reflected by these trace fossils provide a critical constraint on the timing of appearance of large-bodied bilaterians. Their absence earlier than 560 Ma strongly implies that, if animals existed, they must have been small (certainly>1 cm) and incapable of leaving preservable marks (Valentine and Erwin, 1987), which establishes a firm lower limit on benthic animals larger than this size.
Ancient biomolecules or biomarkers provide a third line of evidence about early animals (Briggs and Summons, 2014). Although lipids are modified after burial, many biomarkers can be traced back to their chemical precursors and thus the source organism. Two biomarkers, 24-isopropylchoestane (26-ipc) and 26-methylastigmastane (26-mes), have been recovered from marine rocks dating between 630 and 540 Ma, and are characteristic of demosponges (the most diverse clade of modern sponges) (Briggs and Summons, 2014; Love et al., 2009; Zumberge et al., 2018). The occurrence of 26-ipc and 26-mes in rocks deposited between the Sturtian and Marinoan glaciations, and in Ediacaran rocks provides suggestive – but not conclusive – evidence of sponge-dominated ecosystems. Another putative sponge biomarker, cryostane (26-methylcholestane), is abundant in older rocks and may also be indicative of sponges (Brocks et al., 2016). Sponges effectively modify their environments and could have played a substantial role in sequestering carbon and generating the conditions necessary for metazoan diversification (Erwin and Tweedt, 2011). Finally, a putative metazoan biomarker (coprostane) has been reported from Dickinsonia, a characteristic Ediacaran fossil (Bobrovskiy et al., 2018; but see Summons and Erwin, 2018); diagnostic biomarkers for individual metazoan clades other than sponges are currently unknown.
In summary, equivocal fossil evidence for animals comes from Cryogenian biomarkers and the Doushantuo fossil embryos. The earliest Ediacaran macrofossils appeared after 570 Ma, with bilaterian metazoans appearing by 550 Ma. Larger-bodied and skeletonized animals appeared in the late Ediacaran Period and most metazoan clades appeared during the early Cambrian Period. The Chengjiang fauna indicates that almost all major metazoan clades, including vertebrates, appeared by 518 Ma.
Molecular clock evidence for the early history of animals
A very different picture of the early history of animals emerges from molecular clock evidence. Comparison of molecular sequence data calibrated against the fossil record provides an indirect method of assessing the origin of clades. Through the use of ‘relaxed clock methods’ (see Glossary, Box 1), with sufficient calibration points from fossils, roughly comparable scenarios for animal divergences, albeit with large error estimates, can be generated (dos Reis et al., 2015; Erwin et al., 2011; Lozano-Fernandez et al., 2017; Parfrey et al., 2011; Peterson et al., 2008; reviewed by Cunningham et al., 2017a; Sperling and Stockey, 2018).
The consensus of molecular clock studies is that the last common ancestor of living animals lies at about 750 Ma, with the divergence of Metazoa from choanoflagellates substantially earlier (∼900 Ma; Fig. 1). There is greater uncertainty over the timing of subsequent divergences, but the origin of the Eumetazoa was likely within the Cryogenian Period (∼640 Ma, with large uncertainties), with the protostome-deuterostome divergence in the Cryogenian or early Ediacaran periods. The origin of pairs of bilaterian clades, such and Mollusca-Annelida or the Panarthropoda, dates to the Ediacaran Period, but still tens of millions of years before the Ediacaran and Cambrian boundary. There is an incontestable gap between these divergences and the first appearances of the clades based on fossil evidence. Moreover, molecular clock estimates for the divergences of crown groups (see Glossary, Box 1) within durably skeletonized major clades such as Arthropoda and Brachiopoda, are consistent with the earliest fossil appearances of these clades. In other words, where the fossil record is likely to provide a robust estimate for the origin of a clade, there is little discordance between molecular clock estimates and fossil dates (Erwin et al., 2011). Thus, scenarios that ignore the molecular clock evidence (e.g. Budd and Jensen, 2016; Cavalier-Smith, 2017) are implausible.
A long lag between the origin of a clade and the last common ancestor of all the living members (the crown group) raises the possibility that original members of the clade may have had different characteristics from those defining the crown group. Molecular clock analyses reveal that the origin of Metazoa and the divergence of basal clades were effectively decoupled from the later increases in body size, acquisition of characteristic body plans and (in some clades) skeletonization, which occurred across many lineages during the ‘Cambrian Explosion’ (Erwin, et al., 2011; Erwin and Valentine, 2013; Sperling and Stockey, 2018). The remainder of this Review surveys recent work that provides insight into how the regulatory genome likely evolved between ca. 800-500 Ma, and discusses the inferences we can make about the morphology of organisms at critical nodes in animal history (Fig. 1). In particular, I argue that the combination of decoupling of the origins of metazoan clades from their first appearance in the fossil record, together with the discovery of the regulatory capacity of the holozoan clades (as well as sponges and cnidarians, and other comparative developmental studies), supports the hypothesis of extensive gene co-option leading to characteristic bilaterian architectures.
Evolution of the regulatory genome
Deeply conserved genes and expression patterns were identified across bilaterians during the 1990s, principally between vertebrates and arthropods (De Robertis and Sasai, 1996; De Robertis, 2008), with many of these deeply conserved genes (Hox genes, Pax6, etc.) being linked to apparently conserved patterning of developing embryos. As comparative molecular developmental studies expanded to include broader phylogenetic coverage across animals, and eventually to their unicellular relatives, additional conserved genes were identified (summarized by Tweedt and Erwin, 2015).
The pattern of acquisition of key elements of the regulatory genome is documented in Table 1, tied to the phylogenetic nodes labeled in Fig. 1. This illustrates the central arguments of this Review: that many of the core elements of the metazoan regulatory genome have an ancient origin among the Holozoa and that the high frequency of co-option of regulatory elements in multiple bilaterian clades can be misleading about ancestral functions of these genes, and thus about the nature of early animals before the ECR. The table covers both regulatory machinery, such as new types of promoters and distal enhancers, as well as changes in chromatin structure, in addition to the diversification of transcription factor families and other regulatory genes. Some caution is required in interpretation, as gene loss has been pervasive and similar regulatory elements have expanded independently in different clades. Thus, comparative studies are highly sensitive to taxon coverage and study of multiple exemplars within a clade. This section summarizes key evolutionary novelties at each node; the next section discusses the implications of these novelties for inferring the morphological features at the origin of metazoans (Node 2, Fig. 1; Table 1) and the PDA (Node 4, Fig. 1; Table 1).
Insights into regulatory innovations from Holozoa into Metazoa are based on the patterns of conservation, loss and rearrangement, and introduction of new gene families (Grau-Bové et al., 2017; Paps and Holland, 2018; Richter et al., 2018; Simakov and Kawashima, 2017). Richter and colleagues, benefiting from the sequencing of 19 additional choanoflagellate genomes, identified ∼1944 new gene families on the metazoan stem lineage (leading to Node 2). They also found many gene families previously thought to be metazoan-specific among choanoflagellates, including the TGFβ ligand and receptor, and the Delta/Notch system (Richter et al., 2018). A core of 39 gene families is conserved across each of the 21 animal genomes in their sample, most of which are involved in developmental processes. These and other studies cataloged changes in genome structure, including intron gain (Grau-Bové, et al., 2017), widespread synteny (Zimmermann et al., 2019) and extensive shuffling of multi-domain proteins to generate new genes (Richter et al., 2018).
Choanoflagellates, filastereans and ichthyosporeans (Node 1, Fig. 1) are predominantly unicellular eukaryotes with complex life cycles. Although each contains multicellular representatives, they differ in their multicellularity: forming clonal, aggregative and coenocyte multicellular structures, respectively (de Mendoza et al., 2015; Sebé-Pedrós et al., 2017; Brunet and King, 2017). Comparative studies have shown that holozoans possessed the regulatory capacity for spatially and temporally distinct cell morphologies through TF interactions, and some of these cell types may have been multifunctional (Sebé-Pedrós et al., 2016). Much of this regulatory capacity was deployed for cell-type differentiation in metazoan development. A number of homeobox gene classes and other TF families differentiated (Sebé-Pedrós and de Mendoza, 2016; Sebé-Pedrós et al., 2016; Brauchle et al., 2018), as well as tyrosine kinases (King et al., 2003, 2008; Tong et al., 2017; Sebé-Pedrós et al., 2016), and the complete microRNA microprocessor (including Drosha, Pasha and Dicer) is present in ichthyosporeans, although lost in choanoflagellates and filastreans (Bråte et al., 2018). Brunet and colleagues recently described a sheet-like colonial choanoflagellate containing hundreds of cells that is capable of inverting in response to light via apical contractility mediated by an actomyosin ring (Brunet et al., 2019). A similar pattern of actomyosin-based contractility is associated with ichthyosporean cellularization (Dudin et al., 2019). Actomyosin-based contractility is essential for epithelia formation and gastrulation in animals, and in other developmental processes. Together, these studies demonstrate that pre-metazoan holozoans possessed the regulatory capacity for transient multicellularity and complex life cycles (Fig. 2).
Many of the genes and regulatory factors that arose among Holozoa were co-opted for new functions during the early evolution of Metazoa (Node 2, Table 1). At this node, genome size increased, new genes arose through shuffling of protein domains and there were important additions to the regulatory genome (see reviews by Sebé-Pedrós et al., 2017, 2018a; Richter and King, 2013). Novelties of note include the origin of distal enhancers (Schwaiger et al., 2014; Sebé-Pedrós et al., 2016; Gaiti et al., 2017a), and the introduction of adult and developmental promoters (Lenhard et al., 2012). Increasing the numbers of TF-binding sites per enhancer and the number of enhancers per gene increased the possible regulatory complexity, allowing genes to be expressed in different spatial and temporal domains during development. This increase in combinatorics complexity expanded the diversity of cell types, patterning systems and morphological outcomes (Levine and Tjian, 2003; Sebé-Pedrós et al., 2018a,b). What is striking, therefore, is that sponges, cnidarians and placozoans do not appear to have significantly capitalized on some of these new capabilities. This might seem contradictory, but in contrast to perceived wisdom, evolution does not always take immediate advantage of new opportunities. Although distal enhancers are present in sponge and cnidarian genomes, studies to date suggest that they are not significant regulatory components (Sebé-Pedrós et al., 2018a), likely because taking full advantage of the possibilities of distal enhancers requires changes in chromatin structure (see below). Thus, regulation in basal metazoans (with the exception of ctenophores; Sebé-Pedrós et al., 2018a) appears to largely involve TF combinatorics with proximal regulation, as in the holozoan clades.
In a temporal framework, these studies suggest that the first phase of metazoan evolution (ca. 750-650 Ma) involved the diversification of clades largely dominated by proximal regulatory control and the generation of a diverse array of cell types (Fig. 5). Some of these cell types may well have been multifunctional, combining in a single cell, which is now found in different specialized cell types (Arendt, 2008; Arendt et al., 2016a). As the number of cell types increased and multi-functional cells diverged into more specialized cells, the spatial regulators in ancestral forms were co-opted for temporal regulation, but this still generated relatively flat regulatory hierarchies (Davidson and Erwin, 2006; Davidson, 2006; Erwin and Davidson, 2009; Arenas-Mena, 2017).
The critical transition in the evolution of the metazoan regulatory genome was the construction of more hierarchical, and more interconnected, gene regulatory networks. Davidson and I argued that this transition involved the intercalation of spatial and temporal regulators into simpler cell-type specification pathways (Davidson and Erwin, 2006; Erwin and Davidson, 2002). Many of the highly conserved homologous elements date to these first two phases of regulatory evolution, where they were involved in cell-type specification or in fairly simple patterning (Davidson and Erwin, 2006; Peter and Davidson, 2015). The next phase involved extensive co-option of circuits to progressively elaborate spatial and temporal regulatory hierarchies. This scenario has specific implications for the variety of architectures that would have been achievable during these phases, as discussed next. In particular, the terminal differentiation of cell types requires the generation of specific mechanisms to ‘lock-down’ the regulatory state. One means of terminal differentiation is through feedback in recursively wired GRNs, which were described as kernels by Davidson and Erwin (Davidson, 2006; Davidson and Erwin, 2006) and as character identity networks (ChINs) by Wagner (2014), but there are also other mechanisms.
Most regulatory novelties associated with Eumetazoa (Node 3, Table 1), involve continuing expansion of TF families and signaling pathways (Sebé-Pedrós and de Mendoza, 2016; Brauchle et al., 2018; Babonis and Martindale, 2017), including 11 out of the 12 canonical Wnt subfamilies (one additional family is found in vertebrates; protostomes appear to have lost several Wnt subfamilies) (Kusserow et al., 2005). These changes helped generate the large number of cell types evident from single-cell transcriptomics (Sebé-Pedrós, et al., 2018b).
Further increases in the regulatory genome are associated with the PDA (Node 4, Table 1). Expansions of some TF families continued, most noticeably the homeobox Prospero (PROS) and zing-finger families (Brauchle et al., 2018). Alternative splicing increased in frequency (Grau-Bové et al., 2017, 2018) and circular RNAs were added to the cohort of regulatory RNAs (Gaiti et al., 2017a). One of the more striking insights of recent years into the regulatory genome is the importance of the three-dimensional architecture of chromatin. As the size of the genome increased, nested sets of TADs, which are bounded by insulators that bind CTCF sequences (Rowley and Corces, 2018), evolved to structure chromatin. Regulatory interactions are more common within TADs than they are with more distant regions of a chromosome, and TADs appear to be confined to bilateria (Heger et al., 2012; Acemel et al., 2017; Gaiti et al., 2017a). In jawed vertebrates, the HoxA and HoxD loci exhibit bipartite regulation, with distal regulator sequences both upstream and downstream of the locus whereas Amphioxus (an invertebrate chordate) has only a single TAD with regulatory contacts largely upstream of the Hox cluster (Acemel et al., 2016). The Six locus has a TAD boundary in the middle of the Six gene cluster, with the anterior CREs controlling genes expressed in neural development and the posterior genes expressed during endomesoderm development (Acemel et al., 2017). The origin of TADs has been accompanied by increased clustering of co-expressed developmentally related genes, allowing these syntenic blocks to be regulated as a unit (Heger et al., 2012). Vertebrates, for example, exhibit clustering of Hox genes (Darbellay et al., 2019); however, by the time of origin of hemichordates, four transcription factor genes (nkx2.1, nkx2.2, pax1/9 and foxA) had assembled into a microsyntenic group (the pharyngeal cluster) controlling development of the pharyngeal ‘gill’ slits (Simakov et al., 2015). [Clustering of functionally related genes is not restricted to bilaterians, however, as clustering of genes involved in formation of the nematocyte, a cnidarian-specific cell type, occurs in jellyfish (Khalturin et al., 2019).] Such chromatin architecture allows expanded regulatory control over spatial and temporal gene expression patterns beyond that evident among pre-bilaterian animals.
Nature of ancestral body plans
The nature of ancestral body plans were first inferred from comparative embryology and anatomy. More recently, these debates have been informed by comparative genomic and transcriptomic data, but this requires distinguishing between conserved ancestral genes and functions, the origins of clade-specific new genes, including lineage-specific expansions of gene families, and the co-option and re-deployment of developmental genes into new functions. Available data suggest that all of these processes are involved to varying degrees, but new data are accumulating rapidly as new species and clades are studied, and new techniques, such as single-cell RNA-seq (Box 3), are applied. This section highlights new insights into two critical nodes: the LCMA (Node 2; Fig. 1) and the PDA (Node 4; Fig. 1).
Single-cell RNA sequencing (scRNA-seq) of whole organisms is revolutionizing our understanding of the early phases of metazoan evolution by generating transcriptomes for different life stages. When combined with chromatin data and other information, scRNA-seq illuminates the promoters and transcription factor networks involved in cell-type specification, and confirms that cell types are established by specific and unique combinations of transcription factor expression (Sebé-Pedrós et al., 2018a,b). Sponges (Amphimedon) and cnidarians (Nematostella) have about eight broad cell classes (metaclasses), with ctenophores (Mnemiopsis) having at least 12 classes, and placozoans (Trichoplax) having about five classes (Sebé-Pedrós et al., 2018a,b). In each case, the number of cell types based on transcriptomics was greater than those recognized by ultrastructural studies, with cnidarians having a surprising diversity of neurons (Sebé-Pedrós et al., 2018b). Examination of the regulatory architecture confirmed that most regulatory elements are proximal to the promoter and coding region in sponges and placazoans; in contrast, ctenophores display distal regulatory elements, as well as a unique clade-specific architecture that is independently derived from the TADs and CTCF sequences in bilaterians (Sebé-Pedrós et al., 2018a). In cnidarians, cell-type specification is more complex, including distal elements, with evidence for broader tissue-specific TF expression patterns above the cell-type specification (Sebé-Pedrós et al., 2018b). These results further support the hypothesis that TF combinatorics is strongly correlated with differentiation of cell type classes (Sebé-Pedrós et al., 2018a,b).
The LCMA: division-of-labor model
The traditional, division-of-labor (DOL), scenario involves the gradual evolution of sponges from a colonial choanoflagellate and derives from Ernst Haekel's recognition of the similarities between choanoflagellates and the collar cells of sponges. In this model, colonial choanoflagellates eventually formed a ball of cells (similar to a blastula) that invaginated, and progressively more specialized cell types diverged from originally multifunctional cells that made up the last common metazoan ancestor (Arendt, 2008; Nielsen, 2008, 2013). Thus, early sponges are expected to have only a few more cell types than a choanoflagellate.
The LCMA: temporal-to-spatial transition of cell types
Beyond the regulatory novelties already described, ichthyosporeans, filastereans and choanoflagellates each have members with life cycles that generate different cell types (de Mendoza et al., 2015; Sebé-Pedrós et al., 2017; Brunet and King, 2017). This has renewed interest in a model where the regulatory tools for temporal variation in cell types in these holozoan clades formed the basis for spatial regulation of a multicellular early animal, possibly with the preservation of a complex life cycle (Arenas-Mena, 2017; Brunet and King, 2017; Mikhailov et al., 2009; Sebé-Pedrós et al., 2017; Sogabe et al., 2019). Some capacity for temporal differentiation of cell types was present in the last common holozoan ancestor (about 900 Ma or earlier). In choanoflagellates, spatial differentiation of cell morphologies may be present at the same life-cycle stage (Laundon et al., 2019). This suggests that the regulatory machinery for the generation of different cell types preceded metazoan multicellularity, which was accomplished via a transition from temporal to spatial cellular differentiation at the base of Metazoa.
Brunet and King (2017) emphasized that the DOL and ‘temporal-to-spatial cell conversion’ scenarios are not mutually exclusive, nor do existing data permit testing the relative support for each scenario. But the comparative genomic studies of holozoans have clearly established that the last common metazoan ancestor had greater regulatory capacity than envisioned by the original DOL scenario.
Evolution of metazoan life cycles
Understanding life cycle evolution is equally crucial to subsequent evolutionary steps. Most metazoan clades have a life cycle that alternates between larval and adult phases. The larval stage (or stages) may float in the water column (pelagic) before settling on the sea floor to become a benthic (bottom-dwelling) adult. In contrast, holopelagic forms, such as most jellyfish, remain in the pelagic realm through the entire life cycle.
Ancestral reconstructions of the LCMA by comparative developmental biologists have tended to invoke a benthic adult of varying complexity (compare Carroll et al., 2001 and De Robertis, 2008 with Davidson and Erwin 2006, and Hejnol and Martindale, 2008), with the secondary acquisition of a larval phase in different clades (Peterson, 2005). In contrast, a long-standing tradition among invertebrate anatomists has been recognition of larvae as a primary feature of metazoans, but with disagreement over whether ancestral forms were holopelagic larval-like forms (Nielsen, 2008, 2013) or had a biphasic life cycle with a pelagic larvae and a benthic adult form (Davidson et al., 1995; Rieger, 1994). There has not been a clear resolution of this controversy from recent studies. Extensive conservation of developmental genes (Richter and King, 2013) and extensive expression data across metazoan larvae indicate strong conservation of expression patterns (Marlow, 2018), supporting the view that a biphasic life cycle was present at the origin of Metazoa (Node 2) and the PDA (Node 4, Fig. 1). Single-cell RNAseq results have compared adult and larval cell types in a cnidarian (Nematostella) and a sponge (Amphimedon) with contrasting results: sponge larvae had largely independent cell type programs from the adult, while in the cnidarian the adult and larvae shared cell types (Box 3) (Sebé-Pedrós et al., 2018a,b). Transcriptomes of two different jellyfish suggest that the planuala larvae is the only conserved stage across the cnidarian classes, with anthozoan polyps, medusozoan polyps and a jellyfish stage being equally different from one another (Khalturin et al., 2019).
Developmental co-option of a simple body plan
Here, these comparative studies have been employed to sketch a plausible scenario of early metazoan evolution. This scenario builds upon the early origin of components of metazoan signaling pathways and many transcription factors, but rejects assumptions about morphological homology. In this view, the PDA was relatively simple, likely with tens of different cell types, many of them still multifunctional in a bi-phasic life cycle. But deep homologies of developmental tools had limited morphological expression. For example, anterior-posterior patterning via canonical Wnt signaling via β-catenin (Petersen and Reddien, 2009), distalless was likely involved in proximo-distal patterning and Pax genes with sensory activities. The incredible morphological and behavioral diversity of bilaterians was enabled by the progressive and intercalated evolution of new spatial and temporal gene regulation, transforming relatively flat GRNs into more hierarchically structured GRNS, as co-option of existing regulatory circuits permitted more sophisticated regional patterning (Box 2) (Erwin and Davidson, 2002; Davidson and Erwin, 2006). Many examples of such regulatory transformations have been described, including decoupling of dorsal and ventral Hox expression patterns and the co-option of Hox genes for patterning diverse molluscan architectures (Huan, et al., 2019). Similarly, similar patterns of gene expression and neuronal markers are found in heads across bilaterians [as disparate as those of arthropods and the coiled, ciliary feeding structures (lophophore) of brachiopods and phoronids], although the specific morphological structures evolved independently from much simpler antecedents (Luo et al., 2018). Despite the similarities in dorsoventral patterning of the nerve cords of vertebrates, flies and an annelid (Platynereis), other bilaterians lack the canonical staggered expression patterns, indicating that neuronal dorsoventral patterning arose convergently, probably via co-option of a system for regional patterning (Arendt, et al., 2016b; Martín-Durán, et al., 2018). It is a reasonable hypothesis from the data described here that such expansions in morphological complexity and developmental regulation were aided by advances in chromatin control, exemplified by the origin of CTCF sequences and TADs.
This comparative approach reveals several general patterns in the evolution of the metazoan regulatory genome. First, much of the ‘metazoan developmental toolkit’ appeared almost a billion years ago with the origin and early evolution of Holozoa – particularly the combinatoric TF-TF interactions and proximal regulation – to allow a complex life cycle with multiple cell types. Second, comparative studies of other holozoan clades has shown that the extent of the regulatory genome of the last common metazoan ancestor – including distal enhancers, the number of cell types and morphological complexity – was far greater than appreciated even one decade ago, lending increasing credence to some variant of the temporal-to-spatial transition model. Third, I have argued here that the PDA was less complex than has been argued in the past, which necessarily implies that extensive co-option of regulatory modules must have occurred independently in bilaterian clades. Finally, the origin of Bilateria has been identified as a particularly critical node in the evolution of the regulatory genome. Distal enhancers became far more prominent, CTCF sequences and TADs provided a new level of transcriptional control, and GRN hierarchies expanded through intercalation, co-option and other processes. Integrating our knowledge of the evolution of developmental patterns and processes with insights from molecular clock estimates and from the fossil record reveals the extent of co-option of regulatory components into new functions, particularly across the bilaterians. Together, this information provides a much richer view of evolutionary dynamics during one of the most crucial episodes in the history of life.
Here, I have focused on the origins of particular regulatory novelties which have expanded the capacity of the regulatory genome. Beyond these, however, a number of trends and recurrent patterns appear to be similar across animal clades. At the level of genome structure, major lineages show distinct patterns of gene gain and loss (with the suite of regulatory genes in cnidarians more similar to those of deuterostomes than to protostomes). There have been increases in gene clustering, macro- and micro-synteny (see Glossary, Box 1) and intron density, which may have facilitated the extensive expansion of most families of metazoan transcription factors (Irimia et al., 2012; Zimmermann et al., 2019). The complexity of GRNs has increased during the past 600 Ma through an increase in promoters and transcription start sites and the increasing hierarchical structuring of GRNs as subcircuits have been co-opted for new functions (Sabarís et al., 2019).
Conclusions and future directions
New comparative studies of animals and extant holozoans will continue to expand our understanding of regulatory evolution in early animals. Of particular interest will be comparative studies of GRNs involved in cell-type differentiation in different clades, and in regulatory control of regional patterning. Single-cell transcriptomics and related studies have revolutionized the understanding of cell type evolution (Achim and Arendt, 2014; Arendt, 2008; Arendt et al., 2016a; Sebé-Pedrós et al., 2018a) and provide a foundation for detailed comparative studies of GRNs. Despite the advances discussed in this Review, many unresolved questions remain: how have spatial and temporal regulators been intercalated to construct more hierarchical GRNs, which can then be co-opted for new developmental functions? Did hierarchically structured GRNs arise before the PDA? Or during the early divergence of deuterostomes, for example, but before the origin of echinoderms and chordates? Comparative studies will also reveal whether some components of GRNs are more refractory to evolutionary change than others. Another issue for future study is whether the nature of regulatory changes has itself evolved over time. The account here provides tentative support for this suggestion, with the generation of regulatory novelties associated with holozoans and early in animal evolution (Erwin, 2015; Simakov and Kawashima, 2017), with later evolutionary events dominated by co-option and repatterning of GRNs.
I appreciate feedback from Phil Donoghue, one named and one anonymous reviewer, and from the faculty and students at the Marine Biological Laboratory course on Gene Regulatory Networks, particularly Isabelle Peter and Ellen Rothenberg, and inspiration from Eric Davidson.
Research on this topic was funded in part by the National Aeronautics and Space Administration through the National Astrobiology Institute (grant NNA13AA90A) to the Massachusetts Institute of Technology node.
The authors declare no competing or financial interests.