ABSTRACT
All vertebrates possess anatomical features not seen in their closest living relatives, the protochordates (tunicates and amphioxus). Some of these features depend on develop mental processes or cellular behaviours that are again unique to vertebrates. We are interested in the genetic changes that may have permitted the origin of these inno vations. Gene duplication, followed by functional diver gence of new genes, may be one class of mutation that permits major evolutionary change. Here we examine the hypothesis that gene duplication events occurred close to the origin and early radiation of the vertebrates. Genome size comparisons are compatible with the occurrence of duplications close to vertebrate origins; more precise insight comes from cloning and phylogenetic analysis of gene families from amphioxus, tunicates and vertebrates. Comparisons of Hox gene clusters, other homeobox gene families, Wot genes and insulin-related genes all indicate that there was a major phase of gene duplication close to vertebrate origins, after divergence from the amphioxus lineage; we suggest there was probably a second phase of duplication close to jawed vertebrate origins. From amphioxus and vertebrate homeobox gene expression patterns, we suggest that there are multiple routes by which new genes arising from gene duplication acquire new functions and permit the evolution of developmental inno vations.
INTRODUCTION
The origin of vertebrates has been the subject of conjecture and debate for over a century. Discussion has centred on the affini ties of the vertebrates, the nature of their ancestors and the anatomical changes that must have occurred during vertebrate evolution. (There is also disagreement concerning usage of the term ‘vertebrate’; here we include mammals, birds, reptiles, amphibians, true fish, lampreys and, unlike some authors, hagfish). Many attempts have been made to derive vertebrates from either extant invertebrate taxa or hypothetical ancestral forms; each scenario suggests various morphological changes to the body plan, but only rarely have authors considered the underlying genetic causes or the plausibility in a developmen tal context. In this regard, comparative genome analysis, phy logenetic studies of developmental regulator genes and com parative developmental biology of vertebrates and their extant relatives has much to offer, since it could reveal how genes and developmental processes have changed in evolution.
One influential hypothesis for vertebrate origins, which did take a developmental perspective, was proposed ten years ago by Gans and Northcutt (for a recent review see Gans, 1993). These authors proposed an evolutionary scenario in which the origin of a suite of novel vertebrate characters (including the sensory and cranial ganglia, three paired special sense organs, sensory capsules, and cartilaginous gill arches), were dependent on the origin of neural crest cells and ectodermal placodes. These have important developmental roles in the cranial region of vertebrates, and the structures derived from them (and through their interactions with other cells) dominate the vertebrate head. In this sense, much or all of the vertebrate head was proposed to be an evolutionarily new structure (or neomorph): an innovation of the vertebrates. Other significant morphological changes are proposed to have occurred earlier or later in chordate evolution; for example, the origin of seg mentation within early chordates, and the origin of vertebrae and jaws during early vertebrate radiation (Fig. 1).
How did each of these developmental changes occur in evolution? What kinds of genetic changes allowed the origin of the vertebrate developmental program? For example, did new genes permit the evolution of new cell behaviours (seen in neural crest cell migration and differentiation)? Answers to questions such as this may come from considering the genetic basis for evolutionary changes in development. Specifically, we must ask what sort of mutations were potentially, and actually, responsible for particular changes in developmental control during vertebrate origins.
One type of mutation that may have played an important role is gene duplication. The great potential of gene duplication in the evolution of increasing complexity was discussed by Susumu Ohno in his classic book (Ohno, 1970). He argued that tandem duplication of genes, and polyploidy, could create redundant genes that were then able to diverge, relatively unchecked by purifying selection, until co-opted for new functions. In Ohno’ s words “natural selection merely modified, while redundancy created”. Since Ohno’s insight, the hypoth esis that gene duplications are a major force in the generation of organismal complexity has been put on a sound population genetic basis (Ohta, 1989).
With respect to the origin and radiation of the vertebrates, little data on gene duplications were available at the time of Ohno’s book. Even so, he was able to make some specula tions, based on allozyme data and genome sizes within the chordates. Ohno suggested that at least one round of tetraploidization occurred in the lineage leading to amniotes (reptiles, birds and mammals), probably in our Devonian fish or amphibian ancestors, and that independent tetraploidization events occurred in other fish and amphibian lineages (see also Ohno, 1993). He also suggested that, much earlier in evolution, genome expansion (by either tetraploidy or tandem gene duplication) occurred in the common ancestor of amphioxus and vertebrates (after divergence from tunicates); he did not explicitly propose significant genome changes at the origin of vertebrates. More recently, Holland (1992) spec ulated that multiple gene duplications may have occurred at vertebrate origins; new genes could then have been co-opted to new roles, facilitating the evolution of new developmental pathways.
These hypotheses make predictions concerning the diversification of gene families that are testable. For example, the number of related genes in a particular gene family can be estimated by application of the polymerase chain reaction (PCR) using degenerate primers; although this technique may not detect all related genes, it does have the advantage of being applicable to multiple species (essential for the comparative approach needed). Furthermore, even if only a subset of genes within a gene family are cloned, from a few key species, molecular phylogenetic analyses can reveal relationships between genes and hence the pathways and timings of gene duplication. Linkage analysis by genomic walking or chromo some in situ hybridization is also now widely applicable, and can be used to distinguish tandem duplication from polyploidy. Of course, it should not be expected that all duplicated genes are retained in the genome after duplication; unused genes could be deleted or scrambled during evolution with little consequence. In addition, it seems possible that even genes that were once essential could be secondarily lost. Nonetheless, applying PCR, genomic library screening and molecular phy logenetic analysis to multiple gene families, in multiple chordate species, should allow the general patterns of genome evolution to be elucidated.
Here we examine the evidence for gene duplications during chordate evolution, comparing the conclusions drawn from genome size comparisons to the insights now possible from gene cloning. All protein-coding genes reported to date from amphioxus are reviewed in an evolutionary context, and two amphioxus Wnt genes are reported. Tunicate genes are compared where relevant; we also report the PCR cloning of a Hox gene from an appendicularian. We then consider alter native ways in which duplication of developmental control genes could contribute to the evolution of vertebrate develop ment, and assess these alternatives in the light of in situ hybrid ization analyses of amphioxus homeobox gene expression.
CAN GENOME SIZE GIVE ANY EVOLUTIONARY INSIGHTS?
Since the cephalochordates (amphioxus) are generally thought to be the closest extant relatives of the vertebrates (Fig. 1), genome comparisons between amphioxus and vertebrates may yield clues to the genetic events that accompanied the evolution of developmental innovations at the origin of verte brates. Atkin and Ohno (1967) reported the haploid genome of the amphioxus Branchiostoma lanceolatum to be approxi mately 0.6 pg, about 17% of the value for placental mammals. This is considerably larger than seen in many tunicates (for example, Ciona at 0.2 pg), but similar to the smallest vertebrate genomes (for example, puffer fish at 0.5 pg; see also Brenner et al., 1993). This led Ohno (1970) to suggest that genome enlargement by tandem gene duplications and/or polyploidy occurred in a common ancestor of amphioxus and vertebrates, but not significantly in the immediate vertebrate ancestors. Taking into account the genome sizes of mammals, birds and reptiles, he also suggested the occurrence of one or more addi tional rounds of tetraploidy in our Devonian fish or amphibian ancestors (Ohno, 1970, 1993).
These proposed timings of genome expansion do not correlate with vertebrate origins. Does this mean that gene duplications did not play an important role in the origin of the complex vertebrate body plan? Not necessarily, since genome size may be only a very approximate indicator of gene number: for example, repetitive DNA comprises from 20% to over 50% of metazoan genomes, and this fraction is prone to dramatic changes in evolution, probably without concomitant changes in gene number (Lewin, 1990). Furthermore, the distribution of genome sizes within the fishes, together with phylogenetic con siderations, make it very unlikely that the extremely compact puffer fish genome is representative of early vertebrates. Puffer fish, being members of the order Tetraodontiformes, occupy a very derived phylogenetic position within the ray-finned fish, and have genome sizes well below the modal value for fishes; this unusually small genome size must be secondarily derived, unless one is willing to accept the occurrence of very frequent, but independent, genomic expansion events in many divergent fish lineages (P. E. Ahlberg, unpublished analyses). If amphioxus is compared instead to the living members of the earliest vertebrate lineages, hagfish and lampreys, significantly larger genomes are indeed seen in vertebrates (haploid values 1.4-2.8 pg). Furthermore, it has been shown that the brook lamprey genome (at 1.4 pg) is not complicated by very recent tetraploidy (Ward et al., 1981); hence, it may be valid to use it as an approximate guide to early vertebrate genome size. Of course, modem lampreys could have secondarily expanded or compacted genomes, in which case it would not be valid to infer early vertebrate genome size from them.
The assumption would be testable if genome sizes could be measured from representatives of other (now extinct) jawless vertebrate lineages. Perhaps surprisingly, this may be feasible since the outlines of cells are preserved in some fossils. Cell outlines give an estimate of cell volume, which in tum is an approximate indicator of genome size within vertebrates (if the same cell type is compared between species). The feasibility of this approach was demonstrated by Conway Morris and Harper (1988), who estimated genome size in extinct conodonts (thought to be an ancient lineage of jawless ver tebrate; Sansom et al., 1992). These analyses need extending to other lineages; at present, however, the data from both living and fossil jawless vertebrates support the contention that sig nificant genome enlargement occurred at vertebrate origins (after divergence of the amphioxus lineage).
EVOLUTIONARY INSIGHTS FROM AMPHIOXUS HOMEOBOX GENES
More accurate insight into the evolution of vertebrate genome organization will undoubtedly come from cloning and phylo genetic analysis of gene families in representatives of several protochordate and vertebrate lineages. Phylogenetic consider ations make amphioxus a particularly important protochordate for gene family analysis, since its lineage diverged after the urochordates but before the diversification of vertebrates (Fig. 1). Of particular interest in these analyses will be multigene families implicated in the control and coordination of devel opmental processes (for example homeobox and Wnt genes), since their molecular evolution may give insight into the origin of vertebrate developmental control. Relatively few protein coding genes have been cloned from amphioxus, but these include genes related to vertebrate transcription factors, growth factors, signalling molecules, structural proteins and enzymes. In this section and the next, we look at every example published to date.
One group of homeobox genes for which comparative surveys have been undertaken in the chordates is the Msx gene family. Three members of this gene family were cloned from the mouse genome by PCR (Holland, 1991b); two of these are known to be expressed in cranial neural crest-derived mes enchymal tissue and in complementary patterns at many sites of tissue interaction during development (including during branchial arch development, palate development, tooth mor phogenesis, and development of the paired eyes; Davidson and Hill, 1991). Aspects of the gene expression patterns are certainly functional; for example, a point mutation in the homeobox of the human MSX2 gene is thought be one cause of a skull morphology abnormality, craniosynostosis (Jabs et al., 1993), whilst deletion of mouse Msxl by gene targeting causes a range of cranial defects (Satokata and Maas, 1994). Many of the expression sites of the Msx-1 and Msx-2 genes, although not all, are vertebrate-specific features (Holland, 1992); hence it is intriguing to ask whether amphioxus has homologues of these genes.
To date, we have succeeded in isolating only a single member of the Msx homeobox gene family from the genome of Branchiostomafloridae, both by PCR (Holland et al., 1994) and by genomic library screening (A. Sharman and P. W. H. H., unpublished data). This parallels the results from an ascidian, but contrasts with the multiple Msx genes present in a teleost fish, Brachydanio rerio (Holland, 1991b). These results are consistent with the hypothesis that gene duplications in this gene family occurred in the vertebrate lineage after divergence of amphioxus; however, a wider survey of verte brates must be completed before the timing of duplication can be ascertained.
Comparative data are more sparse for three other homeobox gene families analyzed in amphioxus: the En, Cdx and the X1Hbox8-related genes. In the latter two cases, PCR has iden tified a single homologue to date in B. floridae (Holland et al., 1994); the Cdx genes, at least, form a multigene family in mammals (Gamer and Wright, 1994). The size of the X1Hbox8 gene family is unknown in any species; within vertebrates, rep resentatives have been cloned from Xenopus (Wright et al., 1989), mouse (Ohlsson et al., 1993) and rat (Miller et al., 1994). For both the Cdx and X1Hbox8 gene families, additional species need to be analyzed (including jawless vertebrates) before the timing and extent of gene duplications can be ascertained.
For the largest homeobox gene family, the Hox genes, more extensive cloning and phylogenetic surveys have been under taken. Mammals (and probably all higher vertebrates) have four similar clusters of Hox genes, homologous to the single Hox or HOM-C cluster of arthropods and nematodes (reviewed by Holland, 1992; Burglin and Ruvkun, 1993). Elucidating the number of Hox clusters in amphioxus and lower vertebrates is crucial to determining the time of Hox cluster duplication. In addition to cluster duplication, there is the question of tandem duplications within a cluster. The 38 mammalian Hox genes are divisible between 13 paralogous groups (containing genes related by the cluster duplication events); many of these groups are not present in arthropods and nematodes (Holland, 1992; Burglin and Ruvkun, 1993). Phylogenetic reconstructions suggest that the pre-duplication Hox cluster organization was not identical to any of the clusters of mouse or human (Kappen and Ruddle, 1993); hence, tandem duplications and/or gene losses must have occurred after cluster duplication. Amphioxus Hox genes could give clues to the timing of these events.
The first amphioxus Hox gene published was AmphiHox3 (Holland et al., 1992) from Branchiostomafloridae; complete gene sequence showed this gene is homologous to paralagous group 3 of mammalian genes. This assignment suggests that the tandem duplication event that yielded paralogous groups 2 and 3 (both related to the Drosophila pb gene) predated the divergence of amphioxus and vertebrates; it cannot be dated more accurately at present. The number of Hox clusters in the amphioxus genome has been estimated in two studies using PCR (Pendleton et al., 1993; Holland et al., 1994). Both studies utilized the same species (B. floridae) and identified multiple Hox genes. From analysis of the deduced translation products of short Hox clones, Pendleton et al. (1993) conclude that “the amphioxus data are in good agreement with a two cluster model”; however, from similar data Holland et al. (1994) conclude that there is “probably a single Hox cluster”. The dif ficulty in determining the number of clusters stems partly from the fact that PCR primers capable of amplifying a broad spectrum of Hox genes can only yield up to 82 nucleotides of unique sequence from each gene. This is often insufficient to assign a gene accurately to a paralogous group (Garcia Fernandez and Holland, unpublished data). To overcome this problem, and resolve the discrepancy, we have isolated genomic clones of ten amphioxus Hox genes and mapped their genomic organisation. We find there is a single cluster of Hox genes in the amphioxus genome (Garcia-Fernandez and Holland, unpublished data).
It is interesting to compare our one cluster model for amphioxus Hox genes (Holland et al., 1994) with PCR results obtained for a lamprey (Pendleton et al., 1993). Despite the dif ficulty in assigning PCR clones to paralogous groups, the 19 Hox genes identified in Petromyzvn marinus are consistent with lampreys having at least two, and perhaps three or four, Hox clusters. This suggests the initial Hox cluster duplica tion(s) occurred in the lineage leading to the first vertebrates, after the divergence of amphioxus.
EVOLUTIONARY INSIGHTS FROM OTHER AMPHIOXUS GENES
The first clues to gene family complexity in amphioxus, predating the homeobox results discussed above, came from Chan et al. (1990). These authors reported that Branchiostoma califomien sis has a single insulin-like gene (/LP), homologous to three gene family members in mammalian genomes (insulin, IGF-1, IGF- 2); the deduced mature protein sequence shares equal identity with each of the three human proteins. The simplest explanation is that amphioxus retains a single member of this gene family, and that insulin gene duplications occurred on the vertebrate lineage, after divergence of amphioxus and vertebrates. One of the duplication events occurred very early on the vertebrate (or pre-vertebrate) lineage, since both hagfish and lampreys possess an insulin gene and at least one IGF gene (Nagamatsu et al., 1991). Remarkably, evidence for this very ancient duplication may still be present in the human genome: IGF-1 maps to chro mosome 12, within a region of paralogy to chromosome 11 that contains the insulin and IGF-2 genes (Brissenden et al., 1984; Lundin, 1993). It should be possible to test if this paralogy is genuinely the result of a very early duplication event (of a chro mosome, chromosomal region or entire genome) by examination of the genes linked to the /LP gene in amphioxus.
The Mn superoxide dismutase (Mn SOD) gene and an inter mediate filament (/F) gene have also been cloned from amphioxus (Smith and Doolittle, 1992; Riemer et al., 1992). The former appears to be a single copy gene in all animals studied, implying that gene duplications during vertebrate ancestry did not affect every gene (or subsequent gene loss has returned some gene families to singletons). The intermediate filament genes could be an informative source of data on dupli cations, since in mammals they form five subfamilies (types I to V), each containing multiple genes. Exhaustive surveys have not been carried out in amphioxus; the one gene reported to date is clearly a type III gene (vimentin/desmin family), con sistent with the idea that initial subdivision of the IF gene superfamily predated the divergence of amphioxus and verte brates (Riemer et al., 1992).
It would be interesting to know the number of amphioxus genes in each IF gene subfamily, particularly since mammalian gene mapping studies reveal that some of the ‘within group’ duplications almost certainly coincided with Hox cluster dupli cations. For example, within both the type I (acidic cytoker atin) and the type II (basic cytokeratin) gene families, there are related genes very closely linked to the HOXB and HOXC gene clusters on human chromosomes 12 and 17 (Bentley et al., 1993; Lundin, 1993).
There are several other cases where members of a gene family are chromosomally linked to more than one mammalian Hox cluster; in each case, their origin by chromosomal or genome duplication may have coincided with Hox cluster duplication. Possible examples of ‘co-duplicated’ gene families include (in addition to the cytokeratins): collagen genes, retinoic acid receptor genes, Evx homeobox genes, erythrocyte band 3 related genes, glucose transporter genes, actin genes, GLI/ci0 zinc finger genes, myosin light chain genes, some Wnt genes (but see below) and the neuropeptide Y/pancreatic polypeptide genes (gene mapping data from Bentley et al., 1993; Lundin, 1993). Extrapolating from data on the timing of Hox cluster duplica tions (Pendleton et al., 1993; Holland et al., 1994; Garcia Fernandez and Holland, unpublished data), we suggest that expansion of many of these gene families occurred close to ver tebrate origins. We do not, however, discount the possibility that additional duplication events occurred in these gene families during the subsequent evolutionary radiation of the vertebrates.
The Wnt gene family is an interesting case from the per spective of gene duplications. These genes encode an extensive family of secreted proteins implicated in cell-cell signalling during vertebrate and invertebrate embryogenesis (Nusse and Varmus, 1992). Sidow (1992) investigated the diversification and molecular evolution of the Wnt gene family, by phyloge netic analysis of 72 partial Wnt gene sequences isolated from a diversity of vertebrates, echinoderms and Drosophila. The results suggested that Wnt-1, -3, -5, and -7, and one or more ancestors of Wnt-2, -4, -6, and -JO were probably present in the genome of the last common ancestor of arthropods and ver tebrates. Later duplications of Wnt-3, -5, -7, -8 and -JO (giving rise to, for example, Wnt-3a and -3b) occurred before the diver sification of jawed vertebrates, perhaps after divergence of the hagfish lineage.
We used PCR to search for Wnt genes in amphioxus genomic DNA, since no protochordate genes were included in the original analysis. After cloning of the amplified band, sequence analysis of 12 recombinants revealed just two amphioxus Wnt genes (Fig. 2). The phylogenetic position of amphioxus and the history of gene duplications in the Wnt family (Sidow, 1992) imply that amphioxus should have addi tional Wnt genes, unless they were lost during its evolution. It will be particularly interesting to determine if amphioxus has homologues of those genes that are duplicated in jawed verte brates (Wnt-3, -5, -7, -8, and -JO).
EVOLUTIONARY INSIGHTS FROM TUNICATE GENES
In the examples discussed above, the assumption is made that gene family organization in amphioxus is primitive with respect to the vertebrate condition. To test this assumption, comparison can be extended to an outgroup. The Urochordates (tunicates) may be useful for this comparison, since they are thought to be the sister group to the clade comprising amphioxus plus vertebrates (Fig. 1). In addition, this compar ison could help evaluate the hypothesis that, after divergence of the tunicates, substantial gene duplications occurred on the lineage leading to amphioxus plus vertebrates (Ohno, 1970).
The majority of tunicates belong to the Ascidiacea: a group of animals widely used in developmental studies. Conse quently, many genes and gene families have been cloned from ascidians. For the present purposes, however, we are only concerned with members of those gene families also analyzed in amphioxus and several vertebrates. One example alluded to above was the Msx homeobox gene family; PCR analyses suggest that the ascidian Ciona intestinalis probably has a single member of this gene family (Holland, 1991b), as also found for amphioxus. Hence in this example, Msx gene dupli cations postdated the amphioxus-vertebrate divergence.
Perhaps surprisingly, at the time of writing, few homeobox genes from the Hox family have been reported from ascidians. Single Hox genes have been isolated from Phallusia mammilata (W. Gehring and Paul Baumgartner, personal com munication) and, by PCR, from several other ascidians (Ruddle et al., this volume). In addition, in the ascidian Halocynthia roretzi, screening of genomic Southern blots and a cDNA library using Antp as a probe yielded only the divergent homeobox gene, AHoxl (Saiga et al., 1991)
We decided to examine complexity of the Hox gene family in a group of tunicates related to the ascidians, the appendicu larians. These are small (1.5 mm) pelagic tunicates with an adult morphology similar to ascidian tadpole larvae (Fig. 3A);they do not metamorphose into a sessile stage. We reasoned that these animals may be a good outgroup for comparison to amphioxus and vertebrates, since recent studies of sperm mor phology suggest they derive from a more basal lineage within the tunicates than do the ascidians (Fig. 1), and they possibly have a less highly modified morphology and life cycle (Holland et al., 1988; Holland, 1991a).
Using degenerate primers (complementary to Hox genes from paralagous groups 1 to 10), we employed PCR to search for Hox genes in genomic DNA of Oikopleura dioica. After cloning of the amplified band, we determined the DNA sequence of 19 recombinant clones. All clones were identical (or with up to one nucleotide difference), and presumed to derive from the same Oikopleura Hox gene (Fig. 3B). Failure to clone additional genes does not disprove their presence in the genome, but similar PCR conditions have yielded multiple Hox genes in many other metazoan species (Averof and Akam, 1993; Pendleton et al., 1993; Holland et al., 1994). It would be surprising if Oikopleura dioica, or other tunicates, have only a single Hox gene. An alternative possibility is that some aspect of genome organisation, codon usage or sequence divergence in tunicates causes inefficient cloning or PCR amplification. Further work is necessary to resolve these alternatives. Pos session of a single Hox gene cannot be the primitive state within the chordates, since wider comparisons to arthropods, echinoderms and nematodes indicate that a cluster of at least five Hox genes predated the origin of the chordates (Burglin and Ruvkun, 1993). Hence, despite the phylogenetic position of tunicates, it may be difficult to address satisfactorily the question of exactly when genome duplications occurred during the very early phases of chordate radiation.
TIMING OF GENE DUPLICATIONS
Table 1 summarizes the data on timing of gene duplications directly inferred from cloning of amphioxus genes. These data, and the above discussions, suggest that different gene families followed different patterns of diversification during the evolu tionary radiation of chordates. Some gene families apparently showed stability without duplication (Mn SOD gene); whilst some gene duplications occurred after vertebrate origins (eg: some Wnt gene duplications; one IGF gene duplication). A common theme found is expansion of gene families on the ancestral lineage of vertebrates, occurring after the lineage leading to amphioxus had diverged. Examples include dupli cation of the Hox cluster, Msx gene, Cdx gene and the ancestral insulin/IGF gene, plus probably duplication of several genes chromosomally linked to the Hox clusters.
These gene family analyses can be used to evaluate previous hypotheses concerning the evolution of vertebrate genomes. Ohno suggested that there were two principal phases of gene duplication on the lineage leading to higher vertebrates (Ohno, 1970, 1993; see also Lundin, 1993). He postulated that genome expansion occurred (a) before vertebrate origins (predating divergence of amphioxus and vertebrates), and (b) during ver tebrate radiation (in Devonian fish or amphibia).
We believe that the available comparative data from amphioxus, tunicates and jawless vertebrates (see above) suggest either of two different scenarios. Either (1) there was one major phase of genome expansion, involving two rounds of extensive gene duplication (perhaps by complete or partial tetraploidization), close to the origin of the vertebrates; or (2) there were indeed two phases, but the first was close to the origin of the vertebrates and the second was close to the origin of the jawed vertebrates or gnathostomes (Fig. 4). At present we favour the second of these models, since it is compatible with data from the Hox, Wnt, insulin/IGF and En gene families.
We propose that the first phase of gene duplication, close to vertebrate origins, included the initial Hox cluster duplication and an insulin gene duplication. Evidence for the former comes from our demonstration of a single Hox gene cluster in amphioxus (Holland et al., 1994; Garcia-Fernandez and Holland, unpublished data), coupled with PCR data suggesting more than one cluster in lampreys (Pendleton et al., 1993). Together, these date the initial Hox cluster duplication to after divergence of the amphioxus lineage, but before emergence of lampreys. Suggestive evidence for similar timing of the first insulin gene duplication stems from the single /LP gene in amphioxus (Chan et al., 1990), but two genes (insulin and IGF) in lampreys and hagfish (Nagamatsu et al., 1991).
We suggest the second phase of duplication was close to gnathostome origins, and probably included further Hox cluster duplication, a second IGF duplication, duplication of the En gene, and expansion of the Wnt gene family. Pendleton et al. (1993) suggest lampreys have three Hox clusters; we believe their data could reflect a two-cluster model. Either way, a second Hox cluster duplication event is implied, after diver gence of the lamprey lineage, possibly around the origin of jawed vertebrates. The second IGF duplication is implied by the existence of a single IGF gene in lampreys and hagfish (Nagamatsu et al., 1991), but two in jawed vertebrates analyzed. Evidence that several Wnt gene duplications may have also occurred close to jawed vertebrate origins comes from PCR surveys of hagfish and gnathostomes (Sidow, 1992). The best placement of duplications affecting the Wnt-3, -5, -7, and -JO genes (in each case giving a and b paralogues) was after the divergence of the hagfish lineage on the ancestral lineage of jawed vertebrates, although this is not at statistical significance. Timing of an En homeobox gene duplication is inferred by the isolation of one En gene in lampreys, but two to three in teleosts, amphibia and mammals (Holland and Williams, 1990). The two En genes in hagfish could be the result of a separate duplication event, as suggested by Holland and Williams (1990); this may be related to recent, indepen dent, tetraploidy in this lineage (see Ward et al., 1981). The Cdx and Msx homeobox gene families also expanded on the ancestral lineage of the vertebrates, but there are fewer clues to timing (see above).
This two-phase model for gene family expansion during early vertebrate evolution is testable by analysis of other gene families. Resolution will require a combination of careful phy logenetic analysis of extant gene sequences, combined with gene family analysis and genome analysis in protochordates, jawless vertebrates, teleost fish and other chordate taxa.
FROM DUPLICATION TO INNOVATION
The hypothesis that mutations in regulatory genes could underlie evolutionary change in embryonic development is now widely accepted (Ohno, 1970; Raff and Kaufman, 1983; Arthur, 1988; Holland, 1992). But different types of mutation could play different roles in developmental evolution. For example, it seems unlikely that minor alterations to the coding sequence or expression pattern of a regulatory gene could allow the origin of completely new morphological features or developmental processes, at least in the majority of cases. Duplication of regulatory genes (followed by divergence of one or both daughter genes), seems more likely to precede the origin of such developmental innovations. Gene duplication would not necessarily cause major developmental alteration; rather, it is considered permissive to subsequent phenotypic change. This hypothesis, therefore, does not propose the creation ofradically altered ‘hopeful monsters’ in a single gen eration, but envisages new genes being made available to the gradual modifying effects of natural selection and genetic drift.
The hypothesis predicts a correlation between the origin of new regulatory genes and the emergence of new cell behaviours, body regions or structures (Holland, 1992); furthermore, significant changes to body organisation may correlate with significant genome changes simultaneously affecting several gene families. The scenario presented in the previous section for the timing of gene duplications during vertebrate evolution suggested there may have been two phases of gene duplica tion: one close to vertebrate origins and a second close to jawed vertebrate origins. Both stages of chordate evolution involved significant developmental changes; in addition, both phases of genome expansion involved duplication of developmental regulatory (and other) genes.
The origin of vertebrates involved the evolution of several innovations in developmental strategy, notably the involve ment of neural crest and placodes in craniofacial morphogen esis, elaboration of the brain (origin of the midbrain ?), and spe cialisation of the segmented mesoderm (Holland, 1992; Gans, 1993; Holland and Graham, 1994). In terms of anatomy, the differences between extant jawed vertebrates and jawless ver tebrates are less dramatic than between vertebrates and proto chordates, but important developmental transformations can be inferred. These include the origin of paired appendages and a remodelling of the anterior branchial arches to form the jaws and jaw support apparatus (these transformations need not have occurred simultaneously).
It seems at least plausible, therefore, that multiple new genes originating close to vertebrate origins, and close to jawed ver tebrate origins, permitted the evolution of these developmen tal innovations. Without new sets of genes, developmental control may have been constrained from further elaboration; significant gene duplication may have allowed release from these genetic constraints, allowing rapid adaptive radiation of the first vertebrates and, later, the jawed vertebrates.
These hypotheses require new genes to be recruited to new roles after duplication. How could this occur? One possible route would be evolutionary modification of the coding sequence of a gene, such that it is optimized for a different function to that of the ancestral gene. This may be accompa nied by changes in gene regulation to allow deployment at a new site or new time. An example that shows the feasibility of this pattern of evolution (although not relating to the early ver tebrates) is the lysozyme gene family in mammals. Lysozyme gene duplication in the ruminant mammal lineage was followed by changes in protein sequence and expression, allowing lysozyme to be co-opted to a digestive role in the foregut of cows, sheep and relatives (Irwin et al., 1992). In contrast, a recent experimental analysis of the prd gene family in Drosophila, demonstrates that adaptive divergence of protein-coding sequences does not always accompany func tional divergence after duplication (Li and Noll, 1994). The related genes prd, gsb and gsbn apparently encode function ally equivalent proteins; their divergent roles may have evolved solely by changes in deployment. We suggest this latter route to functional diversification may have occurred frequently.
At present, there are limited clues to the mechanisms that allowed functional divergence within regulatory gene families during vertebrate evolution. From the inferred timing of gene duplication, and the expression patterns of the mammalian gene family members, a hypothesis can be proposed regarding functional evolution of the vertebrate Msx homeobox genes (Holland, 1991b, 1992). As described above, two of the three mammalian Msx genes resultant from duplication are predom inantly expressed in vertebrate-specific tissues, including cran iofacial neural crest derivatives, developing teeth and eyes. We suggest that these expression characteristics reflect co-option to new functions at vertebrate origins; the origin of Msx-1 and Msx-2 might even have permitted the evolution of new patterns of cell behaviour and differentiation and new developmental processes. This hypothesis implies that Msx-1 and Msx-2 acquired new control elements after gene duplication, leaving Msx-3 to persist with an ancestral function. Further insight will come from analysis of the expression pattern of the third mammalian paralogue, Msx-3, and comparison to the single amphioxus Msx gene. These analyses are in progress (S. Shimeld and P. Sharpe, personal communication; A. Sharman and P. W. H. H., unpublished data).
The evolution of Hox gene function following cluster duplication is discussed in detail by Gaunt (1991) and Holland (1992). The expression patterns of mouse (and other vertebrate) Hox genes suggests that Hox and Msx genes followed different courses of functional diversification. Each mammalian Hox gene is expressed within a precise, regionally restricted, domain along the anteroposterior axis of the devel oping neural tube, plus often in a subset of tissues from somitic or lateral mesoderm and/or neural crest cells (for reviews, see Holland and Hogan, 1988; Shashikant et al., 1990; Gaunt, 1991). Hox genes related by cluster duplication have similar (but not always identical) expression patterns in the develop ing neural tube, but there are dramatic differences in which mesodermal and neural crest derivatives express these par alogues. Furthermore, gene targeting of Hox genes often causes more severe disruption in mesodermal and neural crest derivatives than in the neural tube, suggesting partial func tional overlap in the latter. This suggests that most Hox genes retained ancestral roles in neural patterning after Hox cluster duplication, but added to these roles were secondary expression sites and functions (perhaps by acquisition of addi tional cis-regulatory elements; Holland, 1992).
This evolutionary scenario, which was based primarily on data from mouse Hox genes, made testable predictions regarding the expression of amphioxus Hox genes. If the ancestral function of chordate Hox genes, prior to cluster duplication, was to encode positional information within the devel oping neural tube, then this should be the predominant (or only) expression site in amphioxus, which retains a single Hox cluster. Consistent with this prediction, the AmphiHox3 gene was found to be expressed predominantly in lhe developing amphioxus neural tube, where it respects a stable anterior boundary at the level of somite five. Expression is also seen in posterior mesoderm, but thisdoes not respect a stable boundary and remains posteriorly localised through development (Holland et al., 1992, 1994; Fig. SA).
Signals were not detected by whole-mount in situ hybridiz ation on amphioxus larvae older than 5 days, perhaps due to technical problems relating to probe penetration (Holland et al., 1992). We therefore used radioactive in situ hybridization to histological sections to assess if additional, secondary expression sites appear later in development. These experi ments revealed that the predominant site of expression remains the dorsal nerve cord in adult and juvenile amphioxus; we find no consistent evidence for secondary expression sites (Fig. 5B).
Acquisition of completely new roles (as proposed for Msx genes) or the supplementation of ancestral roles with secondary roles (Hox genes) are just two of many routes possible for func tional diversification of duplicated genes. It seems likely that, even if gene duplication events affected many (or all) gene families simultaneously in evolution, different gene families will have followed quite different routes of functional diversi fication. These patterns of evolution need to examined in much more detail (including analysis of coding sequences, regulatory elements and function), and in many more gene families, if strong correlations are to be found between particular genetic and phenotypic changes in vertebrate evolution. Correlations should be tested by examination of multiple taxa, but cannot be considered proof of causality in evolution. Even so, by analysis of multiple gene families in many taxa, it should be possible to assess the hypothesis that gene duplications have played an important permissive role in the evolution of vertebrate devel opment. Is it unrealistic to hope that insight will eventually be gained into the mutations that permitted the evolution of specific innovations, such as the origin of neural crest cells and placodes, or the transition from branchial arches to jaws?
ACKNOWLEDGEMENTS
We thank Walter Gehring, Frank Ruddle, Anna Sharman, Paul Sharpe and Seb Shimeld for communication of results prior to publi cation; Per Ahlberg for discussions; and Linda Holland, Nick Holland and the rest of Team Amphioxus for help with specimen collection. The authors’ research in this field was supported by the SERC (P. W. H. H., N. A. W.), a Royal Society Research Fellowship to P. W. H. H. and a Human Frontiers Fellowship to J. G. F.