The sporophyte generation of the brown alga Ectocarpus sp. exhibits an unusual pattern of development compared with the majority of brown algae. The first cell division is symmetrical and the apical-basal axis is established late in development. In the immediate upright (imm) mutant, the initial cell undergoes an asymmetric division to immediately establish the apical-basal axis. We provide evidence which suggests that this phenotype corresponds to the ancestral state of the sporophyte. The IMM gene encodes a protein of unknown function that contains a repeated motif also found in the EsV-1-7 gene of the Ectocarpus virus EsV-1. Brown algae possess large families of EsV-1-7 domain genes but these genes are rare in other stramenopiles, suggesting that the expansion of this family might have been linked with the emergence of multicellular complexity. EsV-1-7 domain genes have a patchy distribution across eukaryotic supergroups and occur in several viral genomes, suggesting possible horizontal transfer during eukaryote evolution.
Multicellular organisms with haploid-diploid life cycles are found in several major eukaryotic groups including the green lineage (Archaeplastida) and the red and brown macroalgae (Rhodophyta and Phaeophyceae, respectively). In these organisms, a single genome provides the genetic information to deploy two different developmental programmes during the course of the life cycle, leading to the construction of the sporophyte and gametophyte generations, respectively (Cock et al., 2014; Coelho et al., 2007). One consequence of this type of life cycle is that the emergence of developmental innovations for one generation of the life cycle can occur without it being necessary to evolve developmental regulatory modules de novo. This is because it is possible to adapt regulatory modules that have evolved to function during one generation of the life cycle to carry out related functions during the other generation. An important objective of the developmental biologists that study these organisms has been to understand the relative contributions of these two processes – developmental innovation and trans-generation co-option – to the evolution of multicellularity in these species (Dolan, 2009; Pires and Dolan, 2012; Shaw et al., 2011).
In the green lineage, embryophytes (which have haploid-diploid life cycles) are thought to have evolved from a green algal ancestor with a haploid life cycle by the addition of a sporophyte generation (Bower, 1890; Celakovsky, 1874; Dolan, 2009; Haig and Wilczek, 2006; Niklas and Kutschera, 2010; Qiu, 2008; Qiu et al., 2006). It has been proposed that the regulatory networks that controlled the development of early embryophyte sporophytes were recruited to a large extent from the gametophyte generation (Dolan, 2009; Niklas and Kutschera, 2010). Support for this viewpoint has come both from broad comparisons of gametophyte and sporophyte transcriptomes (Nishiyama et al., 2003; Szovenyi et al., 2011) and from demonstrations that homologues of key regulatory genes in embryophyte sporophytes play important roles in gametophyte function in bryophytes (Aoyama et al., 2012; Kubota et al., 2014; Menand et al., 2007; Nishiyama et al., 2003). There are, however, exceptions (Szovenyi et al., 2011). For example, members of the KNOX family of TALE homeodomain transcription factors are not expressed during the gametophyte generation in bryophytes and therefore appear to have evolved as sporophyte developmental regulators (Sano et al., 2005).
To more fully understand the relative contributions of developmental innovation and trans-generation co-option to the evolution of multicellularity in organisms with haploid-diploid life cycles, it would be of interest to investigate this phenomenon in several lineages that have independently evolved complex multicellularity. Not only would this allow the generality of inferences from studies of the green lineage to be assessed but it would also allow an evaluation of the importance of the ancestral state in subsequent evolutionary events. For example, the brown algae (Phaeophyceae) most probably evolved from an ancestor that alternated between simple, filamentous sporophyte and gametophyte generations (Kawai et al., 2003; Silberfeld et al., 2010). If this were the case, then the evolution of novel regulatory systems might have played a more important role in the emergence of novel developmental mechanisms than the co-option of regulators across generations in this phylogenetic group. Unfortunately, very little is known about developmental processes in the brown algae and, for example, no developmental regulatory genes have so far been characterised at the molecular level. However, the recent emergence of the filamentous alga Ectocarpus sp. as a model organism for this group (Cock et al., 2014; Cock et al., 2015; Coelho et al., 2012a) has created a context in which this type of question can be addressed.
Ectocarpus sp. has a haploid-diploid life cycle that involves alternation between two generations, which both consist of uniserate filaments with a small number of different cell types and bearing simple reproductive structures (Cock et al., 2015). The morphological similarity of the two generations has allowed mutants affected both in switching between generations (Coelho et al., 2011) and in generation-related developmental processes (Peters et al., 2008) to be isolated. The immediate upright (imm) mutant is particularly interesting because it has major effects on the early development of the sporophyte generation but causes no visible phenotype during the gametophyte generation (Peters et al., 2008). In individuals that carry this mutation, the initial cell of the sporophyte generation undergoes an asymmetrical rather than a symmetrical cell division and produces an upright filament and a rhizoid rather than the prostrate filament typical of wild-type sporophytes (Peters et al., 2008). Individuals that carry this mutation therefore fail to implement the typical early sporophyte developmental programme and resemble gametophytes, but produce the sexual structures of the sporophyte generation at maturity. The absence of a phenotype during the gametophyte generation suggests that the developmental programme directed by the IMM gene might have been a sporophyte-specific innovation.
Here we describe the positional cloning of the IMM locus and show that this gene encodes a protein of unknown function that shares a novel, repeated motif with a viral protein. The IMM gene is part of a large, rapidly evolving gene family in Ectocarpus sp., and species with identifiable homologues exhibit an unusual distribution across the eukaryotic tree of life.
Positional cloning of the Ectocarpus IMM locus
Peters et al. (2008) showed that the imm mutation behaved as a recessive, Mendelian allele and was located on an autosome. To map this mutation, a backcrossed descendant of strain Ec137 carrying the imm mutation (Peters et al., 2008) was crossed with the outcrossing line Ec568 (Heesch et al., 2010) to generate a segregating family of 1699 haploid progeny. The IMM locus was then mapped genetically by scanning the genome for linked microsatellite markers and fine mapping the mutation. To scan for linked markers, a subset of the population (30 to 75 individuals) was genotyped for 97 microsatellite markers (Heesch et al., 2010) distributed at ∼30 cM intervals along the length of the entire genetic map. Additional markers were then generated for a region on chromosome 27 that exhibited co-segregation with the Imm+ phenotype and these were tested against the entire population to fine map the IMM locus. Overall, a total of 121 markers were genotyped (Table S1), allowing the IMM locus to be mapped to a region of 43.7 kb between coordinates 2,299,499 and 2,343,206 on chromosome 27 (Fig. 1A).
To identify the imm mutation within the 43.7 kb interval, this region was amplified as a series of PCR products and the pooled products sequenced on an Illumina HiSeq platform. As the Imm− phenotype is the result of a spontaneous mutation that was not originally present in the parent strain Ec17, we reconstructed reference sequences for the two parental haplotypes of this region by sequencing equivalent PCR products amplified from eight wild-type haploid siblings of the imm strain Ec137. Comparison of the sequence data for the eight siblings with that from the Ec137 strain allowed polymorphisms inherited from the diploid parent sporophyte to be distinguished from the causal mutation. Sanger resequencing was used to validate polymorphisms detected by the Illumina sequencing approach and to generate sequence data for several short regions that were not covered by the Illumina sequence data. This approach identified a 2 bp deletion within exon five of gene Ec-27_002610 as the causal mutation of the Imm− phenotype (Fig. 1A). No other mutations were detected within the mapped interval.
The IMM gene encodes a protein of unknown function that is related to a brown algal virus protein
The IMM gene (locus Ec-27_002610) is predicted to encode an 862 amino acid (91.8 kDa) protein that consists of a long N-terminal domain that shares no similarity with other Ectocarpus sp. genes or genes from other species, plus a C-terminal domain that includes five imperfect tandem repeats of a 38 amino acid cysteine-rich motif (C-X4-C-X16-C-X2-H-X12, Fig. 1B). The 38 amino acid cysteine-rich motif is very similar to a cysteine-rich repeated motif found in the EsV-1-7 protein of the Ectocarpus virus EsV-1 (Delaroque et al., 2001), which also contains five of these repeated motifs. Based on this similarity, hereafter we refer to the 38 amino acid cysteine-rich motif as an EsV-1-7 repeat.
The 2 bp deletion in the imm mutant causes a frameshift in the part of the gene that encodes the N-terminal domain. The mutation is predicted to lead to the production of a 418 amino acid protein with a truncated N-terminal domain and possessing no EsV-1-7 repeats (Fig. 1B).
Disruption of IMM function by RNA interference
Recent work has demonstrated that injection of double-stranded RNA into zygotes of the brown alga Fucus induces an RNA interference (RNAi) response, leading to knockdown of target gene expression (Farnham et al., 2013). RNAi therefore represents a potential approach to investigate gene function in brown algae, but a modification to the Fucus protocol was required because microinjection is not feasible for Ectocarpus due to the small size of its cells. We therefore developed an alternative approach in which synthetic siRNA molecules were introduced into naked gametes using a transfection reagent (see Materials and Methods for details).
Wild-type gametes that fail to fuse with a gamete of the opposite sex can develop parthenogenetically to give rise to partheno-sporophytes. These partheno-sporophytes go through the same developmental steps as diploid sporophytes derived from zygotes and are morphologically indistinguishable from the latter. In both cases the initial cell undergoes a symmetrical division that gives rise to two germ tubes, which grow to form a symmetrical, prostrate basal filament (Peters et al., 2008). The basal filament, which is composed of characteristic round and elongated cells, adheres strongly to the substratum. Following simultaneous introduction of three siRNA molecules targeting the IMM gene transcript, a small proportion [0.63±0.09% (mean±s.d.) in six replicates each of 400 individuals] of the parthenogenetic gametes adopted a pattern of early development that closely resembled the phenotype of the imm mutant (Peters et al., 2008). These gametes underwent an asymmetrical rather than a symmetrical initial cell division and the two germ tubes of the developing partheno-sporophyte gave rise to an upright filament and a rhizoid (Fig. 2). Individuals with this phenotype were not observed in parallel samples of gametes treated with an siRNA directed against a green fluorescent protein gene sequence as a control (six replicates of 400 individuals) and the difference between the test and control experiments was highly significant (Pearson's χ2=13.0, P=0.0003).
These observations indicated that RNAi-induced knockdown of IMM gene expression had the same developmental consequences as the imm mutation, at least in a small proportion of the treated individuals. Taken together with the mapping of the genetic mutation, this observation confirmed that the 2 bp deletion identified in exon five of gene Ec-27_002610 is the causal mutation of the Imm− phenotype.
Expression pattern of the IMM gene
qRT-PCR analysis indicated that the IMM gene transcript was approximately twice as abundant in diploid sporophytes and partheno-sporophytes as in gametophytes (Fig. 3). The relatively high abundance of the IMM transcript during the gametophyte generation was surprising because no visible phenotype was detected during this generation in the imm mutant (Peters et al., 2008). The transcript was less abundant in imm mutant partheno-sporophytes than in wild-type partheno-sporophytes (Fig. 3), suggesting that the mutation has a destabilising effect on the transcript.
Analysis of gene expression in the imm mutant sporophyte
In a previous study we analysed gene expression in the imm mutant using a microarray constructed with sequences from two subtraction libraries enriched in genes differentially expressed during either the sporophyte or the gametophyte generation (Peters et al., 2008). This analysis indicated that sporophyte-specific genes were downregulated and gametophyte-specific genes were upregulated in the imm mutant during the sporophyte generation. Based on this information, and the morphological resemblance of the imm sporophyte to the wild-type gametophyte, the Imm− phenotype was interpreted as representing partial homeotic switching from the sporophyte to the gametophyte developmental programme (Peters et al., 2008). Here we used multiple RNA-seq datasets to compare the imm transcriptome with a broader range of samples, including two microdissected partheno-sporophyte tissue samples corresponding to the apical upright filaments and the basal system, respectively. Principal component analysis (PCA) indicated that, overall, the transcriptome of the imm partheno-sporophyte was actually more similar to the transcriptomes of wild-type partheno-sporophyte samples, particularly samples that included upright filaments, than to wild-type gametophyte samples (Fig. 4).
We therefore reanalysed the expression patterns of sets of genes that had previously been identified as significantly upregulated or downregulated in the imm mutant partheno-sporophyte compared with the wild-type partheno-sporophyte [denoted as imm upregulated (IUP) and imm downregulated (IDW) genes by Peters et al. (2008)]. This analysis indicated that IUP and IDW genes tended to be upregulated and downregulated, respectively, in gametophyte samples but they also showed very similar patterns of expression in upright filaments isolated from the sporophyte generation (Fig. S1). Bearing in mind that the gametophyte generation consists almost entirely of upright filaments, these results suggested that the IUP and IDW genes, rather than being life cycle-regulated genes, might correspond to loci that are differentially regulated in upright filaments compared with basal tissues.
To further investigate this possibility, a genome-wide analysis was carried out using RNA-seq data and DEseq2 (Love et al., 2014) to identify additional genes that were differentially expressed in the imm partheno-sporophyte compared with the wild-type partheno-sporophyte. This analysis identified 1578 genes that were significantly differentially expressed between the two samples (1087 upregulated and 491 downregulated in imm; Table S2). Again, analysis of expression patterns across several different samples indicated that the majority of these genes did not exhibit life-cycle-generation-specific expression patterns but rather were upregulated or downregulated in upright filaments (Fig. S2).
Taken together, these observations suggest an alternative interpretation of the Imm− phenotype. Rather than representing a mutation that causes switching between life cycle generations, we propose that abrogation of the IMM gene leads to failure to correctly implement the early developmental programme of the sporophyte. In the absence of a functional IMM gene, the initial cell does not divide symmetrically and there is no deployment of a system of basal filaments before the establishment of the apical-basal axis (Fig. 2A). Rather, an asymmetrical division of the initial cell directly produces a basal rhizoid cell and an apical thallus cell. We suggest that the resemblance to the gametophyte, in terms of gene expression, is not due to switching to the gametophyte developmental programme but is instead due to the sporophyte adopting an alternative developmental programme that is more similar to that of the gametophyte (immediate production of an upright filament).
The 1578 genes that were identified as significantly differentially expressed between the imm partheno-sporophyte and the wild-type partheno-sporophyte were analysed for enriched gene ontology categories. One significant (FDR<5%) category ʻG-protein signalling' was found for the upregulated genes and two categories ʻphotosynthesis-related' and ʻRNA polymerase II activity' for the downregulated genes (Tables S3 and S4).
IMM is a member of a large gene family in Ectocarpus sp.
A search of the Ectocarpus sp. genome identified a large family of 91 genes that encode proteins with at least one EsV-1-7 domain, indicating that IMM is part of a large gene family in this species. The 91 predicted proteins contain between one and 19 copies of the cysteine-rich motif (six on average), with multiple motifs being organised as adjacent tandem repeats in almost all cases. Eighty-seven of the 91 predicted proteins (including IMM) did not consist solely of EsV-1-7 domains but contained at least one additional polypeptide region (of at least 25 contiguous amino acids and often considerably longer), although only two of these additional polypeptide regions contained known structural domains (a heavy metal-associated domain and an ABC transporter ATP-binding domain; Table S5). Clustering analysis indicated that in most cases these additional domains were unique. Only 19 proteins could be clustered based on similarities between these additional polypeptide regions (five clusters of two to nine proteins), and most of these clusters appeared to have arisen as a result of local gene duplications (eight linked duplicated gene pairs).
In addition to the loci described above, five putative pseudogenes were associated with the EsV-1-7 family (Table S5). Classification as pseudogenes was based either on the presence of a stop codon within the predicted coding region (one gene) or on the failure to detect any evidence of gene expression (TPM<1) across multiple tissue samples and life cycle stages (four genes). In addition, we noted that the 91 EsV-1-7 domain genes described above included three bi- or monoexonic loci with short open reading frames, which are often located near to or within the untranslated regions of neighbouring genes and might correspond to gene fragments.
Estimates of transcript abundances based on RNA-seq data indicated that the members of the Ectocarpus sp. EsV-1-7 domain gene family have diverse expression patterns (Fig. S3), suggesting that they carry out diverse functions at different stages of the life cycle and in different organs.
IMM orthologues and EsV-1-7 domain proteins in other species
Searches of the recently published Saccharina japonica genome (Ye et al., 2015) and multiple brown algal transcriptomes produced by the oneKP project (https://sites.google.com/a/ualberta.ca/onekp/) identified predicted proteins similar to IMM in a broad range of brown algal species, including members of the Ectocarpales, the Laminariales and the Fucales. Reciprocal BLAST searches indicated that a number of these proteins were IMM orthologues, and this conclusion was supported by the observation that best reciprocal BLAST matches to Ec-17_002150 (which is the member of the Ectocarpus sp. EsV-1-7 family that is most similar to IMM) formed a distinct cluster in a phylogenetic tree (Fig. S4).
Analysis of IMM orthologues from seven brown algal species (Ectocarpus sp., Scytosiphon lomentaria, S. japonica, Macrocystis pyrifera, Sargassum muticum, Sargassum hemiphyllum and Sargassum thunbergi) using the paired nested site models (M1a, M2a; M7, M8) implemented in PAML5 (CODEML; Yang, 2007) did not detect any evidence of positive selection acting on this protein. Pairwise dN/dS values ranged between 0.0229 and 0.2021.
The large size of the EsV-1-7 domain family in Ectocarpus sp. and the presence of several pseudogenes and putative gene fragments suggested that this gene family might be undergoing a process of gene gain and gene loss over time. We searched for orthologues of the Ectocarpus sp. EsV-1-7 domain genes in the S. japonica genome (Ye et al., 2015) in order to investigate the evolutionary dynamics of the family. The last common ancestor of Ectocarpus sp. (order Ectocarpales) and S. japonica (order Laminariales) existed between 80 and 110 million years ago (Kawai et al., 2015; Silberfeld et al., 2010). An extensive search, which included de novo annotation of 19 new EsV-1-7 domain genes in S. japonica, identified orthologues for only 34 of the 91 Ectocarpus sp. genes (Table S6), and no orthologues were found for the five Ectocarpus sp. pseudogenes. Moreover, in general when an S. japonica orthologue was identified, it was considerably diverged from its Ectocarpus sp. counterpart. These results suggested that the EsV-1-7 gene families have diverged relatively rapidly since the divergence of the two species.
To evaluate whether the EsV-1-7 genes are evolving more rapidly than other genes in the genome, we compared the set of percentage identities between orthologous Ectocarpus sp. and S. japonica EsV-1-7 proteins with the percentage identities for a set of 9845 orthologous protein pairs obtained by reciprocal Blastp comparison of the predicted proteomes of the two species. Median identity for the orthologous pairs of EsV-1-7 proteins (31.1% identity) was significantly lower (Wilcoxon test P=9.2×10−7) than the genome-wide average (47.9%). Sex-biased genes have been shown to evolve rapidly in several, diverse species, including brown algae (Lipinska et al., 2015). The median identity for the EsV-1-7 proteins was also significantly lower (Wilcoxon test P=7.6×10−5) than that for 905 orthologous pairs of sex-biased proteins from Ectocarpus sp. and S. japonica (44.6%). Taken together, these analyses indicated that the EsV-1-7 family is evolving not only more rapidly than the average gene in the genome but also more rapidly than the set of sex-biased genes.
A more detailed analysis of sequence conservation at the domain level indicated that the EsV-1-7 domains (i.e. the regions of the proteins containing the EsV-1-7 repeats) tended to be markedly more conserved than the other parts of the proteins (Fig. S5). The conservation of the EsV-1-7 domains at the sequence level suggests that these regions correspond to the main functional features of the proteins and it is possible that the regions outside these domains do not have specific functional roles in most cases. Note that the IMM orthologues from diverse brown algal species also exhibited this characteristic, with the EsV-1-7 repeat region being more strongly conserved across species than the other parts of the protein (e.g. Fig. S5).
A search of available stramenopile genomes identified single EsV-1-7 domain genes in each of the Nannochloropsis gaditana, Nannochloropsis oceanica and Pythium ultimum genomes, although the P. ultimum gene contained only one, poorly conserved repeat (Table S6). No EsV-1-7 domain genes were found in other stramenopile genomes, including those of the diatoms Thalassiosira pseudonana and Phaeodactylum tricornutum. The small number of EsV-1-7 domain genes in stramenopiles other than the brown algae (zero or one gene per genome), suggests that the diversification of the brown algal EsV-1-7 domain genes occurred within the brown algal lineage.
Searches for homologues in other eukaryotic groups revealed a similar, patchy distribution of EsV-1-7 domain proteins across the eukaryotic tree (Fig. 5A, Table S6). The only other species in which EsV-1-7 domain proteins were detected were the cryptophyte Guillardia theta (37 genes), the green algae Coccomyxa subellipsoidea (four genes) and Monoraphidium neglectum (11 genes), and a single gene with one poorly conserved EsV-1-7 domain in the fungus Rhizophagus irregularis. The genome of the Ectocarpus virus EsV-1 contains a single EsV-1-7 domain gene and single homologues were also found in two distantly related viruses: Pithovirus sibericum (which infects Acanthamoeba) and Emiliania huxleyi virus PS401.
The sets of stramenopile, cryptophyte and chlorophyte EsV-1-7 domain genes each exhibited characteristic features in terms of the conserved residues within the repeated cysteine-rich motif (Fig. 5B), indicating that each group of genes has diversified independently.
In most brown algal species the first cell division is asymmetrical and defines not only the apical-basal axis but also establishes the separate apical and basal lineages that will give rise, respectively, to the thallus and to anchoring structures such as a rhizoid or a holdfast (Fritch, 1959). The Ectocarpus sporophyte is unusual in that the first cell division is symmetrical and a system of basal filaments is deployed before the apical-basal axis is established. A functional IMM locus is necessary for this developmental programme to be implemented. When IMM is mutated a more canonical programme of early development is observed involving an asymmetrical initial cell division and the formation of a basal rhizoid and an apical upright filament. We originally proposed, based on phenotypic and gene expression analyses, that the Imm− phenotype represented partial switching from the sporophyte to the gametophyte developmental programme, but the more detailed analysis carried out in this study instead suggests that the Imm− phenotype corresponds to a modified version of the sporophyte developmental programme. We suggest, based on these results, that the developmental programme of the Ectocarpus sporophyte evolved through a modification of an ancestral developmental programme which more closely resembled that of the majority of brown algae, i.e. early establishment of the apical-basal axis and limited deployment of basal structures.
Recent work on the life cycles of Ectocarpus species under field conditions suggests that the sporophyte generation tends to persist for longer and to be the stage of the life cycle that allows the species to overwinter (Couceiro et al., 2015). The gametophyte generation, by contrast, tends to be ephemeral during the spring months and to have principally a reproductive role. Based on these observations, the developmental programme of the sporophyte could be interpreted as an adaptation to the role of this generation in the field. Delaying establishment of the apical-basal axis will tend to delay the transition to reproductive maturity because the reproductive structures develop on the apical upright filaments. Delayed reproduction may be advantageous, for overwintering for example. Moreover, the establishment of a network of strongly attached basal filaments composed of cells with thick cell walls could also contribute to survival during seasons that are not optimal for growth and reproduction.
Interestingly, no mutant phenotype was observed in imm gametophytes (Peters et al., 2008), suggesting that the processes regulated by IMM evolved specifically within the context of the sporophyte generation. The unusual early development of the sporophyte therefore represents an example of a major developmental innovation (modified timing of major axis formation, transition from asymmetrical to symmetrical initial cell division) that has evolved specifically for one generation of the life cycle. Surprisingly, although the IMM gene was expressed most strongly in the sporophyte, transcripts also accumulated, albeit at a lower level, during the gametophyte generation of the life cycle. Therefore, it is unclear at present how the effects of this gene are restricted to one generation.
The IMM protein sequence does not contain any regions that are similar to previously defined protein domains, but the C-terminal region contains several copies of a motif that we refer to as an EsV-1-7 repeat based on the presence of similar repeats in the EsV-1-7 gene of the Ectocarpus virus EsV-1. The large phenotypic changes and the extensive transcriptome reprogramming observed in the imm mutant indicate that IMM is likely to have a regulatory role, presumably mediated by this conserved domain, but further work will be required to determine how the protein functions at the molecular level. The four highly conserved cysteine and histidine residues in each EsV-1-7 repeat are reminiscent of several classes of zinc-finger motif, and one interesting possibility that would merit further investigation is that the motif represents a new class of zinc-finger motif.
The origin of the IMM gene itself is also of interest. All of the brown algae analysed possess multiple EsV-1-7 domain genes and detailed comparison of the EsV-1-7 domain gene families in Ectocarpus sp. and S. japonica indicated that the family has been evolving rapidly with considerable gene gain and gene loss during the diversification of the Ectocarpales and Laminariales. By contrast, EsV-1-7 domain genes are rare or absent from other stramenopile lineages (a single gene in eustigmatophytes and some oomycete species, none in diatoms). This observation suggests that the expansion of this family occurred after the brown algal lineage diverged from other major stramenopiles and was perhaps linked with important events within this lineage such as the emergence of complex multicellularity. The presence of IMM orthologues in diverse brown algae indicates that this gene evolved early during the diversification of this group but presumably IMM was only later co-opted to direct the unusual pattern of early sporophyte development observed in Ectocarpus sp. (i.e. during the diversification of the Ectocarpales).
Outside the stramenopiles, EsV-1-7 domain proteins were only found in two chlorophyte species and one cryptophyte species, with a possible single gene in the fungus Rhizophagus irregularis. This patchy distribution of EsV-1-7 domain genes across the eukaryotic tree is difficult to reconcile with vertical inheritance from a common ancestor. Given that (1) the short EsV-1-7 repeat is the only motif that is conserved across major lineages, (2) that this motif does show some sequence variation and (3) that the repeat motifs of each lineage have their own distinct, conserved characteristics (Fig. 5B), it is possible that the gene families of each major eukaryotic lineage evolved independently. However, this process would have had to involve remarkable, multiple convergences towards the use of several highly conserved residues in a specific configuration in each distinct lineage. In this context, it is interesting to note that EsV-1-7 domain genes were found in three diverse viral genomes and it is tempting to speculate that the highly unusual distribution of this gene family in extant eukaryotic lineages is the result, at least in part, of ancient horizontal transfers of EsV-1-7 domain genes due to cross-species viral infections.
MATERIALS AND METHODS
Ectocarpus strains and growth conditions
The Ectocarpus strains used to map the imm mutation are described in the next section. The imm mutant strain Ec419 (CCAP1310/321) was generated by crossing the original imm mutant strain [Ec137 (CCAP1310/319); Peters et al., 2008] with a wild-type sister Ec25 (CCAP1310/3) and selecting gametophyte descendants that produced Imm− partheno-sporophytes. The imm gametophyte generation was generated from these imm partheno-sporophytes by inducing the production of meiospores. The wild-type individuals used in this study were strain Ec32 (CCAP1310/4), a brother of the original imm mutant strain Ec137. Ectocarpus was cultivated as described previously (Coelho et al., 2012b).
Genetic mapping of the IMM locus
The imm mutation was originally detected in strain Ec137, a male descendant of a diploid sporophyte Ec17 (CCAP1310/193) isolated at San Juan de Marcona, Peru (Peters et al., 2008). Ec137 was crossed with a sister, Ec25, to generate the diploid sporophyte Ec372 (CCAP1310/320), which gave rise to a male gametophyte Ec420. To map the imm mutation, Ec420 (CCAP1310/322) was crossed with Ec568 (CCAP1310/334) and the resulting diploid sporophyte Ec700 gave rise to a population of 1699 gametophyte progeny, all derived from independent meiotic events (i.e. isolated from independently micro-dissected unilocular sporangia). Initially, 30 individuals from this segregating population were genotyped with 97 microsatellite markers distributed at ∼30 cM intervals along the length of the entire genetic map (Heesch et al., 2010). Additional microsatellite markers were developed based on the Ec32 genome sequence to fine map the mutant locus (Table S1).
Quantitative reverse transcriptase PCR (qRT-PCR) analysis of mRNA abundance
Total RNA was extracted as previously described (Coelho et al., 2011) from five or six biological replicates each of imm mutant partheno-sporophytes (Ec419), wild-type partheno-sporophytes (Ec32), wild-type heterozygous diploid sporophytes (Ec17) and wild-type gametophytes (Ec32). These samples were not the same as those used for the RNA-seq analysis. DNase treatment using the TURBO DNA-free Kit (Applied Biosystems) was carried out to eliminate any contaminating genomic DNA. RNA concentration and quality were determined by spectrophotometry and agarose gel electrophoresis. Between 0.2 and 2.0 μg of total RNA was reverse-transcribed to cDNA with the SuperScript First-Strand Synthesis System (Invitrogen).
IMM cDNA was amplified with primers (5′-3′) E657Q4F (GGGGTTTGGGTGGAAGAGGACC) and E657Q4R (CGGCGTGGAAGCTGCCTGGTAT) and ELONGATION FACTOR 1α (EF1α) cDNA was amplified as an internal reference with primers EF1adeL (CAAGTCCGTCGAGAAGAAGG) and EF1autrL (CCAGCAACACCACAATGTCT). qRT-PCR was carried out using the ABsolute QPCR SYBR Green ROX Mix (Thermo Scientific) in a Chromo4 thermocycler (Bio-Rad) and data were analysed with Opticon Monitor 3 software (Bio-Rad). Amplification specificity was checked using a dissociation curve. Amplification efficiency was tested using a genomic dilution series and was always between 90% and 110%. A standard curve was established for each gene using a range of dilutions of Ectocarpus sp. genomic DNA (between 80 and 199,600 copies) and gene expression level was normalised against the EF1α reference gene. Two technical replicates were carried out for the standard curves and three technical replicates for the samples.
Small interfering RNAs (siRNAs) directed against the IMM gene transcript were designed using version 3.2 of E-RNAi (Horn and Boutros, 2010). The specificity of the designed siRNAs was determined by comparing the sequence (Blastn) with complete genome and transcriptome sequences. Candidates that matched, even partially, genomic regions or transcripts in addition to IMM were rejected. Three siRNAs with predicted high specificity corresponding to different positions along the IMM transcript were selected (Table S7). The control siRNA directed against jellyfish GFP (Caplen et al., 2001) was obtained from Eurogentec. siRNAs were introduced into Ectocarpus sp. strain Ec32 gametes using the transfection reagent HiPerFect (Qiagen). One microlitre each of 0.5 µg/µl solutions of HPLC purified siRNAs in 1× Universal siMAX siRNA Buffer (MWG Eurofins) was mixed with 12 µl HiPerFect transfection reagent in a final volume of 100 µl of natural seawater, vortexed to mix and incubated for 10 min at room temperature before being added dropwise to 100 µl freshly released gametes (Coelho et al., 2012b) in natural seawater in a Petri dish. After rotating gently to mix, the Petri dish was incubated overnight at 13°C. The following day, 10 ml Provasoli-enriched natural seawater (see below) was added and incubation continued at 13°C. RNAi-induced phenotypes were observed under a light microscope and the number of individuals that resembled the imm mutant were scored in lots of 400 individuals each for six experimental replicates. Control treatments were carried out in the same manner using an siRNA directed against a transcript that is not found in Ectocarpus sp. gametes (GFP).
RNA-seq analysis was carried out to compare the transcriptome of the imm mutant with wild-type individuals corresponding to either the sporophyte or gametophyte generations of the life cycle. Biological replicates (duplicates) were cultivated in 90 mm Petri dishes in natural sea water supplemented with 0.01% Provasoli enrichment (Starr and Zeikus, 1993) under a 12 h light:12 h dark cycle of white fluorescent light (10-30 mol m−2 s−1 photon fluence rate). Total RNA was extracted from between 0.05 and 0.1 g of tissue as described previously (Coelho et al., 2011). Total RNA was quantified and cDNA, synthesised from an oligo(dT) primer for each replicate, was independently fragmented, prepared and sequenced by Fasteris (Plan-les-Ouates, Switzerland) using an Illumina HiSeq 2000 platform to generate 100 bp single-end reads. The quality of the sequence data was assessed using the FASTX toolkit (http://hannonlab.cshl.edu/fastx_toolkit) and the reads were trimmed and filtered using a quality threshold of 25 (base calling) and a minimal size of 60 bp. Only reads in which more than 75% of the nucleotides had a minimal quality threshold of 20 were retained. Filtered reads were mapped to the Ectocarpus sp. genome (Cock et al., 2010), available at ORCAE (Sterck et al., 2012), using TopHat2 with the Bowtie2 aligner (Kim et al., 2013), and the mapped sequencing data were processed with HTSeq (Anders et al., 2014 preprint) to count the numbers of sequencing reads mapped to exons. Expression values were represented as transcripts per kilobase million (TPM). Genes with TPM<1 in all samples were considered not to be expressed.
Heat maps schematically representing gene transcript abundances were generated using the log2 transformed TPM values centred by gene in Cluster 3.0 (de Hoon et al., 2004) and Treeview 1.1.6 (Saldanha, 2004).
PCA of transcript abundances was carried out using log2 transformed TPM expression values in R (prcomp) and visualised with the rgl package for R (R Development Core Team, 2009). Genes with TPM<1 were considered not expressed and their log2 values were set to 10−5 to remove noise from the data. Only genes with divergent expression patterns were used for the whole-transcriptome PCA analysis (4381 genes). These genes were chosen based on a calculation that estimated expression variance across all libraries (ΔTPM=max TPM−min TPM) and the third quartile (harbouring genes with ΔTPM>32) was chosen for the PCA calculation.
Identification and analysis of the Ectocarpus sp. EsV-1-7 domain gene family
Ectocarpus sp. EsV-1-7 domain proteins were identified by iteratively blasting IMM and other EsV-1-7 domain protein sequences against the predicted proteome (Blastp). Clusters of closely related genes within the Ectocarpus sp. EsV-1-7 domain family were identified by generating a similarity network based on comparisons of the non-repeat regions of the proteins with the EFI-EST similarity tool (http://efi.igb.illinois.edu/efi-est/index.php).
Searches for IMM orthologues and members of the EsV-1-7 family in other genomes
Saccharina japonica EsV-1-7 domain genes were identified either by blasting Ectocarpus sp. protein sequences against the predicted proteome (Blastp) or against the S. japonica genome sequence (tBlastn) (Ye et al., 2015). The coding regions of novel EsV-1-7 domain genes detected in the genome (19 genes) were assembled using GenomeView (Abeel et al., 2012) and the publically available genome and RNA-seq sequence data (Ye et al., 2015). The same approach was used to improve the gene models for five of the EsV-1-7 domain genes previously reported by Ye et al. (2015), as indicated by adding ʻmod' for modified to the protein identifier. Orthology between Ectocarpus sp. and S. japonica EsV-1-7 family proteins was determined using reciprocal BLAST analysis combined with manual comparisons of protein alignments.
To analyse the rate of evolution of the EsV-1-7 family of proteins, a set of 9845 orthologous pairs was first identified by comparing the complete predicted proteomes of Ectocarpus sp. and S. japonica and retaining reciprocal best Blastp matches. Global percentage identities based on alignments of the full protein sequences were then calculated for pairs of orthologous proteins using EMBOSS Needle (Li et al., 2015), and the set of percentage identities for the 34 orthologous pairs of EsV-1-7 proteins was compared with that of the complete set of 9845 orthologue pairs. To compare the EsV-1-7 proteins with known fast-evolving proteins, a similar analysis was then carried out with a set of 905 Ectocarpus sp. sex-biased genes that have one-to-one orthologues in S. japonica (Lipinska et al., 2015). The statistical significance of differences between datasets was evaluated with a Wilcoxon test and bootstrap resampling with replacement (10,000 replicates). EMBOSS Needle was also used to calculate shared identity between individual domains of orthologous EsV-1-7 family proteins from Ectocarpus sp. and S. japonica.
Searches for proteins related to IMM from additional species were carried out against the NCBI, Uniprot and oneKP databases using both Blastp and HMMsearch (EMBL-EBI), the latter using both an alignment of Ectocarpus IMM homologues and an alignment of brown algal IMM orthologues. In addition, HMMsearch and tBlastn or Blastp searches were carried out against the following genomes and complete deduced proteomes: Thalassiosira pseudonana (diatom; Thaps3 assembled and unmapped scaffolds, http://genome.jgi-psf.org/Thaps3/Thaps3.download.ftp.html) (Armbrust et al., 2004), Phaeodactylum tricornutum (diatom; Phatr2 assembled and unmapped scaffolds, http://genome.jgi-psf.org/Phatr2/Phatr2.download.ftp.html) (Bowler et al., 2008), Aureococcus anophagefferens (Pelagophyceae; http://genome.jgi-psf.org/Auran1/Auran1.download.ftp.html) (Gobler et al., 2011), Emiliana huxleyi (haptophyte; http://genome.jgi.doe.gov/Emihu1/Emihu1.download.ftp.html) (Read et al., 2013), Chlorella variablis NC64A (chlorophyte; http://genome.jgi.doe.gov/pages/dynamicOrganismDownload.jsf?organism=ChlNC64A_1) (Blanc et al., 2010), Monoraphidium neglectum SAG 48.87 (chlorophyte; NCBI RefSeq assembly GCF_000611645.1) (Bogen et al., 2013) and Bathycoccus prasinos (chlorophyte; http://bioinformatics.psb.ugent.be/orcae/overview/Bathy) (Moreau et al., 2012). Finally, iterative searches using EsV-1-7 domain proteins from phylogenetically distant species were carried out against the NCBI and Uniprot databases to identify distantly related members of this gene family.
Phylogenetic trees were constructed based on ClustalW-generated alignments using the maximum likelihood approach (PhyML) implemented in Seaview (Gouy et al., 2010).
Sample sizes were chosen to allow adequate downstream statistical analysis.
We thank Toshiki Uji for providing RNA-seq data.
J.M.C. and S.M.C. designed the research. N.M., A.F.P., F.L. and D.S. performed the positional cloning and characterisation of the IMM gene. D.S., S.M.C., M.S. and A.H. developed and applied the RNAi knockdown approach. A.L., M.-M.P. and J.M.C. analysed data. J.M.C. wrote the article.
This work was supported by the Centre National de la Recherche Scientifique; Agence Nationale de la Recherche (project Bi-cycle ANR-10-BLAN-1727 and project Idealg ANR-10-BTBR-04-01); Interreg Program France (Channel)-England (project Marinexus) ; Université Pierre et Marie Curie; and the European Research Council (grant agreement 638240). F.L. and M.-M.P. were supported by grants from the China Scholarship Council and the Conseil Régional de Bretagne (SAD Program), respectively.
RNA-seq data from this article are available at the NCBI Sequence Read Archive under accession number SRP037532 (see Table S8 for details).
The authors declare no competing or financial interests.