Naegleria gruberi is a single-celled eukaryote best known for its remarkable ability to form an entire microtubule cytoskeleton de novo during its metamorphosis from an amoeba into a flagellate, including basal bodies (equivalent to centrioles), flagella and a cytoplasmic microtubule array. Our publicly available full-genome transcriptional analysis, performed at 20-minute intervals throughout Naegleria differentiation, reveals vast transcriptional changes, including the differential expression of genes involved in metabolism, signaling and the stress response. Cluster analysis of the transcriptional profiles of predicted cytoskeletal genes reveals a set of 55 genes enriched in centriole components (induced early) and a set of 82 genes enriched in flagella proteins (induced late). The early set includes genes encoding nearly every known conserved centriole component, as well as eight previously uncharacterized, highly conserved genes. The human orthologs of at least five genes localize to the centrosomes of human cells, one of which (here named Friggin) localizes specifically to mother centrioles.
Naegleria gruberi grows as an amoeba without flagella, centrioles or even cytoplasmic microtubules; it relies on an actin-based cytoplasmic cytoskeleton for chemotaxis and motility, and its mitotic spindle is contained within an intact nuclear envelope (Fulton, 1970; Fulton, 1977; Fulton and Dingle, 1971). However, when exposed to stressors such as changes in temperature or nutrient availability, Naegleria rapidly differentiates into a flagellate, forming a complete cytoplasmic microtubule cytoskeleton from scratch (Fig. 1) (Fulton and Dingle, 1967). This differentiation occurs synchronously – approximately 90% of cells assemble basal bodies (structures equivalent to centrioles) within a 15-minute window, followed by flagella approximately 10 minutes later (Fig. 1) (Fulton and Dingle, 1971). Although Naegleria assembles basal bodies de novo, protein incorporation occurs in the same order as that occurring during assembly of human centrioles (Fritz-Laylin et al., 2010a). The evolutionary distance of Naegleria from animals means that genes shared between Naegleria and humans were probably present in the ancestor of all eukaryotes (Cavalier-Smith, 2002; Ciccarelli et al., 2006) (for a review, see Fritz-Laylin et al., 2010b). Thus, Naegleria differentiation affords a unique opportunity to study ancestral features of centriole and flagellum assembly.
Interphase animal cells contain numerous microtubules emanating from microtubule organizing centers (MTOCs) called centrosomes. Centrosomes contain centrioles that are primarily composed of nine microtubule triplets, and the surrounding amorphous pericentriolar material (PCM) that anchors cytoplasmic microtubules. Centrioles are called basal bodies when they are used to organize axonemes, the microtubule core of eukaryotic cilia and flagella. These whip-like structures propel single-celled organisms and move fluids within multicellular organisms. Metazoan cells also have nonmotile cilia that function as ‘cellular antennae’ by gathering information about the surrounding environment using their varied signaling receptors (Marshall and Nonaka, 2006).
Proteomic analyses indicate that centrosomes and basal bodies contain many of the same proteins, a large number of which are thought to be functional components of centrioles (Andersen et al., 2003; Keller et al., 2005; Kilburn et al., 2007). However, only a handful of these proteins have been characterized functionally (Strnad and Gönczy, 2008). This is in part due to the technical difficulties associated with studying centriole assembly in most organisms. First, new centrioles usually assemble in association with a mature centriole, hindering proteomic characterization of assembly intermediates. Second, centriole assembly is usually tied to the cell cycle, rendering it difficult to distinguish centriole-specific genes from other induced cell cycle genes. And finally, de novo assembly (where centrioles are built in the absence of preexisting ones) usually occurs in a single cell or embryo, making proteomic or microarray-based approaches unfeasible. Here, we used the synchronous de novo basal body assembly pathway of Naegleria to overcome these technical roadblocks and identify genes used specifically for building basal bodies and flagella.
Results and Discussion
Flagella and basal body gene transcripts are induced with different kinetics
We isolated total RNA at 20-minute intervals during Naegleria differentiation (at 0, 20, 40, 60 and 80 minutes; Fig. 1) from three biological replicates (supplementary material Fig. S1A). The relative abundance of transcripts from each time-point was quantified using custom full-genome Naegleria DNA microarrays. Approximately 24% of Naegleria genes are induced at least twofold, and an additional 39% are reduced by at least 50% during the amoeba-to-flagellate transition (4065 and 6484 genes, respectively; P<0.01, after correction for multiple testing). Differentially regulated genes include those involved in stress responses (including Hsp20 and Hsp 90; data not shown) and core metabolism (including glycolysis, Krebs cycle and pyruvate–acetate metabolism; data not shown), as well as cytoskeletal components.
Only a fraction of the thousands of induced genes are likely to be microtubule related. To aid in our search for uncharacterized and evolutionarily conserved centriole proteins, we focused on genes found in Naegleria and other flagellates but missing in non-flagellated organisms [the flagellar motility (FM) gene set (Fritz-Laylin et al., 2010b)]. FM members include genes specific to basal bodies and flagella but exclude genes such as that encoding α-tubulin that are also used by organisms without flagella. To permit analysis of general microtubule proteins involved in basal body and flagella formation, we added Naegleria homologs of known microtubule cytoskeleton proteins (Fritz-Laylin et al., 2010b). Finally, we also added 63 genes conserved in organisms that undergo amoeboid movement and missing in organisms that do not undergo amoeboid locomotion [the amoeboid motility (AM) gene set] (Fritz-Laylin et al., 2010b), to serve as a specificity control.
Overall, 78% of the FMs and 60% of the AMs have at least twofold induction or repression, respectively (P<0.01, after correction for multiple testing), providing large-scale confirmation of previous evidence that Naegleria differentiation is controlled at the transcriptional level (Lai et al., 1988; Levy et al., 1998). We next investigated whether the timing of gene expression was linked to function.
Cluster analysis of the expression data for these 310 genes (the AM and FM gene sets, and Naegleria homologs of known microtubule genes) resulted in five major gene clusters (A–E; Fig. 2). Clusters A and C consist primarily of genes found in the FM gene set and have increased expression during differentiation. However, the genes in cluster A reach peak expression levels by 20 minutes and begin decreasing in expression by 40 minutes, whereas the expression of genes in cluster C peaks by 40 minutes and remains high through to 80 minutes. Manual inspection revealed that the cluster with earlier expression contains many known centriole genes (Table 1), whereas the later expression cluster contains flagella genes (supplementary material Table S1). The general induction of basal body genes before flagella genes agrees with the fact that Naegleria assembles its basal bodies before it assembles its flagella (t=55 and t=65 minutes, respectively) (Fig. 1) (Fritz-Laylin et al., 2010a; Fulton and Dingle, 1971).
Centriole-enriched gene cluster
The 55 genes found in the centriole gene cluster include Naegleria homologs of seven genes whose products are thought to be required for assembly of the centriole or basal body: ε-, δ- and η-tubulin, SAS-4 (CPAP), SAS-6, centrin (Cen2) and POC1 (for references, see Table 1). This set represents the majority of components shown to be required specifically for centriole assembly that are conserved outside animals (Carvalho-Santos et al., 2010; Hodges et al., 2010; Strnad and Gönczy, 2008). Other core centriole genes not found in the cluster either have not been identified in the Naegleria genome (PLK4) or were not included in the microarray (BLD10).
The centriole-enriched gene cluster also encodes homologs of microtubule nucleation factors [γ-tubulin, GCP3 and GCP6 (Raynaud-Messina and Merdes, 2007)], as well as proteins required for general microtubule functions, such as the microtubule-severing protein katanin p60, which is known to localize to centrosomes (Hartman et al., 1998). This gene set also includes several genes encoding centrosome-localized proteins of unknown function, and eight completely uncharacterized genes (Table 1).
Axonemal dyneins are large protein complexes containing light, intermediate and heavy chains that slide microtubules past each other to produce flagellar movement. Surprisingly, the centriole-enriched cluster includes nine axonemal dynein light and intermediate chain homologs, as well as a homolog of kintoun (PF13) that is required for assembly of dynein arm complexes (Omran et al., 2008). By contrast, Naegleria dynein heavy chain genes are expressed later along with other flagella-specific genes. Assembly of dynein light and intermediate chain complexes can be genetically uncoupled from assembly of the dynein heavy chain in Chlamydomonas (Omran et al., 2008). Although there are many possible reasons for the early expression of dynein light and intermediate chains, but not heavy chains, it is possible that Naegleria pre-assembles flagellar intermediate and light chain dyneins before incorporating heavy chains, before flagellar outgrowth.
Flagella-enriched gene cluster
The flagella-enriched gene cluster contains 82 genes (supplementary material Table S1), including genes encoding proteins used for transporting proteins to the base of the growing flagellum (BBS components BBS1–BBS5 and BBS7–BBS9) and within the flagellum to its growing tip (FLA3, kinesin 2, IFT20, IFT52, IFT57, IFT80, IFT88, IFT122 and IFT140), as well as structural components of the flagellum itself [including PF20 and PF16, RSP4 and Rib72 (Pazour et al., 2005)]. This gene set also includes 23 FM genes with homologs found in the Chlamydomonas flagella proteome (Pazour et al., 2005) but which are otherwise uncharacterized. Together, these data suggest that these proteins are probably core components of eukaryotic flagella and therefore prime candidates for future functional analyses.
To validate the putative flagellar components, we conducted a proteomic analysis of Naegleria flagella. We purified the flagella of ~4×108 flagellate cells using low-speed centrifugation followed by a sucrose step-gradient. The resulting sample contained flagella and no visible cell bodies and comprised largely two proteins of a size similar to that of α- and β-tubulin (supplementary material Fig. S1, panel B), as is typical for clean flagellar preparations (e.g. Kowit and Fulton, 1974). MUDPIT mass spectrometry analysis of the sample identified 415 proteins (supplementary material Table S2).
Of the 82 genes in the flagellar-enriched gene cluster, 23 were also identified in our proteomics analysis (supplementary material Table S1), indicating that they are likely to be structural components of the flagellum itself (in contrast to proteins that might be required for flagellar function but are located within the cell body). Included in this overlap are seven flagellar-associated proteins (FAPs), which were identified in the proteomic analysis of Chlamydomonas flagella (Pazour et al., 2005) but remain otherwise uncharacterized. These proteins are therefore likely to be ancestral structural flagella components. The Naegleria flagellar proteome also includes a number of previously undescribed proteins (supplementary material Table S2), some of which might represent uncharacterized flagellar proteins.
Verification of putative centriole genes
The centriole-enriched gene cluster includes eight genes that have not previously been localized or otherwise characterized, which we refer to as ‘putative conserved centriole components’ (pCCCs; Table 1). Because orthologs of all centrosome-localized pCCCs can be found in a wide diversity of eukaryotes (supplementary material Table S3), they were probably present in the eukaryotic ancestor. To determine whether the pCCCs are likely to be centriole components, we transiently expressed N- and C-terminally GFP-tagged human orthologs of each pCCC in U2OS and HeLa human cell lines and used antibodies recognizing γ-tubulin to highlight centrosomes. To the eight unknown gene products, we added one whose homolog localizes to the base of the cilia of Caenorhabditis elegans (B9D2) and one that has only very recently been characterized in human cells (MOT52, also known as FOR20 or BBC20) (Sedjai et al., 2010). Five of the ten tagged proteins showed either diffuse cytoplasmic GFP or bright foci likely to be inclusion bodies (data not shown). This nonspecific localization neither confirms nor denies a possible centriole function. However, the remaining five localized within or near centrosomes using both N- and C-terminal GFP tags (Fig. 3A) and are described below.
First, MOT52 is found only in organisms with motile flagella (Merchant et al., 2007), and its homolog (BBC20) was found in the Tetrahymena basal body proteome (Kilburn et al., 2007). Recently, the human ortholog (FOR20) was reported to localize to pericentriolar satellites and to be involved in ciliary assembly (Sedjai et al., 2010). FOR20 is predicted to have a FOP dimerization domain (Pfam domain PF09398), required for the centrosomal localization of the FOP protein (Mikolajka et al., 2006). In transient transfections of both U2OS (Fig. 3A) and HeLa cells (data not shown), GFP-tagged FOR20 localized to multiple foci near centrosomes. A similar localization was reported using an antibody against FOR20 (Sedjai et al., 2010), thus validating our GFP tagging approach.
Second, B9D2 contains a B9 domain. The C. elegans homolog, TZA-1, localizes to the transition zone at the base of the cilium (Williams et al., 2008). Although the B9 domain has no known function, it is found in several proteins known to be localized to the centriole and/or basal body, including MKS1 (Dawe et al., 2007). Human B9D2 localized to centrosomes of U2OS cells, along with scattered foci throughout the cytoplasm (Fig. 3A).
Third, POC11 shows good conservation in many eukaryotes but has no identifiable domains other than a coiled-coil region. Localization of the human homolog (CCDC77) resulted in bright punctate spots within centrosomes of both U2OS and HeLa cells (Fig. 3A and data not shown, respectively), suggesting that POC11 represents a new family of centrosome proteins.
Fourth, SSA3 was so named for its predicted function in both motile and nonmotile flagella [SSA stands for ‘sensory, structural and assembly’ (Merchant et al., 2007)]. SSA3 has a conserved central region containing an ‘ELMO/CED12’ domain (Pfam domain PF04727), found in proteins that facilitate cytoskeletal rearrangements (Gumienny et al., 2001). SSA3–GFP-expressing cells contained diffuse cytoplasmic GFP, as well as centrosomal GFP in a small percentage (4%) of transfected cells that also displayed relatively small γ-tubulin foci (Fig. 3A). As γ-tubulin foci vary in size during the mammalian cell cycle (with small foci at G1- and early S-phases), SSA3 might localize to centrosomes in a cell-cycle-dependent manner.
Fifth, Friggin was originally named MOT37 for its predicted function in motile flagella (Merchant et al., 2007) and contains leucine-rich repeats (LRRs), which typically mediate protein–protein interactions [Pfam clan CL0022 (Kobe and Deisenhofer, 1994)]. Unexpectedly, the 542-residue human ortholog of MOT37 localized to only one of two γ-tubulin foci in both U2OS (Fig. 3A) and HeLa cells (data not shown), suggesting that it might be a component specific to either mature or immature centrioles.
Centrioles develop over two cell cycles, acquiring the basic nine-triplet pinwheel structure during the first cell cycle and various appendages that allow it to function as a basal body for axonemal assembly during the second cycle. Several gene products have been shown to be involved in the assembly of appendages (Chang et al., 2003; Gromley et al., 2003; Lange and Gull, 1995; Mogensen et al., 2000; Ou et al., 2002), only one of which, ε-tubulin (Chang et al., 2003), is conserved outside animals and likely to be ancestral to all extant eukaryotes.
To investigate whether MOT37 is a component of either mother or daughter centrioles, we expressed GFP–MOT37 and stained cells with an antibody recognizing the mother centriole component cenexin (Lange and Gull, 1995). GFP–MOT37 consistently colocalized with cenexin (Fig. 3B), indicating that MOT37 is a mother-centriole-specific protein that we predict is involved in the developmental transition from immature to mature centrioles. Because MOT37 probably represents a second ancestral mother centriole protein, we have named this eukaryotic protein family ‘Friggin’ after Frigg, the Norse goddess of motherhood.
Understanding how centrioles and flagella assemble and function requires a full inventory of components. Previous studies have used proteomic approaches to attempt to identify a complete parts list for centrioles (Andersen et al., 2003; Keller et al., 2005; Kilburn et al., 2007) or flagella (for a review, see Inglis et al., 2006). Theoretically, proteomic analyses can identify all stably localized proteins, including those that are species specific or required for unrelated biological functions. By contrast, our analysis has identified comprehensive sets of genes required specifically for centriole and flagellar function, independent of their localization. Of eight previously uncharacterized genes that we predict are involved in centriole assembly, at least three have human orthologs with a centrosome-related localization. The conservation of these proteins in both Naegleria and human, probably spanning over a billion years of eukaryotic evolution (Brinkmann and Philippe, 2007), indicates that these proteins are important for centriole function. Our study adds significantly to the list of conserved centriole components, including an additional protein (Friggin) that is apparently specific to mature centrioles.
Our analyses extend previous observations (Fulton et al., 1995; Levy et al., 1998) of two major programs of transcription during Naegleria differentiation: an early round of transcription of basal body genes and a later round of flagellar genes (Fig. 2), a timing that mirrors the assembly of basal bodies before flagella (Fig. 1). Although we have limited our analysis to centriole and flagella genes, this represents only one aspect of the Naegleria amoeba-to-flagellate transition. A cursory analysis indicates that genes from other core pathways, including basic metabolism and the stress response, are also regulated differentially. We have deposited our microarray data in the NCBI Gene Expression Omnibus (Edgar et al., 2002) under GEO Series accession number GSE21527 and encourage other scientists to take advantage of this rich data set.
Materials and Methods
Naegleria differentiation and RNA isolation
N. gruberi strain NEG grown on Klebsiella was differentiated three separate times using standard protocols (Fulton, 1970). Synchrony was estimated by counting the percentage of flagellates after fixing in Lugol's iodine (Fulton and Dingle, 1967), using a phase-contrast microscope with a ×40 objective (n>100 for each time-point). 107 cells were harvested at each time-point, and RNA extracted using Trizol reagent (Invitrogen), purified using RNAeasy (Qiagen), treated with Turbo DNAse (Ambion), and repurified with RNAeasy (Qiagen), according to the manufacturers' instructions. RNA purity was verified by means of gel electrophoresis (supplementary material Fig. S1) and spectrophotometry.
NimbleGen expression oligoarrays
The N. gruberi whole-genome expression oligoarray version 1.0 (NimbleGen Systems) comprises 182,813 probe sets corresponding to 15,777 gene models predicted on the N. gruberi genome sequence version 1.0 (Fritz-Laylin et al., 2010a), and an additional 963 open reading frames (ORFs) identified in intergenic regions. For each gene, 11 unique 60-mer oligonucleotide probes were designed by NimbleGen Systems. The Naegleria V1.0 oligoarray is fully described at the Gene Expression Omnibus (GEO) (Edgar et al., 2002) under accession number GSE21527.
Preparation of samples, hybridization and scanning were performed by NimbleGen Systems (Madison, WI), following their standard operating protocol. The raw data were subjected to robust multi-array analysis (RMA) (Irizarry et al., 2003), quantile normalization (Bolstad et al., 2003) and background correction, as implemented in the NimbleScan software package, version 2.4.27 (Roche NimbleGen). Reproducibility between biological replicates was inspected using MA [log-intensity ratios (M) versus log-intensity averages (A)] and scatter plots of log intensities (supplementary material Fig. S1), and P-values were calculated in a simple paired-data comparison model and were corrected for multiple testing using the BH (false discovery rate controlled) procedure, all within the R statistical package (http://www.r-project.org/).
The log-transformed expression data for the 310 cytoskeleton-related genes were subjected to gene normalization followed by hierarchical clustering, with centered correlation and complete linkage in the Cluster program (Eisen et al., 1998).
Proteomics of Naegleria flagella
Flagella were isolated using published methods (Kowit and Fulton, 1974) and mass spectrometry performed by the Vincent J. Coates Proteomics/Mass Spectrometry Laboratory at UC Berkeley, CA. A nano LC column was packed in a glass capillary, of 100 μm inner diameter, with an emitter tip. The column comprised 10 cm of Polaris c18 5-μm packing material (Varian), followed by 4 cm of Partisphere 5 SCX (Whatman). The column was loaded using a pressure bomb and washed extensively with buffer A [5% acetonitrile, 0.02% heptaflurobutyric acid (HBFA)], then directly coupled to an electrospray ionization source mounted on a Thermo-Fisher LTQ XL linear ion-trap mass spectrometer. An Agilent 1200 HPLC delivering a flow rate of 30 nl/min was used for chromatography. Peptides were eluted using a 14-step MudPIT procedure (Washburn et al., 2001), using buffer A, buffer B (80% acetonitrile, 0.02% HBFA), buffer C (250 mM ammonium acetate, 5% acetonitrile, 0.02% HBFA) and buffer D (250 mM ammonium acetate, 5% acetonitrile, 0.02% HBFA, 500 mM ammonium acetate). The programs SEQUEST and DTASELECT were used to identify peptides and proteins from the Naegleria genome (Eng et al., 1994; Tabb et al., 2002).
Localization of pCCCs
The following mammalian cDNAs from the human ORF collection in the form of Gateway entry vectors were purchased from Open Biosystems (Rual et al., 2004): B9D2 (CV025994), MOT52 (FOR20; EL735575), POC16 (EL737049), FM14 (EL735863), SSA3 (EL735155), MOT39 (CV023936) and TECT3 (EL736819), and these were verified by sequencing. The following cDNAs (and corresponding accession numbers) were obtained from Open Biosystems: POC11 (BC006444), MOT37 (BC016439) and LRRC6 (BC047286). Each ORF was amplified (primer sequences available upon request), transferred into a Gateway donor vector (pDONR221) and verified by sequencing. All ORFs were then transferred into the C-terminal (pCDNA-DEST47) and N-terminal (pCDNA-DEST53) GFP-tagged Gateway Vectors according to the manufacturers' protocols.
A total of 20,000 U2OS cells (ATCC catalog number HTB-96) was inoculated in 0.5 ml medium [DMEM (GIBCO catalog number 10569) supplemented with 10% FBS, 1% nonessential amino acids, and 1% sodium pyruvate] in 24-well plates containing coverslips. Cells were transfected the next day with lipofectamine 2000, and 14 hours later fixed for three minutes with methanol and prepared for immunofluorescence using standard methods (http://mitchison.med.harvard.edu/protocols/gen1.html), and the following antibodies: monoclonal antibody 20H5 against centrin (antibody 20H5) (Sanders and Salisbury, 1994), used at 1:400, antibody CD1B4 against cenexin (antibody CD1B4) (Lange and Gull, 1995) used at 1:3, and mouse monoclonal antibody against GFP (catalog number 11814460001, Roche) at a 1:500 dilution. Alexa-Fluor-conjugated secondary antibodies were sourced from Invitrogen (Carlsbad, CA) and used at a 1:500 dilution.
Fluorescence deconvolution microscopy
Images were collected with SoftWorX image acquisition software (Applied Precision, Issaquah, WA) on an Olympus IX70 wide-field inverted fluorescence microscope with an Olympus PlanApo ×100, (NA 1.35), oil-immersion objective and Photometrics CCD CH350 camera (Roper Scientific, Tuscon, AZ). Image stacks were deconvolved with the SoftWorX deconvolution software and flattened as maximum projections (Applied Precision, Issaquah, WA).
We thank Lori Kohlstaedt and the Vincent J. Coates Proteomics/Mass Spectrometry Laboratory at UC Berkeley, CA, for help with mass spectrometry experiments, Lani Keller, Juliette Azimzadeh, Hellen Dawe and Tim Stearns for helpful advice, Pierre Gönczy, Rebecca Heald, Scott Dawson, and Chandler Fulton for valuable comments on the manuscript and Keith Gull for the generous gift of the antibody against cenexin. This study was supported by a grant from the University of California Cancer Research Coordinating Committee (CRCC) to W.Z.C. and NIH grant A1054693 to W.Z.C. Article deposited in PMC for release after 12 months.