Epiplasmin C is the major protein component of the membrane skeleton in the ciliate Tetrahymena pyriformis. Cloning and analysis of the gene encoding epiplasmin C showed this protein to be a previously unrecognized protein. In particular, epiplasmin C was shown to lack the canonical features of already known epiplasmic proteins in ciliates and flagellates. By means of hydrophobic cluster analysis (HCA), it has been shown that epiplasmin C is constituted of a repeat of 25 domains of 40 residues each. These domains are related and can be grouped in two families called types I and types II. Connections between types I and types II present rules that can be evidenced in the sequence itself, thus enforcing the validity of the splitting of the domains. Using these repeated domains as queries, significant structural similarities were demonstrated with an extra six heptads shared by nuclear lamins and invertebrate cytoplasmic intermediate filament proteins and deleted in the cytoplasmic intermediate filament protein lineage at the protostome-deuterostome branching in the eukaryotic phylogenetic tree.

Unlike protozoans which display an amoeboid type of cellular organization, ciliates and flagellates do not appear to have made extensive use of spectrins or homologous proteins in the elaboration of their membrane associated cytoskeleton. Available data, mainly derived from immunocytochemical studies, indicate the presence of spectrin in highly specialized and restricted regions of the cell surface such as flagellar membrane (Schneider et al., 1988) or extrusome anchoring sites (Kwiatowska and Sobota, 1992). However, extensive membrane skeletal systems have been identified in ciliates and flagellates. An outstanding example is the epiplasm, a fibrogranular layer which contacts the cytoplasmic side of the innermost cortical membrane and plays an important role in the maintenance of cell surface and whole cell shape (Peck, 1977; Dubreuil et al., 1988).

Only a few studies have led to characterization of epiplasmic proteins at the molecular level, but all demonstrate that these proteins, called articulins, share a common tripartite organization and a repetitive 12 residues-long VPVP- motif in the central core of the protein and more variable terminal domains (Marrs and Bouck, 1992; Huttenlauch et al., 1995; Huttenlauch et al., 1998a). Articulins appear to be epiplasm- specific proteins, i.e., not present in other types of membrane skeletal systems (Bouck and Ngô, 1996).

As far as we know, epiplasmic proteins of Tetrahymena have not yet been sequenced though this ciliate is established as an important system for new experimental applications, including functional genomics. In T. pyriformis, three major proteins, initially called ‘bands’ A, B and C, have been characterized biochemically and immunologically in the epiplasm (Vaudaux, 1976; Williams et al., 1995). In this paper we report on the molecular analysis of ‘band’ C, referred here to as epiplasmin C (EpiC). It is shown that EpiC lacks the canonical, structural features of articulins. Hydrophobic cluster analysis (HCA) was used for thorough analysis of the protein (see Callebaut et al., 1997, for a review on HCA). HCA uses a two-dimensional representation of the sequence where neighboring hydrophobic amino acids form clusters. The shape of the clusters is sensitive to the secondary structure adopted by the residues constituting the clusters. It thus allows efficient detection of similarities between protein sequences sharing less than 25% sequence identity (e.g. Calmels et al., 1998; Girault et al., 1998). It has also proved to be particularly powerful in detecting domain n-plication within sequences (Ayadi et al., 1998; Callebaut et al., 1997; Callebaut and Mornon, 1997). In the case of EpiC, HCA enabled us to delineate 25 repeated structural domains, which were used as queries to scan protein databases. From these comparisons, structural similarities have been evidenced with a protein sub-domain shared by invertebrate intermediate filament (IF) proteins and lamins, the nuclear envelope IF proteins in metazoan cells.

EpiC purification and microsequencing

Tetrahymena pyriformis cells (strain GL-C) were grown as previously described (Bouchard et al., 1998). Enriched cortex extracts were obtained using the Triton High Salt method (Williams et al., 1992) and then submitted to 7.5% preparative SDS-PAGE (Laemmli, 1970). After rapid staining, the band corresponding to EpiC was cut off the gels and transferred onto Immobilon-P membranes (Towbin et al., 1979). The amount of purified EpiC was estimated by comparison with standard molecular mass markers. Microsequencing of membrane bound protein was performed by direct cleavage with CNBr (Andy Brauer; PROSEQ Inc., Salem, USA).

RNA isolation and DNA libraries

T. pyriformis total RNAs were prepared using a guanidium method followed by centrifugation on CsCl gradient. Poly(A) RNAs were obtained by magnetic isolation according to the manufacturer’s instructions (Promega). 3 μg poly(A) RNAs were used to construct a random-primed cDNA library in λgt11 using Promega cDNA synthesis and λgt11 cloning kits. The T. pyriformis DNA macronuclear library cloned in λDash was kindly provided by Dr C. Rodrigues-Pousada (Oeiras, Portugal).

PCR analysis and primers

Amplification templates were purified DNA from either T. pyriformis λgt11 cDNA library or λDASH gDNA libraries. For each library, phages were grown on 25 plates as described in (Sambrook et al., 1989). Plates were then incubated for 1 hour with 5 ml SM buffer and phage DNA subsequently purified according to QUIAGEN Phage MIDI PREP. The sequence EEKLVHM was used to synthesize the sense degenerated primer EPICN: GA(AG)GA(AG)AA(AG)(CT)TI- GTICA(CT)ATG and the corresponding antisense EPICR: CAT(AG)- TGIACIA(AG)(CT)TT(CT)TC(CT)TC. Specific λgt11 primers were LAN (sense): GGTGGCGACGACTCCTGGAGCCCG and LAR (antisense): TTGACACCAGACCAACTGGTAATG. Reaction mixtures for PCR consisted of 2 mM MgCl2, 1 μg template DNA, 100 pmoles degenerated primer, 50 pmoles phage specific primers, 1 Unit Eurobio Taq polymerase in manufacturer’s buffer. PCRs were assessed with different annealing temperatures (ranging from 50°C- 65°C) and primer pairs in all possible combinations. The cycling parameters were: 1 cycle (94°C), 30 cycles (20 seconds, 94°C; 30 seconds, 50°C, 55°C or 60°C; 2 minutes, 72°C) and a final polymerisation step (30 minutes, 72°C). Perfect matching primers were deduced from the amplification product obtained after PCR with degenerated oligonucleotides: (AISLRDK) EPICEFN (sense primer) 5′ GCTATCTCCCTCAGAGACAAGC 3′ and (GELAISL) EPICEFR (antisense primer) 5′ TGAGGGAGATAGCGAGTTCACC 3′. Primers for λDASH were T3 and T7 universal primers.

Screening of the genomic library

Ten replica filters from the T. pyriformis genomic library were prehybridized at 42°C for 4 hours in 50% (w/v) formamide, 5× SSPE, 5× Denhardt’s solution, 0.1% SDS and 100 μg denatured salmon sperm DNA. Hybridization was performed in the same buffer containing 50 ng of 32P-radiolabeled probes (Prime-a-Gene, Promega) for 20 hours at 42°C. The probes were amplification products obtained from λDASH gDNA. Filters were washed twice at low stringency in 1× SSPE, 0.1% SDS at 42°C for 15 minutes and autoradiographed using Kodak X-Omat AR films.

DNA sequencing

The amplification products were subcloned in TA cloning vectors (Promega) and sequenced by the dideoxy chain termination method, using a T7 polymerase kit (Pharmacia). DNA from λDASH clones containing EpiC gene was prepared according to QUIAGEN Phage MIDI PREP. Sequencing was performed directly on λDASH isolated DNA by Genome Express (Grenoble, France). The nucleotide sequence of T. pyriformis EpiC gene has been submitted to the GenBank™ with accession number AF119380.

Hydrophobic cluster analysis

For HCA analysis, the steps are as follows. The sequence is drawn on the surface of a cylinder as a helix whose pitch is that of an alpha helix. The cylinder is then opened and the picture is duplicated to keep the local environment of each residue. Hydrophobic residues (F, I, L, M, V, W and Y) are surrounded to form clusters with a certain connectivity: if two hydrophobic residues are separated by three or less non hydrophobic residues, they belong to the same cluster, otherwise they do not. A proline breaks any cluster. The fact that physical connectivity between residues involved in beta strands or alpha helices is different results in different cluster shapes. Strands are often predicted from vertical clusters, while helices mainly correspond to slightly horizontal ones. It has been demonstrated (Woodcock et al., 1992) that clusters are statistically centered on the secondary structure elements. Bank screening to search for similarities has been accomplished with BLAST (Altschul et al., 1990) and PSI-BLAST (Altschul et al., 1997) on the NCBI server. Output files for BLAST were analyzed and HCA plotted with the Visual BLAST program (Durand et al., 1997). Once a putative multiple alignment is performed through human expertise, three scores are computed for each pair to quantify the relatedness of the domains. Identity score is calculated together with similarity score with matrices from Risler (Risler et al., 1988) or BLOSUM62 (Henikof and Henikof, 1992) and finally an HCA score is derived which gives a hit each time two hydrophobic residues occupy the same position in the alignment. To give a confidence estimate to the alignment, one of the two sequences of the pair is randomized 1000 times and the same three scores are derived each time, thus yielding mean random scores and their related standard deviation for identity, similarity and HCA. The scores corresponding to real sequences are then converted into variables centered around the random mean and reduced by the standard deviation in order to obtain dimensionless Z scores. The higher Z, the more secure the alignment. Comparable scores are computed on the whole alignment instead of pairs, thus producing multiple scores and multiple Z scores (Callebaut et al., 1997).

CNBr cleavage and microsequencing of EpiC

1 μg of membrane bound EpiC was subjected to limited CNBr cleavage providing a single, strong peptide signal in the picomole range, while all other signals were in the few hundred femtomole range. The strong signal was read for 30 Edmann degradation cycles. The resulting amino acid microsequence (GELAISLRDKQALEEKLVHMTQQIEFLSQK) will be referred to as ‘CNBr microsequence’ throughout this paper.

EpiC cloning and sequencing

To determine the entire coding sequence of EpiC, we performed PCRs using total DNA isolated from a T. pyriformis cDNA library as a template for amplification. Primer pairs were then designed between specific λgt11 primers (close to the muticloning sites) and degenerated primers deduced from the CNBr microsequence. The latter were synthesized according to Tetrahymena genetic code. To optimize PCR experiments, primers were synthesized in both orientations. Classical PCRs were then assayed with different annealing temperatures ranging from 50°C to 65°C and primer pairs in all possible combinations. About 200 bp amplification products were evident with EPICR and LAN or LAR primer pairs at 55°C or 60°C annealing temperatures (data not shown). These amplified DNAs were cloned and sequenced. Interestingly, the deduced amino acid sequences were consistent with the CNBr microsequence, which allowed us to design perfect matching primers for a further round of PCR using a T. pyriformis λDASH gDNA library as a template for amplification.

The primer pairs EPICEFN/T3 and EPICEFN/T7 gave the best results. Five amplification products ranging from 250 to 750 bp were obtained at 60°C annealing temperature and three of them were sequenced. Interestingly, each deduced amino acid sequence harbored part of the CNBr microsequence. A possible explanation of these preliminary results was that EpiC gene consisted of repeated sequences, so we decided to clone the entire coding sequence of EpiC to obtain the exact series of the repeated sequence. For this purpose, we further screened the T. pyriformis macronuclear genomic λDASH library with the amplification products obtained earlier as radioactive probe. 10000 phages were screened and three positive clones were selected. With the collaboration of the technical service of Genome Express company, we managed sequencing directly on purified λDASH DNA by primer walking. In this way, the entire coding sequence of EpiC gene was obtained as shown in Fig. 1.

Fig. 1.

Nucleotide and deduced amino acid sequences of T. pyriformis EpiC gene. Underlined amino acid sequences correspond to the conserved part of the repeated CNBr microsequence. Lower case letters indicate the 3′ and 5′ non coding regions.

Fig. 1.

Nucleotide and deduced amino acid sequences of T. pyriformis EpiC gene. Underlined amino acid sequences correspond to the conserved part of the repeated CNBr microsequence. Lower case letters indicate the 3′ and 5′ non coding regions.

It is relevant at this stage to note that codon usage in many ciliates, including Tetrahymena, differs from the universal genetic code by translating either UAA or UAG as glutamine and by the use of UGA as the sole termination codon (Martindale, 1989). Furthermore, genome is A+T rich in Tetrahymena as well as in other ciliates and non coding sequences typically display a level of A+T greater than 70% (Brunk and Sadler, 1990). Percentage A+T was calculated for sequences upstream and downstream of each AUG codon. The AUG codons doublet found at positions 320 and 323 were candidates as starting ones. An open reading frame containing 15 putative UAA or UAG glutamine codons is continuous from these AUG codons until the UAG codon at position 3819. Molecular mass for predicted amino acid sequence corresponding to this open reading frame was 135 kDa, which is that expected for the protein.

Sequence analysis of the 5′ non coding region reveals 73% A+T residues which do not facilitate canonical ‘TATA’ box identification. It is accepted that Tetrahymena genes may lack this signal, using other boxes as reported for instance in yeast (Witt et al., 1993) A polyadenylation signal (AATTAA) could not be identified although such a motif has been demonstrated in several Tetrahymena genes 45 bp or 100 bp downstream of the stop codon (Barahona et al., 1988). Southern-blot analysis indicates that EpiC gene exists as a single copy per Tetrahymena haploid genome. Northern-blot detection reveals a RNA size of approximately 5 kb (data not shown).

Sequence analysis and comparison

The amino acid sequence of EpiC contains a high level of glutamine and glutamic acid which results in an acidic net charge and a calculated pI of 4.9. Moreover, the protein possesses a high content of hydrophobic amino acids (32.7%), very similar to that observed within typical globular domains (Callebaut et al., 1997). Direct analysis of the overall sequence indicates that CNBr microsequence is represented several times in EpC. Near the C terminus of the protein, a stretch of serine residues composes a poly-serine motif (Fig. 1).

The full sequence has been used to search for EpiC homologous proteins in databases either with BLAST (Altschul et al., 1990) or PSI-BLAST (Altschul et al., 1997) algorithms. These searches were unsuccessful, and so we decided to use HCA, which overcomes the limitations of linear lexical analysis. It was fairly easy to evidence 15 repeats in the whole sequence, but with different lengths. It was then obvious that each domain was actually composed of two parts, presumably related because of the common shape of the clusters. Further splitting of these fragments led to 25 modules, each one composed of about 40-50 amino acids. They can be grouped in two families called type I and type II. Fig. 2 reports the succession of these types of domains along the sequence. Once delineated, these 25 domains were then used as queries to scan non redundant data bases with PSI-BLAST. From these scans, similarities have been evidenced with a 42 amino acid sub-domain shared by the lamins, the typical IF proteins that compose the nuclear lamina in metazoans and by cytoplasmic IF proteins characterized in protostome invertebrates such as Caenorhabditis elegans (see Discussion). For all type I domains, when the non redundant bank of the NCBI is scanned, lamins B are retrieved about 80 times for 8 domains as a query. For the values of the parameters that we used, none was retrieved for domains 1, 3, 8 and 24. Under the same conditions with type II domains, lamins B are found with lower scores but lamins A and C are also retrieved. Sequence identity with lamins B is higher than 30% for seven domains of type I over a reasonable length, say more than 40 residues. It can reach to 51% identity between lamin C and domain 23 of type II, but on a smaller length, 27 amino acids.

Fig. 2.

Schematic representation of the EpiC succession of small domains as revealed by HCA analysis. Squares represent type I domains, and circles represent type II domains.

Fig. 2.

Schematic representation of the EpiC succession of small domains as revealed by HCA analysis. Squares represent type I domains, and circles represent type II domains.

HCA plots are represented in Fig. 3 and the corresponding multiple alignments in Fig. 4, for each type of domain as well as lamin and invertebrate IF proteins. Type I domains are slightly more homogenous than type II, and in particular the number of highly conserved residues is larger as can be easily seen on Fig. 4. One must mention that eight repeats of the microsequence previously described belong to the type I successive domains from 10 to 21 (i.e. 10, 12, 13, 14, 16, 17, 19 and 21), thus producing an ankle to the alignment because it covers more than half the length of the domain. Type I is typically 50 amino acids long while type II is slightly smaller, around 40. Broken lines in Fig. 3 separate domains that are more divergent than the rest of the family, as will be discussed below. These domains are located at both ends of the complete sequence and concern domains 1, 2, 3 and 25.

Fig. 3.

HCA plots of the aligned domains of EpiC. (A) HCA plots of the type I domains. (B) Same figure for type II. The sequence is read vertically and the predicted nature of the secondary structure elements is read from the shape of the clusters, thus horizontally. Four residues are represented by special characters, and reported at the bottom of Fig. 3A. ‘Lam-1 human’ is the SWISS-PROT code (Bairoch and Boekmann, 1991) for human lamin B and ‘cif’ is C. elegans cytoplasmic IF protein (EMBL access number: X70834). Domains slightly divergent from the rest of the family are separated by horizontal dashed lines. All domains are numbered as in Fig. 2. Vertical lines delineate the various motifs, A, B, C, D and E. Topologically conserved hydrophobic clusters are colored, as well as residues not hydrophobic but highly conserved. The color code is the same as in Fig. 4 for clarity.

Fig. 3.

HCA plots of the aligned domains of EpiC. (A) HCA plots of the type I domains. (B) Same figure for type II. The sequence is read vertically and the predicted nature of the secondary structure elements is read from the shape of the clusters, thus horizontally. Four residues are represented by special characters, and reported at the bottom of Fig. 3A. ‘Lam-1 human’ is the SWISS-PROT code (Bairoch and Boekmann, 1991) for human lamin B and ‘cif’ is C. elegans cytoplasmic IF protein (EMBL access number: X70834). Domains slightly divergent from the rest of the family are separated by horizontal dashed lines. All domains are numbered as in Fig. 2. Vertical lines delineate the various motifs, A, B, C, D and E. Topologically conserved hydrophobic clusters are colored, as well as residues not hydrophobic but highly conserved. The color code is the same as in Fig. 4 for clarity.

Fig. 4.

Multiple alignement of all domains of EpiC. Type I domains are at the top. Human lamin B and C. elegans cytoplasmic IF protein are also aligned with types I, as well as with type II domains, below. All EpiC domains are numbered as in Fig. 2 and the starting and ending residues are indicated on each line. The five motifs A, B, C, D and E, evidenced in Fig. 3, are represented and separated by vertical lines. Highly conserved residues are bold and drawn on a background of the same color as for Fig. 3. Below each position where they occur, the most frequent amino acid is given. 1 indicates any of the seven hydrophobic residues (V, I, L, M, F, Y, W) and Φ indicates an aromatic hydrophobic residue.

Fig. 4.

Multiple alignement of all domains of EpiC. Type I domains are at the top. Human lamin B and C. elegans cytoplasmic IF protein are also aligned with types I, as well as with type II domains, below. All EpiC domains are numbered as in Fig. 2 and the starting and ending residues are indicated on each line. The five motifs A, B, C, D and E, evidenced in Fig. 3, are represented and separated by vertical lines. Highly conserved residues are bold and drawn on a background of the same color as for Fig. 3. Below each position where they occur, the most frequent amino acid is given. 1 indicates any of the seven hydrophobic residues (V, I, L, M, F, Y, W) and Φ indicates an aromatic hydrophobic residue.

Fig. 3A represents the 13 type I domain HCA plots. Each domain can be divided into four motifs, called A, B, C and D, separated by vertical lines. A color code has been used to help visualize similarities in the shapes of the clusters of the different domains and is the same as in Fig. 4. The A motif is composed of a mosaic cluster, known to correspond generally to a β strand. Strict conservations, contiguous to this cluster, are shaded in light green and concern three residues K/R, D and A. They are found in 9 domains as well as in lamins. Thr, which is a mimetic residue, has been included in the mosaic cluster of A motif in lamins. Upstream of this mosaic cluster, a long loop is present with a short sequence, (V/L/I/T)KRS, used as an anchor to be certain of the starting point of the repeats. It occurs 11 times and in particular around position 10 of domain 1, namely at the N terminal. This is of importance to predict that such a domain can fold by itself, as will be discussed later. The serine in this short sequence, as it is surrounded by basic residues, can be a favorable site for phosphorylation by calcium ion dependent protein kinase C (Kennelly and Krebs, 1991). Such sites occur in all type I domains except domain 8, and in one domain of type II, domain 15. This is an argument in favor of the presence of a fairly long loop at the N terminal side of the A motif. This A motif is absent in all type II domains, and it constitutes the basic difference between the two types. B motif, in blue in Fig. 3A, presents a glutamic or aspartic acid topologically conserved in twelve out of thirteen type I domains. Domain 8 has an asparagine instead. Lamins and invertebrate IF proteins present this D/E conservation, in conjunction with two residues shaded in blue, (D/R)K, present 11 times in type I domains. These are still missing in domain 8 as well as in the first two, domains 1 and 3, which are presumably more external in the 3D structure, and thus more subject to sequence divergence. This is also the reason why they have been separated by a dashed line from the rest of type I domains in Fig. 3A. Type I B motif ends by a triplet, generally QQI (in yellow), present in all EpiC domains except the first one. This triplet is also present in lamins and C. elegans cytoplasmic IF proteins. Type I C motif contains a conserved lysine or arginine. The only case where it is replaced by a serine is lamins. In domain 6, a cystein has been included in the hydrophobic cluster because Cys is a mimetic residue, such as Thr and Ala, which all adopt the behavior of their neighboring residues as regards hydrophobicity (Callebaut et al., 1997). Type I D motif (pale blue in Fig. 3A) is missing in two cases: domain 3 and lamins but is present in C. elegans cytoplasmic IF proteins. Finally a strong anchor constituted of four residues serves as a separator between each domain and the following one. This sequence is mainly QLGD and is found at the C terminal end of domains 8, 12, 13, 14, 16, 17 and 19, as well as with lamins and C. elegans cytoplasmic IF proteins.

HCA alignments of type II domains are plotted in Fig. 3B. B motif, in light blue, contains an aspartic or glutamic acid, conserved at the same position, relative to the cluster, ten times out of the twelve members of the family. In domains 7 and 25 it is replaced by a basic residue, arginine or lysine, respectively. C motif (purple) presents the same cluster shape as type I C motif. The topologically conserved residues linked to this C motif are lysine or arginine in ten cases. Domains 2 and 25 slightly diverge again, probably because they are the domains which are the more exposed to the solvent in the folded protein. For this reason they have been separated from the rest of the family by a dashed line. D motif (blue) contains in five cases the YEXXI sequence as in six type I D motifs. This motif is only missing in domain 2. The last motif, E, has been used to determine whether a domain belongs to type II. We can note in Fig. 3B certain residues of E motif colored in pink. They constitute strong anchors for domains 9, 11, 15, 18 and 20 and partially for domains 2 and 4. This anchor is also present in domain 23 and has been used to assign this particular domain to type II. As domains 23 and 5 share 42% sequence identity with a Z score of 6.8, this latter domain has also been included in type II. Domains 23 and 5 present a long extension at the N terminal side, but not related to type I A motif.

As in the case of type I domains, there is a signature of the end of type II domains, cherry-red colored in Fig. 3B. It presents a similarity with the type I end signature, being composed of the sequence (Q/M/L/N)L(G/N)D in six cases. As the two signatures are alike in type I and II families, C terminal ends have been used to align domains of type I with domains of type II. Thus motifs B, C and D of type I are aligned on motifs B, C and D of type II. The A motif is specific to type I, while the E motif is specific to type II.

The linear multiple alignment of the 25 domains is reported in Fig. 4. Highly conserved positions are shaded with the same color code as in Fig. 3. Human lamin B and C. elegans cytoplasmic IF protein are aligned with type I domains. Below, the twelve type II domains are aligned and the colors are of great help in visualizing the common conserved motifs, shaded for the two families. Of the central type I domains, domain 8 is the most divergent: it lacks some typical characteristics of motifs A and B, and presents an insertion between B and C motifs. Its sequence identity with the other type I domains ranges from 12% identity with domain 3 to 22% identity with lamin. As an illustration, domain 1 ranges from 10% identity with lamin to 34% identity with domain 14, and domain 3 ranges from 12% identity with domain 8 to 32% identity with domains 6, 10, 12 and 14. The highest scores for lamins and invertebrate IF proteins occur at 37% identity with domain 12 (Z score of 6) and at 23% identity with domain 24 (Z score of 3.4) respectively. The rule of thumb established on coherent but divergent families indicates that these sequence identities and their related Z scores are sufficient to ensure the consistency of the family. The type II domains that differ most from the rest of the family are domains 2 and 25, here also presumably the most external. Domain 2 has a maximum sequence identity with domain 11, 18 and 20 at 31% (Z score of 4) and domain 25 has a maximum of 16% sequence identity (Z score of 1.9) with domain 5. Once the families of each type have been properly defined and aligned, we can quantify the alignment between the two types. For this purpose, we have chosen two representative members of each type, domain 14 for type I and domain 18 for type II. From the multiple alignment of Fig. 4, this pair of domains shows 26% sequence identity, 74% similarity through BLOSUM62 matrix, 69% HCA score over 31 residues. The related Z scores are 3.4, 3.7 and 5.3, respectively. For the same length of alignment, domains 16 and 9 present 42% identity with a Z score of 6.4. These values support the proposed alignment and the relation evidenced between the two types of domains.

We can try to determine how these repeat domains can fold. There is no tandem repeat, and no symmetry has been evidenced. Nevertheless, some rules seem to appear in the junctions between domains. As shown in Table 1, there are ten junctions between types I and II, nine junctions between types II and I, three junctions between I and I, and two junctions between II and II. In this table, end sequences previously defined in the multiple alignment as end anchors are reported (colored in cherry-red in Figs 3, 4), as well as the residues in the loop preceding the first conserved pattern of the second domain of the junction. Junctions separating two type I domains are very consistent. They all have the same end marker, QLGD and the same loop sequence, EN. The type I type II junctions do not all present an end marker; half of them are missing. (R/K/Q)D is mainly found in the loop sequence. Type II type I junctions, in contrast, are characterized by a complete set of end markers, and also by short loop sequences, mainly E(N/S/G). Finally the type II type II junctions present both an end marker and a long loop of 16 residues of high similarity. This is important to show that domain connections present rules that can be evidenced in the sequence itself. This enforces the validity of the splitting of the domains that we propose. Lamins and invertebrate cytoplamic IF proteins exhibit similar markers (QLAD and RIKQ) and potential loop initiation signatures (ET and EN, respectively).

Table 1.

Sequence analysis of the junctions between EpiC successive domains

Sequence analysis of the junctions between EpiC successive domains
Sequence analysis of the junctions between EpiC successive domains

In this study, we report on the molecular features of EpiC, the major protein component of the cell membrane associated cytoskeleton (epiplasm) in the ciliate T. pyriformis. Analysis of the nucleotidic sequence indicates an open reading frame that encodes a deduced 1164 amino acid polypeptide with a calculated molecular mass of 135,8 kDa and a pHi of 4.9. It is noteworthy that these values are in close agreement with EpiC electrophoretic coordinates as determined by one- and two-dimensional gel electrophoresis (Vaudaux, 1976; Williams et al., 1992; Bouchard et al., 1998, and unpublished data). We note however that the coding sequence begins with a doublet of AUG codons which makes it difficult to decide on the real initiating codon. Direct N-terminal microsequencing of EpiC would be required to obtain information on this particular point. We also note that a 18 amino-acid stretch of CNBr microsequence is repeated eight times, explaining why a strong peptide signal was obtained after cleavage in the picomole range. CNBr cleavage generally occurs after methionines, although other cleavage sites may occur especially when CNBr is used, as in this study, under acidic conditions. Asp bonds are acid labile on their C-terminal side, so Asp-Pro and to a lesser extent Asp-Gly cleavages may occur (A. Brauer, personnal communication). The bend generated by proline residues in the peptidic chain probably facilitates the reaction by making the Asp-Pro bond more accessible to the cleavage reagent. One must assume that in the case of EpiC, CNBr cleaved preferentially between Asp (D) and Gly (G) in position 9 and 10 respectively in type I domains numbered 10, 12-14, 16, 17, 19 and 21 (see below).

Sequence analysis using the full sequence to scan protein databases indicate EpiC to be a previously unrecognized protein. In these conditions, BLAST analysis extracted almost exclusively proteins with predicted coiled-coil structure, probably because of the extent of alpha helices in EpiC (see below). Furthermore, these hits were found on fairly short sequences, typically of a few tens of residues. For this reason it was necessary to restrict queries to stretches of sequences relevant to EpiC overall molecular organization. HCA proved useful in this by delineating 25 domains of 40 amino acids each. Based on sequence similarities, these domains were split into two families called type I and type II. Within one type, similarities are fairly high, sometimes more than 80% identity. Both types of domains are also related, as evidenced from the common shape of the hydrophobic clusters in type I and II domains, generating three motifs called B, C and D (Fig. 3). Sequence identities between type I and II domains are below 25%. It is difficult to hypothesize on the 3D organization of a polypeptide containing numerous repeats. Type II domains are longer than type I domains at the C-terminal end, while they miss the A upstream motif, containing a long loop with a putative phosphorylation site. The secondary structures predicted by HCA are essentially helices, although one cannot exclude other secondary structures, in particular in the A motif. The origin of the domains is certain, because a conserved stretch of four residues, present at the beginning of most type I domains, occurs at position 10 in the full sequence.

As already mentioned, a few epiplasmic proteins have been characterized molecularly. Determination of the sequence of articulins, first from euglenids (Maars and Bouck, 1992) and then from the ciliate Pseudomicrothorax dubius (Huttenlauch et al., 1995; Huttenlauch et al., 1998a) has shown evolutionary conservation of a central VPV-repeat core domain that is the hallmark of this family of membrane skeletal proteins. It has also confirmed antigenic cross-reactivity between euglenids and P. dubius epiplasms previously demonstrated by western blot and immunoelectron microscopy (Vigues et al., 1987). EpiC is remarkable in that its structural features are quite distinct from those of articulins. This finding correlates with the fact that antibodies raised against articulins do not bind to Tetrahymena bands A, B and C even when pan-specific anti- articulin antisera were tested for antigenic cross-reactivity (Huttenlauch et al., 1998b). Conceptually, this lays to rest the dogma that epiplasmic membrane skeletons are predominantly constituted of proteins of the articulin-type. This is not to say that articulins do not exist in Tetrahymena: very recently one group has reported evidence for articulin cross-reactivity on western blots of T. pyriformis cortical preparations (Huttenlauch et al., 1998b). However, the extent of antigenic cross-reactivity is limited and involves minor protein components whose assignment to the epiplasm is still uncertain. An analogous situation involving major epiplasmic proteins distinct from articulins probably also exists in the ciliate Paramecium (Coffe et al., 1996). EpiC bears no structural resemblance to Paramecium epiplasmic proteins. In T. pyriformis, EpiC is present as a 135 kDa protein. Comparative electrophoretic analysis indicate a widespread occurrence of this protein within the genus Tetrahymena together with some variability in molecular mass depending on the strain or species analyzed (Williams, 1986). Likewise, EpiC displays biochemical features reminiscent of bands A and B with which it cross-reacts antigenically (Williams et al., 1987; Bouchard et al., 1998). It is therefore likely that structural homologs of EpiC will be found in a number of tetrahymenids. Although we have not yet made a systematic search for this protein, PCR amplifications followed by HCA of the deduced protein sequences, indicate type I and type II domains in T. thermophila and rostrata (Bouchard in preparation). Although we are aware that constitution of a novel class of epiplasmic proteins will require more development, we anticipate that this and further studies should create a experimental framework to understand how n-plication of a structural domain unit may contribute to protein heterogeneity in the epiplasm of Tetrahymena.

A further fundamental question underlined by this study is the significance of finding a structural domain shared by a family known to display variability in Tetrahymena and one subset of IF-like proteins themselves known to undergo evolutionary variations. A common feature of IF proteins is a tripartite domain structure with a central rod domain characterized by a heptad repeat of hydrophobic amino acids and more variable terminal domains (Steinert and Roop, 1988; Shoeman and Traub, 1995). Interestingly, the IF protein subdomain structurally related to EpiC recurrent domains is an extra six heptads which in vertebrates, distinguishes lamins from cytoplasmic IF proteins (long coil 1b version versus short coil 1b version). Deletion of this subdomain from the central core of cytoplamic IF proteins correlates with protostome/deuterostome branching in the eukaryotic phylogenetic tree (Riemer et al., 1992). We also note that comparative analysis based on the exon-intron organisation of vertebrate lamin genes and genes of cytoplasmic IF proteins argue for an evolution of IF proteins from a lamin-like ancestor (Dodemont et al., 1990; Döring and Stick, 1990; Stick, 1995). The present study therefore indicates a relationship between EpiC repeats, especially type I repeats, and an early evolutionary IF protein subdomain. Although we are still unable to present more definite conclusion, we can state some interesting comparisons. First, both the epiplasm and the nuclear lamina are located underneath a membrane. The epiplasm is tightly associated with the cytoplasmic side of the innermost cortical membrane, i.e. the inner alveolar membrane in Tetrahymena and ciliates in general. Lamins with few exceptions, are universal protein components of the nuclear lamina in metazoan cells. Second, evidence has been reported for a 140 kDa lamin B immunoanalog in bovine desmosomal plaques suggesting that a family of lamin-like proteins may associate with both the nuclear and cell membrane (Cartaud et al., 1990). Finally, and while conserved serine residues in lamins are phosphorylated by protein kinase C, putative phosphorylation sites for this kinase are also present in EpiC type I repeats. As in the case of the lamina, hyperphosphorylation of EpiC by protein kinase C might have implications for the regulation of the structure of the epiplasm and its dynamics during cortical morphogenesis accompanying cell division in Tetrahymena.

In conclusion, sequence analysis has shown EpiC to be a previously unrecognized protein, composed of 25 structurally related domains of 40 amino-acids each. We are presently attempting to demonstrate EpiC homologous proteins in the epiplasm of other tetrahymenids as well as in other protozoans known to contain this type of membrane skeleton. As noted above, the significance of finding an EpiC type I/II related domain in the central core of lamins is not clear. Part of the difficulty stems from the fact that the existence of archetypal IF proteins, i.e. filament-forming proteins displaying a central core with typical heptad repeats, has not yet been settled in protists. With keeping in mind that metazoan IF proteins have probably evolved from a lamin- like ancestor, this leaves the question of the evolutionary origin of lamins. One advantage of our results may be their use in considering the question of lamin progenitors in protists. Finally, now that we have sequenced the gene coding for EpiC, it should be possible to use the tools of molecular genetics to analyze the contribution of this protein to cell surface architecture and to its developmental regulation in Tetrahymena.

The authors gratefully acknowledge Drs C. Rodrigues-Pousada and H. Soares for providing the T. pyriformis macronuclear λDASH library. We are also deeply indebted to Dr J.-C. Courvalin, Institut Jacques Monod (Paris), for helpful discussions and to D. Bayle for technical assistance. This work was supported by CNRS (UMR 6023 and UMRC 7590) and by a grant from the Ministére de l’Education Nationale, de la Recherche et de la Technologie (to P. B.).

Altschul
,
S. F.
,
Gish
,
W.
,
Miller
,
W.
,
Myers
,
E. W.
and
Lipman
,
D. J.
(
1990
).
Basic local alignment search tool
.
J. Mol. Biol
.
215
,
403
410
.
Altschul
,
S. F.
,
Maden
,
T. L.
,
Schaefer
,
A. A.
,
Zhang
,
J.
,
Zhang
,
Z.
,
Miller
,
W.
and
Lipman
,
D. J.
(
1997
).
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
.
Nucl. Acids Res
.
25
,
3389
3402
.
Ayadi
,
L.
,
Callebaut
,
I.
,
Saguez
,
C.
,
Villa
,
T.
,
Mornon
,
J.-P.
and
Banroques
,
J.
(
1998
).
Functional and structural characterization of the prp3 binding domain of the yeast prp4 splicing factor
.
J. Mol. Biol
.
284
,
673
687
.
Bairoch
,
A.
and
Boeckmann
,
B.
(
1991
).
The SWISS-PROT protein sequence data bank
.
Nucl. Acids Res
.
19
,
2247
2249
.
Barahona
,
I.
, Soares. H.,
Cyrne
,
L.
,
Penque
,
D.
,
Denoulet
,
P.
and
RodriguesPousada
,
C.
(
1988
).
Sequence of one alpha- and two beta-tubulin genes of Tetrahymena pyriformis. Structural and functional relationships with other eukaryotic tubulin genes
.
J. Mol. Biol
.
202
,
365
382
.
Bouchard
,
P.
,
Viguès
,
B.
,
Ruchaud
,
M.-H.
and
Ravet
,
V.
(
1998
).
The membrane skeleton of Tetrahymena contains immunoanalogs of GFAP the intermediate filament protein expressed in astrocytes and cells of glial origin
.
Eur. J. Protistol
.
34
,
138
147
.
Bouck
,
G. B.
and
Ngô
,
H.
(
1996
).
Cortical structure and function in Euglenoids with reference to Trypanosomes, Ciliates and Dinoflagellates
.
Int. Rev. Cytol
.
169
,
267
318
.
Brunk
,
C. F.
and
Sadler
,
L. A.
(
1990
).
Characterization of the promoter region of Tetrahymena genes
.
Nucl. Acids Res
.
18
,
323
329
.
Callebaut
,
I.
and
Mornon
,
J.-P.
(
1997
).
The human EBNA-2 coactivator p100: multidomain organization and relationship to the staphylococcal nuclease fold and to the tudor protein involved in Drosophila melanogaster development
.
Biochem. J
.
321
,
125
132
.
Callebaut
,
I.
,
Labesse
,
G.
,
Durand
,
P.
Poupon
,
A.
,
Canard
,
L.
,
Chomilier
,
J.
,
Henrissat
,
B.
and
Mornon
,
J.-P.
(
1997
).
Deciphering protein sequence information through Hydrophobic Cluster Analysis (HCA): current status and perspectives
.
Cell. Mol. Life Sci
.
53
,
621
645
.
Calmels
,
T. P. G.
,
Callebaut
,
I.
,
Lèger
,
I.
,
Durand
,
P.
,
Bril
,
A.
,
Mornon
,
J.-P.
and
Souchet
,
M.
(
1998
).
Sequence and 3D structural relationships between mammalian Rasand Rho-specific GTPase-activating proteins (GAPs): the cradle fold
.
FEBS Lett
.
426
,
205
211
.
Cartaud
,
A.
,
Ludosky
,
M. A.
,
Courvalin
,
J.-C.
and
Cartaud
,
J.
(
1990
).
A protein antigenically related to nuclear lamin B mediates the association of intermediate filaments with desmosomes
.
J. Cell Biol
.
111
,
581
588
.
Coffe
,
G.
,
Le Caer
,
J.-P.
,
Lima
,
O.
and
Adoutte
,
A.
(
1996
).
Purification, in vitro reassembly, and preliminary sequence analysis of epiplasmins, the major constituent of the membrane skeleton of Paramecium
.
Cell Motil. Cytoskel
.
34
,
137
151
.
Dodemont
,
H.
,
Reimer
,
D.
and
Weber
,
K.
(
1990
).
Structure of an invertebrate gene encoding cytoplasmic intermediate filament (IF) proteins: implications for the origin and the diversification of IF proteins
.
EMBO J
.
9
,
4083
4094
.
Döring
,
V.
and
Stick
,
R.
(
1990
).
Gene structure of nuclear lamin L III of Xenopus laevis; a model for evolution of IF proteins from a lamin-like ancestor
.
EMBO J
.
9
,
4073
4081
.
Dubreuil
,
R. R.
,
Rosiere
,
Y. K.
,
Rosner
,
M. C.
and
Bouck
,
G. B.
(
1988
).
Properties and topography of the major integral plasma membrane protein of a unicellular organism
.
J. Cell Biol
.
107
,
191
200
.
Durand
,
P.
,
Canard
,
L.
and
Mornon
,
J.-P.
(
1997
).
Visual BLAST and Visual FASTA: graphic workbenches for interactive analysis of full BLAST and FASTA outputs under Microsoft Windows 95/NT
.
Comput. Appl. Biosci
.
13
,
401
413
.
Girault
,
J. A.
,
Labesse
,
G.
,
Mornon
,
J.-P.
and
Callebaut
,
I.
(
1998
).
Janus Kinases and Focal Adhesion Kinases play in the 4. 1 band: a superfamily of band 4. 1 domains important for cell structure and signal transduction
.
Mol. Med
.
4
,
751
769
.
Henikoff
,
S.
and
Henikoff
,
J. G.
(
1992
).
Amino acid substitution matrices from protein blocks
.
Proc. Nat. Acad. Sci. USA
89
,
10915
10919
.
Huttenlauch
,
I.
,
Geisler
,
N.
,
Plessmann
,
U.
,
Peck
,
R. K.
,
Weber
,
K.
and
Stick
,
R.
(
1995
).
Major epiplasmic proteins of ciliates are articulins: Cloning, recombinant expression, and structural characterization
.
J. Cell Biol
.
130
,
1401
1412
.
Huttenlauch
,
I.
,
Peck
,
R. K.
,
Plessmann
,
U.
,
Weber
,
K.
and
Stick
,
R.
(
1998a
).
Characterisation of two articulins, the major epiplasmic proteins comprising the membrane skeleton of the Ciliate Pseudomicrothorax
.
J. Cell Sci
.
111
,
1909
1919
.
Huttenlauch
,
I.
,
Peck
,
R. K.
and
Stick
,
R.
(
1998b
).
Articulins and epiplasmins: two distinct classes of cytoskeletal proteins of the membrane skeleton in Protists
.
J. Cell Sci
.
111
,
3367
3378
.
Kennelly
,
P. K.
and
Krebs
,
E. G.
(
1991
).
Consensus sequences as substrate specificity determinants for protein kinases and protein phosphatases
.
J. Biol. Chem
.
266
,
15555
15558
.
Kwiatkowska
,
K.
and
Sobota
,
A.
(
1992
).
240 kDa immunoanalogue of vertebrate α-spectrin occurs in Paramecium cells
.
Cell Motil. Cytoskel
.
23
,
111
121
.
Laemmli
,
U. K.
(
1970
).
Cleavage of structural proteins during the assembly of the head of the bacteriophage T4
.
Nature
227
,
680
685
.
Marrs
,
J. A.
and
Bouck
,
G. B.
(
1992
).
The two major membrane skeletal proteins articulins of Euglena gracilis define a novel class of cytoskeletal proteins
.
J. Cell Biol
.
118
,
1465
1475
.
Martindale
,
D. W.
(
1989
).
Codon usage in Tetrahymena and other ciliates
.
J. Protozool
.
36
,
29
34
.
Peck
,
R. K.
(
1977
).
The ultrastructure of the somatic cortex of Pseudomicrothorax dubius: structure and fonction of the epiplasm in ciliated protozoan
.
J. Cell Sci
.
27
,
367
385
.
Riemer
,
D.
,
Dodemont
,
H.
and
Weber
,
K.
(
1992
).
Analysis of the cDNA and gene encoding a cytoplasmic intermediate filament (IF) protein from the cephalocordate Branchiostoma lanceolatum; implications for the evolution of the IF protein family
.
Eur. J. Cell Biol
.
58
,
128
135
.
Risler
,
J.-L.
,
Delorme
,
M.
,
Delacroix
,
H.
and
Henaut
,
A.
(
1988
).
Amino acid substitutions in structurally related proteins. A pattern recognition approach
.
J. Mol. Biol
.
204
,
1019
1029
.
Sambrook
,
J.
,
Fritsch
,
E. F.
and
Maniatis
,
T.
(
1989
).
Molecular Cloning: A Laboratory Manual
, 2nd edn.
New-York
Cold Spring Harbor Laboratory Press
.
Schneider
,
A.
,
Lutz
,
H. U.
,
Marrug
,
R.
,
Gehr
,
P.
and
Seebeck
,
T.
(
1988
).
Spectrin-like proteins in the paraflagellar rod structure of Trypanosoma brucei
.
J. Cell Sci
.
90
,
307
315
.
Shoeman
,
R. L.
and
Traub
,
P.
(
1995
).
The proteins of intermediate filament systems
.
In The Cytoskeleton, Structure and Assembly
(ed.
J. E.
Hesketh
and
I. F.
Pryme
), pp.
205
255
.
JAI Press Ltd
,
Greenwich-London
.
Steinert
,
P. M.
and
Roop
,
D. R.
(
1988
).
Molecular and cellular biology of intermediate filaments
.
Annu. Rev. Biochem
.
57
,
593
625
.
Stick
R.
(
1995
).
Nuclear lamins and the nucleoskeleton
.
In The Cytoskeleton, Structure and Assembly
(ed.
J. E.
Hesketh
and
I. F.
Pryme
), pp.
257
296
.
JAI Press Ltd
,
Greenwich-London
.
Towbin
,
H.
,
Staehelin
,
T.
and
Gordon
,
J.
(
1979
).
Electrophoretic transfer of proteins from polyacrylamide gels to nitrocellulose sheets: procedure and some applications
.
Proc. Nat. Acad. Sci. USA
76
,
4350
4354
.
Vaudaux
,
P.
(
1976
).
Isolation and identification of specific cortical proteins in Tetrahymena pyriformis strain GL
.
J. Protozool
.
23
,
458
464
.
Vigues
,
B.
,
Bricheux
,
G.
,
Metivier
,
C.
,
Brugerolle
,
G.
and
Peck
,
R. K.
(
1987
).
Evidence for common epitopes among proteins of the membrane skeleton of a ciliate, a euglenoid and a dinoflagellate
.
Eur. J. Protistol
.
23
,
101
110
.
Williams
,
N.
(
1986
).
Evolutionary change in cytoskeletal proteins and cell architecture in lower eukaryotes
.
Progr. Protistol
.
1
,
309
324
.
Williams
,
N. E.
,
Honts
,
J. E.
and
Jaeckel-Williams
,
R. F.
(
1987
).
Regional differentiation of the membrane skeleton in Tetrahymena
.
J. Cell Sci
.
87
,
457
463
.
Williams
,
N. E.
,
Honts
,
J. E.
and
Dress
,
V. M.
(
1992
).
Protein polymorphism and evolution in the genus Tetrahymena
.
J. Protozool
.
39
,
54
58
.
Williams
,
N. E.
,
Honts
,
J. E.
,
Dress
,
V. M.
,
Nelsen
,
E. M.
and
Frankel
,
J.
(
1995
).
Monoclonal antibodies reveal complex structure in the membrane skeleton of Tetrahymena
.
J. Euk. Microbiol
.
42
,
422
427
.
Witt
,
I.
,
Straub
,
N.
,
Kaufer
,
N. F.
and
Gross
,
T.
(
1993
).
The CAGTCACA box in the fission yeast Schizosaccharomyces pombe functions like a TATA element and binds a novel factor
.
EMBO J
.
12
,
1201
1208
.
Woodcock
,
S.
,
Mornon
,
J.-P.
and
Henrissat
,
B.
(
1992
).
Detection of secondary structure elements in proteins by Hydrophobic Cluster Analysis
.
Prot. Eng
.
5
,
629
635
.