ABSTRACT
Mitochondrial biogenesis relies on hundreds of proteins that are derived from genes encoded in the nucleus. According to the characteristic properties of N-terminal targeting peptides (TPs) and multi-step authentication by the protein translocase called the TOM complex, nascent polypeptides satisfying the requirements are imported into mitochondria. However, it is unknown whether eukaryotic cells with a single mitochondrion per cell have a similar complexity of presequence requirements for mitochondrial protein import compared to other eukaryotes with multiple mitochondria. Based on putative mitochondrial TP sequences in the unicellular red alga Cyanidioschyzon merolae, we designed synthetic TPs and showed that functional TPs must have at least one basic residue and a specific amino acid composition, although their physicochemical properties are not strictly determined. Combined with the simple composition of the TOM complex in C. merolae, our results suggest that a regional positive charge in TPs is verified solely by TOM22 for mitochondrial protein import in C. merolae. The simple authentication mechanism indicates that the monomitochondrial C. merolae does not need to increase the cryptographic complexity of the lock-and-key mechanism for mitochondrial protein import.
INTRODUCTION
Evolved from a free-living α-proteobacterial ancestor via an endosymbiotic event, the mitochondrion has its own genome and gene expression system (Gray et al., 1999). However, most genes encoding mitochondrial proteins are now encoded in nuclear genomic DNA and 99% of mitochondrial proteins are synthesized by cytosolic ribosomes (Chacinska et al., 2009; Sickmann et al., 2003). Due to the compartmentalization by membranes, nascent mitochondrial precursor proteins are secreted into mitochondria according to the embedded information in their peptide sequence (Pfanner et al., 2019; Schmidt et al., 2010). Except for mitochondrial membrane proteins, the signature for transportation into mitochondria can be found in the N-terminal peptide sequence of precursor proteins called the presequence or targeting peptide (TP) (Endo and Yamano, 2009). The precursor proteins that have the TP are carried into the mitochondrion through the mitochondrial protein translocator of the outer membrane (called the TOM complex) (Araiso et al., 2019; Shiota et al., 2011, 2015; Su et al., 2022). Then, precursor proteins are imported into the matrix by the function of the TIM23 complex (Sim et al., 2023; Yamamoto et al., 2002; Zhou et al., 2023). The mitochondrial protein import system is widely conserved through eukaryotes, suggesting that it evolved in their last common ancestor.
To distinguish the nascent proteins that should be secreted into the mitochondrion from others, the mitochondrial targeting sequence has several characteristic features. Mitochondrial TPs usually contain 20 to 60 amino acids and tend to fold into amphiphilic α-helices (Calvo et al., 2017; Carrie et al., 2015; Gavel et al., 1988; Huang et al., 2009). Their amino acid composition is generally enriched in alanines, leucines, lysines and arginines. In particular, it is considered that the presence of arginine residues and the resultant positive charge in the presequence determines mitochondrial targeting (Vögtle et al., 2009; von Heijne et al., 1989). The reason for the overall enrichment of basic amino acids is thought to be that the positive charge of the TP facilitates passage through the electrochemical gradient across the inner mitochondrial membrane generated by the mitochondrial electron transport chain (Garg and Gould, 2016).
Despite these features being well known in the mitochondrial targeting sequences throughout eukaryotes, the common motif has not been identified and amino acid sequences differ even in the same organism. Although the targeting sequence appears to be a cryptic sequence, significant biochemical features of the presequence indicate that the organelle destinations of each protein are controlled and follow uncharacterized rules in the eukaryotic cell. Given that each organism shows different trends in the mean length and amino acid composition of the presequences, there are countless functional sequences as the presequence and the complexity of the presequences might correlate with the complexity of cell structure such as multicellularity.
In this study, to understand the prerequisite of the TP for mitochondrial targeting, we experimentally assessed protein targeting function of various types of synthetic TPs (synTPs) using the unicellular alga Cyanidioschyzon merolae. The C. merolae cell contains only one mitochondrion, which is easily identified by its shape and intracellular localization by fluorescence microscopy (Kuroiwa, 1998; Matsuzaki et al., 2004; Nozaki et al., 2007) (Fig. 1A). In addition, established gene-targeting techniques using fluorescent reporters can be used to test whether the engineered TP has the ability to target a fluorescent protein (FP) or FP-fused protein to the mitochondrion of C. merolae (Fujiwara et al., 2015; Imamura et al., 2009; Ohnuma et al., 2008; Tanaka et al., 2021). Through a series of in vivo and in silico experiments, we showed that an N-terminal peptide with a specific amino acid composition and very few basic residues fulfils the requirement for mitochondrial protein targeting. Thus, a key with a simple structure can open the mitochondrial protein gate in the monomitochondrial C. merolae.
Genomic information of mitochondrial and chloroplast proteins in Cyanidioschyzon merolae. (A) Fluorescence images of a C. merolae cell. The ACTIN knockout cell was used for the imaging (Tanaka et al., 2021). The nucleus, mitochondrion, chloroplast and peroxisome were visualized by Cas9–Venus, mScarlet, chlorophyll autofluorescence and mCerulean3, respectively. Images are representative of more than three independent experiments. (B) The mitochondrial translocater of the outer mitochondrial membrane (TOM) complex and the inner mitochondrial membrane (TIM) complex. The TOM complex is composed of the β-barrel protein TOM40, α-helical membrane-integrated receptors TOM20, TOM22 and TOM70, and the regulators TOM5, TOM6 and TOM7. Proteins which are not identified in the C. merolae protein-coding genes are illustrated with dashed lines. OM, outer membrane; IMS, intermembrane space; IM, inner membrane. (C,D) Scatterplot comparisons of mitochondrial targeting peptide (mTP) and chloroplast targeting peptide (cTP) scores for all ORFs (4803 proteins) (C) and for well-characterized mitochondrial (113 proteins) and chloroplast proteins (97 proteins) (D). Prediction scores for all ORFs are shown in Table S1 and the list of mitochondrial and chloroplast proteins is given in Tables S2 and S3. (E,F) Histograms of the length of presequences for the mitochondrion (E) and the chloroplast (F). (G) Venn diagram showing the classification of mitochondrial presequences containing α-helices (magenta), β-sheets (blue), both α-helices and β-sheets (purple), and no α-helices or β-sheets. See also Figs S1 and S2.
Genomic information of mitochondrial and chloroplast proteins in Cyanidioschyzon merolae. (A) Fluorescence images of a C. merolae cell. The ACTIN knockout cell was used for the imaging (Tanaka et al., 2021). The nucleus, mitochondrion, chloroplast and peroxisome were visualized by Cas9–Venus, mScarlet, chlorophyll autofluorescence and mCerulean3, respectively. Images are representative of more than three independent experiments. (B) The mitochondrial translocater of the outer mitochondrial membrane (TOM) complex and the inner mitochondrial membrane (TIM) complex. The TOM complex is composed of the β-barrel protein TOM40, α-helical membrane-integrated receptors TOM20, TOM22 and TOM70, and the regulators TOM5, TOM6 and TOM7. Proteins which are not identified in the C. merolae protein-coding genes are illustrated with dashed lines. OM, outer membrane; IMS, intermembrane space; IM, inner membrane. (C,D) Scatterplot comparisons of mitochondrial targeting peptide (mTP) and chloroplast targeting peptide (cTP) scores for all ORFs (4803 proteins) (C) and for well-characterized mitochondrial (113 proteins) and chloroplast proteins (97 proteins) (D). Prediction scores for all ORFs are shown in Table S1 and the list of mitochondrial and chloroplast proteins is given in Tables S2 and S3. (E,F) Histograms of the length of presequences for the mitochondrion (E) and the chloroplast (F). (G) Venn diagram showing the classification of mitochondrial presequences containing α-helices (magenta), β-sheets (blue), both α-helices and β-sheets (purple), and no α-helices or β-sheets. See also Figs S1 and S2.
RESULTS
Genome analysis of the presequences in C. merolae
The fact that only 4803 genes are encoded in the nuclear genome and 99.9% of the protein-coding genes are single-exon genes suggests a simple proteome in C. merolae (Matsuzaki et al., 2004; Nozaki et al., 2007). Furthermore, as only two components, TOM40 and TOM22, in the TOM complex are characterized in the genome, the mitochondrial protein targeting system in C. merolae is likely to be functionally limited than that in other eukaryotic cells (Fig. 1B). To reveal the minimal requisite as the functional TP for mitochondrial protein targeting, we first computationally evaluated all amino acid sequences encoded in the nuclear genome by using a deep learning model-based presequence prediction tool, TargetP2.0 (Armenteros et al., 2019). The protein-encoding open reading frames (ORFs) were classified based on their mitochondrial TP (mTP) and chloroplast TP (cTP) scores: 336 putative mitochondrial proteins (289 proteins with an mTP score >0.5) and 349 putative chloroplast proteins (328 proteins with a cTP score >0.5) were identified (Fig. 1C; Table S1). To assess the predictability of the results, we confirmed the prediction scores for mitochondrial and chloroplast proteins that are well characterized in other organisms or experimentally confirmed to localize to the mitochondrion or the chloroplast in C. merolae (Mori et al., 2016; Moriyama et al., 2014a,b) (Tables S2 and S3). The recalls of protein targeting sequences for mitochondrial proteins (113 proteins) and chloroplast proteins (97 proteins) were 43.4% (49 proteins with a mTP score >0.5) and 41.2% (40 proteins with a cTP score >0.5), respectively (Fig. 1D). Given that the recall of presequence prediction for mitochondrial and chloroplast proteins in other organisms by TargetP2.0 was approximately 80–86% (Armenteros et al., 2019; Imai and Nakai, 2020), the lower prediction scores for C. merolae protein targeting sequences suggest that the mitochondrial targeting system in C. merolae not only shares basic similarities with those in other eukaryotes, but also has some differences.
To understand the characteristics and principles of mitochondrial TPs in C. merolae, we next compared the length of the putative mitochondrial and chloroplast TPs. In the comparison, TP regions in mitochondrial and chloroplast proteins were presumed by protein sequence alignment analysis (see Materials and Methods) and we omitted proteins for which the length of the putative TPs was shorter than ten or longer than 150 amino acids from the analysis. The lengths of mitochondrial and chloroplast TPs were 59.3±23.4 and 78.4±17.3 (mean±s.d.), respectively (Fig. 1E,F). Additionally, our dataset for mitochondrial TPs showed that 75.2% of mitochondrial TPs (85/113 proteins) are α-helical polypeptides (Fig. 1G; Figs S1 and S2). Thus, similar to the results in other organisms, the mitochondrial TP is typically shorter than the chloroplast TP and the α-helical structure is the significant feature of the mitochondrial TP even in the C. merolae cell with the simplest organelle composition.
In vivo analysis of mitochondrial targeting property by fluorescence reporter assay
To verify whether the single α-helical polypeptide could work as a TP in C. merolae, an α-helix region (1–33 amino acids) or the full length (1–468 amino acids) of aspartate aminotransferase (AAT, CMC148C) was fused with the yellow fluorescent protein mVenus (Nagai et al., 2002) and introduced into the cells as a representative example (Fig. 2A,B; Fig. S3). The resultant transformants expressing either in AAT 1–33–mVenus or AAT full-length–mVenus emitted fluorescence signals of the mVenus reporter in the mitochondrion. The result suggests that the α-helical polypeptide is one of the minimal requisites as a functional presequence for mitochondrial protein targeting in C. merolae.
Synthetic mitochondrial presequence. (A) Scheme of the 1–33 amino acid sequence of aspartate aminotransferase (AAT, CMC148C). (B) Fluorescence images of AAT 1–33 and AAT full-length fused with mVenus. Fluorescence signals for mVenus are shown in green and chlorophyll autofluorescence is shown in red. See also Fig. S3. Images are representative of three independent experiments. (C) Comparisons of amino acid compositions in all ORFs and α-helices in mitochondrial presequences. The α-helices in mitochondrial presequences are identified by structural simulation using the AlphaFold program. See also Table S4 for sequences. (D) Distribution of amino acids in all ORFs (individual bars on the left) and α-helices in presequences (individual bars on the right). (E) An amino acid sequence logo of α-helices in presequences. Asterisks indicate the arginine residues that were adopted in the synthetic presequence. (F) Sequence of the synthetic presequence of 24 amino acids. (G) Structural prediction score for α-helix (left) and local distance difference test (LDDT) score (right) of the synthetic presequence. (H) A simulated structure of the synthetic presequence in lateral and front views.
Synthetic mitochondrial presequence. (A) Scheme of the 1–33 amino acid sequence of aspartate aminotransferase (AAT, CMC148C). (B) Fluorescence images of AAT 1–33 and AAT full-length fused with mVenus. Fluorescence signals for mVenus are shown in green and chlorophyll autofluorescence is shown in red. See also Fig. S3. Images are representative of three independent experiments. (C) Comparisons of amino acid compositions in all ORFs and α-helices in mitochondrial presequences. The α-helices in mitochondrial presequences are identified by structural simulation using the AlphaFold program. See also Table S4 for sequences. (D) Distribution of amino acids in all ORFs (individual bars on the left) and α-helices in presequences (individual bars on the right). (E) An amino acid sequence logo of α-helices in presequences. Asterisks indicate the arginine residues that were adopted in the synthetic presequence. (F) Sequence of the synthetic presequence of 24 amino acids. (G) Structural prediction score for α-helix (left) and local distance difference test (LDDT) score (right) of the synthetic presequence. (H) A simulated structure of the synthetic presequence in lateral and front views.
Next, we analyzed the amino acid compositions of the α-helices of mitochondrial TPs. For the analysis, 31 single α-helical polypeptides, which are less than 25 amino acids long, in TPs were evaluated (Table S4). As a result, we identified that the α-helical polypeptides were composed of 17.3% basic residues, 3.6% acidic residues, 26.2% polar uncharged residues and 52.9% of nonpolar residues (Fig. 2C, right). Given that the average amino acid composition of all C. merolae ORFs is 11.2% basic residues, 14.0% acidic residues, 22.2% polar uncharged residues and 52.5% nonpolar residues (Fig. 2C, left), although the decrease in the proportion of acidic residues was remarkable, the overall trend was not extremely skewed in the mitochondrial TPs. More clear differences were found not in the chemical characteristics, but in the amino acid compositions. Although the aspartic acid, glutamic acid, proline and isoleucine residues are scarce, the arginine, serine, threonine, alanine, leucine and valine residues are abundant in the mitochondrial TPs (Fig. 2D). Of these abundant residues, it is known that alanine, arginine and leucine have high α-helical propensities (Pace and Scholtz, 1998). As the mean hydrophobic moment in the α-helical polypeptides was 0.288±0.13 μH (±s.d.) (Table S4), the TP has very weak amphiphilicity. Also, owing to the low composition of acidic residues, the average charge of the α-helices was estimated to be 2.80±1.76 at pH 7.0 (Table S4). Thus, the mitochondrial TP has a specialized amino acid composition compared with that of other polypeptides and a tendency to form α-helical structure with weak amphiphilicity and weak positive charge in C. merolae.
Mitochondrial targeting property of a designed presequence
To further investigate the basal prerequisites of the mitochondrial TP, we designed a synTP by linking the most frequent amino acid residue at each position in the α-helices (Fig. 2E). In order to make the charge similar to that of the endogenous TPs, the two arginine residues were replaced with leucine and serine residues, respectively. As a result, the synTP contains three arginine residues in the 24 amino acids (Fig. 2F). Computational simulations of both the secondary and tertiary structure indicated that the synTP forms a single α-helical structure (Fig. 2G,H). By introducing synTP-fused mVenus into the cell, we detected the fluorescence signal in the mitochondrion and concluded that the designed TP has the mitochondrial targeting property (Fig. 3, left). As it is well known that multiple basic residues in the TP are required for protein targeting into the mitochondrion (Gavel et al., 1988), we investigated the requirement of the minimum number of basic residues for mitochondrial protein targeting in C. merolae. To investigate this, arginine residues in the synTP were progressively replaced by other residues. Interestingly, although computational simulations predicted that modified synTPs containing less than two arginine residues would lose the ability to translocate to the mitochondrion (Table S5), transformants expressing modified synTP fused to mVenus showed that even a single arginine residue in the TP fulfils the role for mitochondrial targeting (Fig. 3). In addition, we confirmed that lysine residues are exchangeable for arginine residues in the TP (Fig. 3, right).
Effects of the number of basic residues in the synthetic presequence. The helical wheel diagrams for amino acid sequences of the synthetic presequences containing three, two, one or zero arginine residues and three lysine residues. Representative images of cells from three independent experiments are shown below each helical wheel.
Effects of the number of basic residues in the synthetic presequence. The helical wheel diagrams for amino acid sequences of the synthetic presequences containing three, two, one or zero arginine residues and three lysine residues. Representative images of cells from three independent experiments are shown below each helical wheel.
Identification of physicochemical properties for a functional presequence
Generally, mitochondrial TPs contain an arginine residue near the protease cleavage site with a sequence motif of R-X-/-X or R-X-F/T/L-/-A/S-X (Heidorn-Czarna et al., 2022), indicating that not only the net charge but also position of arginine residues would affect the mitochondrial targeting property. Based on this assumption, we investigated the effect of the localization of the single arginine residue in the synTP1R by an arginine-scanning approach. The position of a single arginine residue at the +2, +6, +8, +9, +12, +13, +15 or +21 position was examined (Fig. 4A). After the series of analyses, we found that the position of the single arginine residue in synTP1R is broadly permissible in the α-helix, but the synTP1R containing an arginine residue at the +2 position relative to the initial methionine residue lost the mitochondrial targeting property and the mVenus reporter localized in the cytosol similar to synTP0R (Fig. 4B). Taken together, an arginine residue in the α-helix can impart function to the mitochondrial TP and the position of the arginine residue is allowed in a broad region of the helical structure except for the flanking position.
Arginine scanning analysis and physicochemical properties of the modified synthetic presequences. (A) Arginine scanning of the modified synthetic presequence. (B) Fluorescence images of each modified synTP1R–mVenus. Images are representative of three independent experiments. (C–E) Physicochemical properties of each synthetic presequence. See also Table S5.
Arginine scanning analysis and physicochemical properties of the modified synthetic presequences. (A) Arginine scanning of the modified synthetic presequence. (B) Fluorescence images of each modified synTP1R–mVenus. Images are representative of three independent experiments. (C–E) Physicochemical properties of each synthetic presequence. See also Table S5.
By changing either the number of arginine residues or the position of the arginine residue, the physicochemical properties of synTP are drastically altered (Fig. 4C–E). As the hydrophobicity and hydrophobic moment are acceptable in a broad range (0.388 to 0.747 H and 0.246 to 0.101 μH) as functional TPs, the importance of these factors for protein targeting to the mitochondrion is not high. Furthermore, neither the hydrophobicity nor the hydrophobic moment of the mitochondrial TP in C. merolae has been characterized. More importantly, the C. merolae mitochondrial TP functioned normally even when the charge was negative. Although the results suggest that the net charge of the TP does not need to be positive for protein targeting to the mitochondrion, a modified TP without an arginine residue (synTP0R) lost the ability to target mitochondrial proteins (Fig. 3). It is therefore suggested that basic residues in the α-helix, with the exception of the flanking position, are essential for a functional TP, but their importance is not linked to the net charge of the TP.
Verification of the mitochondrial targeting property of endogenous polypeptides sharing sequence similarity with the synTP
Our results showed that the specialized amino acid composition and the presence of a few basic residues in the α-helix are prerequisites for the functional TP in C. merolae; however, it is questionable whether these lax requirements are sufficient to distinguish mitochondrial protein precursors from other proteins in vivo. If the peptide sequence of the synTP is very specific in protein sequences and only endogenous mitochondrial TPs have sequence similarity with the synTP in C. merolae, the sequence would act as a key for the mitochondrial protein gate. In contrast, if the sequence has some similarity to other proteins, gene products with translation errors or genetic mutations that cause amino acid substitutions in each gene would easily lead to misdelivery of the protein to the mitochondrion. To address this issue, we evaluated the sequence similarity between the N-terminal peptides (1–24 amino acids) of all ORFs and the synTP using the BLOSUM30 matrix.
Consequently, not only mitochondrial proteins, but also many cytosolic and chloroplast proteins were highly scored (Table S6). For example, a hypothetical protein (CMD051C), a putative formin-like protein (FMNL, CMN049C) and a putative chloroplast ribosomal protein (r-protein) S1 (RPSA, CMM019C) were ranked first to third. The N-terminal peptides for FMNL and RPSA were also highly predicted to form an α-helical structure by local structure prediction, as was the case for synTP (Fig. 5A,B). FMNL, a cytosolic protein related to actin filament dynamics, and chloroplast RPSA, a 30S r-protein S1 in chloroplasts, are well studied, but it has not been reported that these proteins localize to mitochondria.
Fluorescent reporter assay for putative presequences of FMNL and RPSA. (A) Sequences of the first 24 amino acids of formin-like protein (FMNL) and r-protein S1 (RPSA). (B) Helical wheels for FMNL 1–24 and RPSA 1–24. (C) Fluorescence images of the N-terminal 1–24 peptide of FMNL fused with mVenus. Mutated FMNLs (E2A, E18A and E2A/E18A) fused with mVenus are also shown. (D) The Bayesian tree of chloroplast, mitochondrial and bacterial RPSA proteins (see Fig. S4 for the unabbreviated tree). Numbers on the left and right near branches indicate posterior probabilities of Bayesian inference and bootstrap values of the maximum likelihood method, respectively. Branch lengths are proportional to the evolutionary distances indicated by the scale bar. The C. merolae RPSA (CMM019C) is shown as the black circle. (E) Fluorescence images of RPSA 1–24 or RPSA full-length fused with mVenus. Images are representative of three independent experiments.
Fluorescent reporter assay for putative presequences of FMNL and RPSA. (A) Sequences of the first 24 amino acids of formin-like protein (FMNL) and r-protein S1 (RPSA). (B) Helical wheels for FMNL 1–24 and RPSA 1–24. (C) Fluorescence images of the N-terminal 1–24 peptide of FMNL fused with mVenus. Mutated FMNLs (E2A, E18A and E2A/E18A) fused with mVenus are also shown. (D) The Bayesian tree of chloroplast, mitochondrial and bacterial RPSA proteins (see Fig. S4 for the unabbreviated tree). Numbers on the left and right near branches indicate posterior probabilities of Bayesian inference and bootstrap values of the maximum likelihood method, respectively. Branch lengths are proportional to the evolutionary distances indicated by the scale bar. The C. merolae RPSA (CMM019C) is shown as the black circle. (E) Fluorescence images of RPSA 1–24 or RPSA full-length fused with mVenus. Images are representative of three independent experiments.
Fluorescence reporter assay for the native or modified N-terminal polypeptides of FMNL
To determine whether these N-terminal peptides have mitochondrial targeting function, we performed the fluorescent reporter assay for the 1–24 amino acids of FMNL. As a result, the fluorescence signal for FMNL 1–24 fused to mVenus was identified in the cytosol, suggesting that FMNL 1–24 is similar to synTP but does not have a mitochondrial targeting property (Fig. 5C, left). As a notable difference between FMNL1–24 and synTP is the total number of acidic residues, we hypothesized that a negative charge derived from these residues prevents translocation of FMNL to the mitochondrion. Therefore, two acidic acid residues (E2 and E18) were progressively replaced by alanine residues. Using this approach, we found that a single substitution of glutamic acid residues did not affect the targeting property, but double mutation (E2A and E18A) conferred the targeting property to the mitochondrion (Fig. 5C). This demonstrates that a small number of amino acid substitutions can alter the destination of a protein in vivo.
The evolutional origin and the targeting property of the N-terminus of RPSA
As a second example, we investigated the targeting property of the N-terminus of RPSA. Prior to the in vivo assay, we performed phylogenetic analysis to verify whether C. merolae RPSA (CMM019C) shares sequence similarity and is evolutionarily related to other chloroplast and cyanobacterial RPSAs. The results showed that cyanobacterial, green algal/land plant and red algal RPSAs containing the C. merolae RPSA formed a phylogenetic group and were separated from other bacterial and eukaryotic RPSAs (Fig. 5D; Fig. S4). The evolutionary relationship between C. merolae RPSA and cyanobacterial RPSA suggests that the gene product of CMM019C would function as a 30S r-protein S1 in the chloroplast, not in the mitochondrion. Next, we tested the targeting property of the N-terminal peptide sequence of the chloroplast RPSA using the fluorescent reporter assay. As a result, we found that the N-terminal peptide has the property to translocate to the mitochondrion but not to the chloroplast (Fig. 5E, left). We also confirmed that the full-length RPSA fused to mVenus translocated to the chloroplast (Fig. 5E, right). The results suggest that the N-terminus of RPSA has the same targeting property to the mitochondrion as that of the synTP, but an additional peptide sequence would convert it to a chloroplast TP.
DISCUSSION
The single-step authentication of mitochondrial preproteins in C. merolae
Through a series of in vivo and in silico experimental evaluations using synTPs, we showed the minimal prerequisite for protein targeting to the mitochondrion in C. merolae. The results indicated that functional TPs need to have some basic residues, at least one, in an α-helix comprising the specific amino acid composition, but the physicochemical properties of the net charge, hydrophobicity and hydrophobic moment do not seem to be strictly determined in C. merolae. The reason for the small number of basic residues in the TP could be explained by the characteristics of the TOM complex in C. merolae. Previous pioneering studies in fungi and animals indicated that the mitochondrial outer membrane proteins TOM20 and TOM22 identify mitochondrial TPs by multi-step authentication and introduce mitochondrial precursors into the mitochondrial intermembrane space (Abe et al., 2000; Araiso et al., 2019; Hanif Sayyed and Mahalakshmi, 2022; Nargang et al., 1998; Pfanner et al., 2019; Saitoh et al., 2007; Shiota et al., 2011). During the process, precursor proteins in the cytosol are initially detected and captured by TOM20 on the mitochondrial outer membrane via the hydrophobic interaction between the hydrophobic groove on the TOM20 surface and the hydrophobic site of the amphipathic helix of the TP (Abe et al., 2000; Saitoh et al., 2007). Then, TOM22 interacts with the TP via the electrostatic interaction between the negatively charged site of TOM22 and the positively charged site of the helix in the TP (Nargang et al., 1998; Shiota et al., 2011; Su et al., 2022). After the verification by TOM20 and TOM22, precursors are delivered to the β-barrel protein transporter TOM40 and the TIM complex. Although a TOM22 homologue has been identified in C. merolae, there is no sequence encoding TOM20 in the genome (Fig. 1B). Given that hydrophobicity is not strictly required for the functional TP (Fig. 4C), combined with the absence of TOM20 in the genome, verification of the mitochondrial precursors in C. merolae would be performed by TOM22 alone as a single-step authentication.
Interestingly, the N-terminal domains of human TOM22 (142 amino acids) and C. merolae TOM22 (119 amino acids), which face toward the cytoplasm in vivo, share very low protein sequence identities (Fig. 6A). The estimated charge of the N-terminal domain of human TOM22 is strongly negative (−16) (Fig. 6B, top), suggesting that the N-terminal domain captures the mitochondrial precursors via their positively charged TPs as shown in a recent study (Su et al., 2022). In contrast, the estimated charge of the N-terminal domain of C. merolae TOM22 is not negative (+2.2), but we found that one putative α-helical domain has five acidic residues on one side (Fig. 6B, bottom). Thus, in C. merolae, mitochondrial precursor proteins would be verified via electrostatic interaction between basic residues on the TP and the acidic residues on TOM22 of the α-helix as the single-step authentication (Fig. 6C). The TOM22-mediated authentication mechanism for mitochondrial protein import would be one reason why the TP with very few basic residues can fulfil the requirement for protein targeting to the mitochondrion in C. merolae.
Schematic representation of mitochondrial protein import in C. merolae. (A) Comparison of the N-terminal domains of human TOM22 and C. merolae TOM22. Human TOM22 helix domains and C. merolae TOM22 helix domains are shown in boxes. Helical regions for C. merolae TOM22 were computationally predicted by NetSurfP2.0. Red boxes indicate negatively charged helices. (B) Protein architectures of TOM22 and TOM40. Human TOM22 and TOM40 are depicted using protein structural data (PDB: 7VC4). The structure for C. merolae TOM22 was computationally simulated. Detailed structures of the boxed areas are shown on the right. (C) Schematic models for mitochondrial protein import in animal and fungi (left) and in C. merolae (right). In animal and fungi, a mitochondrial presequence, which is an amphiphilic helix, is recognized by TOM20 and TOM22 as a multi-step authentication process. In contrast, a presequence α-helix containing a few basic residues would be recognized by the acidic part of the α-helix in TOM22 via electrostatic residue–residue interaction as a single-step authentication in C. merolae. During the process, basic residues on the presequence work as the key and the acidic part on TOM22 functions as the lock. After the authentication, a precursor protein is imported into TOM40 and drawn in by the function of the TIM complex. OM, outer membrane; IMS, intermembrane space; IM, inner membrane.
Schematic representation of mitochondrial protein import in C. merolae. (A) Comparison of the N-terminal domains of human TOM22 and C. merolae TOM22. Human TOM22 helix domains and C. merolae TOM22 helix domains are shown in boxes. Helical regions for C. merolae TOM22 were computationally predicted by NetSurfP2.0. Red boxes indicate negatively charged helices. (B) Protein architectures of TOM22 and TOM40. Human TOM22 and TOM40 are depicted using protein structural data (PDB: 7VC4). The structure for C. merolae TOM22 was computationally simulated. Detailed structures of the boxed areas are shown on the right. (C) Schematic models for mitochondrial protein import in animal and fungi (left) and in C. merolae (right). In animal and fungi, a mitochondrial presequence, which is an amphiphilic helix, is recognized by TOM20 and TOM22 as a multi-step authentication process. In contrast, a presequence α-helix containing a few basic residues would be recognized by the acidic part of the α-helix in TOM22 via electrostatic residue–residue interaction as a single-step authentication in C. merolae. During the process, basic residues on the presequence work as the key and the acidic part on TOM22 functions as the lock. After the authentication, a precursor protein is imported into TOM40 and drawn in by the function of the TIM complex. OM, outer membrane; IMS, intermembrane space; IM, inner membrane.
Diversification of the TOM complex in eukaryotes during evolution
The requirement of arginine residues for protein targeting to mitochondria is also conserved in green plants (Heidorn-Czarna et al., 2022; Lee et al., 2019). However, the protein components of the TOM complex in green plants differ significantly not only from those in fungi/animals, but also from those in red algae (Carrie et al., 2010). In contrast to red algae, no orthologue of TOM22 has been identified in the green plant lineages. Furthermore, although Arabidopsis thaliana TOM20 has been shown to be functionally equivalent to animal/fungal TOM20, there is no sequence similarity between A. thaliana TOM20 and animal/fungal TOM20 (Perry et al., 2006). Thus, the TOM complex in the green plant lineage is composed of many types of specific proteins, but the authentication mechanism of mitochondrial precursor proteins in green plants could be similar in complexity to those in fungi and animals through convergent evolution. Considering the diversification of the components and the complexity of the structure of TOM complexes in modern eukaryotes, the simple authentication mechanism in the red alga C. merolae is very unique and would be a minimal or primitive system.
Rewriting of the mitochondrial targeting information with an additional polypeptide for targeting to the chloroplast
Our results suggest a putative mechanism for classification of mitochondrial and chloroplast precursor proteins in C. merolae, which has the simplest cell structure as a photosynthetic eukaryote. Given that the mean length of the chloroplast TPs is longer than that of the mitochondrial TPs (Fig. 1E,F) and our experimental result for chloroplast RPSA (Fig. 5E), the presence of additional peptide sequences would determine whether the TPs are for the mitochondrion or the chloroplast. The longer length for the chloroplast TP than that for the mitochondrial TP is also found in A. thaliana. As the chloroplast arose after the birth of the mitochondrion via endosymbiosis, the protein targeting system for the chloroplast would have to use longer, more complex TPs to distinguish them from mitochondrial TPs.
MATERIALS AND METHODS
Cell culture
The C. merolae M4 strain, which is isolated from C. merolae 10D (NIES-3377) and has a mutation in the URA5.3 gene, was used in this study (Minoda et al., 2004). The C. merolae M4 cells were cultured in MA2 medium (Minoda et al., 2004) supplemented with 0.5 mg/ml uracil and 0.8 mg/ml 5-fluoroorotic acid mono-hydrate in a tissue culture flask 25 (TPP Techno Plastic Products, Switzerland) with shaking at 120 rpm under continuous white light at 40°C.
Putative presequence dataset
Protein sequences of well-studied mitochondrial and chloroplast proteins were chosen from 4803 C. merolae ORFs (Matsuzaki et al., 2004; Mori et al., 2016; Moriyama et al., 2014a,b; Nozaki et al., 2007). By the protein-BLAST search, non-conserved regions at the N-terminal of each protein sequence were identified by visual confirmation and the regions were assessed as putative presequence regions in this study. Through this approach, we created the sequence dataset for 113 mitochondrial presequences (Table S2) and 97 chloroplast presequences (Table S3).
Data analysis
Prediction of presequences for the mitochondrion and the chloroplast was performed by TargetP 2.0 program (Armenteros et al., 2019) (Table S1). Presequence net charge was calculated for a pH of 7 using the Protein Calculator v3.4 program (https://protcalc.sourceforge.net/). Hydrophobicity and hydrophobic moment of each putative α-helical polypeptide were calculated using the HELIQUEST program (Gautier et al., 2008). Amino acid frequencies were calculated using C. merolae ORF data (http://czon.jp/). Sequence logos were made using the WebLogo program (Crooks et al., 2004). Helical wheel maps were made using the NetWheels program (http://lbqp.unb.br/NetWheels/). Protein local structures were predicted by the NetSurfP 2.0 program (Klausen et al., 2019) and tertial structures were simulated by the AlphaFold program (Jumper et al., 2021). Peptide sequence homologies between the synTP and N-terminal 1–24 amino acids of the C. merolae ORFs were calculated using BLOSUM30 matrix.
Transformation and the fluorescent reporter assay
The C. merolae cell lines expressing mVenus fused with N-terminal TPs were produced as follows. To generate the presequence-fused mVenus expression vector, a presequence of interest was cloned into the vector using the following methods. To produce the strains expressing AAT (aspartate aminotransferase, CMC148C) 1–33, 3-R, 2-R, 1-R (R13), 0-R, 3-K, FMNL 1–24, FMNL 1–24 (E2A), FMNL 1–24 (E18A), FMNL 1–24 (E2A/E18A) and RPSA (chloroplast r-protein S1, CMM019C) 1–24, the nucleotide sequence of TPs were added to the 5′ end of mVenus by two-step PCR and cloned into the linearized pQE80L vector (Qiagen) containing the URA5.3 upstream sequence (from −2300 to −898 bp), the 500 bp of CpcC promoter, the 200 bp of TUBB downstream sequence and the URA5.3 selection marker. To produce the strains 1-R (R2), 1-R (R6), 1-R (R8), 1-R (R9), 1-R (R12), 1-R (R15) and 1-R (R21), synthetic DNA fragments were cloned into the linearized vector.
Using the resulting constructs as a template, DNA fragments for homologous recombination comprising ∼1400 bp of sequence upstream of URA5.3, the CpcC promoter, the nucleotide sequence of theTP, mVenus, the TUBB 3′ untranslated region and the URA5.3 selection marker were amplified by PCR. The PCR amplicons were introduced upstream of the chromosomal ura5.3 of the uracil auxotrophic mutant strain M4 by polyethylene glycol (PEG)-mediated transformation as described previously (Fujiwara et al., 2015; Imamura et al., 2009; Ohnuma et al., 2008). Positive clones, which are non-uracil-auxotrophic, on the MA2 plate were confirmed by Sanger sequencing and observed by fluorescence microscopy. See also Fig. S3.
All PCR amplifications were performed with Platinum SuperFi II DNA polymerase (Thermo Fisher Scientific). Purification and assembly of DNA fragments were performed using a Wizard SV Gel and PCR Clean-up System (Promega) and a NEBuilder HiFi DNA assembly cloning kit (New England Biolabs). DNA sequences are listed in Table S7.
Fluorescence microscopy
Fluorescence observations were performed on an Olympus IX83 inverted microscope with a 1.45 NA, 100× oil immersion objective. Illumination was provided by a fluorescent light source (U-HGLGPS; Olympus), excitation filters [490-500HQ (Olympus) for mVenus, FF01-405/10-25 (Semrock) for chloroplasts], custom dichroic mirrors [Di03-R514-t1-25×36 (mVenus), Di03-R514-t1-25×36 (Semrock), T455lp (Chroma) (for chloroplasts)] and emission filters [FF02-531/22-25 (Semrock) (for mVenus), FF02-617/73-25 (for chloroplasts) (Semrock)]. Images were acquired with an ORCA Fusion BT sCMOS camera (Hamamatsu Photonics) controlled by MetaMorph software (Molecular Devices). The effective pixel size was 65.2 nm×65.2 nm.
Phylogenetic analysis of C. merolae RPSA
The amino acid sequences of chloroplast r-protein S1 (CMM019C) and mitochondrial r-protein S1 (CMI304C) of C. merolae were retrieved from the Cyanidioschyzon merolae Genome Project v3 (Matsuzaki et al., 2004; Nozaki et al., 2007) and used as a query for BLASTP (https://blast.ncbi.nlm.nih.gov/Blast.cgi). The BLASTP was carried out against the non-redundant protein sequences (nr) of all organisms in the National Center for Biotechnology Information (NCBI) database (https://www.ncbi.nlm.nih.gov/). The top 50 sequences with the highest E-value sequences of each query were retrieved. We also added amino acid sequences annotated with chloroplast r-protein S1 of six Viridiplantae species from NCBI. Multiple sequence alignments were generated using the MAFFT v7.511 online service in auto strategy (Katoh et al., 2018; Kuraku et al., 2013). Non-homologous regions were detected and cleaned using HMMcleaner (Di Franco et al., 2019). Finally, the multiple sequence alignment was trimmed using trimAl by automated trimming heuristic using the ‘-automated1’ option (Capella-Gutiérrez et al., 2009). Redundant operational taxonomic units (OTUs) that became identical after the trimming procedures were manually excluded. Finally, 83 OTUs were used for analysis. Bayesian inference for the alignments was carried out using MrBayes v3.2.7a MPI version with the best-fitted model selected by ModelTest-NG v0.1.7 (Altekar et al., 2004; Darriba et al., 2020; Flouri et al., 2015). Convergences of Markov chain Monte Carlo iterations were evaluated based on the average standard deviation of split frequencies for every 1,000,000 generations, discarding the first 25% as burn-in, and the iterations were automatically stopped when the average standard deviations were below 0.01, indicating convergence. In addition, the maximum likelihood method was subjected to the alignment with bootstrap values based on 1000 replications by RAxML-NG v1.2.0 with the same model as Bayesian inference (Kozlov et al., 2019). The alignments used for the phylogenetic analysis are uploaded to TreeBASE (Vos et al., 2012).
Acknowledgements
We thank the members of the Yoshida laboratory (The University of Tokyo) for their support and advice during this project.
Footnotes
Author contributions
Conceptualization: R.H., Y.M., Y.Y.; Methodology: R.H., Y.M., K.T., H.N., Y.Y.; Investigation: R.H., Y.M., K.T., Y.Y.; Writing - original draft: R.H., Y.M., Y.Y.; Writing - review & editing: R.H., Y.M., K.T., H.N., T.H., Y.Y.; Visualization: R.H., Y.Y.; Supervision: Y.Y.; Project administration: Y.Y.; Funding acquisition: Y.Y.
Funding
This work was supported by Precursory Research for Embryonic Science and Technology (PRESTO) from the Japan Science and Technology Agency (JPMJPR20EE to Y.Y.), the Human Frontier Science Program Career Development Award (CDA00049/2018-C to Y.Y.), the Japan Society for the Promotion of Science (KAKENHI; JP18K06325 and 22H02653 to Y.Y.) and the Institute for Fermentation, Osaka (L-2020-2-008 to Y.Y.). Open Access funding provided by University of Tokyo. Deposited in PMC for immediate release.
Data availability
The alignments used for the phylogenetic analysis are available in TreeBASE under the accession code S31160. The data supporting the findings of this work are available in the main figures and supplementary information.
Peer review history
The peer review history is available online at https://journals.biologists.com/jcs/lookup/doi/10.1242/jcs.262042.reviewer-comments.pdf
References
Competing interests
The authors declare no competing or financial interests.