Vitellogenin (Vg) is an egg-yolk precursor protein in most oviparous species. In honeybee (Apis mellifera), the protein (AmVg) also affects social behavior and life-span plasticity. Despite its manifold functions, the AmVg molecule remains poorly understood. The subject of our structure-oriented AmVg study is its polyserine tract — a little-investigated repetitive protein segment mostly found in insects. We previously reported that AmVg is tissue specifically cleaved in the vicinity of this tract. Here, we show that, despite its potential for an open, disordered structure, AmVg is unexpectedly resistant to trypsin/chymotrypsin digestion at the tract. Our findings suggest that multiple phosphorylation plays a role in this resilience. Sequence variation is highly pronounced at the polyserine region in insect Vgs. We demonstrate that sequence differences in this region can lead to structural variation, as NMR and circular dichroism (CD) evidence assign different conformational propensities to polyserine peptides from the honeybee and the jewel wasp Nasonia vitripennis; the former is extended and disordered and the latter more compact and helical. CD analysis of the polyserine region of bumblebee Bombus ignitus and wasp Pimpla nipponica supports a random coil structure in these species. The spectroscopic results strengthen our model of the AmVg polyserine tract as a flexible domain linker shielded by phosphorylation.
Vitellogenins (Vgs) are large (usually >200 kDa), typically female-specific ancient (Hayward et al., 2010) proteins mostly known for their involvement in lipid and ion transport during oogenesis. Vg is produced in the insect fatbody as well as in the liver of oviparous vertebrates (Ansari et al., 1971; Dolphin et al., 1971), and is much studied in many species because of its central function in reproduction. In Apis mellifera (honeybee), Vg (AmVg) is expressed not only in female reproductive individuals but also in sterile worker bees (a helper caste) and in males (Piulachs et al., 2003; Trenczek and Engels, 1986). In the workers, AmVg affects social behavior, oxidative stress resistance and life-span plasticity (Amdam et al., 2003; Nelson et al., 2007; Seehuus et al., 2006). These functions link the evolution of AmVg with the development of the bee as a social insect and benefit the colony (Amdam and Omholt, 2003; Nelson et al., 2007). Little is known about the biochemical and structural properties that mediate the effects of AmVg, which include hormone-like (Amdam et al., 2007; Amdam et al., 2004) and antioxidant functions (Seehuus et al., 2006). However, it appears that glycosylation, proteolytic cleavage and phosphorylation could play a role in mediating AmVg’s effects (Havukainen et al., 2011).
Vgs show considerable amino acid sequence similarity between species at an N-terminal and α-helical domain (Smolenaars et al., 2007; Tufail and Takeda, 2008). Between these two domains, however, there is a hypervariable linker. Insect Vgs appear to contain one to three polyserine tracts of various lengths embedded in this linker. Apocrita, the suborder of A. mellifera, may lack extensive polyserine tracts, with honeybee and bumblebee as notable exceptions (Tufail and Takeda, 2008). For instance, the Apocrita wasp Nasonia vitripennis Vg (NvVg) has only seven serine residues in the corresponding sequence (NvVg residues 345–367) compared with 14 in the honeybee (AmVg residues 349–381). The crystal structure of lamprey (Ichthyomyzon unicuspis) lipovitellin (Anderson et al., 1998; Raag et al., 1988) is widely used as a template for the Vg protein family (Havukainen et al., 2011; Kristoffersen et al., 2009; Li et al., 2010) and even for homologous human proteins (Mann et al., 1999). The vertebrate Vgs do not contain a polyserine region, so structural models for the insect polyserine tracts cannot be derived from the existing structural data. In general, very little is known about what functional role polyserine tracts might have.
The polyserine tract is an example of a low complexity polypeptide region – a semi-repetitive region (Dyson and Wright, 2005) predicted to be disordered (Dunker et al., 2002; Huntley and Golding, 2006). They lag behind in biochemical characterization relative to other disordered regions of proteins (for review, see Uversky et al., 2008). This is presumably because polyserine sequences have traditionally been dismissed as the non-functional result of transcription slippage (Huntley and Golding, 2004; Huntley and Golding, 2006), and because of difficulties in establishing suitable expression systems (Chu et al., 2011) as well as in the chemical synthesis of serine repeats (Barlos et al., 1998). Yet, there are many reports of polyserine regions being functionally important, e.g. in molecular recognition (Huntley and Golding, 2006), bacterial carbohydrate metabolism (Howard et al., 2004) and the pathogenesis of viruses (Bates and DeLuca, 1998).
The putative structural disorder of the polyserine tract might provide an attractive site for proteases, and the polyserine tracts do, indeed, flank a consensus cleavage site (RXXR) in many insect Vgs (Tufail and Takeda, 2008). However, Apocrita species are an exception, in that they lack the consensus sequence (Tufail and Takeda, 2008). Despite this, AmVg can be cleaved, tissue specifically, in the vicinity of the polyserine tract by a protease or group of proteases that have yet to be identified (Havukainen et al., 2011). All known Vgs are phosphorylated, and those of insects are likely phosphorylated at the polyserine region (Finn, 2007; Tufail and Takeda, 2008). Multiple phosphorylation of proteins can regulate many protein activities including cleavage (Cohen, 2000), but the role of Vg phosphorylation is unclear.
Our aim was to structurally study the AmVg polyserine tract and its relation to proteolysis and phosphorylation. Initially, we narrowed down the AmVg cleavage site, and used limited proteolysis to see whether (a) the natural AmVg cleavage pattern can be triggered with trypsin/chymotrypsin and (b) the digestion pattern matches the expectations based on lamprey lipovitellin structure (Anderson et al., 1998; Raag et al., 1988). The latter expectation was met, but no indication of digestion in the vicinity of the polyserine tract was observed, except with a dephosphorylated AmVg sample. Therefore, we speculate that phosphorylation might shield the polyserine tract against unspecific cleavage. We noted that the polyserine tract contains a casein kinase II (CKII) recognition motif and experimentally verified that the polyserine tract can undergo multiple phosphorylation by the respective kinase. In order to explore the predicted propensity for disorder of the AmVg polyserine tract, we studied it using NMR and far-UV circular dichroism (CD). Because of the high level of sequence variation at the polyserine linker in Apocrita species, we performed representative comparisons; the polyserine region of the wasps N. vitripennis and Pimpla nipponica, as well as the bumblebee Bombus ignitus were investigated using CD. That of N. vitripennis was also studied in detail using NMR. The results unveil the AmVg polyserine tract as an extended, flexible coil, and the B. ignitus and P. nipponica peptides share this random coil structure. In contrast, the peptide of N. vitripennis was found to be different, having a more collapsed conformational ensemble with a helical tendency. We found that the number of serine residues was not the main determinant of the lack of structure; rather, the position of DN-repeats at the C-terminal side of the polyserine tract might play a role in stabilizing helical conformations.
MATERIALS AND METHODS
Sequence alignments and prediction of disorder
Vg sequences were obtained from UniProtKB (Jain et al., 2009) and NCBI Protein Database in December 2011, aligned using ClustalW2 (Thompson et al., 2002) and cropped at ultra-conserved sites in the N- and the C-terminus surrounding the polyserine tract (honeybee residues 321–419 for the bee-specific alignment, 267–466 for the cross-taxon alignment and 321–407 for the Hymenoptera alignment). The last alignment was manually adjusted to minimize gaps. A prediction of A. mellifera polyserine tract disorder (residues 321–420) and other selected Vg tracts was obtained using Disopred (Ward et al., 2004). An illustration highlighting the region where the polyserine tract is located in insects was produced with Pymol (DeLano, 2002) using the lamprey lipovitellin structure (Anderson et al., 1998; Raag et al., 1988).
A limited proteolysis analysis was performed in order to investigate whether the natural Vg cleavage, occurring in the vicinity of the serine tract, could be triggered. Vg was purified from honeybee abdomens by GenScript (New York, NY, USA) (see Havukainen et al., 2011) with size-exclusion and ion-exchange chromatography steps, and the purification was completed with a Superdex 75 column (Sigma-Aldrich, St Louis, MO, USA). Pure Vg was partially digested by incubating 6.4 μg of samples with 0.25, 2.5 and 25 ng of trypsin or chymotrypsin (Sigma-Aldrich) in a total volume of 10.5 μl for 30 min on ice. For digestion of dephosphorylated Vg, samples were incubated for 1 h at 37°C with 1 U (0.5 μl) calf intestinal phosphatase (New England BioLabs, Inc., Beverly, MA, USA) prior to digestion. Untreated Vg and Vg incubated with the phosphatase served as controls. The samples were run on a 4–20% SDS-PAGE gel (Bio-Rad, CA, USA) under reducing conditions.
LC-MS/MS of the most protease-resistant part of Vg
The most protease-resistant bands of the limited proteolysis assay were excised for LC-MS/MS (liquid chromatography tandem mass spectrometry) identification, performed essentially as described previously (Havukainen et al., 2011). The gel pieces were washed, treated with 10 mmol l–1 DTT (Amersham Biosciences, Piscataway, NJ, USA) for cysteine reduction and alkylated using iodoacetamide (Sigma-Aldrich), followed by full trypsin digestion. After digestion, the dried samples were dissolved in 11 μl 0.1% formic acid and 6 μl was used for injection. The samples were analyzed on a 4000 QTrap (Applied Biosystems/MDS SCIEX, Concord, ON, Canada).
N-terminal sequencing and MALDI-TOF peptide mass fingerprinting
Vg fragments of 150 and 40 kDa were separated on SDS-PAGE gels (Bio-Rad) and electroblotted onto a polyvinylidene difluoride membrane followed by Coomassie Brilliant Blue staining. N-terminal protein sequencing by Edman degradation was performed using a Procise 494A HT Sequencer (Perkin Elmer, Applied Biosystems Division, Foster City, CA, USA). For peptide mass fingerprint analysis, the 40 and 150 kDa Vg fragments were separated on a gel, excised and in-gel digested as described in the preceding paragraph. Peptides generated by enzymatic cleavage were analyzed by matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) mass spectrometry using an Ultraflex TOF/TOF instrument (Bruker-Daltonik GmbH, Bremen, Germany). ExPASy PeptideCutter (Wilkins et al., 1999) was used for in silico digestion of AmVg in order to find putative proteases that cut at the cleavage site, between residues 336 and 427.
Vitellogenin in vitro phosphorylation and LC-MS/MS
The Vg-derived peptide TDISSSSSSISSSEENDFWQPK (synthesized by the Biochemistry Department at Arizona State University) was incubated essentially following the recommendations of the CKII supplier (New England BioLabs). A 20 μl reaction contained 7 μl of a 0.4 mmol l–1 peptide solution, 4.5 μl ddH2O, 5 μl of a 0.1 mmol l–1 ATP solution, 2 μl of 10× kinase reaction buffer and 1 μl of a 500 U μl–1 kinase solution. The reaction was allowed to proceed for 1 h at 30°C and stopped by the addition of 50 μl ice-cold ethanol, extensive mixing and incubation on ice for 20 min. Subsequently, the kinase was removed by centrifugation (13,400 g at 4°C for 3 min) and the supernatant was concentrated in a speed vacuum. For LC-MS/MS measurements, the peptide sample was dissolved in 0.1% formic acid and injected into a LTQ linear ion-trap mass spectrometer (Thermo Scientific, Waltham, MA, USA). Spectra were recorded for the doubly charged peptide with zero to five phosphorylated residues (CID: 35%).
Peptide design and synthesis
Two peptides of the polyserine tract of AmVg and NvVg were synthesized by CPC Scientific (Sunnyvale, CA, USA) for NMR analysis. The synthesis was performed after isotope-labeling expression attempts in Escherichia coli were deemed unsuitable for the project because of the very low final yield. Synthesis of serine repeats is hampered by the steric conflicts caused by the chemical group used to protect the side-chains from chemical modification during synthesis. In order to make synthesis of serine repeats feasible and aid assignment, one serine residue was mutated into a residue of similar properties in each peptide. The peptides had the following sequences:
AmVg (residues 358–392): EKLKQDILNLRTDISTS(Sp)SS(15I)SSSEENDFWQPKPT
AmVg (residues 336–385): R(15V)SKT(15A)MNSNQI(15V)SDNS(15L)(15S)STEEK(15L)KQDI(15L)N(15L)RTDI(15S)S(15S)(Sp)S(15A)IS (15S)(15S)EEND.
NvVg (residues 351–385): EHKHSDESTSE(Sp)FES(15I)ADNNDDSYFQRKPKLTEAP
NvVg (residues 335–372): RPNK(15L)N(15L)QRRHDHKS(15G)EHKHSDE(15S)S(15S)E(Sp)FE(15A)I(15A)DNNDD.
Peptides of two more species and a chimera made of AmVg and NvVg were synthesized as described above for CD analysis:
Bombus ignitus (residues 361–395): EATKGQNYRSLSSDSSS(Sp)SSFSNSEEDHYWQPKPT
Pimpla nipponica (residues 356–397): YNLYRRNRINDNDDDSSASDSSK(Sp)AKSLESNEEQLYWQPKPT
Chimera peptide (the NvVg part is underlined): EKLKQDILNLRTDISSSE(Sp)FESIAEENDFWQPKPT.
For all sequences, mutated residues are bold and underlined, phosphorylation is denoted by (Sp), and 15N labeling is denoted by (15X), X being the labeled residue.
The NMR peptides were solubilized in 250 μl of 50 mmol l–1 phosphate buffer pH 6.7 containing 0.02% NaN3, resulting in the following concentrations after centrifugation at 20,800 g for 15 min: AmVg(358–392) 7.9 mg ml–1, AmVg(336–385) 1.8 mg ml–1, NvVg(351–385) 20 mg ml–1 and NvVg(335–372) 2.4 mg ml–1. Tetramethylsilane was used as a reference compound. The secondary structure prediction program used was PSIPRED (Jones, 1999).
The samples were diluted to 0.5 mg ml–1 with 50 mmol l–1 phosphate buffer pH 6.7. Far-UV spectra (190–260 nm) were recorded on J-810 spectropolarimeter (JASCO, Tokyo, Japan).
An AVII 600 MHz spectrometer (Bruker, Billerica, MA, USA) equipped with a cryoprobe was used for accumulating the following two-dimensional spectra: HSQC (heteronuclear single quantum coherence), HSQC-NOESY (HSQC-Nuclear Overhauser Effect spectroscopy), TOCSY (total correlation spectroscopy) and NOESY. Water suppression was achieved using pulse-sculpting and gradient spin-echo in combination with various modifications of the Watergate sequence. Typical processing parameters were: number of points 4 k in F2 and 2 k in F1; 4 k in SI1 and SI2. Squared sine (QSINE) was used as the window function. Decoupled and un-decoupled spectra were recorded in order to identify the labeled residues. The spectra were assigned using Sparky (T. D. Goddard and D. G. Kneller, SPARKY 3, University of California) and 200 three-dimensional structures of AmVg(358–392) and NvVg(351–385) were calculated using Aria1.2 and CNS (Linge et al., 2002), of which NvVg(351–385) was calculated only based on the fully assigned peaks.
Accession numbers not mentioned elsewhere
Swiss-Prot or NCBI: A. mellifera Q868N5, B. ignitus B9VUV6, S. invicta Vg-1 Q7Z1M0, S. invicta Vg-2 Q2VQM6, S. invicta Vg-3 Q2VQM5, H. saltator E2BDX3, C. floridanus E2ANT2, Pteromalus puparum B2BD67, Encarsia formosa Q698K6, Athalie rosae BAA22791, N. vitripennis XP_001607388.
Sequence alignments and predicted disorder
The sequential serine residues of Vg appear strictly conserved within A. mellifera populations and in Apis cerana, based on our alignment of 22 AmVg sequences and two A. cerana sequences found at UniProtKB (supplementary material Fig. S1). A cross-taxon alignment (supplementary material Fig. S2) suggests that this extended linker region is mostly present in insects or perhaps in Protostome animals (includes insects and mollusks), as a Lophotrochozoan oyster (Crassostrea gigas) has sequential serine residues in this region. Vg of the early diverged animal lineages (Placozoan Trichoplax adhaerens and Cnidarian coral Galaxea fascicularis) and Chordata (jawless fish I. unicuspis; fish white sturgeon Acipenser transmontanus and rainbow trout Oncorhynchus mykiss; frog Xenopus laevis) lack any polyserine tract at the N-terminal location in Vg. In the alignment of 13 Hymenopteran species, in general, the bees’ Vgs (A. mellifera, B. ignitus and Bombus hypocrita) have more serine residues in the polyserine region than the solitary wasp (N. vitripennis, P. puparum, E. formosa and P. nipponica) Vgs do (Fig. 1A). Ants have multiple Vgs that vary in their serine content. All three Vgs of Solenopsis invicta are shown because S. invicta has one Vg with a very high serine proportion (Vg-2), but only one example of Vgs of Harpegnathos saltator and Camponotus floridanus is presented, because these two ants are not particularly rich in Vg serine content. The extremely serine-rich tract of A. rosae, a member of a distinct suborder (Symphyta), is shown separated from those of the other species. Hymenoptera Vgs are predicted to have a long stretch of disorder at the linker site independent of the serine content (Fig. 1B). Fig. 1C shows the polyserine linker location in between the N-terminal β-barrel-like domain and the main lipid-binding cavity, based on the crystal structure of lamprey lipovitellin (Anderson et al., 1998; Raag et al., 1988). The red linker in this case does not contain a polyserine tract, as this is absent in vertebrate Vgs such as lamprey lipovitellin. We conclude that the prevalence of the serine residues at this disordered domain-connector site and the sequence, in general, varies greatly in the Hymenoptera. We endeavored to include all the currently available Apocrita sequences in our alignment, but stress that these sequences are greatly outnumbered by the total Apocrita species.
Limited proteolysis and identification of fragments
The naturally occurring, tissue-dependent cleavage pattern of AmVg to 40 and 150 kDa fragments suggests that this is a specific occurrence, rather than a result of general degradation (Havukainen et al., 2011). The predicted disorder in the vicinity of the cleavage site should, in principle, make this segment prone to proteolysis by a range of proteases. To investigate whether the observed cleavage pattern can be mirrored in vitro, we proceeded to explore the effect of limited proteolysis on AmVg purified from bees as previously described (Havukainen et al., 2011). We tested proteolysis of AmVg by trypsin and chymotrypsin (Fig. 2A), both of which can cut at sites in the polyserine region (see the triangles in Fig. 2C). The polyserine tract, however, proved resistant to trypsin and chymotrypsin, as the enzymes preferentially cleaved at other locations: a C-terminal fragment is detached at low protease concentrations from both full-length and 150 kDa fragments, corresponding to 100 C-terminal residues and several other bands of unspecific fragmentation emerged (Fig. 2A). Thus, neither of these proteases assists fragmentation to the 40 and 150 kDa units observed in honeybee fatbody (Havukainen et al., 2011). LC-MS/MS experiments on the most trypsin- and chymotrypsin-resistant Vg bands from Fig. 2A identified them as the lipid-binding cavity of Vg, with >490 residue loss at the N-terminus and <130 residue loss at the C-terminus (Fig. 2C). In order to determine the exact cleavage sites in the AmVg polypeptide, both the 150 and 40 kDa Vg fragments were subjected to N-terminal sequencing. Edman degradation of the 40 kDa fragment for 10 cycles gave the sequence DFQHNWQVGN, which shows that this fragment is at the N-terminal part of the polypeptide (the 16 amino acid signal sequence has been removed as predicted). The 150 kDa fragment gave no signals (two repeated experiments) during 10 cycles of Edman degradation, indicating that its N-terminus may be blocked. In order to more accurately focus on the location of the cleavage site between the 40 and 150 kDa fragments, both fragments were subjected to MALDI-TOF peptide mass fingerprint analysis. The peptides from the 40 kDa Vg could be localized between AmVg residue 17 (D) (signal sequence cleavage site) and residue 335 (R). The peptide masses from the 150 kDa fragment could be localized from amino acid 427 (E) to amino acid 1732 (K). According to this experiment, the cleavage site between the 40 and 150 kDa fragments is localized in the polyserine tract between amino acids 335 and 427 (Fig. 2C; supplementary material Fig. S3). In an ExPASy PeptideCutter (Wilkins et al., 1999) search, caspase 1 was found as the only relevant, i.e. non-bacterial and non-gut, enzyme having a cleavage site at AmVg 336–427, at residue 370 and additionally at residue 1604.
Dephosphorylation of full-length AmVg and in vitro phosphorylation of polyserine region
The high resistance of the polyserine region to proteolysis could be explained by the target site being structured, interference from compact neighboring domains or post-translational modification. Both the N-terminal (40 kDa) and the C-terminal (150 kDa) fragment of AmVg are known to be phosphorylated (Havukainen et al., 2011). However, the exact location of the modifications is uncertain. We detected a CKII recognition motif inside the polyserine region. A short synthetic peptide from this region was readily phosphorylated on multiple serine residues by CKII in vitro (Fig. 3). We proceeded to test the role of phosphorylation in the protease resistance of the polyserine tract. AmVg was dephosphorylated and digested using trypsin/chymotrypsin as above. The cleavage pattern shows characteristics of the natural 40+150 kDa pattern (Fig. 2B): both the 150 kDa and the 40 kDa bands are stronger at higher protease concentrations for the dephosphorylated protein (Fig. 2A,B). The change in the cleavage pattern supports the hypothesis that phosphorylation is involved in conveying resistance to proteolysis.
NMR structural propensity of monophosphorylated polyserine region
To determine whether this region has a structure or structural propensity that can further help explain its behavior, with respect to both protease accessibility and the linker’s role in AmVg, we performed an NMR study (Figs 4, 5, 6, Table 1; supplementary material Figs S4 and S5 and Table S1). In order to limit the complexity of the synthesis and to provide a measure of solubility to the polypeptide, we opted for a monophosphorylated version of the tract (see also Materials and methods, ‘Peptide design and synthesis’). In this study we attempted to alleviate to some extent the problems related to exploring a polypeptide segment detached from the protein as a whole by (i) determining the structure for an AmVg peptide that is tractable with respect to solubility, synthesis and spectral overlap, and (ii) using the corresponding NvVg segments as a reference when interpreting structural data.
AmVg(358–392) and NvVg(351–385) produced TOCSY spectra of similar, good quality (Fig. 4A,C). Both peptides were assignable, despite some ambiguity in the polyserine interval in AmVg(358–392). This ambiguity was somewhat alleviated as a result of peptide design and phosphorylation (see Materials and methods, ‘Peptide design and synthesis’). The UV spectra of the peptides show considerable differences, suggesting a random coil for AmVg(358–392) and a more structured nature for NvVg(351–385) (Fig. 4B,D). Chemical shift indexing (CSI) (Wishart et al., 1992) indicated that the polypeptide backbone propensities for secondary structure were different for the two peptides (Fig. 5B,D). While NvVg(351–385) was dominated by CSI values indicative of α-helical conformations, AmVg(358–392) was almost devoid of any propensity with respect to secondary structure, despite in silico prediction of an α-helix (Fig. 5A). The lack of Nuclear Overhauser Effects (NOEs) in the spectra of AmVg(358–392) (Fig. 5A, Table 1) could not be ascribed to incomplete assignment or sample issues because in the similar quality N. vitripennis counterpart, NOEs indicative of structure were readily found.
Solving the structure of the two peptides resulted in an ensemble of outstretched coil structures in the case of AmVg(358–392) (Fig. 6A), whereas that for NvVg(351–385) contains a helix between residues 371 and 377 (Fig. 5A,D, Fig. 6C, Table 1). Fig. 6B,D shows the surface electrostatics for the serine regions, dominated by negative charge even in their monophosphorylated state. These structures are stored in the Protein Data Bank and Biological Magnetic Resonance Bank with the following ID numbers: AmVg(358–392), 2lic and 17888, respectively; NvVg(351–385), 2lid and 17889, respectively.
We then proceeded to solve the longer peptides, which partially overlap with the peptide AmVg(358–392) and NvVg(351–385). We did this to extend the results along the Vg polypeptide chain, and to assess the effect that different peptide endpoints have on the overall structure of the shorter peptides. The AmVg(226–385) peptide suffered from a lower solubility and purity than its N. vitripennis counterpart; still, much of the backbone was assigned, and CSI values of both peptides are presented in Fig. 5C,E. AmVg(336–385) retains its random values and lack of NOEs indicative of any fixed structure, a result also supported by the CD spectrum (supplementary material Fig. S4A,B); the data were not processed into a structure. NvVg(335–372), with its C-terminus located where the helix occurs in the shorter peptide, shows marked changes, with an attenuation of its NOE pattern (supplementary material Fig. S5 and Table S1). Moreover, the changes are visible in the spectrum as a collapse of the signals (supplementary material Fig. S4C), relative to Fig. 4C. The corresponding CD spectra also indicate a random coil (supplementary material Fig. S4D).
In order to test the relationship between the polyserine tract sequence and its structure, three more peptides were analyzed using CD. The CD spectra of polyserine peptides of B. ignitus, close to honeybee in sequence, and the wasp P. nipponica, having a smaller number of serine residues similar to N. vitripennis, both indicated a random coil structure (supplementary material Fig. S6). AmVg peptide was modified to contain a lower amount of serine residues by replacing its sequential serine residues with those of NvVg; this chimera peptide was also found to be a random coil (supplementary material Fig. S6), suggesting that there is no simple relationship between the number of serine residues and the structural propensities.
In this study we concentrated on a hypervariable domain connector, the polyserine tract, of AmVg. Our earlier work (Havukainen et al., 2011) found that AmVg was cleaved in a tissue-specific manner in the vicinity of this tract, and we proposed that this might have relevance for the effects AmVg has on bee behavior and life-span plasticity. Recently, it was shown that AmVg is very enriched in replacement polymorphism (Kent et al., 2011). We notice that this polymorphism does not apply to the polyserine tract, which as a connector structure could be an ideal site for replacements (Panchenko et al., 2005). Sequence comparison across taxa suggests that the domain connector may be elongated in insects or possibly more broadly in Protostome animals, but serine residues are not present in all of these elongated connector sequences in insects, for example in the louse Pediculus humanus. Thus, this vitellogenin region has been extensively modified in the course of evolution; it is not present in most species excluding insects and it is highly variable within insects.
Although hypervariable regions may be non-functional in many cases, the conservation of the tract in Apis, and the differences seen between bees (Apoidea) and wasps (Chalcidoidea and Ichneumonoidea) and elsewhere in the Hymenoptera order are intriguing. For instance, are wasps such as N. vitripennis and P. nipponica exposed to evolutionary pressure to keep the serine content low? And why does a more distantly related hymenopteran sawfly, A. rosae, have many more serine residues, similar to mosquito (Diptera: Aedes aegypti) (Romans et al., 1995)? As honeybees have harnessed the ancient yolk-precursor Vg for the organization of a social hierarchy (e.g. Denison and Raymond-Delpech, 2008), other species might have evolved novel functions for their Vg(s) as well. The serine tracts of the sawfly and silkworm (Bombyx mori) have been speculated to be connected to silk formation as silk fibroin is rich in serine (Chen et al., 1997). Of the few functionally and structurally well-characterized polyserine proteins, spider silk protein may exemplify how transcriptional slippage has resulted in a tremendously successful and versatile adaption (Ayoub et al., 2007; Hu et al., 2005).
The polyserine linkers are an example of protein regions that are predicted to be essentially disordered in silico (Ward et al., 2004). Proteases like trypsin and chymotrypsin prefer easy-access, i.e. flexible, disordered regions as their primary targets (Fontana et al., 2004; Stroh et al., 2005). Thus, we initially expected to see instant cleavage at the polyserine linker by trypsin/chymotrypsin. When this was not observed, we speculated that lack of the expected effect was due to phosphorylation at the polyserine linker (Benore-Parsons et al., 1989). Indeed, instead of strong unspecific cleavage, the digestion of dephosphorylated Vg produces a fragmentation pattern of 40+150 kDa. We further confirmed by mass spectrometry that an AmVg polyserine peptide can be phosphorylated multiple times in vitro by CKII. We used CKII because the polyserine tract harbors a CKII recognition motif (SXXE/D) and CKII has previously been shown to phosphorylate Rhodnius prolixus Vg (Silva-Neto et al., 2002). Thus, our results support prior speculations about the polyserine linker harboring phosphorylation sites (Chen et al., 1994; Tufail and Takeda, 2008).
Cleavage in the vicinity of the polyserine region occurs naturally in the bee fatbody tissue and during AmVg purification (Havukainen et al., 2011; Wheeler and Kawooya, 1990). In contrast to the Vg expressed by many other species, AmVg does not have the consensus cleavage site (Barr, 1991; Rouille et al., 1995). We propose that an alternative way of cleavage may have evolved; for instance, auto-cleavage or activity of an as-yet uncharacterized enzyme. So far, we have managed to narrow down the location of the cleavage site to residues 335–427. Further effort will be needed to pinpoint the exact site and to test protease candidates. A protease database search against the cleavage site (336–427) resulted in one candidate enzyme: caspase 1. Insect caspases have mostly been studied in Drosophila, and they have a similar Asp-X cleavage preference to the mammalian caspases (Cooper et al., 2009; Hawkins et al., 2000); however, this has not been verified in honeybees. In mammals, the phosphorylation of residues adjacent to the cleavage site is known to block caspase cleavage (Hoon Kim et al., 2003). Intriguingly, the existence of a phosphorylation-mediated regulation mechanism has been suggested for mosquito Vg (Don-Wheeler and Engelmann, 1997). Other examples of phosphorylation-regulated cleavage are presenilin-2 cleavage, which is inhibited by phosphorylation near the cleavage site (Walter et al., 1999), and cohesion cleavage, which is triggered by nearby phosphorylation (Alexandru et al., 2001).
To gain a better understanding of the overall domain organization of AmVg, we inspected the results of limited proteolysis, as this is a useful technique in assessing compactness and disorder (Sharp et al., 2006; Stroh et al., 2005). A highly protease-resistant region that contains a compact lipid-binding cavity at the corresponding region in the lamprey structure was identified. Moreover, the observation that there is an easily cleavable C-terminal piece consisting of ∼100 residues is consistent with the lamprey X-ray structure, which lacks electron density corresponding to this region (Anderson et al., 1998; Raag et al., 1988). Thus, the general domains and disordered regions of AmVg as probed by proteolysis seem to correspond to the known Vg structure of lamprey.
A vertebrate Vg structure cannot be used for inspection of the insect-specific polyserine tract. Our comparative study by NMR and CD spectroscopy of the two segments from homologous proteins, AmVg and NvVg, and of peptides of a bumblebee and another wasp (P. nipponica), all predicted to be disordered, present an opportunity to comment on differences in the degree of disorder. Secondary structure prediction suggests some helical tendency at the polyserine tract for all species except the bumblebee. However, only in the case of the N. vitripennis linker was this prediction supported by experimental evidence. The extended conformations of the A. mellifera linker and the more collapsed ensemble of structures found for N. vitripennis suggest that the disorder can take on somewhat different forms. Indeed, computational approaches on repetitive stretches of peptides associated with disorder show that they differ significantly in their tendency to form coils with respect to kinetics and surrounding water molecules (Doruker and Bahar, 1997).
The observed differences in conformational propensity between NvVg and the other peptides likely arise from sequence differences. The number of serine residues per se is not the main determinant, as a ‘chimera’ combination of AmVg and NvVg peptides, containing the same number and sequence of serine residues as found in NvVg, turned out to be a random coil when assessed by CD. The NvVg helix begins in the middle of an Asp–Asn-rich sequence (DNNDD), subsequent to the main serine tract. In P. nipponica, there is a similar sequence (NDNDDD) prior to the serine residues. The locations of these residues within the linker might be important for the structure. For instance, when they appear before the polyserine tract, the sequence remains disordered (as in P. nipponica); when they appear after the polyserine, the sequence has helical elements (N. vitripennis).
The polyserine region has undergone extensive changes since the divergence of Chalcidoidea (N. vitripennis) wasps, Ichneumonoidea (P. nipponica) wasps and bees. We conclude this study by suggesting that the serine-rich domain linker of insect Vgs is a site of high plasticity, in terms of both amino acid sequence and polypeptide conformation. We suggest that residues can be added by transcriptional slippage and kept without much adverse effect at such regions. These plastic regions may subsequently evolve new functionality. Thus, we propose that although AmVg has retained the overall structure as determined for lamprey Vg, it has developed another mechanism for cleavage in its linker region from what is currently known in non-apocritan species. It is likely that multiple phosphorylation by CKII is part of the regulation of this cleavage.
We thank Professor Aurora Martinez for outstanding support; this project took place in her Biorecognition group, at the University of Bergen. LC-MS/MS experiments were performed by the PROBE Proteomic Unit at the University of Bergen and at Arizona State University. We thank Dr Hilde Garberg for expert technical support and Dr Frode Berven for expert advice. The N-terminal sequencing and peptide mass fingerprinting were performed by the Protein Chemistry Research Group and Core Facility, Institute of Biotechnology, University of Helsinki, and we want to thank Nisse Kalkkinen for expert advice. We thank Atle Aaberg, Nils Åge Føystein and Annette Brenner for practical assistance at the NMR facility.
Ø.H. was supported by the Research Council of Norway [grant no. 185306] and Norwegian Cancer Society [grant no. 58240001]. G.V.A. was supported by the Research Council of Norway [grant nos 180504, 185306 and 191699], the National Institute on Aging [NIA P01 AG22500] and the PEW Charitable Trust. The PROBE work was partly supported by the National Program for Research in Functional Genomics (FUGE) funded by the Research Council of Norway. Deposited in PMC for release after 12 months.