ABSTRACT
The Krüppel-like transcription factor (KLF) BCL11B is characterized by a wide tissue distribution and crucial functions in key developmental and cellular processes, as well as in various pathologies including cancer and HIV infection. Although the basics of BCL11B activity and relevant interactions with other proteins have been uncovered, how this exclusively nuclear protein localizes to its compartment remained unclear. Here, we demonstrate that unlike other KLFs, BCL11B does not require the C-terminal DNA-binding domain to pass through the nuclear envelope but has an independent, previously unidentified, nuclear localization signal (NLS), which is located distantly from the zinc finger domains and fulfills the essential criteria of being an autonomous NLS. First, it can redirect a heterologous cytoplasmic protein to the nucleus. Second, its mutation causes aberrant localization of the protein of origin. Finally, we provide experimental and in silico evidences of the direct interaction with importin-α. The relative conservation of this motif allows formulating a consensus sequence (K/R)K-X13–14-KR+K++ (‘+’ indicates amino acids with similar chemical properties), which can be found in all BCL11B orthologs among vertebrates and in the closely related protein BCL11A.
INTRODUCTION
Since its discovery, the BCL11B gene has demonstrated its relevance for embryonic development of a variety of organs and tissues. Multiple mouse knockout models in which the gene was depleted systemically or in a tissue- or developmental stage-dependent manner have an improper structure, and function of the central nervous system, skin, mammary glands and numerous populations of the lymphoid compartment is impaired (Cai et al., 2017; Golonzhka et al., 2009a,b; Arlotta et al., 2005; Abboud et al., 2016; Hirose et al., 2015; Kyrylkova et al., 2012; Holmes et al., 2021). The relevance of the gene in humans was confirmed by identification of de novo germline mutations in the BCL11B locus (Lessel et al., 2018; Punwani et al., 2016; Qiao et al., 2019). These alterations, although only heterozygous, led to multi-organ abnormalities, including severe immunodeficiency arising from arrested T-cell development and accompanying mental retardation, craniofacial dysmorphism and defects in the development of skin, bones, teeth and neuronal tissue.
The vital role of BCL11B in normal function of various organs and tissues has been extended beyond the embryonic stage. Surprisingly, both acquired loss of function mutations and abnormally high expression of BCL11B have been identified even within one disease entity. Approximately 10–16% of T cell acute lymphoblastic leukemia (T-ALL) cases carry mutations altering the DNA-binding properties of BCL11B (Gutierrez et al., 2011). Along with similar findings in mouse γ-radiation-induced T cell lymphomas (Wakabayashi et al., 2003), recurrent genetic lesions de-activating BCL11B function strongly indicate a tumor suppressor activity for the gene. However, the majority of T-ALL and other malignancies originating from BCL11B-positive tissues, like head and neck squamous cell carcinomas (HNSCC), Ewing sarcomas or neuroblastomas (NBs) are characterized by elevated expression of the non-altered gene (Wiles et al., 2013; Orth et al., 2020; Ganguli-Indra et al., 2009; unpublished observation of P.G. and C.A.S.). Moreover, we and others showed that high BCL11B levels are critical for prevention of apoptosis and accumulation of DNA damage in malignant T cells (Grabarczyk et al., 2007, 2010; Kamimura et al., 2007; Karanam et al., 2010). Analogous observations have been made in HNSCC and NB (our unpublished data).
In terms of structure and mechanism of action, the BCL11B gene encodes a zinc finger transcription factor that may, depending on the context, positively or negatively regulate the expression of genes remaining under its direct transcriptional control. In addition to inherent repressor activity (Avram et al., 2002), BCL11B mediates its function by interacting with various proteins and complexes that together are known to mediate formation of a transcriptionally inactive environment. The catalog of the BCL11B-interacting chromatin-inactivating factors grows continuously and includes both Zn2+- as well as NAD+-dependent histone deacetylases (HDACs and sirtuins), HP1α, methyltransferases (SUV49H1) and retinoblastoma-associating proteins (RBBP4 and RBBP7), among others (Cherrier et al., 2013, 2009; Cismasiu et al., 2005; Marban et al., 2007; Senawong et al., 2003; Topark-Ngarm et al., 2006; Kadoch et al., 2013). However, it has been shown that in T lineage-derived cells with activated signaling pathways, like mitogen-activated protein kinases (MAPK) or phosphoinositide-3 kinases (PI3K) pathways, BCL11B temporarily is converted into a transcription activator. The underlying mechanism involved a complex series of posttranslational modifications resulting in acquired affinity to p300 histone acetyltransferase and the downstream transcription machinery (Cismasiu et al., 2006; Dubuissez et al., 2016; Zhang et al., 2012). In our previous report, we described a novel atypical zinc finger domain (CCHC ZF) which served as a dimerization interface for BCL11B. This domain and formation of the dimer were shown to be critically important for efficient repression of target genes and BCL11B-dependent cellular processes (Grabarczyk et al., 2018). Along with the published data mapping the interactions of BCL11B with chromatin modifiers, it extended the understanding of the mechanisms by which BCL11B executes its tasks on the molecular level. However, how the BCL11B protein is guided to its physiological cellular compartment remained unexplained.
The most comprehensively characterized system for the transport of macromolecules into the nucleus is the classical nuclear import pathway (Lange et al., 2007). It facilitates transport of proteins exceeding the size of 60 kDa through the nuclear pore complexes, provided they encode a nuclear localization signal (NLS) (Paine et al., 1975). The transfer is mediated by soluble carrier receptors called importin-α (KPNA2) and importin-β (KPNB1), while the energy for that process is provided by Ran GTPase (Quimby and Dasso, 2003). In its typical variant, the cargo is recognized and bound in the cytoplasm by importin-α serving as an adaptor that assembles with importin-β to form a trimeric complex (Görlich et al., 1995; Lee et al., 2005). The direct interaction of importin-β with the nuclear pore initiates the nuclear translocation followed by dissociation of the cargo-receptor complex and recycling of the receptors to the cytoplasm (Kobe, 1999; Harreman et al., 2003; Matsuura and Stewart, 2004).
The consensus sequences of the common types of classical NLSs (cNLSs) have been defined and grouped into two categories: (1) the monopartite NLS typically consisting of a single cluster of basic residues; (2) the bipartite NLS (biNLS) built from two basic clusters separated by a 10–12-amino-acid linker (Dingwall and Laskey, 1991; Robbins et al., 1991). The first category can be further divided into two classes: class I characterized by at least four consecutive basic amino acids (Kalderon et al., 1984) and class II which requires just three positively charged residues and can be generally described as K(K/R)X(K/R) (Dang and Lee, 1988). Despite the identification of thousands of NLSs that match the consensus rule, there are many functional signals that do not match the canonical pattern, and a mass of sequences that fulfill the consensus requirements but remain non-functional in terms of nuclear transport. Adding to the complexity, some ‘classical’ NLSs can be formed only in certain contexts from distant basic amino acids, as a result of protein folding (conformational NLSs) (Hatayama et al., 2008), upon dimerization or in a post-translational modification-dependent manner (Reich, 2013). Finally, when one realizes the putative existence of multiple NLSs within one protein (Masaki et al., 2020) or alternative nuclear import pathways (Wagstaff and Jans, 2009), it becomes obvious that any predicted or hypothesized new or even previously described NLS must be verified experimentally to prove its functional importance for a given protein.
In this report, we established that, in contrast to other Krüppel-like transcription factors (KLFs), the C-terminal zinc finger cluster of BCL11B possesses only weak NLS activity that is dispensable and insufficient for nuclear localization of the full-length protein. Instead, we identified a short region matching the cNLS consensus NLS and demonstrated that: (1) it was capable of transferring an otherwise cytoplasmic heterologous reporter (3xEGFP); (2) its mutation resulted in aberrant localization of BCL11B in the cytoplasm; and (3) it was recognized by and interacted physically with the classical nuclear import pathway adaptor importin-α (KPNA2). As our in silico protein interaction modeling suggested, the upstream basic cluster (RK) within this region could be tethered by the minor binding side of importin-α, suggesting the bipartite nature of this newly identified NLS. However, its replacement with hydrophobic amino acids did not cause any noticeable redistribution of BCL11B within the cells indicating its limited significance. Collectively, one can conclude that the localization of BCL11B is determined by the existence of a class II monopartite NLS that is positioned in the context of the classical bipartite NLS consensus.
RESULTS
The C-terminal Krüppel-like CCHH zinc fingers are not required for nuclear localization of BCL11B
Our previous study demonstrated the functional importance of the N-terminal part of the BCL11B protein and identified two relevant motifs: (1) a Friend of Gata-like (FOG) repressor domain [amino acids (aa) 1–45] and, (2) an atypical CCHC zinc finger (aa 46–95) facilitating formation of BCL11B dimers. Cellular localization analysis with enhanced green fluorescence protein (EGFP)-tagged N-terminal fragments of the protein (aa 1–141) revealed that, upon dimer formation, this EGFP-labeled fragment localized exclusively in the cytoplasm of transfected HEK293. This observation excluded the existence of a NLS within the N-terminus (Grabarczyk et al., 2018) but left the question on the nuclear import of BCL11B without reply.
To investigate which of the remaining regions of the protein serves as an NLS, we generated a series of fusion genes consisting of EGFP and fragments of BCL11B coding sequences covering the regions located downstream of CCHC domain. To eliminate spontaneous nuclear accumulation of the reporter, we used three tandemly repeated coding sequences of EGFP (3×EGFP) and proved its signal remained in the cytoplasm of the transfected cells (Fig. 1A). The full-length BCL11B coding sequence (transcription variant 2, NM_022898.3), which served as a positive control of nuclear localization transferred the otherwise exclusively cytoplasmic reporter into the nucleus. Since nuclear import of multiple transcription factors, including KLFs, is facilitated by or requires engagement of their CCHH zinc fingers (Ito et al., 2009; Pandya and Townes, 2002; Quadrini and Bieker, 2002; Rodríguez et al., 2010; Shields and Yang, 1997; Shin et al., 2015), we initially focused on these structural motifs. As shown in Fig. 1A, the region BCL11B-146–434 encoding zinc fingers 1–3 (ZF1–3) was evidently not capable of mediating nuclear import functions.
The NLS is located between two zinc finger clusters of BCL11B protein. (A) Schematic representation of the constructs encoding triple EGFP and various BCL11B fragments (upper panel) and their cellular localization after transfection of HEK293 cells (lower panels). (B) Identification of the functional nuclear localization signal of BCL11B. Putative NLS-inclusive region of BCL11B and sequences of the in silico predicted NLS and schematic representation of the NLS constructs (upper panel). Cellular distribution of the 3×EGFP-labeled BCL11B-derived potential NLSs (lower panels). Vertical rectangles on schematic are zinc finger domains (CCHC in light gray, CCHH in dark gray). EGFP, enhanced green fluorescence protein; LMNB1, lamin B1; DNA, DAPI staining. The figure shows images from one of five independently conducted experiments. Scale bars: 10 μm.
The NLS is located between two zinc finger clusters of BCL11B protein. (A) Schematic representation of the constructs encoding triple EGFP and various BCL11B fragments (upper panel) and their cellular localization after transfection of HEK293 cells (lower panels). (B) Identification of the functional nuclear localization signal of BCL11B. Putative NLS-inclusive region of BCL11B and sequences of the in silico predicted NLS and schematic representation of the NLS constructs (upper panel). Cellular distribution of the 3×EGFP-labeled BCL11B-derived potential NLSs (lower panels). Vertical rectangles on schematic are zinc finger domains (CCHC in light gray, CCHH in dark gray). EGFP, enhanced green fluorescence protein; LMNB1, lamin B1; DNA, DAPI staining. The figure shows images from one of five independently conducted experiments. Scale bars: 10 μm.
The C-terminal Krüppel-like (CCHH) ZFs (3×EGFP–BCL11B-718–823) delivered less definite results (Fig. 1A). Depending on the transfection efficiency and fusion protein levels, three different phenotypes could be observed: (1) cells that were weakly positive for EGFP, and accumulated the chimeric reporter inside the nucleus; (2) cells where expression of the fusion was high, and the signal filled up the cytosolic compartments while the nuclei remained weakly positive for EGFP; and (3) cells with an average level of transgene, which resulted in a uniform distribution of the fluorescent tag. Collectively, we concluded that the C-terminal Krüppel-like ZFs 4–6 demonstrate weak NLS activity, which was insufficient and likely unessential for the transfer of BCL11B into the nucleus. The latter assumption was confirmed by the observation that the BCL11B variant in which ZFs 4–6 were deleted (3×EGFP–BCL11B-1–716) remained localized inside the nuclei like the wild-type (wt) counterpart (Fig. 1A). Taken together, the presented data allowed limiting the actual NLS search area to the region located to between aa 434 and aa 715.
A hybrid classical bipartite, or Myc-like, NLS is located between ZFs 3 and 4
Having restricted the putative NLS-encoding BCL11B fragment to aa 435–715, we analyzed the sequence with an NLS mapping software that predicts nuclear NLSs specific to the classical importin-α–importin-β pathway (http://nls-mapper.iab.keio.ac.jp/cgi-bin/NLS_Mapper_form.cgi; Kosugi et al., 2008, 2009). Surprisingly, the putative NLS sequences could be identified only at cut-off scores set to 3 (where 1 reflects cytoplasmic, 10 purely nuclear and 5 even distribution of the protein). The analysis derived three candidates (Fig. 1B), which represented different classes of NLS: (1) biNLS-1, located between positions 524–562 that seemed to belong to the non-canonical structure-dependent NLS class; (ii) a more classical bipartite NLS (biNLS-2) defined as (K/R)(K/R)X10–20 motif followed by five amino acids from which at least three are lysine or arginine residues, and (iii) a monopartite class 2 NLS overlapping with biNLS-2.
Transfections of HEK293 cells with plasmid vectors encoding 3×EGFP fused with the identified sequences excluded that the biNLS-1 peptide had nuclear import activity (Fig. 1B). The biNLS-2 in turn appeared functional and guided the fluorescent tag to the nucleus (Fig. 1B). However, the monopartite NLS overlapping with biNLS-2 showed equally high efficiency in transferring the 80 kDa reporter through the nuclear pores (Fig. 1B). Since mutation of the N-terminal basic motif (biNLS-2-RKtoAA) did not diminish the nuclear-localizing properties of the mutated biNLS-2 (Fig. 1B), we conclude that nuclear import of BCL11B is facilitated by a monopartite class 2 NLS that is positioned in the context of a bipartite NLS structure. Further shortening of the monopartite NLS demonstrated that the 5-amino-acid peptide KRIKV remained active as an NLS and deletion of the C-terminal hydrophobic residue significantly, yet incompletely, abolished this activity (Fig. 1B).
Having identified the minimal NLS sequence, we examined the significance of each residue by performing two different mutational scannings (Fig. 2). Conversion of single residues into alanine clearly demonstrated that each of the three basic amino acids was crucial for nuclear localization. In contrast, the two hydrophobic residues, isoleucine and valine could be mutated without any noticeable consequence on reporter localization (Fig. 2A). Similar results were obtained when all residues of the KRIKVE motif were sequentially replaced with the polar amino acid threonine. Also, alteration of any of the positively charged amino acids meant that only minute amounts of 3×EGFP could be detected in the nucleus. Conversely, the fusion proteins in which the hydrophobic residues were converted into the polar amino acid threonine passed the nuclear envelope barrier and accumulated inside the nucleus (Fig. 2B).
The relevance of the basic and hydrophobic amino acids constituting the identified minimum NLS motif. (A) Cellular localization of the 3×EGFP fluorescent reporter fused to alanine-replacement KRIKVE mutants. (B) Cellular distribution of the 3xEGFP-labeled constructs with KRIKVE motifs carrying threonine replacement mutations. EGFP, enhanced green fluorescence protein; LMNB1, lamin B1; DNA, DAPI staining. The figure shows one of five independently conducted experiments. Scale bars: 10 μm.
The relevance of the basic and hydrophobic amino acids constituting the identified minimum NLS motif. (A) Cellular localization of the 3×EGFP fluorescent reporter fused to alanine-replacement KRIKVE mutants. (B) Cellular distribution of the 3xEGFP-labeled constructs with KRIKVE motifs carrying threonine replacement mutations. EGFP, enhanced green fluorescence protein; LMNB1, lamin B1; DNA, DAPI staining. The figure shows one of five independently conducted experiments. Scale bars: 10 μm.
A single KRIKVE sequence is sufficient to transport a BCL11B dimer to the nucleus
To verify whether the identified peptide represents an actual NLS of BCL11B, we next mutated the basic residues within the KRIKVE motif in the context of full-length protein (Fig. 3A). The simultaneous conversion of the two lysine and one arginine residues to either alanine or threonine completely disabled nuclear import of the 3×EGFP fusion proteins. Conversely, replacing the upstream RK motif with two alanine residues did not negatively influence nuclear import of this variant, suggesting that the C-terminal basic region represents an autonomous NLS sequence.
Validation of KRIKVE nuclear localization activity in the context of full-length BCL11B protein and its dimers. (A) Schematic drawing of vectors and localization of fusion proteins carrying targeted mutations within the minor and major basic clusters of the putative biNLS (RK-X13-KRIKVE). (B) The EGFP-tagged and KRIKVE-mutated variant of BCL11B co-transfected with NLS-proficient full-length or N-terminally truncated BCL11B. Vertical rectangles on schematic are zinc finger domains (CCHC in light gray, CCHH in dark gray). EGFP, enhanced green fluorescence protein; mRFP, monomeric red fluorescence protein; LMNB1, lamin B1; DNA, DAPI staining. The figure shows images from one of five independently conducted experiments. Scale bars: 10 μm.
Validation of KRIKVE nuclear localization activity in the context of full-length BCL11B protein and its dimers. (A) Schematic drawing of vectors and localization of fusion proteins carrying targeted mutations within the minor and major basic clusters of the putative biNLS (RK-X13-KRIKVE). (B) The EGFP-tagged and KRIKVE-mutated variant of BCL11B co-transfected with NLS-proficient full-length or N-terminally truncated BCL11B. Vertical rectangles on schematic are zinc finger domains (CCHC in light gray, CCHH in dark gray). EGFP, enhanced green fluorescence protein; mRFP, monomeric red fluorescence protein; LMNB1, lamin B1; DNA, DAPI staining. The figure shows images from one of five independently conducted experiments. Scale bars: 10 μm.
We showed previously that, in addition to the interactions with multiple nuclear proteins, BCL11B underwent homodimerization, and that this was a prerequisite of normal protein function (Grabarczyk et al., 2018). In that study, we observed colocalization of wt BCL11B co-transfected with various N-terminal fragments containing the CCHC zinc finger domain, which remained in the cytoplasm when expressed alone. On the other hand, our data presented here demonstrate that the dimerization-incompetent but NLS-encoding deletion mutant (aa 146–823) of BCL11B localizes in the nucleus.
These results indicate that homodimerization of BCL11B CCHC ZF motifs precedes nuclear migration and that dimer formation is not required for nuclear import of BCL11B. To verify these conclusions, we investigated the location of a heterodimer consisting of the full-length wt and mutated NLS BCL11B variants. As shown in Fig. 3B, 3×EGFP-labeled BCL11B NLS mutants, that were thought to be incapable of passing the nuclear envelope alone colocalized with mRFP-tagged wt-BCL11B in the nucleus (Fig. 3A). To confirm the engagement of the CCHC motif and exclude a putative ‘piggy-backing’ effect mediated by nuclear proteins encoding their own NLS sequences and interacting with, for example, the FOG repressor domain (FRD) of BCL11B, we next performed the co-transfection using an FRD- and CCHC-deleted mRFP BCL11B fusion. This experimental setting resulted in detection of mRFP-tagged truncated BCL11B in the nucleus while the NLS-mutated EGFP-labeled BCL11B remained in the cytoplasm.
Collectively, our observations indicate the following: (1) the upstream RK motive has limited impact on nuclear trafficking of BCL11B; (2) the KRIKVE motif is sufficient for nuclear localization of monomeric and dimerized BCL11B; and (3) unlike what has been shown in the case of transcription factor STAT1 (Fagerlund et al., 2002), no additional KRIKVE-independent and non-classical conformational NLS is being formed upon dimer formation of BCL11B.
KRIKVE motif interacts with importin-α1 adaptor protein
One of the fundamental requirements that must be accomplished by a newly identified NLS sequence is a direct interaction with the nuclear import receptor machinery. Demonstrating this in vivo is, however, technically challenging in non-yeast systems, in part because specific inhibitors of different nuclear import receptors have not yet been developed (Lange et al., 2007). To overcome these obstacles, we applied an optimized tripartite fluorescence complementation cellular assay based on split-GFP approach (Cabantous et al., 2013). The principles of the assay are depicted in Fig. 4A. In short, split-GFP was designed for detection of protein–protein interactions by tagging them with non-fluorescent GFP fragments that spontaneously reassemble to form a functional GFP when brought into spatial proximity. In contrast to the earlier bipartite split-GFP assays, the tripartite system relies on a GFP1-9 detector, characterized by improved folding, and two 20-aa-long micro-tags GFP10 and GFP11 fused to the putatively interacting proteins. The tripartite format minimizes the spontaneous self-assembly-related background fluorescence typical for bipartite GFP-split assays, while the small size of the tags reduces their influence on the investigated proteins (Cabantous et al., 2013) (Fig. 4A). First, we modified the target HEK293 cell line to stably express the GFP1-9 sensor/reporter from the non-viral EF1α constitutive promoter. Next, we fused the auto-inhibition defective variant of KPNA2 (49-KRR-51 to 49-AAA-51, denoted KRRtoAAA; Harreman et al., 2003) with a GFP10 tag using a 30-aa-long flexible linker. Functionally, the KRRtoAAA variant of KPNA2 differs from its wt counterpart by not being able to release its cargo upon entering the nucleus and therefore creating a stable complex with the transported proteins. To confirm that the putative interaction with BCL11B or the KRIKVE motif occurs via the conventional NLS binding interface of importin, further mutations were introduced into the KPNA2 coding sequence to create a non-functional variant (2xmut KPNA2, KRRtoAAA, S149K-N188K-D192K-N228R). The complementing GFP11 tag was fused to either the KRIKVE motif or full-length BCL11B using a 25-aa linker. In both cases, the corresponding NLS-mutated controls were generated, and each construct was created as either N- or C-terminal fusion (Fig. 4A). The corresponding cDNA sequences can be found in Table S1.
Confirmation of the direct KRIKVE-importin interaction in live cells. (A) Schematic presentation of the cellular tripartite fluorescence complementation assay (left upper part) and vectors expressing KPNA2, KRIKVE, and BCL11B fusion variants. sfGFP1, superfolder GFP encoding an improved variant of nine GFP β-sheets; GFP10, 10th GFP β-sheet; GFP11, 11th GFP β-sheet; L25, L30, flexible peptide linkers; vertical rectangles; zinc finger domains (CCHC in light gray, CCHH in dark gray). (B) Interaction of KRIKVE motif and KPNA2. Flow cytometry dot plots representing one of three independent experiments revealing reconstituted GFP signal in sfGFP-LX-293T cells transfected with indicated vector combinations at 6:2 GFP10:GFP11 plasmid ratio (left), and the summarized results obtained from three experiments (right; display of individual and mean values). (C) Interaction of full-length BCL11B and KPNA2. Flow cytometry dot plots representing one of three independent experiments revealing reconstituted GFP signal in sfGFP-LX-293T cells transfected with indicated vector combinations at 5:6 GFP10:GFP11 plasmid ratio (left), and the summarized results obtained from three experiments (right; display of individual and mean values). All groups of transfection conditions were statistically compared using a Kruskal–Wallis test of non-parametric one-way ANOVA, which resulted in significant differences with P=0.0139 and P=0.0150 for the set of BCL11B full-length constructs and the KRIKVE-only constructs, respectively. A multiple comparison follow-up test for meaningful comparisons of mean ranks (positive transfection condition vs appropriate control; B/N1+K1 vs B/N3+K1, B/N2+K1 vs B/N4+K1, B/N1+K2 vs B/N3+K2, B/N2+K2 vs B/N4+K2) with application of multiple testing correction by controlling the false discovery rate (FDR) using the recommended two-stage set-up method of Benjamini, Krieger and Yekutieli led to significant discoveries (B,C) as indicated (** 0.001 < FDR q < 0.01; * 0.01 < FDR q < 0.05). Further control conditions are depicted for completeness.
Confirmation of the direct KRIKVE-importin interaction in live cells. (A) Schematic presentation of the cellular tripartite fluorescence complementation assay (left upper part) and vectors expressing KPNA2, KRIKVE, and BCL11B fusion variants. sfGFP1, superfolder GFP encoding an improved variant of nine GFP β-sheets; GFP10, 10th GFP β-sheet; GFP11, 11th GFP β-sheet; L25, L30, flexible peptide linkers; vertical rectangles; zinc finger domains (CCHC in light gray, CCHH in dark gray). (B) Interaction of KRIKVE motif and KPNA2. Flow cytometry dot plots representing one of three independent experiments revealing reconstituted GFP signal in sfGFP-LX-293T cells transfected with indicated vector combinations at 6:2 GFP10:GFP11 plasmid ratio (left), and the summarized results obtained from three experiments (right; display of individual and mean values). (C) Interaction of full-length BCL11B and KPNA2. Flow cytometry dot plots representing one of three independent experiments revealing reconstituted GFP signal in sfGFP-LX-293T cells transfected with indicated vector combinations at 5:6 GFP10:GFP11 plasmid ratio (left), and the summarized results obtained from three experiments (right; display of individual and mean values). All groups of transfection conditions were statistically compared using a Kruskal–Wallis test of non-parametric one-way ANOVA, which resulted in significant differences with P=0.0139 and P=0.0150 for the set of BCL11B full-length constructs and the KRIKVE-only constructs, respectively. A multiple comparison follow-up test for meaningful comparisons of mean ranks (positive transfection condition vs appropriate control; B/N1+K1 vs B/N3+K1, B/N2+K1 vs B/N4+K1, B/N1+K2 vs B/N3+K2, B/N2+K2 vs B/N4+K2) with application of multiple testing correction by controlling the false discovery rate (FDR) using the recommended two-stage set-up method of Benjamini, Krieger and Yekutieli led to significant discoveries (B,C) as indicated (** 0.001 < FDR q < 0.01; * 0.01 < FDR q < 0.05). Further control conditions are depicted for completeness.
Next, we cotransfected the GFP1-9-expressing HEK293 cells with plasmid vectors encoding the GFP10 and GFP11 fusions. The transfected cells were checked for green fluorescence by flow cytometry at 48 h after transfection. The reconstitution of functional GFP represents evidence of GFP10 and GFP11 proximity resulting from direct and permanent interaction of KRIKVE motif or BCL11B with KPNA2 followed by association with the GFP1-9 sensor. As shown in Fig. 4B, the N- and C-terminally GFP11-labeled isolated NLS motifs led to GFP reconstitution in a significant proportion of transfected GFP1-9-expressing HEK293 cells when co-transfected with GFP10–KPNA2 fusions. The strength of the signal, calculated as the percentage of GFP-positive (%GFP+) cells multiplied by the mean fluorescence intensity (MFI), was comparable for N- and C-terminally labeled KPNA2, provided the major NLS-binding interface of KPNA2 remained intact. Conversion of the positively charged amino acids into alanine reduced the GFP signal strength (%GFP+×MFI) to the levels observed in non-functional KPNA2 controls. Similar observations were made with full-length BCL11B (Fig. 4C). Although the GFP signal strengths were slightly weaker compared to the isolated KRIKVE motif, which likely reflects the different length of KRIKVE and BCL11B and their expression efficiency, the GFP reconstitution could be detected only when both the investigated NLS motif and the NLS-binding interface of KPNA2 remained intact. The position of the GFP10 and GFP11 tags had no major influence on the signal magnitude and, like for KRIKVE, the mutated controls generated only minimal background fluorescence, presumably caused by spontaneous GFP self-assembly occurring at low frequency also in the tripartite systems.
In silico modeling of the BCL11B-NLS–KPNA2 direct interaction
In order to gain a deeper understanding of the molecular interaction between BCL11B and KNPA2, we performed various computational methods to predict the binding mode. Since molecular docking is a quite challenging task for larger peptides due to the large conformational flexibility, a multi-step procedure was chosen to reassemble the BCL11B-NLS at the KPNA2-binding site based on known structures. A crystal structure of KNPA2 with a substrate bound to both minor and major binding site was used as an initial starting point (PDB ID: 3FEY). The binding motifs of substrates at both sites of KPNA2 are highly conserved, making them an optimal choice as anchoring points for loop modeling. The residues of these motifs were simply mutated to their BCL11B counterpart (minor, RR→RK; major, KRRK→KRIK). This drastically decreased the conformational space for the residues in between by constraining a fixed distance. The whole sequence (RKPAPLPSPGLNSAAKRIK) could then be rebuilt using the loop modeler application implemented in the Molecular Operating Environment (online at https://www.chemcomp.com/Products.htm, Chemical Computing Group) within a few hours. Two additional residues were added at both ends to obtain the full length NLS sequence bound to KPNA2. The system was prepared and submitted to a molecular dynamics simulation as final refinement and sidechain optimization, resulting in a stable complex in solution. All basic residues maintained their positions during the whole simulation time.
Due to the length of the sequence, a short but stable alpha-helical structure was predicted from loop modeling enclosed to the major binding site (Fig. 5; Fig. S1). To verify the helix probability in this region, a protein folding simulation was performed using the replica exchange TIGER2hs algorithm (Geist et al., 2019). Compared to classical T-REMD simulations, fewer replicas are needed for explicit solvation without losing sampling efficiency (Geist et al., 2019). All helix residues had a probability of ∼25% to 40% (ensemble average), which is similar to other helical proteins like Ac-(AAQAA)3-NH2 (Pang, 2016).
Predicted binding mode of the BCL11B NLS to importin α1 and specific interaction to the minor and major binding sites. Hydrogen bonds and salt bridges are represented as dashed yellow and purple lines, respectively. The amino acids of the NLS directly involved in importin α interaction are shown in cyan. A small helical region in front of the residues binding at the major site was predicted by loop modeling and could be confirmed by ab initio protein folding simulations. The helix probability is shown as bar chart, where green colors indicate the residues that actually form a helical secondary structure. A full resolution plot can be found in Fig. S1.
Predicted binding mode of the BCL11B NLS to importin α1 and specific interaction to the minor and major binding sites. Hydrogen bonds and salt bridges are represented as dashed yellow and purple lines, respectively. The amino acids of the NLS directly involved in importin α interaction are shown in cyan. A small helical region in front of the residues binding at the major site was predicted by loop modeling and could be confirmed by ab initio protein folding simulations. The helix probability is shown as bar chart, where green colors indicate the residues that actually form a helical secondary structure. A full resolution plot can be found in Fig. S1.
Besides the strong ionic bonds formed by the basic residues, various hydrogen bonds with NLS backbone atoms were observed (Fig. 5). Previous studies have depicted their importance for recognition and binding affinity to importin-α (Conti and Kuriyan, 2000; Pang and Zhou, 2014). Additionally, hydrophobic interactions were formed by L605 and L610. At the N-terminal region, another ionic bond could be observed between E620 and an arginine sidechain which could further stabilize the interaction with KPNA2.
Taken together, the in silico structure prediction approach provided a reliable model of the BCL11B-NLS–importin complex, which strongly supports the direct interaction of BCL11B and importin-α already assumed from the tripartite GFP complementation-based assay described above.
DISCUSSION
The classical nuclear import of proteins mediated by importin-α–importin-β complex is one of the most conserved cellular processes unique to eukaryotic cells. Owing to an abundance of closely located basic amino acids, the cNLSs recognized by importin-α (KPNA2) are relatively easy to predict and have been identified for over 50% of nuclear proteins. The remaining fraction does not encode any identifiable cNLS, although some of these cargos have been proven to directly bind importin-α. Besides the alternative importin-α-independent transport systems or importin-α-dependent piggybacking mechanisms, this unexpected lack of cNLS could be explained by the presence of as-yet-unidentified alternative basic amino acid-rich motifs that do not resemble the classical one and yet bind importin-α at its conventional NLS-binding regions (Tessier et al., 2020).
The family of Krüppel-like factors to which BCL11B was classified, owing to its conserved C-terminal DNA-binding domain, represents the final group of nuclear proteins adopting the importin pathway without bearing the canonical NLS sequence. But even inside this family members have revealed different demands concerning the presence and location of the contributing positively charged amino acids. The erythroid Krüppel-like factor (EKLF, also known as KLF1) requires the entire C-terminal domain, consisting of three CCHH zinc fingers, to migrate into the nucleus. Surprisingly, neither the zinc finger structure itself nor the DNA-binding properties facilitate the process, and the critical determinants for the nuclear localization are exclusively the conserved basic residues dispersed throughout the entire domain (Pandya and Townes, 2002). The gut-enriched Krüppel-like factor (GKLF, also known as KLF4) in turn was shown to encode an additional basic region (5′BR) located in front of the Krüppel-like zinc fingers, which was found to be essential for creating the fully functional NLS (Shields and Yang, 1997). Although the 5′BR is a common feature of many KLFs, its relevance for nuclear localization of KLFs varies. Despite the presence of the 5′BR, the KLF6 transcription factor localizes in the nucleus solely via its C-terminal zinc fingers and basic residues spread throughout this domain (Rodríguez et al., 2010).
The overlap of the DNA-binding motifs and NLS properties observed in the KLF family prompted us to begin our search for BCL11B NLS within its zinc finger domains. The hypothetical NLS activity of these BCL11B regions could not be detected (ZF1–3) or was minimal (ZF4–6) when attached to an extra-nuclear reporter (3×EGFP), which excluded a crucial role for ZF domains in nuclear localization of the native protein. Although the weak but unneglectable NLS activity of ZF4–6 might represent another proof for a close relationship between BCL11B and KLFs or their common origin, the fact that the ZF4–6-deleted variant showed an exclusive nuclear pattern rather ruled out the major importance of this region for nuclear transport of BCL11B. This conclusion forced us to continue our search for the alternative NLS and led to identification of a cNLS-like sequence located between the two sets of zinc fingers. The identified motif earned rather low predicted importin-α-interaction scores via multiple bioinformatics tools and possessed a number of unique features. Structure-wise it could be categorized as a bipartite NLS consisting of arginine and lysine residues followed by a bit longer stretch of positively charged amino acids (three out of five), both separated by up to a 13-amino-acid spacer region that typically did not exceed ten residues. Interestingly, further shortening or mutations of the predicted motif demonstrated that the upstream basic region (RK) was not essential for nuclear import of the reporter-labeled deletion mutants or the full-length protein. The remaining KRIKV motif that fulfills the consensus criteria of the type 2 monopartite NLS [K(K/R)X(K/R)] was not only sufficient to guide the otherwise cytoplasmic reporter to the nucleus but, in contrast to the upstream RK, also indispensable for nuclear localization of the full-length BCL11B. We could confirm the crucial role of all basic residues, although the observed loss of NLS activity was partially dependent on the introduced mutation. While conversion of KRIKVE into ARIKVE almost completely eliminated nuclear transport of the fused reporter, the TRIKVE mutant remained partially active. An explanation of this difference was delivered by the in silico comparison of these two mutant NLSs bound to the conventional NLS-binding groove of KPNA2. As shown in Fig. S2, in contrast to alanine, threonine could compensate for part of lysine side chain located in the slightly hydrophobic region of the groove and therefore led to subtly less pronounced loss of NLS function. The hydrophobic residues within the identified NLS motif were mutation-tolerant and did not reduce the efficiency of nuclear import when mutated to either alanine or threonine. However, the removal of valine and glutamic acid from the C-terminus of the KRIKVE motif dramatically reduced its NLS activity. The C-terminal VE region like the entire identified NLS strongly resembles the c-Myc-NLS, which is KRVKLD (Dang and Lee, 1998). It was shown that the LD dipeptide could restore the NLS activity of an otherwise defective isolated second basic cluster of the bipartite nucleoplasmin NLS (Makkerh et al., 1996). This suggests a context-dependent contribution of these hydrophobic and acidic residues to importin binding, namely, that they are dispensable when the upstream basic region of a bipartite NLS remains intact, but they gain importance and stabilize the interactions at the major binding groove of KPNA2. Our in silico model supports this conclusion and the results presented here (Fig. S2) – the removal of VE from the C-terminus left an unfavorable negatively charged C-terminal carboxyl group which disabled KRIK–KPNA2 binding. According to our simulations, the adjacent glutamic acid (E620) stabilized the interaction further by forming a salt bridge with an arginine residue of importin-α.
The transient nature of the importin-α–NLS interaction and lack of precise tools to inhibit or deplete this crucial component of the nuclear transport make an in vivo validation of a putative interaction a challenging task. To overcome this obstacle and prove the postulated interaction of KPNA2 with KRIKVE and BCL11B in a cellular context, we applied an assay that took advantage of the fluorescence complementation phenomenon. The chosen system represented an improved variant of a split-GFP protocol: instead of two relatively big complementary non-fluorescent fragments that tended to reconstitute an intact fluorescent protein spontaneously, it splits GFP into a bigger sensor covering 9 of 11 GFP β-strands (GFP1-9) and two small tags encoding the 10th and 11th β-strands. This solution not only diminished the background fluorescence caused by uncontrolled GFP reconstitution but also generated tags (GFP10 and GFP11), which, due to their small size, should be neutral for the expression, structure and stability of the investigated interacting proteins (Koraïchi et al., 2018; Cabantous et al., 2013). Indeed, we observed only minimal background signal in GFP1-9-expressing cells co-transfected with GFP10- and GFP11-labeled peptides supposed not to interact with each other. The vectors with intact NLS sequences combined with NLS-binding-proficient KPNA2 produced green fluorescence that was a couple of magnitudes stronger than in any of the binding-defective controls. However, we cannot ultimately exclude bias in the results of the fluorescence-complementation split-GFP-based assay induced by potential differences in the expression levels between wild-type and mutated split-GFP fusions. To rule out the putative unspecific effects on the assay results, we chose not to tag the sequences additionally with any traceable marker. Although this strategy makes some of the fusions undetectable or undistinguishable at the protein level, the data generated by fluorescence microscopy or cytometry using the same cDNAs fused to a fluorescent reporter (GFP, ECFP or RFP) clearly excluded significantly varying expression levels of the wild-type and the corresponding mutant variants. The fact that the clear GFP-complementation signal could be observed only for the combinations of components capable of mediating specific cNLS–importin interactions at each of many tested GFP10:GFP11 ratios additionally supports our conclusions and virtually eliminates an unequal expression of tested assay components as a cause of the negative GFP complementation results.
The low background fluorescence observed for NLS-mutated BCL11B in combination with the NLS-binding-proficient KPNA2 indicated that, unlike dimerization, which occurs in the cytoplasm, the interactions with other proteins known to possess their own NLSs mostly initiate inside the nucleus. Otherwise, some piggy-backing of the NLS-deficient BCL11B indirectly attracted to the importin should have become detectable by the split-GFP assays or microscopy.
Not surprisingly, the in silico model of the BCL11B bipartite NLS binding to importin-α (KPNA2) is in good agreement with other known importin-α–substrate complexes. It shows that the RK motif occupies the minor binding site of KPNA2 and the KRIKVE engages the major binding site of the adaptor, while the spacer region acquires a helical structure putatively stabilizing the complex. The formation of hydrogen bounds and hydrophobic interactions between the NLS spacer and KPNA2 might compensate for the reduced affinity of the RK-mutated variants. This suggests that nuclear import of BCL11B requires residues located upstream of the KRIKVE motif. However, the experimental results obtained with the isolated KRIKVE clearly show that the interactions outside of the motif, although stabilizing, are dispensable for its function.
Although not proven experimentally, the described nuclear localization mechanism very likely applies to the homologous transcription factor BCL11A. The corresponding region consists of a KPNA2 minor groove-binding KK motif and a 13-amino-acid spacer followed by KRIKLE sequence that engages the major binding interface of importin. The entire region preserves the number and distribution of the basic and hydrophobic residues critical for NLS function. From an evolutionary perspective, the identified NLS seems to be conserved among vertebrates and can be described as RK-X13-KR+K++, where (+) indicates substitutions with amino acids of similar properties and X represents residues forming the spacer (and is X14 in fish).
Interestingly, the second lysine within the KRIKVE motif was previously proven to possess a regulatory function crucial for BCL11B transcriptional activity. As shown by two independent studies, this residue (K689 in BCL11B variant 1, and K618 in BCL11B variant 2 described in this study) undergoes reversible sumoylation induced by MAPK pathway activation and preceded by phosphorylation of BCL11B. The modified SUMO–BCL11B recruited p300 histone acetyltransferase and led to de-repression followed by transcriptional activation of BCL11B-regulated genes in response (Vogel et al., 2014; Zhang et al., 2012). However, the described post-translational modification of BCL11B seems to not be required for its nuclear transport. Conversion of K689 (K618) into arginine, which is not susceptible to sumoylation, did not prevent the normal nuclear localization of the EGFP-labeled protein (data not shown). The most tempting explanation is the spatial separation of the two processes in which the KRIKVE motif is involved. While the non-modified lysine is likely crucial or sufficient for importin binding in the cytoplasm, upon release of BCL11B from the importin-α–importin-β complex inside the nucleus, the same lysine residue becomes available for modifications and interactions with different nuclear proteins.
The wide tissue distribution and the involvement in crucial developmental and physiological processes predispose BCL11B to be a carrier of causative or contributing mutations in tumors derived from various organs. According the Catalogue of Somatic Mutations in Cancer (https://cancer.sanger.ac.uk/cosmic), over 1200 genetic aberrations leading to altered amino acid composition have been identified within the BCL11B coding sequence. However, their distribution is not accidental, and the regions within or adjacent to the DNA-binding zinc finger motifs are affected with relatively higher frequencies. Although improper cellular localization can be easily imagined and was documented as a cause of protein malfunction (Cinti et al., 2000) the KRIKVE motif and the neighboring regions seem not to represent a mutation hotspot in BCL11B. Only two mutations within KRIKVE sequence have been reported, R616G and K618N in lung and skin carcinomas, respectively, and according to the results presented here these variants likely localize aberrantly in the cytoplasm. The second one, besides its aberrant localization likely could not participate in the regulatory sumoylation and repressor-to-activator switch. Five further aberrations affect the two alanine residues directly preceding KRIKVE and interestingly, none of the seven was found in lymphoid malignancies, which appear to be overrepresented among all tumors carrying BCL11B mutations. These reports lead to the conclusion that despite its crucial importance for normal BCL11B function, the newly identified NLS and its surrounding sequence are not frequently affected by loss-of-function mutations, which represent only 0.15% of all loss-of-function mutations identified within the BCL11B coding sequence. It is possible the underrepresentation of the BCL11B-NLS mutations reflects the major importance of this motif for BCL11B function and supports the previously reported dependence of the transformed cells on BCL11B activity (Grabarczyk et al., 2007; Kamimura et al., 2007).
In conclusion, we show evidence for a previously unknown NLS sequence located at distance from the DNA-binding and dimerization-supporting zinc fingers of the BCL11B protein. Although it has structural similarity to a bipartite motif, we furthermore demonstrate that, with regard to function, the NLS of BLC11B represents a monopartite signal consisting of the 6-amino-acid sequence KRIKVE.
MATERIALS AND METHODS
Plasmids
All BCL11B sequences and fragments used in this study correspond to the isoform 2 consisting of exons 1, 2 and 4 (accession number NM_022898.2).
For immunofluorescence microscopy experiments, DNA fragments encoding the full-length BCL11B and its deletion mutants (aa 1–823, aa 1–141, aa 1–434, aa 146–434, aa 718–823, aa 146–823 and aa 1–716) were PCR-amplified from human T cells cDNA and cloned in frame into pEGFP-C1 (Takara Bio Europe, Saint-Germain-en-Laye, France) or newly created p3xEGFP plasmid vectors. The PCR products were cloned into the vectors using the InFusion ligase-free cloning system (Takara Bio Europe). The shorter BCL11B fragments corresponding to the predicted putative nuclear localization sequences (aa 524–562, aa 598–640, aa 615–620, and their alanine or threonine mutants) were synthesized as double stranded DNA (dsDNA) oligonucleotides (Integrated DNA Technologies, Leuven, Belgium). The vectors encoding full-length BCL11B carrying point mutations in the putative NLS were prepared using Q5™ Site-Directed Mutagenesis Kit (New England Biolabs, Ipswich, MA, USA), according to the manufacturer's instructions. For BCL11B dimer localization assay, PCR-generated DNA fragments matching the whole isoform 2 of BCL11B or its N-terminally-truncated version (aa 146–823) were fused to monomeric red fluorescence protein (mRFP). The cDNAs encoding the GFP10- and GFP11-labeled components of the tripartite fluorescence complementation assay were synthesized and delivered in pcDNA3.1 plasmid vectors (BioCat GmbH, Heidelberg, Germany). The GFP1-9 reporter was re-cloned into a lentiviral vector pWPXL (Addgene #12257) under the control of the EF1α constitutive eukaryotic promoter using InFusion ligase free cloning system (Takara Bio Europe). All expression vectors generated for this study were prepared using standard bacterial culture techniques and HiPure plasmid preparation kits (Invitrogen by Thermo Fisher Scientific, Carlsbad, CA). The fidelity of the amplified or synthesized cloned sequences was verified by Sanger sequencing (LGC Genomics, Berlin, Germany).
Cell culture and transfection
The LX™ 293T cell line, a HEK293 derivative (Takara Bio Europe), was maintained in Dulbecco's modified Eagle's medium (Invitrogen) supplemented with 10% fetal calf serum (PanBiotech, Berlin, Germany), Glutamax (Invitrogen), and Mycozap (Lonza Scientific, Basel, Switzerland). Cells from early passages tested negative for mycoplasma infections were used for all experiments described here. The non-confluent LX™ 293T cells growing on 12-well CellBIND® culture dishes (Corning, NY, USA) were transfected using a CalPhos Mammalian Transfection Kit (Takara Bio Europe) according to the manufacturer's instructions. For the BCL11B dimer localization assay, the co-transfected plasmids were combined at 4:1 ratio of NLS-proficient (mRFP labeled) to the NLS-mutated (EGFP) variant to enhance the frequency of mutNLS-containing heterodimers.
To establish the GFP-complementation assay, GFP1-9 encoding cDNA fragment was transduced into the LX™ 293T cell line to ensure high and constitutive expression of the assay sensor. The GFP1-9-LX 293T cells were transfected with GFP10 and GFP11-encoding fusion cDNA at different GFP10:GFP11 plasmid ratios using the CalPhos Mammalian Transfection Kit (Takara Bio Europe) according to the manufacturer's instructions. After 24–48 h, transfected cells were detached from the culture vessel with StemPro Accutase cell dissociation reagent (Thermo Fisher Scientific, Waltham, MA, USA), and green fluorescence was monitored by flow cytometry (Navios, Beckman Coulter, Brea, CA, USA). Statistical analysis was conducted in GraphPad Prism 9.0.0 (GraphPad Software, San Diego, CA, USA). Details are described in the figure legend.
Immunofluorescence
For immunofluorescence analysis, transfected LX™ 293T cells were seeded on 4-well chamber Permanox® slides (Thermo Fisher Scientific) and allowed to attach for 24 h. Then, the cells were cautiously washed with warmed PBS followed by fixation in 4% PFA and subjected to colocalization staining using commercial vendor-validated antibodies (EGFP with LMNB1 or EGFP with mRFP).
After a blocking step (10% normal horse serum, 0.1% Triton X-100, PBS) the cells were incubated with first primary antibody rabbit anti-EGFP (50430-2-AP, 1:200 dilution; Proteintech Group, St. Leon-Rot, Germany) at 4°C overnight. On the next day, the first secondary antibody chicken anti-rabbit-IgG conjugated to Alexa Fluor™ 488 (A-21441, Thermo Fisher Scientific, Waltham, MA, USA) was applied. For the second sets of antibodies, the cells were blocked (10% normal goat serum and 0.1% Triton X-100 in PBS) and incubated with either mouse anti-LMNB1 (clone CL3929, 1:250 dilution; Sigma-Aldrich, Schnelldorf, Germany) or mouse anti-RFP (clone GT1610, 1:1000 dilution; Sigma-Aldrich) primary antibody at 4°C overnight. Next, incubation with second secondary antibody F(ab′)2-goat anti-mouse-IgG conjugated to Alexa Fluor™ 594 (A48288, Thermo Fisher Scientific) was performed. Then the slides were counterstained with DAPI (Bio-Rad Laboratories, Hercules, CA, USA), mounted and subjected to confocal imaging (Carl Zeiss LSM 700, Jena, Germany).
Computational methods
Structure preparation and loop modeling
If not specified otherwise, all calculations were performed using Molecular Operating Environment, Version 2019.01. (2019) (MOE; Chemical Computing Group ULC; available online at https://www.chemcomp.com/Products.htm). The crystal structure of KPNA2 (PDB: 3FEY) was downloaded from RCSB database (https://www.rcsb.org). Residues binding to the minor (RR) and major (KRRK) site were mutated to their corresponding residues in BCL11B (minor, RK; major, KRIK). All other substrate residues were removed. Finally, AMBER14 (ff14SB) force field parameters (Maier et al., 2015) were assigned to the complex prior to protonation with Protonate 3D and restraint minimization with fixed atoms farther than 0.7 nm from substrate. Both peptides at the minor and major site were used as anchor groups to perform de novo loop modeling. A total of 10,000 loop conformations were generated and the top ten poses visually inspected. For the final model, all loop and environment sidechains were rebuilt and minimized.
Molecular dynamics simulations
For further optimization and verification of the final binding mode from loop modeling, molecular dynamics simulations were performed using a multi-stage simulation protocol with NAMD 2.13 (CUDA+MPI version) (Phillips et al., 2005). Explicit TIP3P water molecules, counter ions, partial charges and AMBER16 force field parameters were assigned by tLeap from AmberTools 17 (https://ambermd.org/; Case et al., 2021, Amber 2021, University of California, San Francisco). Hydrogen mass repartitioning (Hopkins et al., 2015) was applied by ParmED to enable a time step of 4 fs. Langevin piston barostat and Langevin thermostat were applied for pressure and temperature control, respectively. First, the system was minimized for 50,000 steps, followed by 1 ns NVT and 1 ns NPT at 310 K. Afterwards, snapshots were collected every 1 ps for a total simulation time of 100 ns during the production run. VMD 1.9.3 was used to analyze the simulation trajectory (Humphrey et al., 1996).
TIGER2hs replica exchange molecular dynamics
All molecular dynamics and force field parameters used are described above. The initial structure was rebuilt from the sequence (PSPGLNSAAKRIKV) as a unordered coil and capped with acetyl groups at the N-terminus and N-methyl-amide at the C-terminus, respectively. For the solvation shell estimation, a short molecular dynamics simulation was performed (50,000 steps minimization, 400 ps NVT, 400 ps NPT and 10 ns NPT production run). The radial distribution function of water oxygen atoms around the solute was calculated using a Tcl script, which is provided along with the TIGER2hs simulation code for NAMD (Geist et al., 2019). The average number of 166 water molecules was selected at 0.41 nm and includes the complete first two solvation shells. A truncated system for energy evaluation was generated with ParmED. An OpenMM (Eastman et al., 2017) script was used for energy evaluation using the GBSA continuum model as implicit solvent proportion of the hybrid solvent approach. For simulation, 24 replicas were distributed on a temperature range from 280 K to 600 K. Sampling and quenching time of each run were set to 20 ps and 10 ps, respectively. Almost 20,000 runs were performed, resulting in nearly 400 ns of pure sampling time and 160,000 conformations. The secondary structure content was then calculated using CPPTraj (AmberTools 17) (Case et al., 2017).
Acknowledgements
We thank Kathrin Assmus for her excellent technical assistance.
Footnotes
Author contributions
Conceptualization: P.G., M. Delin, L.S.; Formal analysis: C.A.S.; Investigation: P.G., M. Delin, D.R., L.S., H.F.; Data curation: M. Delin, D.R., L.S., H.F., M. Depke; Writing - original draft: P.G., L.S., M. Depke, A.L., B.M., C.A.S.; Writing - review & editing: M. Depke; Visualization: D.R.; Supervision: P.G., A.L., B.M., C.A.S.; Funding acquisition: L.S.
Funding
The work was supported by The North-German Supercomputing Alliance (HLRN) (Project ID mvc00011), and the Cooperation Program Interreg VA Mecklenburg-Pomerania/Brandenburg/Poland under the European Territorial Cooperation Objective of the European Regional Development Fund (ERDF): ‘Consolidation of cross-border cooperation through the exchange of knowledge and experience in the field of modern methods of experimental and clinical haematology and oncology’.
Peer review history
The peer review history is available online at https://journals.biologists.com/jcs/article-lookup/doi/10.1242/jcs.258655.
References
Competing interests
The authors declare no competing or financial interests.