Krüppel-associated box domain zinc finger proteins (KRAB-ZFPs) are the largest family of transcriptional regulators in higher vertebrates. Characterized by an N-terminal KRAB domain and a C-terminal array of DNA-binding zinc fingers, they participate, together with their co-factor KAP1 (also known as TRIM28), in repression of sequences derived from transposable elements (TEs). Until recently, KRAB-ZFP/KAP1-mediated repression of TEs was thought to lead to irreversible silencing, and the evolutionary selection of KRAB-ZFPs was considered to be just the host component of an arms race against TEs. However, recent advances indicate that KRAB-ZFPs and their TE targets also partner up to establish species-specific regulatory networks. Here, we provide an overview of the KRAB-ZFP gene family, highlighting how its evolutionary history is linked to that of TEs, and how KRAB-ZFPs influence multiple aspects of development and physiology.
Introduction
Biological events are regulated by complex transcriptional networks, with combinations of transcription factors interacting with cis-acting genomic sequences. Almost 70 years ago, Barbara McClintock proposed that some of these regulatory DNA sequences lay in mobile genetic elements (McClintock, 1950), and 20 years later Roy Britten and Eric Davidson outlined that the repetitive nature of these elements might explain how multiple changes in gene activity can so remarkably result from a single initiatory event (Britten and Davidson, 1969). Nevertheless, such transposable elements (TEs) kept being considered mostly as genetic threats in need of the strictest silencing, and were otherwise dismissed as purely selfish or junk DNA (Doolittle and Sapienza, 1980). However, the sequencing of the human genome at the dawn of the century changed this view, and it is increasingly recognized that some TEs are crucial components of transcriptional regulatory networks that play essential roles not only in the evolution but also the biology of most organisms (Garcia-Perez et al., 2016; Chuong et al., 2017; Thompson et al., 2016). Moreover, recent work indicates that a particular family of transcriptional regulators – Krüppel-associated box domain zinc finger proteins (KRAB-ZFPs) – controls TEs in higher vertebrates and, as such, exerts key influences on the biology of these organisms, including humans.
KRAB-ZFP genes first emerged more than 400 million years ago, and are now encoded in the hundreds by all modern tetrapods examined to date, with the notable exception of birds, in which they generally do not exceed ten (Emerson and Thomas, 2009; Liu et al., 2014; Imbeault et al., 2017; Kauzlaric et al., 2017). KRAB-ZFPs are characterized by an N-terminal Krüppel-associated box (KRAB) domain and a C-terminal array of C2H2 zinc fingers (ZNFs) (Urrutia, 2003). Despite their abundance, the functions of KRAB-ZFPs have long remained ill-defined, although cumulated data have implicated some of them in processes as diverse as imprinting, cell differentiation, metabolic control and sexual dimorphism (reviewed by Lupo et al., 2013). This picture changed when the KRAB-binding co-factor KAP1 was demonstrated to be essential for the early embryonic repression of TEs in both mouse and human, and when a few individual KRAB-ZFPs could be linked to this function as well (Wolf and Goff, 2007, 2009; Wolf et al., 2015b; Rowe et al., 2010, 2013; Jacobs et al., 2014). It was then suspected that the primary role of KRAB-ZFPs was to silence TEs, and that their evolutionary selection represented the host component of an arms race against these genetic invaders (Jacobs et al., 2014; Castro-Diaz et al., 2014; Thomas and Schneider, 2011). More recent data, however, suggest that KRAB-ZFPs fulfill a role that is far more elaborate and, in some cases at least, can contribute to the domestication of their TE targets for the benefit of the host (Ecco et al., 2016; Imbeault et al., 2017).
In this Primer, we sum up our current understanding of the KRAB-ZFP family. We first provide an introduction to TEs and how they function. We also outline the structure, targets and general functions of KRAB-ZFPs. We then focus on the biological impact of the KRAB-ZFP gene family, highlighting its evolution, its role in controlling TEs, and how the selection of both TEs and KRAB-ZFPs might represent a dynamic partnership that generates the species-specific transcriptional networks that influence most aspects of human biology.
Transposable elements and their impact on the genome
TEs can be classified according to their transposition mechanism, overall genetic structure and phylogenetics. Most TEs present in the human genome are retroelements, whether endogenous retroviruses [e.g. human endogenous retrovirus (HERV) or long terminal repeat (LTR) retrotransposons] or non-LTR-retrotransposons of the long interspersed nuclear element (LINE), short interspersed nuclear element (SINE) and SINE-VNTR-Alu (SVA) subgroups. All retroelements spread via a copy-and-paste mechanism leading to their amplification. Given the functional and phylogenetic relationships between transposons and viruses, the sum of TEs present in the genome of an organism can be referred to as its ‘endovirome’, although it should be noted that not all TEs are strictu sensu derived from viruses. Some 4.5 million sequences derived from TEs can be readily identified in the human genome, accounting for about 50% of its DNA content. However, because TEs become unrecognizable over time owing to mutational drift, it is likely that this represents an underestimate of their contribution to our genetic make-up (de Koning et al., 2011; Hubley et al., 2016). Notably, as carriers of transcription factor-binding sites, TEs can impact the host genome in many ways (see Box 1). TEs thus fuel genetic diversity, but they can also induce deleterious mutations responsible for disease. Fewer than one out of 10,000 human TEs is still capable of transposition (Hancks and Kazazian, 2016), but a far greater proportion can alter gene expression.
Owing to their mobile nature and genetic constitution, TEs can perturb their genomic environment. They often bear promoters, enhancers, suppressors, insulators, splice sites or transcriptional stop signals. Accordingly, they can disrupt genes (via alternative splicing, truncation or insertion of new exons) or modify their expression (via promoter, enhancer or repressor effects). Owing to their highly repetitive nature, TEs also underlie recombination events that can lead to deletions, duplications, rearrangements or translocations. Finally, they can alter genome architecture via insulator sequences or by nucleating short- and long-range chromatin interactions, or they can provide entirely novel open reading frames (reviewed by Friedli and Trono, 2015; Rebollo et al., 2012b; Warren et al., 2015).
Pathologies associated with new TE insertions or other types of deregulation include cancers, hemophilia, muscular dystrophy and other congenital or acquired human diseases (reviewed by Ayarpadikannan et al., 2015; Hancks and Kazazian, 2012, 2016; Mager and Stoye, 2015). Most TE-associated human disorders are related to non-LTR retrotransposons. For example, a known cause of breast cancer is the insertion of a primate-specific Alu SINE into the BRCA1 and BRCA2 genes (Miki et al., 1996; Puget et al., 1999). Cases of hemophilia A and B are also associated with insertional mutations of LINE-1 or Alu elements into genes that encode coagulation factors (Kazazian et al., 1988; Li et al., 2001). LTR retrotransposons have also been associated with some diseases, especially cancer. For instance, endogenous retrovirus (ERV) transcripts are upregulated in some tumors and there are reports of LTRs driving oncogene expression in human lymphomas (Lamprecht et al., 2010; Romanish et al., 2010; Babaian and Mager, 2016; Babaian et al., 2016). In mice, many LTR elements are transposition proficient, and ERVs related to mouse mammary tumor virus (MMTV) and mouse leukemia virus (MLV) can cause cancer via activation of proto-oncogenes (Rosenberg and Jolicoeur, 1997). Finally, the expression of ERV proteins can be detrimental to the host and might be associated with autoimmune diseases such as systemic lupus erythematosus in mice and multiple sclerosis in humans (Baudino et al., 2010; Antony et al., 2011).
However, it is on an evolutionary scale that the impact of TEs is best appreciated. TEs endow the genomes of their host species with binding sites for transcription factors, which can then contribute to species-restricted phenotypes (reviewed by Thompson et al., 2016). For instance, mammals generally produce amylase in the pancreas, yet primates can release this enzyme in saliva too, owing to the insertion upstream of the amylase coding sequence of a HERV-E LTR driving expression in the salivary glands (Samuelson et al., 1996; Ting et al., 1992). Many other cases of LTR promoter exaptation have been documented, generally resulting in new or altered tissue-specific gene expression (Cohen et al., 2009; Stavenhagen and Robins, 1988; Rebollo et al., 2012a). Examples of TE-based species-specific enhancers also exist, and in mammals include MER130 elements acting as neocortex-specific units (Notwell et al., 2015), a SINE integrant functioning as a distal enhancer of Fgf8 in the diencephalon (Nakanishi et al., 2012), RLTR13D5 ERVs co-opted as placenta-specific enhancers (Chuong et al., 2013), and the MER41-mediated dispersion of interferon-responsive elements in primates (Chuong et al., 2016). Retroelements can also contribute to embryonic stem cell (ESC) regulatory networks; many binding sites for pluripotency factors [such as Oct4 (Pou5f1) and Nanog] reside within primate- or human-specific ERVs in the human genome (Bourque et al., 2008). In addition, LTR elements are implicated in the regulation of specific genes in early embryogenesis (Bourque et al., 2008; Fort et al., 2014; Macfarlan et al., 2012; Peaston et al., 2004; Wang et al., 2014; Goke et al., 2015; Kunarso et al., 2010). TEs are also frequently bound by p53 (Trp53), with more than one-third of the genomic targets of this tumor suppressor overlapping with primate-specific ERVs (Wang et al., 2007), hence at locations not found for instance in the mouse genome. Finally, ERV-derived proteins can themselves be sources of genetic diversity, as illustrated in placental mammals in which formation of the syncytiotrophoblast, a placenta layer with extensive cellular fusion, is mediated by ERV envelope-derived syncytins (Mi et al., 2000; Dupressoir et al., 2009, 2011). Interestingly, across mammals, these proteins derive from the env gene of distinct groups of ERVs, indicating convergent evolution with multiple and independent events of ERV co-option (Lavialle et al., 2013). Together, these findings highlight the huge impact that TEs can have on the evolution and biology of complex organisms.
The domain structure of KRAB zinc finger proteins
KRAB-ZFPs are characterized by the presence of a KRAB domain and an array of C2H2 zinc fingers (Fig. 1). The KRAB domain encompasses approximately 75 amino acids and is often split into two modules: the A-box, which is primarily responsible for repressive activity, and the B-box, which is thought to potentiate KRAB-A effectiveness (Bellefroid et al., 1991; Mannini et al., 2006; Witzgall et al., 1994). The repressor activity of KRAB-ZFPs stems from the KRAB domain-mediated recruitment of KAP1 [KRAB-associated protein 1; also known as TRIM28 (tripartite motif protein 28), Tif1β or KRIP-1] (Friedman et al., 1996), a scaffold protein that recruits mediators of heterochromatin formation (Iyengar and Farnham, 2011). The C-terminal C2H2 ZNF arrays of KRAB-ZFPs are tandem repeats of the CX2-4CX12HX2-6H motif (where X is any amino acid) interspaced by seven residue-long linkers (Iuchi, 2001). Human KRAB-ZFPs can harbor anywhere between two and more than 40 ZNFs, with the average number being 12 (Urrutia, 2003). Each zinc finger can theoretically interact with three nucleotides of the primary DNA strand (via amino acids at positions −1, 3 and 6 of the C2H2 helix), with some contacts being established with the secondary strand (via amino acid 2) (Emerson and Thomas, 2009; Elrod-Erickson et al., 1998). KRAB-ZFP genes display signs of strong positive selection at the positions encoding for the DNA-contacting amino acid residues, consistent with the idea that their products interact with DNA targets that themselves are capable of rapid evolution, such as TEs or viruses (Emerson and Thomas, 2009). Furthermore, although the length of KRAB-ZFPs (given the many ZNFs present in each array) should allow for a very high degree of specificity in the recognition of long DNA targets, it has been noted that KRAB-ZFP binding motifs are usually shorter than predicted. This suggests that different ZNFs in a KRAB-ZFP recognize different DNA motifs, as is the case for the C2H2 ZNF protein CTCF (Nakahashi et al., 2013), or that ZNFs not involved in contacting DNA could engage in other types of interactions, for instance with RNA or proteins (Najafabadi et al., 2015; Imbeault et al., 2017; Schmitges et al., 2016).
Some highly conserved KRAB-ZFPs contain additional elements in their N terminus, such as SCAN or DUF3669 domains. The vertebrate-specific SCAN domain can mediate oligomerization notably with other SCAN-containing proteins (Honer et al., 2001), whereas the function of the DUF3669 domain remains largely unknown, as indicated by its acronym (domain of unknown function).
Genomic targets of human KRAB zinc finger proteins
The genomic targets of a large fraction of human KRAB-ZFPs have been characterized in recent studies using chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) and tagged proteins overexpressed in 293T cells as bait (Najafabadi et al., 2015; Imbeault et al., 2017; Schmitges et al., 2016). This type of analysis does not allow one to conclude which genomic locus or loci are bound by a KRAB-ZFP in the physiological setting of a particular cell, as its recruitment stands to be differentially affected by the presence of other DNA-binding proteins, by the state of the chromatin and perhaps by levels of DNA methylation at potential target loci. However, these studies have revealed the type of genetic elements targeted by these KRAB-ZFPs, and in most cases have delineated a consensus binding sequence. These studies have also determined that a great majority of human KRAB-ZFPs associates with at least one subfamily of TEs, most of them retrotransposons. Some KRAB-ZFPs can bind to sequences in different TE families (e.g. HERVs and LINEs). Conversely, many TE subfamilies are recognized by several KRAB-ZFPs, which most often target clearly distinct regions of their integrants, as has been observed for ERVKs (endogenous retrovirus K) in mouse and for HERVs and LINE-1s (L1s) in humans (Imbeault et al., 2017; Ecco et al., 2016). Interestingly, in humans, it was observed that the age of the elements influences their pattern of KRAB-ZFP recruitment. For instance, the primate-specific LINE-1 (L1PA) L1PA4s, which are approximately 20 million years old (myo), are recognized by many KRAB-ZFPs. In contrast, most human-specific LINE-1s (L1Hs) are devoid of binding sites for factors recruited near the L1 promoter, such as ZNF93 (∼20 myo), ZNF649 (∼105 myo), ZNF765 (∼7 myo) and ZNF141 (∼43 myo). However, 3′ binders such as ZNF382, ZNF84 (both ∼105 myo) and ZNF429 (∼29 myo) bind to a significant fraction of all L1PA integrants, from the ∼40 myo L1PA16 to the youngest L1Hs (Imbeault et al., 2017). Thus, binding of KRAB-ZFPs to TEs is both combinatorial and evolutive.
It should be noted that about a third of tested human KRAB-ZFPs do not associate significantly with TEs and are instead found at other types of genomic targets such as promoters, simple repeats and poly-zinc finger protein genes. Many promoter-binding KRAB-ZFPs are ancient and evolutionarily conserved, and contain SCAN or DUF3669 domains; most do not recruit KAP1 and, as yet, are of unknown function (Imbeault et al., 2017; Schmitges et al., 2016). Some associate with wide arrays of promoters, for instance ZNF202, which binds in the vicinity of several thousand transcriptional start sites (TSSs). Others bind to promoters in a combinatorial fashion, such as the DUF3669-containing ZNF282 and ZNF398. Other KRAB-ZFPs, such as ZNF274 and ZNF75D, associate with the 3′ region of poly-zinc finger protein genes, where they recognize conserved and partly overlapping motifs within the proximal part of ZNF-encoding sequences (Imbeault et al., 2017; Frietze et al., 2010).
The biological functions of KRAB zinc finger proteins
KRAB-ZFPs influence a variety of biological events. Many of their roles involve KAP1, which binds to a sizeable fraction of human and murine KRAB-ZFPs (Schmitges et al., 2016). KAP1 acts as a scaffold for a silencing complex that comprises the histone methyltransferase SETDB1 (also known as ESET) (Schultz et al., 2002), the nucleosome remodeling and deacetylation (NuRD) complex (Schultz et al., 2001), heterochromatin protein 1 (HP1) (Nielsen et al., 1999; Sripathy et al., 2006) and DNA methyltransferases (Quenneville et al., 2012) (Fig. 2). Accordingly, many KRAB-ZFPs act as transcriptional repressors via the KAP1-nucleated induction of heterochromatin and, in early embryonic cells, via DNA methylation (Wolf and Goff, 2009; Quenneville et al., 2012; Rowe et al., 2013; Jacobs et al., 2014; Najafabadi et al., 2015; Ecco et al., 2016; Schmitges et al., 2016; Imbeault et al., 2017). However, not all KRAB-ZFPs bind KAP1, and the interactome of more ancient human family members, notably those endowed with SCAN or DUF3669 domains, reveals associations with other types of proteins, including transcriptional activators (Schmitges et al., 2016). Below, we provide an overview of the key biological functions that have been identified for KRAB-ZFPs during development.
Heterochromatin induction in early development and TE control
The best-characterized function of KRAB-ZFPs is the locus-specific induction of heterochromatin during early embryogenesis via the KRAB-mediated recruitment of KAP1, as first suggested by the discovery that KRAB could trigger promoter methylation if tethered to DNA during the first few days of mouse development (Wiznerowicz et al., 2007). At imprinting control regions, where a methylated hexanucleotide is recognized in mouse and human by ZFP57, this results in the trans-generational preservation of imprinting (Quenneville et al., 2011; Li et al., 2008; Strogantsev et al., 2015). At sequences derived from TEs, this allows for the taming of transcriptional influences that would otherwise hamper early development, from zygotic genome activation to the establishment and normal differentiation of pluripotent stem cells (Rowe et al., 2010; Matsui et al., 2010; Rowe et al., 2013; Turelli et al., 2014; Macfarlan et al., 2012). KRAB-ZFPs display exquisitely regulated patterns of expression during the first few days of embryogenesis, both in humans and mice, mirroring the tightly orchestrated transcription of TE-containing loci during this period (Corsinotti et al., 2013; Theunissen et al., 2016; Macfarlan et al., 2012; Fort et al., 2014; Gifford et al., 2013; Goke et al., 2015; Grow et al., 2015; Kunarso et al., 2010; Xue et al., 2013; Yan et al., 2013). The removal of KAP1 or its partner histone methyltransferase SETDB1 in murine or human ESCs activates the expression of multiple TEs (Matsui et al., 2010; Rowe et al., 2010; Turelli et al., 2014).
A number of KRAB-ZFPs have been implicated in controlling TE repression in ESCs (Wolf et al., 2015a). ZFP809, a murine-specific KRAB-ZFP, was demonstrated early on to silence exogenous MLV in embryonic carcinoma cells through recognition of the provirus primer binding site-coding sequence (Wolf and Goff, 2007, 2009). Curiously, depletion of ZFP809 in mice leads to de-repression of MLV-related ERVs in adult tissues but not in ESCs (Wolf et al., 2015b). Although functional data on the role of individual human KRAB-ZFPs during this period are still missing, it is noteworthy that HERVH (human endogenous retrovirus H) integrants, which appear to play an important role in human ESC pluripotency, are recognized by several KRAB-ZFPs, the levels of which change as these cells switch from a naïve to a primed pluripotent state (Theunissen et al., 2016). Other KRAB-ZFPs controlling TEs in ESCs are ZNF91 and ZNF93, which respectively repress SVAs and LINE-1 (Jacobs et al., 2014), and the murine paralogs ZFP932 and Gm15446, which regulate ERVKs (Ecco et al., 2016). It is now established that, by controlling TEs, the KRAB-ZFP/KAP1 complex ensures the transcriptional homeostasis and normal differentiation of ESCs. Upon KAP1 or KRAB-ZFP depletion in ESCs, repressive chromatin marks at TEs are replaced by active histone modifications typically found on enhancers, and nearby genes can become activated (Rowe et al., 2013; Jacobs et al., 2014; Turelli et al., 2014; Ecco et al., 2016).
Until recently, it was generally believed that most TEs are irreversibly silenced during these early stages of embryonic development, alleviating the need for subsequent sequence-specific control, including by the KRAB-ZFP/KAP1 system (Maksakova et al., 2008; Walsh et al., 1998). However, recent evidence suggests otherwise. First, deep transcriptome analyses indicate that some TE loci can be transcriptionally active in adult tissues, providing alternative promoters or fulfilling other regulatory functions (Faulkner et al., 2009; Belancio et al., 2010). Second, in mature T lymphocytes, a significant fraction of TEs bound by KAP1 in human ESCs still carries the co-repressor (Turelli et al., 2014). Third, KAP1 deletion in neuronal progenitors activates some endogenous retroelements (Fasching et al., 2015), and selected ERVs are similarly induced in murine B lymphocytes or mouse embryonic fibroblasts (MEFs) depleted for SETDB1 (Collins et al., 2015; Wolf et al., 2015b). Correspondingly, human KRAB-ZFPs display extensive and cell-specific patterns of expression in all adult tissues examined (Imbeault et al., 2017; Liu et al., 2014). Furthermore, the mouse-specific KRAB-ZFPs ZFP932 and Gm15446 are also involved in controlling their TE targets in somatic tissues, where they modulate the TE-mediated regulation of neighboring genes in vivo (Ecco et al., 2016). More broadly, by comparing KRAB-ZFP-binding sites with the ENCODE database, a significant overlap between the TE targets of a number of human KRAB-ZFPs and the binding regions of other transcription factors such as YY1, CEBPZ, GATA3, FOXA1 and STAT1 was observed (Imbeault et al., 2017). Finally, by examining the chromatin state of KRAB-ZFP-bound TEs in a subset of these tissues, it was noted that a significant fraction display cell-specific enrichment of activation marks instead of those associated with repressive heterochromatin. Moreover, in these cases, nearby genes were on average expressed at higher levels, consistent with KRAB-ZFP-controlled, TE-based enhancer effects on these genes (Imbeault et al., 2017). Considering the limited scope of this type of analysis, which can detect neither long-range effects nor trans-acting influences by TE-derived regulatory RNAs, and the fact that chromatin data were available only for a few cell types, it is likely that the KRAB-ZFP-mediated control of TEs in fact impacts the physiology of a range of developing and adult tissues.
KRAB-ZFPs in cell differentiation
As discussed above, many KRAB-ZFPs are expressed in ESCs and early progenitors (Corsinotti et al., 2013), where they engage together with KAP1 in repressing TEs. However a number of KRAB-ZFPs can influence other aspects of development, although no evidence for interaction with TEs has been demonstrated so far in these cases. In the mouse, for example, ZFP689, ZFP13 and KAP1 play an important role in erythropoiesis by regulating an miRNA cascade that governs mitophagy in red cell precursors (Barde et al., 2013). In humans, there is evidence that ZNF589, ZNF268 and ZNF300 influence hematopoietic differentiation (Venturini et al., 2016; Zeng et al., 2012; Xu et al., 2010). KAP1 is also important for B- and T-cell development and homeostasis (Santoni de Sio, 2014). Studies in human and mouse showed that KAP1 depletion in these cells leads to differentiation and metabolic defects (Santoni de Sio et al., 2012a,,b; Chikuma et al., 2012). The KRAB-ZFPs responsible for these phenotypes, however, have not yet been identified, but many are specifically expressed in these tissues (Liu et al., 2014; Imbeault et al., 2017). Other events influenced by KRAB-ZFPs include osteogenesis (Jheon et al., 2001), mammary gland development (Oliver et al., 2012) and the formation of extra-embryonic tissues (Shibata and Garcia-Garcia, 2011; Shibata et al., 2011).
KRAB-ZFPs and metabolism
A number of KRAB-ZFPs have been implicated in cellular and organismal metabolic pathways. For example, ZFP69 was reported to mediate liver fat accumulation and mild insulin resistance in mice (Chung et al., 2015). In human cells, ZNF224 is associated with glycolysis and oxidative metabolism (Iacobazzi et al., 2009; Lupo et al., 2011). The mechanism of action of these KRAB-ZFPs are not all defined, but they most likely act via KAP1 as ZNF224, for instance, was shown to interact with the co-repressor (Medugno et al., 2005). Furthermore, KAP1 plays important roles in the liver: liver-specific KAP1 knockout leads to male-restricted hepatic carcinogenesis and perturbs the metabolism of hormones and antibiotics in the liver (Bojkowska et al., 2012). In mice, the KRAB-ZFPs RSL1 and RSL2 are involved in sexually dimorphic gene expression, also in the liver, repressing male-specific hepatic genes such as members of the cytochrome P450 (Cyp) families, which are important for the metabolism of xenobiotics (Krebs et al., 2003). These dimorphic cytochrome P450 genes are also upregulated in KAP1 knockout livers (Bojkowska et al., 2012), suggesting that RSL1 and RSL2 act via KAP1 in this context. Interestingly, it has been reported that the control of one RSL1 target, the gene encoding the sex-limited protein (SLP; also known as C4A), seems to occur via binding to an ancient endogenous retrovirus (Stavenhagen and Robins, 1988; Krebs et al., 2012).
Other examples of KRAB-ZFPs implicated in metabolism include ZNF255, an isoform of ZNF224, which interacts with a Wilms' tumor 1 (WT1) protein isoform that has affinity for RNA and has been implicated in transcript processing, suggesting a role for this KRAB-ZFP in RNA maturation and post-transcriptional control (Florio et al., 2010). Similarly, ZNF74 binds RNA and is tightly associated with the nuclear matrix, suggesting a role for this protein in RNA metabolism (Grondin et al., 1996).
The evolutionary path of KRAB-ZFPs
A survey of more than 200 vertebrate genomes reveals that KRAB-ZFP genes first appeared some 420 million years ago in a common ancestor of coelacanths, lungfish and tetrapods (Imbeault et al., 2017). The genomes of all analyzed modern species derived from this ancestor, except for birds, contain several hundreds of KRAB-ZFP genes. Interestingly, all 300 or so KRAB-ZFP genes found in coelacanths seem to be mono-exonic, whereas in all other species the KRAB and zinc finger domains are most often encoded by separate exons. This suggests that ancestral KRAB-ZFPs were mono-exonic, and that switching to a multi-exonic configuration perhaps facilitated the reshuffling of zinc finger arrays and the independent evolution of the KRAB domain, paving the way to its coupling in some proteins to SCAN or DUF3669 domains, and to the emergence of non-canonical KRAB units not functionally linked to KAP1 recruitment (Schmitges et al., 2016; Itokawa et al., 2009; Murphy et al., 2016).
To trace putative DNA-binding orthologs, the ZNF fingerprints of KRAB-ZFPs (i.e. the series of amino acid triplets within their ZNF arrays predicted to dictate their DNA-binding specificity) have been compared (Liu et al., 2014; Imbeault et al., 2017). These analyses delineated clusters that are specific for most taxonomic orders and also allowed for the identification of KRAB-ZFPs restricted to each species (Imbeault et al., 2017). Interestingly, no ZNF fingerprint ortholog of coelacanth KRAB-ZFPs is found in any other species, suggesting that the genomic targets of these proteins are unique to this organism and possibly untested close relatives, consistent with the existence of species-restricted TEs. Many species- and class-specific KRAB-ZFPs can similarly be detected in most analyzed genomes, indicating ongoing amplification and turnover of the family with regular addition of new members (Imbeault et al., 2017; Huntley et al., 2006; Thomas and Schneider, 2011). A recent examination of the mouse genome identified about twice as many KRAB-ZFP genes as had been previously annotated as either KRAB-ZFP genes or pseudogenes, notably by assigning an entity formerly considered as a large group of satellite repeats to this family (Kauzlaric et al., 2017). It also highlighted the cluster-based organization of these genes and their distribution throughout the genome, with signs of recombination, translocation, duplication and seeding of new sites by retrotransposition of KRAB-ZFP genes. Finally, it provided evidence that closely related paralogs have evolved through both the genetic drifting and shifting of sequences encoding for zinc finger arrays; that is, with adjacent KRAB-ZFPs differing by either point mutations at DNA-contacting residues of a few ZNFs or substitutions of entire blocks of these motifs (Kauzlaric et al., 2017).
It has been noted that the invasion by new families of endogenous retroviruses coincided with the appearance of novel KRAB-ZFP duplicates in primates (Thomas and Schneider, 2011). The guinea pig, opossum and, to a lesser extent, mouse genomes display an unusually high number of species-specific paralogs. The mouse genome is known to harbor a significant fraction of retrotransposition-competent TEs, including ERVs and LINEs (Kazazian, 2004; DeBerardinis et al., 1998), supporting a model whereby new TE variants contribute to fix recently emerged KRAB-ZFP paralogs. Conversely, a few KRAB-ZFPs that are highly conserved in other mammals have been lost in primates, whereas some human KRAB-ZFP pseudogenes have functional orthologs in closely related species, indicating divergence in the selective pressures responsible for their maintenance (Imbeault et al., 2017). Emergence of a paralog acting as a functional substitute, rather than extinction of the corresponding TE targets, probably accounts for most of these occurrences.
It is remarkable that all examined bird genomes stand out for their very low content of KRAB-ZFP genes, no more than ten in most of them. Interestingly, avian genomes are significantly smaller than those of other amniotes (Organ et al., 2007; Wallis et al., 2004), and a much smaller fraction of the chicken and zebrafinch genomes can be readily attributed to TEs, compared with most other tetrapods (15% versus 40-50% on average) (Chalopin et al., 2015). This suggests that TE burden and activity contribute to the maintenance of a pool of functional KRAB-ZFPs. Alternatively, it is tempting to hypothesize that birds, when they emerged from theropod dinosaurs, evolved another TE control system that exhibits many of the same functional properties as KRAB-ZFPs, rendering the latter dispensable.
TE/KRAB-ZFP co-evolution: both an arms race and domestication
A wealth of data indicates that KRAB-ZFPs and TEs have co-evolved. This has led to proposition of the ‘arms race’ model, which states that competition between KRAB-ZFPs and TEs, with KRAB-ZFPs continuously trying to suppress invasion by rapidly mutating TEs, drives their selection. However, more recent data suggest that this model is too simplistic, and additionally point towards a ‘domestication’ model in which at least some KRAB-ZFPs help the host co-opt TEs for its benefit.
The arms race model
Several lines of evidence indicate that TEs have served as an important motor for the selection of KRAB-ZFP genes. During evolution, KRAB-ZFP genes underwent strong positive selection at positions encoding amino acids predicted to determine the DNA-binding specificity of their products (Emerson and Thomas, 2009; Liu et al., 2014). Furthermore, KRAB-ZFP paralogs exhibit not only significant differences in ZNF fingerprints, but also differential expression and splicing patterns across tissues, consistent with the acquisition of new functions following gene duplication events (Nowick et al., 2010; Kauzlaric et al., 2017). An analysis of data from the 1000 Genomes Project revealed that human KRAB-ZFP genes harboring non-synonymous single nucleotide polymorphisms in sequences encoding their predictive DNA-contacting residues are generally expressed at lower levels, are evolutionarily younger, and seem to be less evolutionarily constrained than those without such polymorphisms, suggesting that they are on their way to becoming pseudogenes (Kapopoulou et al., 2016).
Most importantly, both KRAB-ZFPs and TEs underwent parallel waves of expansion in the genomes of tetrapods (Thomas and Schneider, 2011). Moreover, in human ESCs, a dynamic regulation model of LINE elements by KRAB-ZFP/KAP1 can be documented, whereby the expression of newly emerged LINE-1 families is initially repressed by small-RNA-induced DNA methylation, before KAP1-mediated repression takes over through the selection of KRAB-ZFPs sequentially capable of recognizing these TEs, until these are ultimately deprived of any activity by mutations (Castro-Diaz et al., 2014).
Together, these findings have led to the ‘arms race’ model (Fig. 3A), which asserts that dynamic competition between TEs and KRAB-ZFPs drives their co-evolution, with TEs that are controlled by a KRAB-ZFP mutating away to escape repression while the pool of KRAB-ZFP genes evolves proteins with novel zinc finger arrays, which get fixed once they can recognize the renegade TE (Imbeault and Trono, 2014). This model is best exemplified by the primate-specific L1PA subfamily of LINE elements, as these TEs, devoid of an extracellular phase, display a linear evolutionary path, each new subfamily deriving from the one previously expanded in the genome of its host species and ancestors. Indeed, compelling evidence for the arms race model stems from the characterization of ZNF93 and its binding to L1PA elements – in particular the loss (via deletion) of the ZNF93 recognition site in newer L1PA subfamilies (Jacobs et al., 2014). Additional support comes from the recent identification (Imbeault et al., 2017) of TE targets in a large set of human KRAB-ZFPs, which reveals the sequential recruitment at the 5′ ends of primate-specific L1 elements of not only ZNF93 but also ZNF141, ZNF649 and ZNF765, with zinc finger mutations accumulating coincidentally with the appearance of new L1PA subfamilies, and loss of binding sites for all of these KRAB-ZFPs in the newest human-specific LINE-1. This study could also retrace specific mutation events in the binding motifs of KRAB-ZFPs that correlated with loss of binding in the youngest elements, generally subtler than the 129-bp deletion event that led to escape from ZNF93 (Imbeault et al., 2017).
With ERVs, the situation is more complicated, as these TEs are endogenized following waves of genomic invasion originating from external sources, with potential iterations precluding firm dating. Nonetheless, in mice, the KRAB-ZFP paralogs ZFP932 and Gm15446 regulate overlapping but distinct sets of ERVKs with both proteins binding to the 3′ end of members from the same families of retroelements, but with different preferences (Ecco et al., 2016). For instance, whereas ZFP932 and Gm15446 are similarly enriched at RLTR44-int, IAP-d-int and MMERVK10D3_I-int elements, Gm15446 is more frequently found at MMERVK10C-int, IAPEy-int and IAPEY3-int. Further analyses suggest that ZFP932 appeared first and that Gm15446 arose secondarily by duplication, with subsequent accumulation of mutations leading to a partial shift in target range (Kauzlaric et al., 2017).
The domestication model
More recently, evidence has emerged to suggest that a host-invader arms race cannot have been the sole motor of the evolutionary selection of KRAB-ZFP genes. First, the recognition of many TEs by several KRAB-ZFPs would constitute a major obstacle to mutational escape if these factors were all simultaneously engaged in their repression. Second, LINE-1 integrants controlled by KAP1 in human ESCs are between ∼7 and 25 million years of age, and have long lost all transposition potential (Castro-Diaz et al., 2014), as have all HERVs, including the tens of thousands of integrants still controlled by KAP1; therefore, the conservation of KRAB-ZFP-binding sites in these elements does not arise from the need to suppress their replication. Third, it appears that numerous TEs kept spreading or even started invading the human ancestral genome long after KRAB-ZFPs capable of recognizing their sequence had emerged. For instance ZNF649, which like ZNF93 binds the L1PA promoter and exhibits a very similar expression pattern, dates back to the time of mammalian radiation, some 60 million years before either ZNF93 or any of its target L1PA subfamilies appeared. In addition, recent data suggest that enrichment for certain KRAB-ZFPs is positively selected on some TEs, as for ZNF382 and ZNF84 on L1Hs, most integrants of which are recognized by these proteins (Imbeault et al., 2017). It could be hypothesized that these TEs have evolved to bind KRAB-ZFPs in order to be able to spread in the germ line, where these proteins might not be produced, and be subsequently controlled by their action in differentiated tissues, which would minimize negative selection. However, we note that most of the corresponding KRAB-ZFPs do not recruit KAP1 (P.-Y. Helleboid and D.T., unpublished), and are therefore not predicted to act as repressors. In addition, most KRAB-ZFPs exhibit highly sophisticated patterns of expression, exhibiting tissue- and lineage-specificity and being influenced by the differentiation and activation states of the cell, indicating that their interactions with, and hence their influences on, their TE targets are highly regulated. Collectively, these findings strongly suggest a ‘domestication model’, in which KRAB-ZFPs, rather than just blocking the transposition potential of TEs, participate in their domestication (Fig. 3B). It is noteworthy that data revealing highly tissue-specific expression patterns, individualized sets of post-translational modifications and very distinct protein interactomes for many human KRAB-ZFPs (P.-Y. Helleboid and D.T., unpublished) indicate that this system is likely to be far more amenable to regulation than RNA-based TE control mechanisms, which are predominantly at work in the germ line and during early embryogenesis and mostly result in permanent silencing. As such, KRAB-ZFPs might be important instruments towards a full participation of TEs in shaping transcriptional regulatory networks as imagined by Britten and Davidson some fifty years ago (Britten and Davidson, 1969).
The evolutionarily ephemeral nature of many KRAB-ZFP genes should not be taken as an argument against their involvement in the domestication of TEs. Indeed, TEs sprinkle genomes with a constant flux of cis-acting sequences. Although many TE integrants are likely to be of neutral impact, destined to be progressively erased by mutational drift or to be eliminated by recombination, some can exert significant influences, which provide the host with new options for regulating biological events. At times, these new integrants could render more ancient TE-based cis-acting sequences less essential, opening the door to their evolutionary removal. Meanwhile, the genomes of higher vertebrates generate steady supplies of new KRAB-ZFP genes. Some are paralogs subtly differing from their immediate predecessors, which they can functionally replace, leading to their evolutionary loss. Others are more novel family members, which can either be positively selected if they fulfill a useful role, or disappear if they do not match a functionally relevant target.
A general picture integrating these various considerations thus emerges and can describe the interactions between TEs and KRAB-ZFPs. When a new TE enters a host genome, whether from an exogenous source (for ERVs) or by mutation of an endogenous predecessor, it is initially silenced via ancestral RNA-based mechanisms, such as those mediated by Piwi-interacting RNAs (piRNAs). Over time, its integrants accumulate mutations that progressively hamper their transposition potential. Meanwhile, KRAB-ZFP paralogs with novel DNA-binding specificities are generated, some of which recognize these TEs and get fixed, because they contribute to preventing the further spread of these elements and/or because they partake in their co-option for the benefit of the host, for instance by allowing the transcriptional regulatory potential of these TEs to be developmentally regulated or tissue restricted. Based on the observed evolutionary dynamics of KAP1-mediated control of LINE-1 in human ESCs, it seems that, at least in recent time and for this class of retroelements, the matching of a newly appeared TE and an inhibitory KRAB-ZFP can take more than 7 million years, as KAP1 does not repress any human-specific LINEs (Castro-Diaz et al., 2014). Over time, KRAB-ZFP/KAP1-controlled TE integrants continue to undergo mutational drift, so that in some cases only their KRAB-ZFP-recruiting region remains to serve as a transcription regulatory platform, which could explain why we frequently find the oldest human KRAB-ZFPs at promoters, without identifiable TE signatures. KRAB-ZFPs themselves might evolve to become capable of recruiting activities distinct from KAP1-nucleated repression. Ultimately, all that might be left from the TE/repressor pair is a DNA target motif and its sequence-specific polypeptidic ligand, with no recognizable trace of their source elements.
Concluding remarks
TEs and their KRAB-ZFP controllers confer a high degree of species specificity to many biological processes relevant to the development and physiology of their hosts, including humans. Indeed, a large fraction of the human endovirome is unique to our species and its close relatives, both with regards to its sequence and the genomic distribution of its individual components. Correspondingly, many human KRAB-ZFPs are relatively recent products of our evolution (Nowick et al., 2010; Liu et al., 2014; Imbeault et al., 2017). Therefore, when studying regulatory networks in the human system or in animal models one should carefully discern general principles from species-specific layers of regulation.
Considering that species-restricted KRAB-ZFPs and TEs probably shape regulatory networks in all mammals, the high degree of similarity in the physiology of these organisms might seem surprising. However, although the dynamic partnership between TEs and their KRAB-ZFP ligands provides plenty of ground for divergence, functional evolution for most organ systems is limited by physiological and environmental constraints. For instance, even though early embryogenesis is regulated by different sets of TEs and KRAB-ZFPs in mice and humans, major deviations are difficult to introduce in this highly orchestrated process.
One organ that might partly escape such evolutionary canalization is the central nervous system as, at least in humans, a very wide range of cognitive and psychological phenotypes are compatible with normal life expectancy and efficient reproduction. It is thus interesting to note that remarkably elevated levels of TE activity have been recorded in the brain (Erwin et al., 2014), that a higher range of KRAB-ZFPs is expressed in the brain than in most other adult human tissues (Imbeault et al., 2017), and that KRAB-ZFPs disproportionately contribute to differences between the brain gene networks of chimpanzees and humans (Nowick et al., 2009). These observations suggest that the endovirome and its KRAB-ZFP controllers could have played an important role in the expansion of higher brain functions that were key to the emergence of modern humans. Future studies should test this hypothesis and further decipher the function and mechanisms of action of individual KRAB-ZFPs, and how, together with their TE or non-TE targets, these proteins so uniquely impact on the biology of their host species.
Acknowledgements
We thank all former and current members of the Trono Lab for discussions. We regret being unable to cite all work relevant to this primer owing to space constraints.
Funding
The authors were funded by the Swiss National Foundation (Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung) and the European Research Council (ERC 268721 and ERC 694658).
References
Competing interests
The authors declare no competing or financial interests.