Despite often being classified as selfish or junk DNA, transposable elements (TEs) are a group of abundant genetic sequences that have a significant impact on mammalian development and genome regulation. In recent years, our understanding of how pre-existing TEs affect genome architecture, gene regulatory networks and protein function during mammalian embryogenesis has dramatically expanded. In addition, the mobilization of active TEs in selected cell types has been shown to generate genetic variation during development and in fully differentiated tissues. Importantly, the ongoing domestication and evolution of TEs appears to provide a rich source of regulatory elements, functional modules and genetic variation that fuels the evolution of mammalian developmental processes. Here, we review the functional impact that TEs exert on mammalian developmental processes and discuss how the somatic activity of TEs can influence gene regulatory networks.
Genomes are the habitat in which genes reside and their complexity is an indication of the number of biological processes that are required during the life of an organism. Comparative genomic studies have revealed that the proportion of the genome occupied by genes decreases as biological complexity increases (Boeke and Devine, 1998). Intriguingly, the opposite is observed for transposable elements (TEs) (Boeke and Devine, 1998), which are pieces of DNA that can move within genomes. Copies of these elements are typically interspersed throughout the genomes of most organisms examined to date. Up to 70% of the human genome is derived from TEs (de Koning et al., 2011), while genes occupy less than 2% (Lander et al., 2001). These data imply that there has been significant activity of TEs during evolution but that most of the genetic changes caused by TEs are not detrimental. Rather, the high percentage of TE-derived sequences in mammalian genomes could indicate their inherent potential to create and diversify biological processes, as proposed 60 years ago by McClintock, Britten and Davidson (Britten and Davidson, 1969; McClintock, 1956). The proposal that TEs have a present-day function in host genomes to provide cis-regulatory elements that co-ordinate the expression of groups of genes (Britten and Davidson, 1969) is starting to be tested on a genome-wide scale with the advent of next-generation sequencing. Furthermore, recent findings showing that TEs mobilize much more frequently in development than previously anticipated suggest that these sequences may have additional present-day functions in host genomes (recently reviewed in Hancks and Kazazian, 2016; Muñoz-Lopez et al., 2016; Richardson et al., 2015).
In this Review, we discuss the functional impact that TEs exert on gene regulatory networks operating during mammalian embryogenesis and in somatic adult tissues. We also review some of the recent evidence outlining the myriad ways in which TEs can further increase functional variability in mammalian genomes, which may shed some light on why these elements have become so abundant in mammalian genomes.
Types of TEs in the mammalian genome
There are several classes of TEs (Fig. 1) that vary with regards to their structure, impact and regulation in mammalian genomes (reviewed in Hancks and Kazazian, 2016; Muñoz-Lopez et al., 2016; Richardson et al., 2015). These include DNA transposons as well as retrotransposons, which can be further subdivided into long terminal repeat (LTR) retrotransposons and non-LTR retrotransposons (Hancks and Kazazian, 2016; Muñoz-Lopez et al., 2016; Richardson et al., 2015). Approximately 3% of a typical mammalian genome is made of DNA transposons (Lander et al., 2001; Waterston et al., 2002); however, with the exception of some bat species, DNA transposons no longer mobilize in mammals (Mitra et al., 2013; Ray et al., 2007). In contrast, LTR retrotransposons and non-LTR retrotransposons comprise more than 40% of a typical mammalian genome and are still active in most mammalian species (Hancks and Kazazian, 2016; Lander et al., 2001; Richardson et al., 2015). LTR retrotransposons are similar to retroviruses in terms of their structure and mechanism of retrotransposition (Fig. 2A) and are hence often called endogenous retroviruses (ERVs; reviewed in Mager and Stoye, 2015). Full-length ERVs are flanked by LTRs that promote the transcription and maturation of ERV RNAs, and they also contain functional gag and pol genes, which encode structural proteins and enzymes involved in retrotransposition. However, ERVs often lack a functional env gene, which encodes the envelope protein that retroviruses typically use to exit cells (Lee and Bieniasz, 2007). Furthermore, recombination between LTRs occurs frequently, deleting the intervening internal ERV sequence and generating solo LTRs (Belshaw et al., 2007). The mobilization of active ERVs involves an RNA intermediate and a copy-and-paste mechanism that is similar to the initial steps of retroviral infection (Fig. 2A). ERV mobilization in mice is responsible for nearly 10% of spontaneous mutations in this species; by contrast, ERVs generally no longer mobilize in humans (Mager and Stoye, 2015; Maksakova et al., 2006). However, recent reports have identified polymorphic HERV-K insertions in humans (Wildschutte et al., 2016, reviewed in Hohn et al., 2013) suggesting recent mobilization activity and the possibility that some HERV-K copies might retain the capability to mobilize in present-day humans. Hundreds of different types of ERVs are present in a typical mammalian genome, some of which are autonomous and encode the functional retroviral proteins required for their mobilization (Fig. 2A), while others are non-autonomous and rely on retroviral proteins encoded by active ERVs to mobilize (Mager and Stoye, 2015; Maksakova et al., 2006). Notably, two types of ERVs – IAP (intracisternal A-particle) elements and MusD/ETn – appear to be the most active ERVs in mice and their activity can vary among inbred mouse strains (Maksakova et al., 2006).
Mammalian non-LTR retrotransposons are exemplified by long interspersed element class 1 (LINE-1 or L1) retrotransposons, which currently are the only active autonomous retrotransposons in the human genome (Hancks and Kazazian, 2016; Muñoz-Lopez et al., 2016; Richardson et al., 2015). LINE-1 retrotransposons make up 17% of the human genome and, although most are molecular fossils that have lost their ability to move, 80-100 copies of LINE-1 retain retrotransposition potential (Beck et al., 2010; Brouha et al., 2003). In mice, LINE-1 elements comprise a similar fraction of the genome to humans (Waterston et al., 2002) but a few thousand LINE-1 elements may retain the capacity to retrotranspose (reviewed in Richardson et al., 2015). Mammalian genomes also contain numerous short interspersed element (SINE) non-LTR retrotransposons, exemplified by Alu and SVA (SINE-VNTR-Alu) in the human genome (Lander et al., 2001). SINEs are non-autonomous retrotransposons that use LINE-1 proteins in trans to mobilize (Dewannieux et al., 2003; Dewannieux and Heidmann, 2005; Hancks et al., 2011; Raiz et al., 2012). Non-LTR retrotransposons also move by a copy-and-paste mechanism (Fig. 2B), but one that is fundamentally different from that used by LTR retrotransposons (Richardson et al., 2015). Notably, active LINE-1 elements encode two protein products termed open reading frame (ORF) 1 and ORF2 that are strictly required for LINE-1 mobilization (Moran et al., 1996). While ORF1 encodes an RNA-binding protein with nucleic acid chaperone activity (Hohjoh and Singer, 1997; Martin and Bushman, 2001), ORF2 codes for a protein with endonuclease and reverse transcriptase activity (Feng et al., 1996; Mathias et al., 1991) (Figs 1 and 2).
The regulation of TE activity in the mammalian genome
The activity of TEs is tightly regulated in mammals to control the number of insertions accumulated in genomes (Hancks and Kazazian, 2016; Muñoz-Lopez et al., 2016; Richardson et al., 2015). Mechanisms that restrict TE expression and mobilization are likely to be particularly important in germ cells, as well as in the pluripotent cells in early embryos that act as germ cell precursors, as new TE insertions in these cells can potentially be transmitted to the next generation and increase TE copy number during evolution (Crichton et al., 2014). However, the differential activity of these restriction mechanisms in different cell types can influence the ability of TEs to impact gene regulatory networks. Given that these topics have been extensively reviewed recently (Goodier, 2016; Hancks and Kazazian, 2016; Heras et al., 2014; Muñoz-Lopez et al., 2016; Pizarro and Cristofari, 2016; Richardson et al., 2015), below we provide just an overview of some of the main mechanisms used to control TE activity.
Transcriptional repression of TEs
Transcriptional repression is a major mechanism of defence against retrotransposons. In mice, the transcriptional regulation of LINE-1 and ERVs is dynamic during development and different mechanisms contribute to the repression of these TEs in different cell types. Histone modifications and DNA methylation, for example, both play important roles, although the relative importance of each of these mechanisms may depend on both the TE and the cell type (Crichton et al., 2014; Gerdes et al., 2016; Rowe and Trono, 2011; Schlesinger and Goff, 2015). DNA methylation plays a role in repressing both mouse and human LINE-1 elements, and some mouse ERVs including IAP elements (Bourc'his and Bestor, 2004; Karimi et al., 2011; Walsh et al., 1998). Multiple histone modifications, including methylation at histones H3K4, H3K9, H2A/H4R3 and H3K27 as well as histone acetylation, have also been implicated in TE transcriptional repression (Brunmeir et al., 2010; Di Giacomo et al., 2014; Karimi et al., 2011; Kim et al., 2014; Leeb et al., 2010; Macfarlan et al., 2011; Matsui et al., 2010; Reichmann et al., 2012). One of the major histone modifications used to repress a large number of TEs in pluripotent mouse embryonic stem cells (ESCs) is H3K9me3 (Karimi et al., 2011; Matsui et al., 2010; Rowe et al., 2010). This modification is largely deposited at TE sequences by the histone methyltransferase SETDB1, which is recruited to TEs by Krüppel-associated box-containing zinc-finger proteins (KRAB-ZFPs) and their associated co-repressor KAP1 (Castro-Diaz et al., 2014; Ecco et al., 2016; Karimi et al., 2011; Matsui et al., 2010; Rowe et al., 2010; Wolf and Goff, 2009; Wolf et al., 2015b). KRAB-ZFPs provide the sequence specificity to target repression to TEs, and some KRAB-ZFPs that target specific types of TEs have been identified (Castro-Diaz et al., 2014; Ecco et al., 2016; Wolf and Goff, 2009; Wolf et al., 2015a). However, as a reflection of the parasite/host battleground, young and presumably active TEs escape KAP1-mediated silencing as KRAB-ZFPs have not yet evolved to target these sequences (Castro-Diaz et al., 2014; Jacobs et al., 2014). Thus, KAP1 does not control the expression of all mammalian TEs, and alternate mechanisms exist to control the expression of young and active TEs (Castro-Diaz et al., 2014; Jacobs et al., 2014). Intriguingly, tissue-specific expression of some KRAB-ZFPs may underlie tissue-specific host gene expression in somatic tissues through their effects on TEs (see below) (Ecco et al., 2016).
Co-transcriptional repression of TE expression
In addition to transcriptional repression, splicing has been shown to regulate LINE-1 mobilization in cultured cell lines and somatic tissues by generating non-functional LINE-1 transcripts (Belancio et al., 2006; Perepelitsa-Belancio and Deininger, 2003). In addition, Microprocessor – a complex that naturally processes structured pre-microRNAs (pre-miRNAs) to generate miRNAs – can also process LINE-1 and SINE RNAs, thereby reducing their retrotransposition activity (Heras et al., 2014, 2013).
Post-transcriptional control of TEs
A number of viral restriction factors have also been shown to act post-transcriptionally to regulate the activity of retrotransposons (reviewed in Goodier, 2016; Pizarro and Cristofari, 2016). Some factors such as RNaseL (Zhang et al., 2014), SAMHD1 (Zhao et al., 2013), hnRNPL and nucleolin (Peddigari et al., 2013), MOV10 (Goodier et al., 2012), UPF1 (Taylor et al., 2013), PIN1 (Cook et al., 2015) and ZAP (Goodier et al., 2015) have been implicated in post-transcriptionally regulating TE RNAs or post-translationally modifying TE proteins. Others including SAMHD1 (Zhao et al., 2013) and PCNA (Taylor et al., 2013) have been implicated in modulating reverse transcription. Finally, APOBEC proteins (Richardson et al., 2014b; Schumann, 2007) and PCNA (Taylor et al., 2013) have been shown to interfere with later steps of the retrotransposition cycle. We speculate that the pattern of expression of TE-restriction mechanisms may impact human biology, as this will establish a level of TE activity in different cell types. Furthermore, the use of proteomics has resulted in a list of host factors that interact with LINE-1 and may regulate its retrotransposition (Goodier et al., 2013; Moldovan and Moran, 2015; Taylor et al., 2013). However, the role of most of these identified LINE-1 interactors remains to be determined and future studies will help to understand the dynamic interaction between TEs and cellular host factors.
TEs can impact developmental processes via various mechanisms
When expressed, TEs can affect developmental processes either via their gene products, which can influence the behaviour of host cells, or through new insertions that cause genetic changes in the host genome (Fig. 3). Not surprisingly, therefore, the dysregulated expression of TEs has been linked with defects in various developmental processes in mice, including aberrant proliferation of male germ cells (Galli et al., 2005), defects in oogenesis (Malki et al., 2014; Su et al., 2012), disruption of homologous chromosome synapsis during meiosis (reviewed in Crichton et al., 2014; Öllinger et al., 2010), activation of the unfolded protein response during B lymphocyte differentiation (Pasquarella et al., 2016) and inappropriate activation of innate immune responses (Herquel et al., 2013; Stetson et al., 2008). On the other hand, new TE insertions into genes can act as insertional mutagens in mammalian genomes and interfere with gene function (Fig. 3, reviewed in Hancks and Kazazian, 2016; Heras et al., 2014; Muñoz-Lopez et al., 2016; Pizarro and Cristofari, 2016; Richardson et al., 2015). Such insertions can, for example, introduce actively transcribing promoters into genes and cause transcriptional interference. They can also induce premature termination of transcription via the incorporation of TE-derived polyadenylation sites (Perepelitsa-Belancio and Deininger, 2003). In addition, inefficient transcriptional elongation through the AT-rich LINE-1 sequence can modulate gene expression levels (Han et al., 2004). TE insertions can also introduce TE-derived splice acceptor or donor sites that alter splicing, generating non-functional or nonsense transcripts (Belancio et al., 2006) (Fig. 3A), or can be incorporated into mRNAs and introduce frameshifts or premature termination codons (Fig. 3A). However, it should be noted that many of these mechanisms can also potentially confer new properties and functions to a host gene rather than simply inactivate it (Fig. 3B-D). Below, we examine how the present-day functions of TEs can affect developmental genes and processes in mammals.
TEs as promoters that drive the transcription of host genes
TEs contain transcription factor binding sites that promote transcription by RNA polymerase II [in the case of DNA transposons, ERVs, LINE-1 elements, primate SVAs and even SINEs (Lai et al., 2009)] or RNA polymerase III (in the case of short SINEs such as human Alu, and murine B1 and B2). As such, if a TE integrates into or near host genes, its promoters can drive the expression of novel transcripts that encompass part of the coding region (Fig. 3B-D). The co-option of TE-derived sequences as gene promoters can allow a gene to be expressed in new cell types or contexts (Fig. 3B,D) and can generate truncated or extended protein products, potentially allowing host genes to acquire new functions (Fig. 3B-D). Indeed, LTR sequences frequently act as promoters for host genes (Cohen et al., 2009). One example of a TE-derived promoter generating a cell type-specific isoform of a host gene with novel properties is seen in the case of mouse Dicer1 (Flemr et al., 2013). Dicer1 encodes an RNA endonuclease that generates small regulatory RNAs during oogenesis and in other cell types. However, as a result of the intragenic insertion of an MT-C ERV retrotransposon, a truncated form of mouse Dicer1 is expressed specifically in oocytes. This truncated ERV-driven form of Dicer1, which is essential for oogenesis, lacks a potentially autoinhibitory helicase domain present in the N-terminal region of the full-length protein and is more enzymatically active than full-length Dicer1 (Flemr et al., 2013). Many genes in mouse oocytes and zygotes are similarly expressed from ERV insertions acting as promoters for nearby genes (Peaston et al., 2004), including Gata4 and Tead4, which encode key transcription factors that drive the specification of primitive endoderm and trophectoderm, respectively (Macfarlan et al., 2012). It will be of interest to investigate if there are additional examples in which co-option of ERV promoters modifies the function of host genes in cells. Interestingly, the expression of selected TE-derived transcripts can be used to track cell types during development and possibly also in adult tissues (see Box 1).
The expression of individual TEs can be used to distinguish and/or isolate specific cell types during development. For example, the knowledge that MERVL ERVs are highly expressed in totipotent two-cell stage mouse zygotes has recently been used to isolate sub-populations of ESCs that exhibit totipotent rather than pluripotent developmental potential (Macfarlan et al., 2012). A similar strategy has been used to show that depletion of the chromatin assembly factor CAF1 promotes the generation of cells that resemble two-cell stage zygotes from mouse ESCs (Ishiuchi et al., 2015). Similarly, the transcription of selected primate-specific ERVs has been used to isolate populations of naïve-like pluripotent stem cells from human ESC cultures (Wang et al., 2014). It will be of interest to see whether this approach can facilitate the identification of other transient or low abundance cell populations in developing tissues.
ERVs are also able to drive host gene expression in differentiating somatic tissues. Transcription of Bglap3, which encodes an osteocalcin-related protein, originates from a nearby IAP element in mouse ESCs and in some differentiated somatic tissues (Ecco et al., 2016). The regulation of Bglap3 expression depends on the activity of a KRAB-ZFP that targets KAP1 and H3K9me3 repressive histone modifications to this IAP element, and conditional deletion of either the KRAB-ZFP or KAP1 post-natally in the liver results in transcriptional activation of Bglap3 in a tissue where it is normally repressed (Ecco et al., 2016). In sum, the developmental regulation of KRAB-ZFPs, and potentially other regulators of TE expression, can therefore impact host gene expression via the regulation of ERVs in embryonic development but also in fully differentiated somatic tissues. Furthermore, some ERVs appear to acquire epigenetic silencing marks early in development that maintain their repression even when KRAB-ZFPs or the KAP1 co-repressor is deleted later in differentiating tissues (Rowe et al., 2013, 2010; Wolf et al., 2015a).
Variable epigenetic silencing of ERV-derived alternative promoters in somatic tissues can also contribute to the regulation of host genes. This mechanism is exemplified by the mouse Agouti gene. Agouti encodes a signalling molecule that is expressed in hair follicles and inhibits the MC1R melanocortin receptor in melanocytes (Voisey and Van Daal, 2002). During the hair growth cycle, a burst of Agouti expression causes melanocytes to change the type of melanin they produce from black eumelanin to yellow phaeomelanin, resulting in a sub-apical yellow band on an otherwise black hair (Blewitt and Whitelaw, 2013). A naturally occurring mouse mutant exhibiting an insertion of an IAP element immediately 5′ and antisense to the first coding exon of Agouti produces mice with yellow coats. A cryptic antisense promoter in this IAP element drives constitutive expression of functional Agouti transcripts in these mutants (Michaud et al., 1994), but variable DNA methylation of this IAP element between individuals means that these genetically identical mice display a continuous spectrum of coat colour phenotypes from yellow to agouti. Offspring arising from yellow mothers, but not yellow fathers, are more likely to have yellow coats, suggesting that there can be some trans-generational inheritance of the epigenetic state of this IAP insertion (Morgan et al., 1999). Thus, mechanisms that regulate ERV expression are able to impact ERV promoter-driven host gene expression and the development of somatic tissues.
SINE and LINE-1 TEs can also act as alternative promoters to drive the expression of host genes (Fig. 3B,D). SINE elements typically contain an internal RNA polymerase III promoter that transcribes these elements, but some also carry an active RNA polymerase II promoter that drives transcription in an anti-sense orientation (Lai et al., 2009). Transcripts originating from the anti-sense RNA polymerase II promoter in these elements can drive the expression of nearby host genes (Ferrigno et al., 2001). Full-length LINE-1 elements contain conserved sense and anti-sense promoters (Speek, 2001; Swergold, 1990), and recent studies have shown that the anti-sense promoter in primate LINE-1 drives the expression of a trans-acting polypeptide, ORF0, that can stimulate LINE-1 mobilization in trans (Fig. 3D) (Denli et al., 2015). For some host genes, the majority of their transcripts in induced pluripotent stem cells (iPSCs) originate from a nearby LINE-1 antisense promoter and contain ORF0 peptide sequences that can be spliced onto the host-derived protein (Denli et al., 2015).
The exaptation of TEs as alternative promoters thus appears to be a relatively simple way to alter the pattern or level of expression of host genes during development (Fig. 3). There are also examples of convergent exaptation of TEs as promoters of specific developmental genes. In some mammals, the gene encoding the hormone prolactin is expressed in the pituitary gland but is also expressed from an alternative promoter in the maternal endometrial decidua during pregnancy, regulating immune cells, angiogenesis and invasion of the fetal placenta into maternal tissues (Jabbour and Critchley, 2001). In humans, the transcription of decidual prolactin (PRL) initiates from an upstream MER39 ERV. In contrast, the transcription of decidual Prl in mice originates from a MER77 ERV located further upstream, whereas elephants transcribe decidual prolactin from an elephant-specific LINE-1 insertion (Emera et al., 2012; Emera and Wagner, 2012). Thus, the independent evolution of TE-derived prolactin promoters at multiple points in mammalian evolution suggests that TEs provide a rich source of transcription factor binding sites that allows host genes to acquire expression and function in new cell types and tissues (Fig. 3).
TEs as tissue-specific enhancers of host genes
In addition to acting as promoters that drive the expression of alternative isoforms of host genes, transcription factor-binding sites within TEs can act as host gene enhancers in specific tissues or developmental contexts (Fig. 3B,D). Indeed, conserved non-exonic TEs in the human genome tend to cluster within 1 Mb of developmental genes and transcriptional regulators, suggesting that this may be a common mechanism for TEs to impact mammalian development (Lowe et al., 2007). Several examples of TEs acting as enhancers have been noted. In trophoblast stem cells, for example, the mouse-specific RLTR13 ERV recruits the trophoblast transcription factors Eomes, CDX2 and ELF5, and appears to act as a trophoblast enhancer for around 100 host genes (Chuong et al., 2013). ERV-derived enhancers have also been reported in developing primordial germ cells (Liu et al., 2014) and in ESCs (Kunarso et al., 2010). TEs, particularly ERVs, are present in 5-25% of the genomic regions bound by the pluripotency-associated transcription factors OCT4 or NANOG in human and mouse ESCs, and in mouse ESCs, ERV elements provide binding sites for the pluripotency-associated transcription factor SOX2 (Bourque et al., 2008; Kunarso et al., 2010). Additionally, ERVs are able to act as tissue-specific enhancers in the innate immune system (Fig. 3B,D), where they act to mediate part of the interferon response (see Box 2).
TEs can function as tissue-specific enhancers in the innate immune system, acting to mediate the response to interferons, which are pro-inflammatory signalling molecules that are secreted in response to infection. Specific TEs are enriched close to genes that are activated by interferon (interferon-stimulated genes, ISGs) in human cells (Chuong et al., 2016). One of the most enriched TEs, MER41B ERV, contains interferon-inducible binding sites for the STAT1 transcription factor, which mediates part of the interferon response (Chuong et al., 2016). A MER41B insertion upstream of AIM2, a human ISG, is required for AIM2 expression in response to interferon, and deletion of this element impairs the antiviral response of human cell lines (Chuong et al., 2016). Interestingly, there is no copy of MER41B upstream of Aim2 in mice, and Aim2 is constitutively expressed rather than interferon-inducible in this species. Notably, RLTR30B ERVs in mice also contain interferon-inducible STAT1 binding sites and are enriched near functionally annotated immunity genes in this species (Chuong et al., 2016). It is possible that the infectious ancestors of these ERVs possessed interferon-inducible STAT1-binding sites in order to exploit the host's innate immune system to promote their own transcription, and that this innovation has now been repeatedly and independently co-opted by mammals in order to drive evolution of the innate immune system (Chuong et al., 2016).
Domesticated TEs also play a role in the generation of antibody repertoires during adaptive immune system development (Teng and Schatz, 2015). The breaking and rejoining of DNA molecules that occurs during diversification of antibody genes by V(D)J recombination has some similarities to the cut-and-paste mobilization of DNA transposons (Melek et al., 1998; Teng and Schatz, 2015; van Gent et al., 1996). Indeed, it has been suggested that the RAG1 and RAG2 genes that are required for V(D)J recombination in developing lymphocytes were derived nearly 500 million years ago from a Transib superfamily DNA transposon (Kapitonov and Jurka, 2005). The domestication of TE-derived sequences in this context thus appears to play an important role in increasing genetic diversity beyond that encoded by the germline genome.
SINEs are also able to act as enhancers for host genes, particularly during brain development. The gene encoding the LIM homeobox transcription factor ISL1, which is required for motor neuron development, has a nearby conserved exapted LF-SINE TE insertion that can drive gene expression in neural tissues (Bejerano et al., 2006). Similarly, AmnSINE1 TE insertions are associated with genes involved in brain development, such as Fgf8, and act as neural-specific enhancers in transgenic mice (Sasaki et al., 2008).
DNA transposons, too, can act as enhancers to influence host gene expression and contribute to gene regulatory networks in development, even though they no longer mobilize in most mammals. The MER130 DNA transposon appears to act as a neocortical enhancer for a number of genes involved in neural development including Robo1 and Id4 (Notwell et al., 2015). Similarly, DNA transposons are strongly represented amongst the large number of TEs that contribute to gene regulatory networks in the mammalian endometrium during pregnancy. The MER121 and MER97C DNA transposons are enriched in regions bound by activated progesterone receptor in human endometrial cells and appear to contribute to the responses to progesterone in this tissue (Lynch et al., 2015).
TEs as regulators of chromosome organization
In addition to their more direct roles in regulating host gene expression, TEs can influence the organization of mammalian chromosomes. One of the major regulators of mammalian chromosome organization is the CCCTC-binding factor CTCF, which can act as an insulator to block the interaction between an enhancer and a promoter, as a barrier to prevent the spreading of chromatin domains and as an anchor that assembles chromatin into loops or domains within which regulatory elements can interact (reviewed in Merkenschlager and Nora, 2016). In combination with cohesin, CTCF plays an important role in allowing developmental enhancers to regulate gene expression, particularly in the context of long-range enhancers (Fig. 3D) (Merkenschlager and Odom, 2013). TEs are strongly enriched within regions of mammalian genomes that bind CTCF, and in mice, B2 SINEs are enriched in these regions and carry a CTCF-binding motif (Bourque et al., 2008; Schmidt et al., 2012; Sundaram et al., 2014). One example of a context in which a B2 SINE insertion influences developmental gene regulation through effects on chromatin domains occurs during the expression of growth hormone (GH) in the developing pituitary (Lunyak et al., 2007). During the early stages of pituitary development, when GH is not expressed, a repressive chromatin domain extends across the GH locus. However, this domain becomes restricted as pituitary development proceeds. A boundary element located upstream of GH marks the edge of the chromatin domain, and a B2 SINE TE in this region is both necessary and sufficient for the insulating activity that prevents repressive chromatin from extending into the GH domain during late pituitary development (Lunyak et al., 2007). The insulating activity of this SINE TE requires its RNA polymerase II and RNA polymerase III transcripts, suggesting that bidirectional transcription of this element causes a local change in chromatin structure that prevents repressive chromatin from spreading across it (Lunyak et al., 2007; Ponicsan et al., 2010). Notably, additional examples of human and rodent SINEs containing defined transcription factor-binding sites have been previously documented (Morales-Hernández et al., 2016; Román et al., 2011). Future research will help to define the full-repertoire of effects that TEs can exert on mammalian chromosome organization.
The domestication of TE-derived proteins and processes in development
TEs can also have an impact on mammalian development through their proteins becoming domesticated, i.e. performing functions for the host organism. The human genome contains around 50 genes that are probably domesticated TEs (Lander et al., 2001). One developing tissue that appears to rely significantly on domesticated TEs is the placenta. Domestication of gag-pol regions of Sushi-ichi-derived ERVs has generated the paternally imprinted Peg10 and Peg11 genes that have essential functions during mouse placental development (Sekita et al., 2008). Peg10 plays a role in the development of the trophectoderm-derived spongiotrophoblast and labyrinth layers of the placenta, the latter being the site in which components are exchanged between maternal and fetal circulatory systems (Ono et al., 2006) Peg11 functions in extraembryonic mesoderm-derived endothelial cells that line the fetal capillaries of the placenta (Sekita et al., 2008). Domestication of the env regions of ERVs has also generated genes with essential functions for placenta development. The fetal capillaries in the labyrinth are surrounded by syncytial trophoblast cells that play an important role in allowing the exchange of components of the maternal and fetal circulatory systems. This epithelial layer is formed via intercellular fusion between trophoblast cells in a process involving proteins known as syncytins (Dupressoir et al., 2009, 2011). These syncytins are co-opted fusogenic Env proteins derived from ERVs (Blond et al., 2000; Mi et al., 2000). Remarkably, mouse, human and rabbit have all independently co-opted env genes from different ERVs to act as syncytins, and independent capture of syncytins has occurred at least six times during mammalian evolution (Dupressoir et al., 2012). Even marsupials, which have a relatively transient placenta that is in contact with the maternal endometrium for a short period of time, may have domesticated a fusogenic placentally expressed ERV Env protein (Cornelis et al., 2015).
DNA transposons can also undergo domestication, as exemplified by the RAG1 and RAG2 genes that have essential functions during development of the adaptive immune system (see Box 2). Finally, in what may be a form of domestication, recent data suggest that LINE-1 retrotransposons may have a present-day function during oogenesis in mice as a quality control mechanism that eliminates defective oocytes, suggesting a function for LINE-1 in regulating the ovarian oocyte pool (Malki et al., 2014). The ovarian oocyte pool gradually declines with age and the size of the oocyte pool at birth is thought to be a major determinant of reproductive success in older women. It will therefore be of interest to test whether LINE-1 plays a similar role in human oogenesis.
Diversification of the mammalian transcriptome by TEs
TEs often insert into introns or untranslated regions of genes, and this can sporadically result in the exonization of TEs (Fig. 3B). Indeed, exonized TEs can expand the mammalian transcriptome and proteome but can also be used to fine-tune gene regulation (Piriyapongsa et al., 2007a). The accumulation of TE-derived sequences in cellular mRNAs can lead to their differential regulation due to TE control mechanisms targeting these TE-portions or structures (Heras et al., 2014, 2013). Notably, several classes of TEs have been found inserted within mammalian RNAs (Piriyapongsa et al., 2007a,b; Zarnack et al., 2013) and it is likely that their presence impacts gene regulation and function by providing or interfering with regulatory elements in those RNAs (Fig. 3).
Another way that TEs diversify the mammalian transcriptome is by generating long non-coding RNAs (lncRNAs) from their promoters (Fig. 3D). Remarkably, almost two-thirds of all known human lncRNAs contain TE fragments in their sequences (Kapusta et al., 2013; Macia et al., 2011). LncRNAs are more abundant than known genes and are involved in multiple biological processes including gene regulation, maintenance of nuclear architecture and splicing (Macia et al., 2015; Mercer et al., 2009; Moran et al., 2012). The existence of functional lncRNAs was discovered in the early 1990s, with the characterization of Xist – a lncRNA involved in X chromosome inactivation (Brown et al., 1992; reviewed in Moran et al., 2012). This X chromosome-encoded lncRNA is thought to interact with LINE-1 elements to bring about X chromosome inactivation in females (Lyon, 2003). LINE-1 elements are over-represented on the X chromosome (Lander et al., 2001) and it has been suggested that evolutionary older silent LINE-1 elements are involved in assembling a Xist-dependent repressive domain on the inactive X chromosome, while evolutionary younger transcribed LINE-1 elements help spread this repressive domain into adjacent chromosomal regions (Chow et al., 2010). The over-representation of LINE-1 elements on the X chromosome may therefore reflect a present-day function for these sequences in X chromosome inactivation.
Intriguingly, despite a lack of sequence conservation, unexpected roles for multiple human and mouse TE-derived lncRNAs in regulating pluripotency have also recently been described (Fort et al., 2014; Guttman et al., 2011; Lu et al., 2014). Multiple mechanisms may be involved, and some of these TE-derived lncRNAs represent transcripts originating from ERV elements acting as enhancers or promoters in these cells (Fort et al., 2014). However, lncRNAs derived from HERV-H ERVs in human ESCs appear to act in trans via physical association with chromatin modifiers (Guttman et al., 2011; Lu et al., 2014; Wang et al., 2014). Future research is likely to reveal how these diverse species-specific TE-derived sequences can regulate conserved developmental processes.
TEs can also influence developmental processes in trans through the LINE-1-dependent generation of processed pseudogenes (Esnault et al., 2000). Although LINE-1-encoded proteins tend to bind to their encoding mRNA in cis (Esnault et al., 2000; Wei et al., 2001) (Fig. 2A), they occasionally bind to cellular host mRNAs in trans and catalyse their insertion into the genome as processed pseudogenes (Fig. 3C). Over the course of evolution, mammalian genomes have accumulated more processed pseudogenes than annotated genes (Zhang et al., 2004). Although most inserted processed pseudogenes lack a functional promoter upon insertion, a promoter can evolve, be captured by a new TE insertion, or be generated by recombination, resulting in a new functional gene (Ji et al., 2015) (Fig. 3C). Expression of a LINE-1-generated pseudogene derived from the FGF4 gene is strongly associated with a short-legged phenotype selected in dogs (Parker et al., 2009). Thus, the activity of TEs can mediate the duplication and subsequent diversification of key developmental regulators.
Finally, ongoing LINE-1 retrotransposition itself can generate new genes by a mechanism termed exon shuffling (Moran et al., 1999). Exon shuffling occurs when an active LINE-1 within a gene retrotransposes to a new genomic location and delivers nearby coding sequences to the new locus. Indeed, LINE-1-mediated exon shuffling has probably increased the repertoire of the human proteome but, as a result of frequent 5′ truncation during retrotransposition (Grimaldi et al., 1984; Kazazian et al., 1988), its overall contribution to the human genome remains elusive. Notably, known examples of genes generated by LINE-1 exon shuffling have been found in primate species, and include the generation of a new gene product that can restrict HIV infection in new world monkeys (Sayah et al., 2004).
LINE-1 activity in humans
LINE-1 elements, which are the only active autonomous retrotransposons in humans, are thought to mobilize during at least two different developmental contexts: the early embryo and the developing/adult brain (Hancks and Kazazian, 2016; Muñoz-Lopez et al., 2016; Richardson et al., 2015). By contrast, the expression and mobilization of TEs in other somatic cells in humans appears to be low.
Retrotransposition in the early embryo
Studies of transgenic mice carrying engineered human LINE-1 retrotransposition reporters suggest that LINE-1 mobilizes in pre-implantation embryos (Kano et al., 2009; Muotri et al., 2005). Some of this LINE-1 mobilization could potentially reflect activity in the trophectoderm, which gives rise to the extra-embryonic component of the placenta (Fig. 4A). Indeed, the placenta has more-limited TE restriction mechanisms than other hypomethylated cell types (Reichmann et al., 2013), although LINE-1 mobilization in these cell types has not been directly assessed. However, at least some of the LINE-1 mobilization in pre-implantation embryos occurs in the pluripotent epiblast, which gives rise to all embryonic tissues (Fig. 4A). Notably, such LINE-1 mobilization in pluripotent cells in pre-implantation embryos could result in the new insertion being a mosaic within somatic and germline cells in adults.
LINE-1 is also highly expressed, and new insertions of endogenous LINE-1 elements can accumulate, in human pluripotent cell lines that mimic some aspects of embryonic pluripotent cells (Garcia-Perez et al., 2007, 2010; Klawitter et al., 2016; Wissing et al., 2011, 2012). The LINE-1 transcripts expressed in pluripotent cell lines represent a restricted subset of the genomic LINE-1 repertoire, suggesting that the surrounding chromatin environment of a LINE-1 locus might contribute to local activation of these elements in this cell type (Karimi et al., 2011; Macia et al., 2011; Philippe et al., 2016). Notably, the analysis of a new LINE-1 insertion in humans is also consistent with mobilization of this element occurring in pluripotent cells early in embryogenesis (van den Hurk et al., 2007). However, the frequency and specific timing of when retrotransposition takes place in early embryos and during gametogenesis remain to be determined.
Retrotransposition in the brain
Both endogenous and engineered LINE-1 elements have also been shown to mobilize in the mammalian brain (Fig. 4A) (Muotri et al., 2005). Surprisingly, LINE-1 mRNAs are expressed in neuronal precursor cells (NPCs) in the mammalian brain and new LINE-1 insertions can accumulate in NPCs, at least in mouse models of human LINE-1 retrotransposition and in cultured human NPCs (Fig. 4A) (Coufal et al., 2009; Muotri et al., 2005). Notably, a study that analysed LINE-1 expression in fetal NPCs and in other somatic cells (skin) isolated from the same donor demonstrated that a subtle change in LINE-1 promoter DNA methylation levels in brain cells might explain why LINE-1 mRNAs are expressed selectively in NPCs when compared with other tissues, such as skin (Fig. 4A) (Coufal et al., 2009). However, the availability of transcription factors can also contribute to this phenomenon (Richardson et al., 2014a; Thomas et al., 2012).
More recently, the development of next-generation DNA sequencing and single-cell genomics-based studies has allowed researchers to demonstrate that the human brain is in fact made of a mosaic of genomes (Baillie et al., 2011; Erwin et al., 2016; Evrony et al., 2012, 2015; Upton et al., 2015), although there is an ongoing debate about the frequency of retrotransposition in this tissue (Evrony et al., 2016; Richardson et al., 2014a). Furthermore, while retrotransposition has been proposed to be ubiquitous in the hippocampus of the human brain (Upton et al., 2015), we know little about other brain regions or neuronal cell types that may support elevated levels of retrotransposition. Nonetheless, it is clear that the ongoing activity of LINE-1 elements in the brain could provide a mechanism to ensure that no two human brains – even those of identical twins – are genetically identical.
The functional impact of LINE-1 retrotransposition in the brain is less clear (Richardson et al., 2014a; Singer et al., 2010). Indeed, it is possible that LINE-1 expression and mobilization in the brain could be a consequence of LINE-1 elements having a present-day function as enhancers or promoters for host genes in this tissue. However, it is possible that LINE-1 activity in mammalian brains might be on its way to domestication and that, like the domestication of DNA transposons in the immune system (see Box 2), the ability of TEs to increase genetic diversity beyond that encoded by the germline genome is providing some benefit to the development or function of this highly complex organ. Constructing a map of LINE-1 expression and retrotransposition in different human brain regions and cellular types will help us to understand both the magnitude and impact of somatic retrotransposition on brain biology.
It should also be noted that LINE-1 retrotransposition in the brain is potentially mutagenic, although de novo somatic LINE-1 insertions disrupting brain development or function in patients remains to be demonstrated. Recent studies suggest that the ongoing activity of LINE-1 elements in the healthy brain can result in the generation of genomic rearrangements that could delete genomic regions proximal to genes, although any functional significance remains to be determined (Erwin et al., 2016). Similarly, it remains to be elucidated whether dysregulated LINE-1 expression or retrotransposition could contribute to brain disorders, although data acquired in models of Rett syndrome, ataxia telangiectasia and schizophrenia suggest that retrotransposition is indeed associated with these syndromes (Bundo et al., 2014; Coufal et al., 2011; Muotri et al., 2010; reviewed in Richardson et al., 2014a).
The expression and mobilization of TEs in other somatic human cells
The description of LINE-1 activity in the human brain raises another important question: are LINE-1 elements expressed and mobilized in other somatic tissues? Although more research is needed, several studies suggest that the somatic activity of TEs might be restricted to the human brain. Notably, in a recent study it was reported that human tissues including the oesophagus, prostate, stomach and heart muscle express relatively low levels of LINE-1 mRNAs whereas expression in the adrenal gland, kidney, spleen and cervix was below the detection limit (Belancio et al., 2010). Whether part of the expressed LINE-1 mRNAs corresponds to active retrotransposition-competent mRNA is unknown. More recently, and exploiting the inherent capability of human ESCs to differentiate into somatic stem cells, the expression levels of human LINE-1 elements as well as putative engineered LINE-1 retrotransposition have been explored in a panel of human somatic stem/progenitor cells, including human NPCs, human mesenchymal stem cells (MSCs), haematopoietic stem cells (HSCs) and progenitor keratinocytes (Macia et al., 2016) (Fig. 4B). These data suggest that NPCs are the only analysed population of somatic stem/progenitor cells in which LINE-1 expression levels are significant (Macia et al., 2016). In addition, efficient retrotransposition was only detected in NPCs and mature neuronal cells (Macia et al., 2016) (Fig. 4B). These data suggest that retrotransposition in human somatic tissues is restricted to the brain (Baillie et al., 2011; Erwin et al., 2016; Evrony et al., 2012, 2015; Macia et al., 2016; Upton et al., 2015). However, further research is required to define whether additional LINE-1-dependent TEs are also able to mobilize in the human brain, to determine the extent of such mobilization and to fully understand the phenotypic consequences of TE activity in the human brain.
While our understanding of the precise role of TE expression and mobilization during human development remains limited, recent studies have provided key insights into the regulation of TEs. Indeed, future studies analysing the impact that the ongoing activity of TEs during embryogenesis and in the adult brain can exert in a healthy or diseased genetic background will shed light into the unknown functions of TEs in normal mammalian development and biology. Recent studies suggest that there are multiple ways in which a TE-derived sequence, at either the RNA or DNA level, can affect gene regulatory networks and gene function during mammalian development. Thus, these present-day functions of TEs may explain the abundance of TEs in mammalian genomes and provide a convenient mechanism for host genes to evolve new expression patterns and isoforms. In addition, it is clear that independent convergent exaptation of different TEs as developmentally controlled gene regulatory sequences has occurred in multiple mammalian species during evolution, and that TE domestication has been key for generating a functional placenta. In the future, additional genomics and genome-wide epigenetic maps from several cell types across multiple species, in combination with CRISPR/Cas9-driven functional genomics, will more clearly define the role of TEs in mammalian development.
Notably, the existence of active TEs in mammals implies that the mammalian body is a mosaic of gene regulatory networks. On one hand, the mobilization of TEs during early embryogenesis results in the generation of mosaic bodies with respect to their TE content. Whether this mosaicism has an impact on human homeostasis or predisposition to disease is currently unknown. Future studies exploiting single-cell genomics and animal models of de-regulated retrotransposition will clearly help to define the contribution of active TEs to mammalian biology and disease. On the other hand, LINE-1 mobilization in humans is not equally distributed in the body and seems to be restricted to the brain, suggesting that any functional consequences of LINE-1 mobilization are likely to impact this tissue. The somatic activity of LINE-1 generates genetic variation in the human brain and it is possible that this activity might be undergoing domestication in mammals. However, many questions remain unanswered: can LINE-1 elements move equally well in all neuronal and glial cell types present in a vertebrate brain? Do all vertebrate species contain active TEs in their brains? And of course, the most important question in current TE biology: what is the role of the somatic activity of TEs in the vertebrate brain?
T.J.W. is currently funded by the Network of European Funding for Neuroscience Research (PCIN-2014-115-ERA-NET NEURON II) and is a former European Commission Marie Curie IEF fellow (FP7-PEOPLE-2011-IEF-300354). Research in J.L.G.-P.'s lab is supported by the European Commission (CICE-FEDER-P09-CTS-4980, CICE-FEDER-P12-CTS-2256, Plan Nacional de I+D+I 2008-2011 and 2013-2016: FIS-FEDER-PI11/01489 and FIS-FEDER-PI14/02152, PCIN-2014-115-ERA-NET NEURON II), the European Research Council (ERC-Consolidator ERC-STG-2012-233764), by an International Early Career Scientist grant from the Howard Hughes Medical Institute (IECS-55007420) and by The Wellcome Trust University of Edinburgh Institutional Strategic Support Fund (ISFF2). Research in I.R.A.’s lab is supported by a Medical Research Council Human Genetics Unit intramural programme grant.
The authors declare no competing or financial interests.