Since the final decades of the last century, twin studies have made a remarkable contribution to the genetics of human complex traits and diseases. With the recent rapid development in modern biotechnology of high-throughput genetic and genomic analyses, twin modelling is expanding from analysis of diseases to molecular phenotypes in functional genomics especially in epigenetics, a thriving field of research that concerns the environmental regulation of gene expression through DNA methylation, histone modification, microRNA and long non-coding RNA expression, etc. The application of the twin method to molecular phenotypes offers new opportunities to study the genetic (nature) and environmental (nurture) contributions to epigenetic regulation of gene activity during developmental, ageing and disease processes. Besides the classical twin model, the case co-twin design using identical twins discordant for a trait or disease is becoming a popular and powerful design for epigenome-wide association study in linking environmental exposure to differential epigenetic regulation and to disease status while controlling for individual genetic make-up. It can be expected that novel uses of twin methods in epigenetic studies are going to help with efficiently unravelling the genetic and environmental basis of epigenomics in human complex diseases.
Human twins are produced when one pregnancy results in two offspring. A pair of identical twins is developed when one oocyte is fertilized by a single sperm (monozygotic, MZ) and the embryo splits into two, resulting in two genetically identical offspring. In contrast, a pair of fraternal twins is produced when two oocytes (dizygotic, DZ) are released at a single ovulation and fertilized by two different sperm at the same time, resulting in the same type of genetic relationship as siblings. The sharing of intrauterine and rearing environments and genetic similarity and dissimilarity in twins make them unique subjects for genetic studies. For example, by modelling phenotypic covariance in identical and fraternal twins and assuming equal sharing of the rearing environment, the classical twin design is able to decompose observed phenotype variation in twins into genetic (additive, dominant) and environmental (common and unique) components, enabling estimation of the genetic contribution to phenotype development or diseases (heritability) without knowledge of the individual genotypes. Although there can be critical flaws in the assumptions (Rijsdijk and Sham, 2002), the classical twin method has made a remarkable contribution to the literature on human genetics in the past century, especially in the last few decades.
A significant change in 21st century genetics will be the shift from structural genetic variations, with genes regarded as a static concept, to functional genomics, where the dynamic patterns of gene activity are analysed jointly from gene interaction to gene regulation and functional genomics analysis (Peltonen and McKusick, 2001). Epigenetics is the study of changes in the regulation of gene activity and expression that are not dependent on gene sequence. In a broad sense, the epigenetic control over gene activity involves multiple molecular mechanisms including binding of small molecules to specific sites in DNA or chromatin, non-coding RNAs (ncRNAs), microRNAs, etc., all of which serve as ‘volume controls’ that tune up or down a gene's expression. Of the different mechanisms, DNA methylation is the most robust form of epigenetic modification readily measurable using high-throughput techniques. Although current techniques allow epigenetic profiling at the genome scale, identifying the epigenetic patterns in disease conditions and under given environmental exposures imposes new challenges both in experimental design and in methodological issues.
Like any complex phenotype, the epigenetic control of gene expression activity can be influenced by both genetic and environmental factors. In fact, recent research showed that the impact of the environment can be acquired via the epigenome (Fraga et al., 2005; Wong et al., 2005; Poulsen et al., 2007; Szyf et al., 2008; Ling and Groop, 2009), a hot area in complex disease studies including cancer that draws active research. With the rapid development in epigenomic analysis using next-generation sequencing or array-based technologies, the newly emerging epigenetic epidemiology is serving as a bridge linking gene activity with environmental conditions. In this regard, twins are useful study subjects that can help with assessing the genetic and environmental influences in epigenetic regulations (Tan et al., 2013).
Twin methods for epigenetic studies
The classical twin design
In the classical twin design, the observed resemblance in MZ twins versus DZ twins allows the decomposition of the phenotypic variation into that due to additive genetic (A), common or shared environmental (C), and unique environmental or residual (E) influences (Fig. 1), with A resulting from the sum of allelic effects across multiple genes; C from environmental influences shared by twin members, such as prenatal condition, home environment and family socioeconomic status; and E from influences that are unshared or unique, such as personal lifestyle and experiences, stochastic biological effects, as well as measurement error. Given the fact that MZ twins share all their genes, while DZ twins share on average half of their segregating genes and assuming that environmental influences are shared to the same extent by MZ and DZ twins (equal environment assumption, Fig. 1), the classical twin method is able to estimate the different variance components using phenotype data on twins. The genetic and environmental sharing in twins means that any phenotypic correlation between MZ twins (rMZ) can be attributed to their shared genetic makeup and their shared environmental influences, i.e. rMZ=A+C. Likewise, any
observed correlation between DZ twins is due also to their shared genetic (50% on average) as well as environmental influences with rDZ=0.5A+C. The variance of each twin is due to their genetic make-up, shared environmental influences and residual factors unique to each twin, i.e. A+C+E. The structural equation modelling (SEM) can be applied to estimate the A, C and E components, i.e. the ACE model, while adjusting for the effect of covariates (e.g. age), providing confidence intervals for parameter estimates and reporting performance of the model in fitting to the data. By calculating the Akaike information criterion (AIC) (Akaike, 1974), performance of the full ACE model can be compared with its nested models including the AE model (dropping the C component), the CE model (dropping the A component) and E model (dropping the A and C components). This enables selection of the best fitting and most parsimonious model for a given set of data. Instead of the popular ACE model, one can also fit an ADE model with D standing for the dominant genetic effect. The presence of dominance can be recognized when rDZ is much lower than half rMZ. Likewise, nested models (AE, DE and E) can be fitted and performances compared to find the best fitting model.
By treating epigenetic measurement such as the DNA methylation level at a CpG site as a quantitative phenotype, the classical twin model can be applied to estimate the genetic and environmental components in the epigenetic control of gene activity during, for example, human development and ageing, and in disease aetiology. Note that, in the ACE model, the C component reflects the common environment in their early lives from pregnancy to at least leaving home. In fact, the ACE model can be compared with its nested AE model to examine whether the shared early-life environmental conditions contribute to epigenetic modification later in life. In this case, the ACE model nicely fits the need for linking early-life environment with epigenetic status afterwards. The twin method can also be applied to epigenetic data collected longitudinally to look for dynamic patterns of genetic and early-life environmental regulation on epigenetic control over gene function during developmental and ageing processes. Besides the univariate twin model, multivariate approaches can also be applied to multiple genomic loci, e.g. CpG sites, to analyse whether they are co-regulated genetically and/or environmentally to give a deeper insight into the interactive epigenome.
The case co-twin design
The case co-twin design collects twin pairs discordant for a disease or trait and tries to link disease discordance with differential environmental exposure (Fig. 2). While the case co-twin design can be applied to either MZ or DZ twins, MZ pairs are preferable because they are genetically identical, leaving discordance within pairs being of environmental origin. By focusing on identical twins, the case co-twin design is especially useful in epigenetic studies as one of the main tasks in these studies is to find environmental exposures that are associated with the observed epigenetic changes linked to disease status. Here, the healthy co-twins serve as an ideal control group. MZ twin pairs share the same genetic composition, and they may also share a common rearing environment during their childhood and adolescent years. As such, MZ pairs are perfectly matched on a multitude of known (genetics, age) and unknown potential confounding factors. This means that the case co-twin design is deemed to have higher power than the ordinary case-control design used in most of the current epigenome-wide association studies.
Quantitative trait loci (QTL) mapping using dizygotic twins
Current array-based high throughput genotyping techniques, such as Illumina and Affymetrix and, more recently, next-generation sequencing, are revolutionizing the way we design and conduct genetic epidemiological studies. High resolution genomic analysis enabled by the high density and informative SNP markers offers researchers efficient tools to map QTL linked to human complex phenotypes including molecular phenotypes such as gene expression activity and DNA methylation levels.
DZ twin pairs are in fact ordinary siblings. By estimating phenotype-dependent allele-sharing identical-by-descent (IBD) within a pair, the non-parametric linkage analysis can be performed using variance component analysis (Kruglyak and Lander, 1996). Similar to disease phenotypes, the measured molecular phenotypes (e.g. DNA methylation levels) can be regarded as a quantitative trait on which non-parametric linkage analysis can be conducted. We emphasize that linkage scans using DZ twins take advantage of the fact that DZ twins are matched for pre-natal and shared rearing environmental factors and perfectly matched for their ages. These factors could affect the measured epigenetic status and thus need to be adjusted in the linkage analysis. The twin-based QTL linkage mapping gains further advantage from the low probability of non-paternity. Non-paternity can result in a biased estimation of IBD probability, leading to incorrect linkage results in non-parametric linkage analysis (MacGregor et al., 2000). In fact, DZ twins have been used in linkage mapping of gene expression QTLs (De Moor et al., 2007; Livshits et al., 2007; Perola et al., 2007; O'Connor et al., 2008). Similarly, methods using sib-pairs (i.e. a pair of siblings) for association mapping such as S-TDT (Spielman and Ewens, 1998) and QTDT (Abecasis et al., 2000; Ewens et al., 2008) apply to molecular data on DZ twins as well.
In fact, both linkage (Schadt et al., 2003; Morley et al., 2004; Monks et al., 2004) and association (Cheung et al., 2005) mapping have been used to look for genomic regions that are under epigenetic control of transcriptional activities through cis (local) or trans (distant) acting regulation. Although gene expression can be correlated with epigenetic status, direct application of linkage and association analysis to epigenetic phenotypes (DNA methylation, histone acetylation, ncRNA expression) should produce more efficient and specific results as the measured mRNA levels can be affected by multiple epigenetic mechanisms mentioned above.
Twins for epigenetic study of human diseases
Changes in gene expression resulting from global epigenetic regulation triggered by genetic, environmental and stochastic effects accumulated over time (Fraga et al., 2005) can contribute to the development of individual susceptibility to complex diseases (Poulsen et al., 2007; Gilbert, 2009; Grönniger et al., 2010; Holliday, 2010; Bocklandt et al., 2011), adding molecular evidence that nature and nurture are inextricably linked. Analysing the complex interactive relationship between gene and environment in the development of complex phenotypes poses a new challenge.
As mentioned above, identical twins are useful study subjects in epigenetic studies of complex phenotypes because of their perfectly matched genetic background (Fig. 2) (Petronis, 2006; Poulsen et al., 2007). Table 1 lists the published articles on epigenetic studies using identical twins discordant for diseases or traits. As shown in Table 1, the scope of diseases covered in the application is still very limited. Current patterns of diseases in developed and developing countries is dominated by non-communicable diseases such as cardiovascular diseases, diabetes and cancer, for which environmental factors play an important role in disease development and pathology. Given the complex nature of common diseases, the use of twin design in epigenetic studies of complex diseases should be highly encouraged. The powerful case co-twin design in combination with recent high-throughput techniques for epigenomic analysis should help to identify important environmental risk factors that are behind the observed epigenetic alteration responsible for disease or health end points.
Twins for epigenetic studies of human ageing and development
Ageing is an inevitable process for everyone but we undergo it at our own time and pace. Though genetics could initially determine how a person ages, over time, environment could eventually play a higher role during the ageing process.
Studies have shown that the early-life environmental conditions are important modifiers of the ageing process (Szyf, 2012; Szyf, 2009; Murgatroyd and Spengler, 2011). As the genetic regulation of the ageing process is not directly controllable, studying the environmental factors that retard or accelerate it can be of more practical impact. As the first step towards this, it is necessary to explore the relative importance of gene and environment, a typical topic in twin modelling. In ageing studies, the classical twin model has been applied to various ageing-related phenotypes including physical performance (Frederiksen et al., 2002) and cognitive ability (Greenwood et al., 2011; Reynolds et al., 2005) in older subjects. Tan et al. (Tan et al., 2005) applied the twin method to gene expression data from older Danish twins and estimated heritability for the expression levels of the most active genes. Subsequently, Kaminsky et al. (Kaminsky et al., 2009) studied global DNA methylation profiles in MZ and DZ twins and reported highly significant epigenetic differences, suggesting both genetic and environmental influences on epigenetic regulation of gene expression. As a matter of fact, the impact of early-life milieu (including intrauterine life) on the epigenome has been explored in both animal and human studies with results linked to the development of stress (Murgatroyd and Spengler, 2011), obesity (Lillycrop and Burdge, 2011), diabetes (Fradin and Bougnères, 2011) and late-onset mental illness (Szyf et al., 2007; McGowan and Szyf, 2010). In a multicellular organism, many of the effects of differential gene expression in determining the structural and functional differentiation of cells arise during development. Epigenetic modifications triggered by early-life events represent a plausible mechanism by which constant exposure to a certain early-life environment could be integrated into the epigenome to programme adult hormonal and behavioural responses, facilitating adaptation to changing environmental conditions through modification or alteration in gene activity. In this regard, identical twins discordant for birth weight are ideal subjects for studying the impact of very early life adverse environment on epigenetic regulation of health at adult ages as the genetic part of epigenetic control is cancelled out, enabling pure association between environmental exposure and epigenetic status. Using a small sample of 17 pairs of MZ twins discordant for birth weight, Souren et al. (Souren et al., 2013) reported no significant intra-pair differential DNA methylation pattern in a genome-wide analysis. We are currently working on whole-genome DNA methylation data on a relatively large sample of 150 pairs of identical twins discordant for birth weight. The observed epigenetic patterns in the discordant twins will be linked to metabolic phenotypes to look for epigenetic modification induced by birth weight discordance and its association with changes in metabolic profiles. Our results will also serve as reliable replicates to the findings of Souren et al. (Souren et al., 2013).
The ageing process is accompanied by various altered phenotypes that might be linked to epigenetic change. In the literature, a growing number of reports have demonstrated that the DNA methylation is an active process in post-mitotic cells and that the epigenome maturates continuously throughout life with the effects of ageing emerging gradually over time as a result of exposure to a variety of environmental factors. For instance, Tan et al. (Tan et al., 2008) reported age-dependent patterns in gene expression in the Utah CEPH (Centre d'Etude du Polymorphisme Humain) families. Although the observed age-associated change in gene expression can be induced by both epigenetic regulation and genetic alterations accumulated from DNA damage, it can be postulated that a considerable portion of the changes can be explained by epigenetic mechanisms. In another example, Fraga et al. (Fraga et al., 2005) reported more epigenetic similarity in younger compared with older MZ twins, which suggested epigenetic regulation induced through accumulated stimulation by internal and external factors including environmental exposure. By treating gene expression or DNA methylation levels as molecular phenotypes, the classical twin design can be applied at different ages to explore the age-dependent patterns in the genetic and environmental contribution to epigenetic modification of gene activity, which can be linked to ageing-related phenotypes (e.g. physical and cognitive decline) and diseases.
Most complex traits (diseases, molecular phenotypes) are under the control of both genetic and environmental factors through the interface of epigenetics. Although current techniques already allow high-throughput epigenetic profiling at the genome scale, our understanding of genetic and environmental influences on epigenetic regulation remains limited. With proper study design and analytical approaches, twin studies can help with the identification of novel epigenetic marks and the linking of these with environmental exposures including early-life experiences for measuring their impact on epigenetic control over gene activity, using classical twin models, linkage analysis, case co-twin design, etc. It is highly expected that the valuable sample of twins is going to make new contributions in unravelling and understanding the epigenetic basis of the development of human complex diseases and traits.
The authors thank Prof. Torben Kruse and Mads Thomassen at Odense University Hospital for useful discussions.
This work was supported jointly by the European Foundation for the Study of Diabetes (EFSD) 2013 Programme for Collaborative Research between China and Europe; the Medical and Natural Sciences Research Grant of the Novo Nordisk Foundation project no. 7493; the Integrated research on DEvelopmental determinants of Aging and Longevity (IDEAL), an EU's FP7 project no. 259679.
The authors declare no competing financial interests.