ABSTRACT
Animal ‘personality’, defined as repeatable inter-individual differences in behaviour, is a concept in biology that faces intense controversy. Critics argue that the field is riddled with terminological and methodological inconsistencies and lacks a sound theoretical framework. Nevertheless, experimental biologists are increasingly studying individual differences in physiology and relating these to differences in behaviour, which can lead to fascinating insights. We encourage this trend, and in this Commentary we highlight some of the benefits of estimating variation in (and covariation among) phenotypic traits at the inter- and intra-individual levels. We focus on behaviour while drawing parallels with physiological and performance-related traits. First, we outline some of the confusion surrounding the terminology used to describe repeatable inter-individual differences in behaviour. Second, we argue that acknowledging individual behavioural differences can help researchers avoid sampling and experimental bias, increase explanatory power and, ultimately, understand how selection acts on physiological traits. Third, we summarize the latest methods to collect, analyse and present data on individual trait variation. We note that, while measuring the repeatability of phenotypic traits is informative in its own right, it is only the first step towards understanding how natural selection and genetic architecture shape intra-specific variation in complex, labile traits. Thus, understanding how and why behavioural traits evolve requires linking repeatable inter-individual behavioural differences with core aspects of physiology (e.g. neurophysiology, endocrinology, energy metabolism) and evolutionary biology (e.g. selection gradients, heritability).
Introduction
Anyone working with live animals can hardly dispute the fact that striking behavioural differences exist among individuals. This variation can dramatically, and often predictably, influence how individuals interact with their environment, and can affect the outcome of biotic interactions such as predation, competition, parasitism, cooperation and mate choice (Wolf and Weissing, 2012). However, it has taken over a century for researchers working on non-human animals to appreciate the magnitude as well as the ecological and evolutionary importance of behavioural variation. Similarly, research on inter-individual phenotypic variation (see Glossary), including consistent individual differences in physiological traits, endocrine responses and performance measures, has been sporadic throughout the second half of the last century (e.g. Tryon, 1942; Huntingford, 1976; Bennett, 1980; Stevenson-Hinde et al., 1980; Arnold and Bennett, 1984; Clark and Ehlinger, 1987; Wilson et al., 1993, 1994). Bennett's (1987) suggestion that physiological ecologists move away from the ‘tyranny of the golden mean’, and instead focus on understanding the functional basis and eco-evolutionary implications of performance differences among individuals instigated a shift in research focus that has since been echoed across various fields in biology (Gould, 1985; Boake, 1989; Hayes and Jenkins, 1997; Kolok, 1999; Chown, 2001; Bolnick et al., 2003; Lloyd-Smith et al., 2005; Crawford and Oleksiak, 2007). This emphasis on individual-level variation has perhaps been most strongly felt in the field of behavioural ecology. Indeed, the past 10–15 years have witnessed an explosion of studies on animal ‘personality’ (e.g. Gosling, 2001; Sih et al., 2004b; Dingemanse and Réale, 2005; Réale et al., 2007, 2010; Gosling, 2008). However, many experimental biologists continue to treat inter-individual variation as statistical noise surrounding what they consider to be the evolutionarily important signal: the population (or treatment) mean (Careau et al., 2008; Williams, 2008) (Fig. S1). This oversight is unfortunate given that variation provides the raw material for natural selection, which operates at the level of the individual.
Phenotypic variation occurs both at the intra- and inter-individual level, with implications for a species' ecology and evolution. On the one hand, the ability of individuals to alter their phenotype across contexts (intra-individual variation or plasticity) provides the flexibility to respond to environmental change on short timescales. On the other hand, the consistency of phenotypic differences among individuals across time or contexts influences the efficiency with which natural selection can generate an adaptive response across generations. Understanding both the proximate [e.g. how is genetic architecture (see Glossary) linked to phenotypic variation] and the ultimate (e.g. how does natural selection act on phenotypic variation) causes of intra-specific variation requires that we address questions at the individual level (Box 1). Experimental biologists working at the interface of behaviour, physiology, endocrinology and evolutionary biology are actively needed to help elucidate these relationships.
Adjusted repeatability
Repeatability calculated after controlling for confounding effects, either as fixed or random factors. Assumes that the repeatability is constant for all values of the confounding factor (unlike conditional repeatability; refer to section VI in Nakagawa and Schielzeth, 2010).
Agreement repeatability (or simply, repeatability)
The proportion of phenotypic variance due to differences among individuals, denoted as R. A measure of the agreement in absolute measurements (McGraw and Wong, 1996; Nakagawa and Schielzeth, 2010), as opposed to relative measurements. Equivalent to the intra-class correlation coefficient (ICC) in the statistical literature, which is the expected correlation among trait measurements within groups or individuals (i.e. classes) (Sokal and Rohlf, 1995).
Animal personality
Consistent or repeatable inter-individual differences in behaviour across time and contexts (Réale et al., 2007). Variation among individuals in the intercept of their behavioural reaction norm (Dingemanse et al., 2010). Inter-individual variation in behaviour attributable to the combined influences of genetic effects and environmental effects that permanently affect the phenotype of an individual (Dingemanse and Araya-Ajoy, 2015).
Behavioural syndrome
Suites of behaviours that co-vary across contexts or situations (Sih et al., 2004a).
Behavioural type
Within a behavioural syndrome (e.g. boldness, aggressiveness), individuals have a behavioural type (e.g. shy versus bold, more versus less aggressive) (Sih et al., 2004b).
Conditional repeatability
An estimate of repeatability at a given value of a fixed factor. Conditional repeatability changes across levels of a fixed factor, unlike adjusted repeatability (Nakagawa and Schielzeth, 2010; Biro and Stamps, 2015). Not discussed in this Commentary.
Coping style
Characterizes the behavioural and physiological responses of individuals to a stressful situation (Koolhaas et al., 1999). Often used as a synonym of animal personality. ‘Proactive’ animals typically exhibit strong responses to stressful stimuli, whereas ‘reactive’ animals respond more passively.
Covariate
A discrete or continuous variable included as a predictor (i.e. independent variable) in a model.
Covariation
Occurs when variation in one trait is correlated with variation in another.
Genetic architecture
The underlying genetic basis of a phenotypic trait.
Minimum metabolic rate (MRmin)
The minimum maintenance metabolism of a resting, non-digesting animal; typically referred to as basal metabolic rate (BMR) in homeotherms and standard metabolic rate (SMR) in poikilotherms (Chabot et al., 2016).
Phenotypic variance (or variability, or variation)
The statistical variance across the values of a phenotypic trait measured in a population.
Reaction norm
A function describing the relationship between the phenotype (e.g. behaviour, metabolism, locomotor performance) and the environment or time.
Selection gradient
A measure of a given trait's effect on relative fitness when the effects of all other measured traits are held constant.
Variable centring
Expressing the observation of a continuous predictor variable as a deviation from its mean value across all observations. Achieved by subtracting the mean from all observations.
Variance component
Variation (in the phenotype) associated with a random factor, such as individual or territory identity (Dingemanse and Dochtermann, 2013).
Here, we use hypoxia tolerance to provide concrete examples for each question, since it is a phenotypic trait commonly studied by comparative physiologists, and it can affect behaviour (e.g. Domenici et al., 2007, 2013; Pollock et al., 2007).
Do individuals differ in their average phenotype (e.g. are there inter-individual differences in a given trait such as tolerance to hypoxia)?
How much variance in a given phenotypic trait occurs within individuals (e.g. do individuals vary in their tolerance to hypoxia from day to day, or among subsequent observations)?
Do individuals exhibit differences in phenotypic plasticity (e.g. do individuals differ in their metabolic response to increasing hypoxia)?
Is there a relationship between an individual's mean phenotype and its phenotypic plasticity for that same phenotype (e.g. is there a correlation between an individual's average activity and how its activity is affected across different oxygen levels)?
Are two or more phenotypic traits correlated at the inter-individual level [e.g. do individuals with generally high tolerance to hypoxia have a lower minimum metabolic rate (MRmin; see Glossary)]?
Are two or more phenotypic traits correlated at the within-individual level (e.g. if, on a given day, an individual has a higher tolerance to hypoxia compared with its own average, does it also have a lower MRmin relative to its own average)?
Is phenotypic plasticity in two or more traits correlated at the inter-individual level (e.g. do individuals showing the greatest decrease in activity in response to decreasing oxygen levels also show the greatest decrease in metabolic rate in response to the same decrease in oxygen levels)?
Are there inter-individual differences in intra-individual variance (i.e. do individuals differ in their ‘predictability’; see Fig. 2H)?
Patterns of intra- and inter-individual variation described in questions 1–4 are illustrated theoretically in Fig. 1 and can be answered by means of univariate linear mixed-effects models (LMMs; see ‘Measuring repeatability’). Patterns of intra- and inter-individual covariation (see Glossary) described in questions 5–7 can be answered using multivariate LMMs (see ‘Measuring intra- and inter-individual correlations’). Question 8 can be answered using double-hierarchical generalized LMMs (Cleasby et al., 2015). Examples and further details regarding these and other questions can be found in Dingemanse and Dochtermann (2013) and Metcalfe et al. (2016).
We believe that experimental biologists should consider measuring individual-level behavioural variation because it can (1) help avoid important biases when studying whole-animal physiology and performance, and (2) have an underlying physiological basis that warrants investigation. Our objective in this Commentary is to highlight key areas of interest and direct the reader to the relevant literature. First, we outline some of the confusion surrounding the terminology in the field of animal personality. Second, we argue that explicitly acknowledging individual behavioural differences can help researchers avoid sampling and experimental bias, increase explanatory power and, ultimately, help us to understand how selection acts on physiological traits. Third, we present what we believe are the best practices for collecting, analysing and presenting data on individual differences, whether these are differences in behaviour, physiological or performance traits.
What is (not) animal personality?
Few concepts in the biological sciences currently face as much controversy and lack of consensus over their usefulness as animal personality. Everything from startle responses in sea anemones to social dominance in chimpanzees has been characterized under the personality moniker (Freeman and Gosling, 2010; Stamps et al., 2012). Although animal personality is now engrained as a mainstream concept in behavioural ecology, the field has its share of critics. Many researchers ignore or dismiss animal personality as a useful concept, perhaps because the field is mostly theory-driven but without a strong conceptual framework (David and Dall, 2016), suffers from a lack of empirical studies (DiRienzo and Montiglio, 2015) and is laden with terminological inconsistencies (Réale et al., 2007; Carter et al., 2013). For example, terms such as temperament, behavioural syndrome, behavioural type and coping style (see Glossary) are often used interchangeably with personality (Réale et al., 2010). For the non-specialist, navigating this breadth of literature and semantics, as well as determining when behaviours should not be considered as personality traits, can be daunting, and may contribute to the dismissive attitude of many experimental biologists regarding the importance of individual behavioural variation.
Despite these criticisms, proponents argue that personality research has greatly improved our understanding of how and when behavioural differences within populations might be adaptive (i.e. result from natural selection; Réale et al., 2007; Wolf and Weissing, 2012). Indeed, studying inter-individual variation in behaviour can be highly informative when addressing questions of interest to experimental biologists (Boxes 1 and 2). Here, we highlight key elements to consider when studying inter-individual differences in behaviour.
Studies at the individual level are extremely useful to identify the functional basis of, or relationships among, traits (Bennett, 1987). Whereas comparative studies traditionally examine trait (co)variation at the population or species level, the objective of an individual-level approach is to identify significant correlations between two or more traits at the individual level. This is a very effective first step towards identifying potential functional associations among complex traits, and should be followed-up by experimentation on the ecological, mechanistic, genetic and/or adaptive nature of the relationships found.
Many ecological and evolutionarily relevant traits are measured at the whole-organism level, allowing links to be made with lower-level processes (e.g. organ, tissue and cellular mechanisms) as well as processes occurring at higher levels of biological organisation (e.g. population or community dynamics). Therefore, investigating individual variation facilitates bridging key topics of interest in experimental biology, including biochemistry, physiology, morphology, endocrinology, neurobiology, organismal performance, behaviour, life-history strategies and population ecology (see Bennett, 1987). For example, a very active area of interdisciplinary research involving individual behavioural differences is on potential linkages with energy metabolism (Careau et al., 2008; Biro and Stamps, 2010; Careau and Garland, 2012; Mathot and Dingemanse, 2015). Several other studies have established conceptual links between personality and key topics of interest in ecology and evolutionary biology (see table below).
Topic . | Example references . |
---|---|
Social dominance | David et al., 2011 |
Mate choice | Schuett et al., 2010 |
Habitat or space use | Boon et al., 2008; Boyer et al., 2010 |
Dispersal | Cote et al., 2010 |
Social foraging | Kurvers et al., 2010 |
Social networks | Pike et al., 2008 |
Cognition | Morton et al., 2013; but see Griffin et al., 2015 |
Parasitism | Barber and Dingemanse, 2010 |
Life-history strategies | Biro and Stamps, 2008; Réale et al., 2010 |
Ontogeny | Stamps and Groothuis, 2010; Wilson and Krause, 2012 |
Performance | Careau and Garland, 2012; Videlier et al., 2014 |
Stress response | Cockrem, 2007; Øverli et al., 2007 |
Immune function | Zylberberg et al., 2014; Lopes, 2016 |
Oxidative stress | Costantini et al., 2008 |
Wildlife conservation | McDougall et al., 2006; Conrad et al., 2011; Killen et al., 2016 |
Topic . | Example references . |
---|---|
Social dominance | David et al., 2011 |
Mate choice | Schuett et al., 2010 |
Habitat or space use | Boon et al., 2008; Boyer et al., 2010 |
Dispersal | Cote et al., 2010 |
Social foraging | Kurvers et al., 2010 |
Social networks | Pike et al., 2008 |
Cognition | Morton et al., 2013; but see Griffin et al., 2015 |
Parasitism | Barber and Dingemanse, 2010 |
Life-history strategies | Biro and Stamps, 2008; Réale et al., 2010 |
Ontogeny | Stamps and Groothuis, 2010; Wilson and Krause, 2012 |
Performance | Careau and Garland, 2012; Videlier et al., 2014 |
Stress response | Cockrem, 2007; Øverli et al., 2007 |
Immune function | Zylberberg et al., 2014; Lopes, 2016 |
Oxidative stress | Costantini et al., 2008 |
Wildlife conservation | McDougall et al., 2006; Conrad et al., 2011; Killen et al., 2016 |
Considering the vast potential for exploring other exciting research avenues, some experimentalists will naturally be tempted to revisit classic research topics in experimental biology (e.g. motor control, blood flow, tolerance to hypoxia) from an individual perspective. However, in doing so, researchers should strive to synthesize existing theory and provide empirical support for the new conceptual advances they propose (see DiRienzo and Montiglio, 2015). In general, most of the new conceptual links listed in the table have a theoretical basis but all are in dire need of further empirical support.
Broad-sense versus narrow-sense personality
The field of animal personality seeks to characterize repeatable differences among individuals in one or more behaviours, and, ultimately, to understand the functional basis and/or the ecological and adaptive significance of such consistent variation. Broadly speaking, the term personality could include any observable behaviour found to be consistently different between individuals (broad-sense personality; Réale et al., 2010) and likely to have ecological consequences. In this case, there is no reason to use the term ‘personality’ as opposed to ‘repeatable behaviour’ (Careau and Garland, 2012). One clear disadvantage of the broad-sense personality approach is the contention over how consistent or repeatable inter-individual differences in behaviour must be over time for the term ‘personality’ to apply (Carter et al., 2013; Koski, 2014). For example, if 25% of the phenotypic variance in a given behaviour is attributable to inter-individual differences, then the most precise way to capture this information is to say that the repeatability (R) of this behaviour is R=0.25 (see ‘Measuring repeatability’). It would seem counter-productive to then engage in a debate about whether this behaviour in this population represents personality or not because it is repeatable at R=0.25.
The narrow-sense definition of personality is restricted to behaviours measured repeatedly using tests that are standardized across individuals (e.g. open-field test, mirror test, use of refuge under simulated predation risk; Table 1). In this case, measured personality traits typically include activity, exploration, boldness, aggressiveness and sociability (Réale et al., 2007, Réale et al., 2010). Which traits are assessed depends on the experimental conditions under which they are assayed (familiar, novel, safe and/or risky environments; Table 1). However, some authors note that measuring a narrow range of behaviours under such artificial conditions might fail to capture relevant variation shaped by natural selection, or simply have no bearing on fitness-enhancing behaviours in the natural world (Koski, 2014; Niemelä and Dingemanse, 2014). Thus, an important area of research in behavioural ecology is to validate whether behaviour measured under standardized (but artificial) conditions underlies ecological or fitness-relevant behaviour and life-history traits in nature, such as foraging, mating, parental care, agonistic interactions and dispersal (e.g. Herborn et al., 2010; Cole and Quinn, 2012; Dammhahn and Almeling, 2012; Niemelä et al., 2015).
Personality and plasticity
A common misconception among researchers is that the presence of personality implies the absence of phenotypic plasticity in behaviour. However, individuals can differ consistently both in their average behaviour (i.e. personality) and in their degree of behavioural plasticity (Fig. 1). Hence, phenotypic variation encompasses both among (inter)- and within (intra)-individual variation. Moreover, personality and plasticity are often correlated (e.g. passive mice adjust their levels of aggression according to social context but aggressive mice do not show context-dependent modulation of their behaviour; Natarajan et al., 2009; see Fig. S2C,D). Thus, the responsiveness of an individual to situational changes (i.e. if and how an individual adjusts its behaviour to the environment) is, itself, an individual trait. Recent evidence suggests that neural plasticity may underlie the physiological (endocrine) and molecular/genetic basis for plasticity versus rigidity in behavioural responses, especially those related to stress (reviewed in Sørensen et al., 2013).
To study behavioural consistency and plasticity, behavioural ecologists have embraced the concept of behavioural reaction norms (BRNs) (Dingemanse et al., 2010) (Fig. 1). The idea is similar to a genetic×environment interaction (G×E) in evolutionary quantitative genetics (Nussey et al., 2007), but instead applied at the individual×environment level (I×E). Once behavioural variation is described as a BRN with an intercept (mean-level individual behaviour, indicative of personality) and a slope (behavioural response to context, indicative of phenotypic plasticity), it becomes possible to examine how personality and phenotypic plasticity are (1) correlated, (2) under selection, and (3) proximally linked through shared underlying mechanisms (Dingemanse et al., 2010; Brommer, 2013; Dingemanse and Dochtermann, 2013). We detail the general reaction norm approach (see Glossary) and provide worked examples with R code in the section ‘Measuring repeatability’.
Why should experimental biologists care about animal personality?
Studying the adaptive evolution of complex traits (e.g. behaviour, metabolic rate) requires the use of techniques and tools developed by quantitative geneticists (Falconer and Mackay, 1996; Lynch and Walsh, 1998). These tools were developed to measure individual variation (repeatability), transmission of traits from parents to offspring and the strength of natural selection. The advent of personality studies that use approaches from quantitative genetics to link repeatable individual differences in behaviour with core aspects of evolutionary biology [e.g. selection gradients (see Glossary), heritability] has led behavioural ecologists closer to understanding how behaviour evolves (Dingemanse and Réale, 2005). Help from experimental biologists is particularly needed to identify the mechanisms underlying context-dependent correlations between personality and other traits such as metabolic rate, life-history traits and immune function (Box 2). Additionally, experimentalists who familiarize themselves with approaches used in quantitative genetics and personality research will be better positioned to study the ecological and evolutionary consequences of individual variation in physiology, performance and other complex traits (e.g. hypoxia tolerance, stress response). In fact, early examples already demonstrate how physiological and performance traits can be studied from a quantitative genetics perspective (Garland, 1988, 1994; Dohm et al., 2001). In addition to helping researchers adopt new, interdisciplinary approaches to answer evolutionary and ecological questions of interest (Box 2), there are several practical reasons why experimental biologists should consider personality differences among individuals in their research, which we discuss in detail below.
To control for population sampling bias (including domestication)
Most statistical models assume that the study sample represents a random selection of individuals from the population; therefore, conclusions drawn from a biased study sample might not reflect the range of responses observed in nature or apply to the population as a whole. Sampling biases are likely if using domesticated laboratory populations to address questions about behavioural or physiological trait variation. For example, laboratory conditions rapidly select for certain behavioural and physiological traits as a result of domestication (Lacy et al., 2013). This probably reduces trait variation and inter-individual differences compared with those of wild populations. While laboratory populations are extremely useful for answering mechanistic or process-oriented questions, an even more powerful approach is to seek cross-context validation by combining laboratory and field measurements when feasible (see Niemelä and Dingemanse, 2014).
Sampling biases might also occur in experiments that use wild-caught animals, because some individuals are inherently more ‘catchable’ or ‘trap-happy’ than others (Biro and Dingemanse, 2009; Carter et al., 2012). Thus, differences in the propensity of wild animals to enter traps or take bait selects for certain behavioural types (e.g. Wilson et al., 1993; Garamszegi et al., 2009; Carter et al., 2012; Stuber et al., 2013; Diaz Pauli et al., 2015; Niemelä et al., 2015). This is probably also true of physiological and/or performance traits, although research on this question is lacking (but see Killen et al., 2015). Researchers can reduce these biases by selecting or combining capture methods that are likely to sample different behavioural strategies or physiological abilities (Biro and Dingemanse, 2009). Where possible, researchers should also observe the various responses of individuals to their capture techniques in order to identify those that are particularly susceptible to capture, and modify their techniques accordingly. Incorporating a quantitative measure of individual ‘catchability’ during this process may also allow researchers to evaluate the distribution of the different behaviours or strategies used by individuals, and see how such differences relate to performance or fitness measures obtained in subsequent experiments.
To estimate personality-related measurement bias in physiological measurements
Individuals often exhibit dramatically different behavioural responses to novel or stressful situations such as handling or introduction to an experimental apparatus. This can lead to differences in the time required to resume normal behaviour (Koolhaas et al., 1999; Carere and van Oers, 2004; Øverli et al., 2007) or the physiological response to a subsequent treatment. These differences can introduce personality-related bias into physiological measurements, as exemplified by the potential influence of personality differences on minimum metabolic rate (MRmin) estimates obtained using respirometry (Hayes et al., 1992; Careau et al., 2008). This can be particularly important when experimenters rely on the shape of the metabolic curve (metabolism–time) and the presence or absence of activity to determine MRmin. In this case, it is possible that MRmin measurements are inflated by stress for some, but not all, individuals. Such a bias can arise because ‘reactive’ individuals tend to respond to handling and/or novel situations with intense, non-motor behavioural responses, which can persist over long periods of time (Koolhaas et al., 1999). Therefore, while periods of inactivity might truly correspond to resting periods for ‘proactive’ behavioural types, this might not be the case for reactive individuals. Identifying whether inactivity is indicative of resting by reactive animals is difficult when acclimation periods are short because this reaction might correspond to an intense and prolonged stress response (Careau et al., 2008).
Although personality-related measurement bias seems likely, there is little empirical evidence on the extent of this problem. However, ignoring obvious differences in individuals' behavioural responses to experimental procedures will introduce large, unexplained inter-individual variation in physiological or performance parameters. In turn, this variation can modulate treatment effects in experimental studies and/or correlations obtained between multiple traits (Killen et al., 2013). For example, MacKenzie et al. (2009) found that ‘proactive’ and ‘reactive’ carp (Cyprinus carpio) exhibit distinct and sometimes opposite patterns of gene expression under control conditions and in response to an immune challenge. Similarly, Rupia et al. (2016) characterized the behavioural and physiological stress responses of olive flounder, Paralichthys olivaceus. They found that flounder with ‘bold’ behavioural types responded to acute stress by increasing their metabolic rates, whereas ‘shy’ types decreased their metabolic rates relative to routine. In both cases, failing to account for the dramatic variation in the direction of the stress response between types would mask the effects of the stressor if the population were examined at large. This area of research is in urgent need of further investigation, and experimental biologists are ideally positioned to provide crucial insights via precise, concurrent experimental measurements of multiple behavioural and physiological processes, such as activity, breathing rate, body temperature and metabolic rate.
To estimate personality-related measurement bias in performance measurements
Variation in motivation among individuals with different behavioural traits can result in underestimates of maximal performance, inflated error variance and biased repeatability (Losos et al., 2002; Adolph and Pickering, 2008; Careau and Garland, 2012; Jornod and Roche, 2015). To measure performance rather than behaviour, tests must therefore force animals to perform at their maximal level. Carefully designing such tests can involve: (1) using stimuli that increase motivation by targeting multiple senses (e.g. stimulate auditory and visual senses simultaneously to induce an escape response), (2) repeating performance trials on individuals to increase the accuracy of performance measurements, and (3) repeating series of trials at different times to quantify intra-individual variance using multilevel mixed-effects models (Cleasby et al., 2015) (Fig. S3).
Instead of focusing only on an individual's best performance, experimenters can gain much insight from retaining performance measurements across repeated trials. For example, some individuals may be consistently motivated to perform at their maximum level, whereas others may be more variable in their level of motivation, resulting in low and high intra-individual variance, respectively (Fig. S3). If these differences exist, then it becomes interesting to test whether intra-individual variation in performance and/or motivation correlates with behavioural and/or physiological traits (Careau and Garland, 2012). Retaining all repeated performance measures also allows experimenters to test whether individuals differ in how their performance changes across trials, which may be indicative of habituation, fatigue and/or training effects (Fig. S3).
How should we quantify animal personality?
Selecting the right test(s)
A major source of confusion in animal personality research is how personality traits are measured and defined (Carter et al., 2013). First, problems with how behaviours are measured include determining which personality traits are assessed by a given test. For instance, a single test might assess many traits (i.e. a one-to-many problem; an open field test can be used to measure both exploration and boldness) and different tests might assess the same personality trait (i.e. a many-to-one problem; exploration can be assessed using an open field or novel object test) (Carter et al., 2013) (Table 1). A second, related issue lies in the mislabelling of behaviours assessed in standard tests, resulting in the so-called ‘jingle-jangle fallacy’, where a single behavioural trait is given different labels in different studies (e.g. distance moved in an open field test is said to represent anxiety, boldness and exploration). Alternatively, different traits are labelled as the same (Gosling, 2001; Carter et al., 2013; Koski, 2014); for example, ‘boldness’ has been quantified as the response to a novel object, a novel environment and predation risk (Toms et al., 2010). This source of confusion is especially problematic when researchers attempt to relate personality traits to physiological responses, because the nature of this relationship will vary depending on how the labelled behavioural or physiological trait was measured. For example, a novel object test in an animal's holding enclosure might assess exploration/curiosity and boldness, whereas the same test in an unfamiliar environment might instead measure fear and/or anxiety (Table 1) (Misslin and Cigrang, 1986; Carter et al., 2013). As such, the magnitude or direction of a correlation between a behavioural and a physiological trait (e.g. distance travelled versus metabolic rate) might differ considerably depending on the context in which the behaviour was assessed. Ignoring this issue will inevitably lead to different predictions regarding the performance or fitness implications associated with such traits.
In general, mislabelling behaviours is problematic, as it impedes our ability to compare behavioural responses across studies, reduces our effectiveness at interpreting the ecological relevance of a measured response and, ultimately, hampers our understanding of how selection acts on behavioural traits in the wild (Carter et al., 2013). To minimize these issues, experimental biologists can simply report what has been measured (e.g. report the distance moved by an animal in a 10 min open-field test) rather than assigning the behaviour a label such as ‘exploration’. Avoiding labels can help prevent the misinterpretation of behaviours assessed using the same test in different contexts. This is easier to achieve when measuring fewer behavioural traits, as would be expected in physiological studies, but difficult when researchers are interested in measuring multiple behavioural traits to look at the existence of independent factors/axes of correlated personality traits such as a boldness–aggression axis, as performed in behavioural ecology studies (see Carter et al., 2013 for suggested approaches in this case).
In addition to considering whether a chosen test really measures the behaviour it is intended to measure (i.e. construct validity; Carter et al., 2013, Koski, 2014), researchers must also consider whether the design of the test and the behaviours measured are ecologically relevant for the species of interest (i.e. ecological validity; Réale et al., 2007; Beckmann and Biro, 2013; Koski, 2014). This is especially true when trying to understand the life-history implications or fitness outcomes associated with a particular behaviour and when comparing studies. Researchers should target traits that are the most ecologically relevant for their study species (Dall and Griffith, 2014), which requires knowledge of the species' ecology and behavioural repertoire (Koski, 2014). Suggestions of how to achieve this goal are to select traits a priori based on sound knowledge of the species' ecology (Koski, 2014), trial assays on the study species (e.g. Balzarini et al., 2014) and identify poor tests early on (Carter et al., 2013).
Measuring repeatability
Perhaps the most important feature of personality studies is that behaviour must be measured multiple times. Repeatedly measuring traits in as many individuals as possible allows the calculation of repeatability (R), a highly informative population-specific metric to quantify inter-individual phenotypic differences across time or contexts (Box 3). R provides a standardized estimate of individuality that can be compared across studies; it is also an inherent component of quantitative genetics theory because repeatability sets an upper limit to heritability (i.e. a trait cannot be more heritable than it is repeatable within a population; but see Dohm, 2002). In fact, a close read of the foundational papers on personality and behavioural syndromes (e.g. Sih et al., 2004b; Réale et al., 2007) reveals that the field's core lies in the application of approaches in quantitative genetics to study the evolution of behaviour (see Boake et al., 2002; Dingemanse and Araya-Ajoy, 2015). Most recently, several studies in the Journal of Experimental Biology have illustrated how these methods can also be used to estimate the repeatability of physiological and performance-related traits and gain important insight into their ecological and evolutionary implications (e.g. Laming et al., 2013; Darveau et al., 2014; Auer et al., 2016; Conradsen et al., 2016).
Whenever a trait is measured multiple times on the same set of individuals, the total phenotypic variance (VP) of the population sample can be partitioned into two variance components (see Glossary): inter-individual variance (Vind) and intra-individual, or residual, variance (Ve). Repeatability (R) is the proportion of the total phenotypic variance (VP) attributable to differences between individuals (Vind): R=Vind/(Vind+Ve), where VP=Vind+Ve. Since Ve corresponds to the non-repeatable fraction of VP (i.e. the sum of measurement error and phenotypic variance in response to micro-environmental effects or any unmeasured variable), R provides a standardized measure of the consistency of phenotypes across time or contexts (Nakagawa and Schielzeth, 2010). Two phenomena can lead to low R values: high intra-individual variation (Ve) and/or low inter-individual variation (Vind). Note that inter-individual differences are needed (i.e. Vind>0) for R to be non-zero (see Fig. 1A,B).
R estimates can be obtained using correlation, ANOVA and linear mixed-effects models (LMMs) (see Wolak et al., 2012 for details). LMMs are currently the method of choice because they (1) allow direct estimates of inter- and intra-individual variance, (2) do not require balanced or complete sampling, and (3) allow calculating R for traits with non-normal error distributions (Nakagawa and Schielzeth, 2010; Dingemanse and Dochtermann, 2013).
As with other sample statistics (e.g. the mean), R is an estimate of the true population repeatability and should be accompanied by an estimate of uncertainty (i.e. a standard error or confidence interval; Sokal and Rohlf, 1995). Nakagawa and Schielzeth (2010) propose different methods of calculating confidence intervals (CI) for R, including a Bayesian approach that uses Markov chain Monte Carlo (MCMC) algorithms and does not require randomization or bootstrapping procedures (these are needed with restricted maximum likelihood methods; REML). Guidelines for determining the sample size (number of individuals, n) and number of measurements (number of times a trait is measured on the same individual, k) to accurately estimate R are given by Wolak et al. (2012): the rule of thumb is to increase n when repeatability is high and increase k when repeatability is low.
Calculations to obtain R are detailed in several excellent reviews and methods articles (Nakagawa and Schielzeth, 2010; Martin et al., 2011; Wolak et al., 2012; Dingemanse and Dochtermann, 2013; Cleasby et al., 2015) but applying these methods can be challenging. Here, we provide a simple worked example for illustrative purposes, which researchers can easily modify and adapt to analyse their own data. We use data from a published study on metabolic traits of fishes (Norin et al., 2016) but the analyses are applicable to any aspect of the phenotype, including behavioural traits. The annotated script is available online (http://dx.doi.org/10.6084/m9.figshare.3464216) and it details the computation of R and its 95% credibility interval (a confidence interval in the Bayesian framework) using the R package MCMCglmm (Hadfield, 2010). For complementary R functions, refer to Nakagawa and Schielzeth (2010) and Wolak et al. (2012).
Norin et al. (2016) took repeated measurements of body mass, standard metabolic rate (SMR) and maximum metabolic rate (MMR) in 60 juvenile barramundi (Lates calcarifer), each sequentially exposed to five different environmental treatments consisting of different temperature, salinity and oxygen levels (900 measurements in total). Their objective was to examine whether individuals exhibit differences in their ability to cope with rapidly changing environmental conditions.
The first step to explore these data using an individual-level approach is to compute R using the raw phenotypic variance [i.e. without controlling for covariates (see Glossary) or fixed-effects]. This R estimate is termed ‘agreement repeatability’ (see Glossary; Nakagawa and Schielzeth, 2010) and is obtained with a linear mixed-effects model (LMM) comprising only a random factor (fish ID). The results reveal that body mass is highly repeatable across contexts (R=0.77, CI=0.68–0.83) but that SMR and MMR are only moderately repeatable (R=0.32, CI=0.22–0.46 and R=0.29, CI=0.17–0.42). Fig. 2A–F illustrates the degree of repeatability and variation in these traits using reaction norm plots and caterpillar plots. The reaction norm plots suggest that fish gained mass over the course of the study (Fig. 2A) and that SMR and MMR differ among experimental treatments (Fig. 2C,E).
As a second step, we add time (trials 1–5 specified as a continuous variable) and experimental conditions (three separate factors: temperature, salinity and oxygen) as predictors in the model, with mass as a response variable to control for the variance due to these fixed effects. VP (Box 3) becomes the total phenotypic variance minus the variance accounted for by the fixed effects (time and experimental treatments). As a result, the variance components (see Glossary) in the model change and so does R: whether R increases or decreases depends on whether the fixed effects inflate variance at the intra- or inter-individual levels (see Nakagawa and Schielzeth, 2010 for details). Re-calculating R as such reveals that body mass is highly repeatable across trials (R=0.96, CI=0.94–0.97). Accordingly, a plot of the model residuals (i.e. after controlling for time and treatment effects) indicates that the majority of the variance in mass occurs among rather than within individuals (Fig. 2G).
Next, we control for the effect of body mass on SMR and MMR (as well as time and treatment effects), to account for the fact that large differences in body mass among individuals inflate repeatability (see Box 3). This is achieved by including these variables as fixed effects in the models. The new adjusted R estimates are 0.39 for SMR (CI=0.23–0.51) and 0.24 (CI=0.10–0.35) for MMR (Fig. 2I–L), both of which fall within the range commonly reported for these traits (Auer et al., 2016). R estimates obtained in this way represent ‘adjusted repeatability’ (see Glossary) as opposed to ‘agreement repeatability’ obtained from LMMs without fixed effects (Nakagawa and Schielzeth, 2010) (refer to the Glossary for an explanation of different repeatability measurements and relevant references).
So far, we have analysed data from Norin et al. (2016) using a random-intercept LMM: individual identity was specified as a random factor in the model such that the y-intercept for each individual was allowed to vary relative to the population intercept. However, as we saw, fish grew substantially during the experiment (Fig. 2A), and we now wish to test whether individuals differ in their growth trajectories (see step four below). One advantage of LMMs is the ease with which the user can implement a reaction norm framework to simultaneously model variation in the individual phenotypic mean (e.g. personality or, in this case, an individual's mean mass) and individual responses across time or contexts (i.e. plasticity, or in this case, an individual's growth rate) (Nussey et al., 2007; Dingemanse et al., 2010; Westneat et al., 2011; Dingemanse and Dochtermann, 2013). Reaction norms are functions that describe the relationship between an individual's phenotype and the environment (or time) and can be estimated using a random slope LMM (also called a random regression model; Henderson, 1982). In the simplest case of a linear reaction norm, a random slope LMM characterizes each individual with an intercept and a slope. Since the slope of the reaction norm represents the change in phenotype through time or across an environmental gradient, it captures variation in phenotypic plasticity. Variance among intercepts in random slope LMMs is estimated at the point where the covariate is zero. Therefore, to examine differences in mean phenotype, it is important to rescale covariates to a mean of zero such that the intercept represents the average phenotype (this does not apply if there is curvature in the reaction norms) (Morrissey and Liefting, 2016). Mean centring of covariates is also important when estimating the slope–intercept relationship in reaction norms, which allows examining whether individuals with different intercepts have different slopes (see question 4 in Box 1). Typically, random slope LMMs include a covariance term to describe the slope–intercept covariance, which is highly sensitive to where the intercept is located relative to the range over which the data were gathered.
In a fourth step, therefore, we extend our LMM with mass as a response variable to a random-slopes model by including a ‘fish ID×trial’ interaction term as a random effect. Doing so allows individual slopes to vary with regards to the population slope and reveals that individuals differ significantly in how they grew throughout the study (i.e. growth trajectories differ significantly among individuals in Fig. 2G). Additionally, the slope–intercept correlation is significant and positive (r=0.54), indicating that individuals that were heavier on average (i.e. had a greater mean mass across all five trials) grew faster throughout the experiment. Models for SMR and MMR can be extended to random-slopes LMMs in the same way.
Measuring intra- and inter-individual correlations
Another key advantage of mixed-effects models is the ease with which the user can transition from a univariate to a multivariate analysis. Whereas univariate models are used to calculate R, multivariate models allow examining relationships between traits at multiple levels of variation simultaneously. If two traits are functionally related, they should be significantly correlated at multiple levels of biological organisation (e.g. within and among individuals, among populations and among species). However, it is also possible that some fundamental trade-off (e.g. energy allocation to different biological functions) applies only at one level of biological organisation, yielding contrasting relationships at different levels (van Noordwijk and de Jong, 1986; Houle, 1991). Under such circumstances, correlations observed within individuals will differ from those observed among individuals (e.g. van de Pol and Wright, 2009). We provide additional detail and a worked example to compare intra- and inter-individual correlations online (http://dx.doi.org/10.6084/m9.figshare.3464216).
Conclusion
Animal personality research is a controversial and polarizing topic among biologists, yet the concept has greatly advanced our understanding of the ecological and evolutionary importance of individual phenotypic variation in behaviour. Experimental biologists have much to gain from adopting a similar individual-level approach to (1) quantify and understand variation in organismal performance and physiology (Box 1, Fig. 1), (2) unravel possible mechanistic links between behaviour and physiology (Box 2), and (3) quantify the extent to which traits of interest are repeatable and/or plastic using analytical tools in personality and quantitative genetics research. In doing so, we suggest that experimentalists preferentially avoid the use of specific labels (e.g. boldness) when characterizing behaviour, seek to identify the physiological mechanisms underlying differences in behaviour, and strive to use validated methodologies for collecting, analysing and presenting behavioural data using repeated measurements.
In addition to providing interesting conceptual advances towards understanding how and why traits evolve, an increased focus on individual-level variation has major implications for predicting the effects of environmental changes on wild populations. Continuing the current emphasis on mean treatment effects of environmental stressors (e.g. hypoxia, low pH) while ignoring variation around the mean is a missed opportunity for understanding the traits responsible for conferring resistance or tolerance to a particular stressor (Browman, 2016; Killen et al., 2016). Examining variation in complex traits, including why some individuals are strongly affected by experimental treatments whereas others are not, is a critical first step towards truly understanding how individuals respond and adapt to environmental change, both in the short term and in the long term.
Acknowledgements
We thank Tommy Norin, Redouan Bshary, Alecia Carter and Mariana Velasque for informative discussions and feedback on the manuscript. Three reviewers provided helpful comments.
Footnotes
Author contributions
D.G.R., V.C. and S.A.B. wrote the paper. V.C. wrote the R script to analyse data from Norin et al. (2016).
Funding
D.G.R. and S.A.B. were supported by postdoctoral fellowships from the Fonds de Recherche du Québec - Nature et Technologies.
Data availability
Tommy Norin generously provided the data for the examples presented in the paper (available at http://onlinelibrary.wiley.com/doi/10.1111/1365-2435.12503/abstract). These data (in long format) and the associated R scripts are available on Figshare at: http://dx.doi.org/10.6084/m9.figshare.3464216.
References
Competing interests
The authors declare no competing or financial interests.