## ABSTRACT

Many constraints of organismal design at the cell and organ level, including muscle fiber types, musculoskeletal gearing and control-surface geometry, are believed to cause performance trade-offs at the whole-organism level. Contrary to this expectation, positive correlations between diverse athletic performances are frequently found in vertebrates. Recently, it has been proposed that trade-offs between athletic performances in humans are masked by variation in individual quality and that underlying trade-offs are revealed by adjusting the correlations to ‘control’ quality. We argue that quality is made up of both intrinsic components, due to the causal mapping between morpho-physiological traits and performance, and extrinsic components, due to variation in training intensity, diet and pathogens. Only the extrinsic component should be controlled. We also show that previous methods to estimate ‘quality-free’ correlations perform poorly. We show that Wright's factor analysis recovers the correct quality-free correlation matrix and use this method to estimate quality-free correlations among the 10 events of the decathlon using a dataset of male college athletes. We found positive correlations between all decathlon events, which supports an axis that segregates ‘good athletes’ from ‘bad athletes’. Estimates of quality-free correlations are mostly very small (<0.1), suggesting large, quality-free independence between events. Because quality must include both intrinsic and extrinsic components, the physiological significance of these adjusted correlations remains obscure. Regardless, the underlying architecture of the functional systems and the physiological explanation of both the un-adjusted and adjusted correlations remain to be discovered.

## INTRODUCTION

Vertebrate locomotor behaviors are powered by functional components with well-known trade-offs: musculoskeletal systems are geared with a specific ratio, and output displacement and force cannot be simultaneously optimized, muscle fibers develop phenotypes that increase either fatigue resistance or power at the cost of the other, and body shapes that augment acceleration have low mechanical efficiency. Physiologists typically expect these trade-offs at lower levels of organization (sub-cellular to organ-system) to scale up to whole-organism performance. Nevertheless, unexpected positive correlations between performance traits are frequently observed (Marras et al., 2013; Vanhooydonck et al., 2014).

Reidy et al. (2000), working with swimming cod, were the first to comment on these positive correlations and proposed that some cod are simply ‘good athletes’ and others ‘bad athletes’, which is a useful way to describe but not explain the pattern. Van Damme et al. (2002) and Wilson et al. (2014b) observed only positive correlations among human athletic performances and suggested that trade-offs at the whole-organism level are masked by individual quality, which Wilson et al. (2014b) explain as: ‘Because individuals vary in health, physical fitness, nutrition, development or genetics, which is the underlying basis of individual quality, some individuals perform better or worse across all types of motor tasks than others. This means that when researchers try to understand intra-individual functional trade-offs using inter-individual variation in performance, then trade-offs that do occur within individuals can be masked’. We try to clarify this meaning of individual quality with Fig. 1.

Both Van Damme et al. (2002) and Wilson et al. (2014b) argue that the intra-individual trade-offs can be recovered by statistically adjusting for quality. And, both found that the expected negative correlations emerged only after this adjustment. Importantly, both Van Damme et al. (2002) and Wilson et al. (2014b) are cited in the evolutionary and human performance literature as evidence of performance trade-offs without acknowledging that the measured correlations were positive (MacArthur and North, 2005; Flueck, 2009; Ruiz et al., 2010; Eynon et al., 2013; Lailvaux and Husak, 2014; Wilson et al., 2014a; Servedio et al., 2014). We emphasize this because all four methods used to infer trade-offs in Van Damme et al. (2002) and Wilson et al. (2014b) are poor estimators of quality-free correlations. These methods are: (1) culling all but the top performers, which is guaranteed to produce a negative correlation, even if no underlying trade-off exists (Garland, 1994), (2) the correlation between the residuals of performance traits regressed on first principal component scores, which is guaranteed to produce strongly negatively biased correlations (Aitchison, 2003), (3) the interpretation of principal component (PC) loadings of opposite signs as indicating an underlying trade-off, which is not a valid interpretation of loadings, and (4) the partial correlation between two performances conditional on all other performances, which removes too much of the shared correlation (Mitteroecker and Bookstein, 2009). We find similar misuses of multivariate methods common in the performance literature and strongly encourage reading our detailed criticism of all four methods (Walker, 2015a).

Despite these methodological issues, individual quality is a compelling hypothesis to explain a common phenomenon in both human and non-human performance data. In order to explore the concept of quality in performance correlations, we compiled a dataset of decathlon performance data for US collegiate athletes, and used Sewell Wright's (1932) path-analytic factor analysis to estimate ‘quality-free’ correlations among these 10 events. In this paper, we use a model of functional trade-offs (Ghalambor et al., 2003) to (1) show how functional trade-offs at the cell, organ or system level contribute to performance trade-offs at the whole-animal level, (2) decompose quality into intrinsic and extrinsic components and show how the extrinsic component can mask the underlying architecture of the form–function mapping, (3) show why ‘bottom up’ approaches to predict performance trade-offs at the whole-animal level based on limited knowledge of trade-offs at the cell or organ level are likely to fail, and (4) show how to estimate a quality-free correlation matrix. Through analysis of the National Collegiate Athletic Association (NCAA) decathlon data, we then show that, compared with the results of Van Damme et al. (2002), the pattern of measured correlations is similar but our estimates of quality-free correlations differ in key respects. We also re-analyze the performance data from sub-elite male soccer players (Wilson et al., 2014b).

### A model of extrinsic and intrinsic components of performance correlations

The concept of intra-individual variation (Wilson et al., 2014b) is similar to a counterfactual conditional statement such as, ‘were the mechanical advantage 0.3 and not its real value 0.2, the force output would be 0.6 N and not its real value 0.4 N’. But there is nothing special about applying this concept to morpho-physiological traits as opposed to individual quality factors (‘were my training 750 h per year and not its real value 500 h, my marathon time would be 2 h:18 min and not its real value 2 h:28 min’). We suggest that the masking problem is not one of inter- versus intra-individual variation but of intrinsic versus extrinsic variation, where extrinsic variation results from differences in exposure to extrinsic factors such as training intensity or style, diet, recovery, stressful life events, pathogens, etc. All intrinsic factors should be left unadjusted.

We developed a model of how functional trade-offs, which arise at the sub-whole-animal level, combine with extrinsic quality factors to contribute to performance correlations that we measured at the whole-animal level. This model uses the graphical algebra of path models, which were specifically developed by Sewell Wright (1918, 1932, 1934) to model the underlying factors (‘causes’) generating correlations among measured traits. While elegant, path models are not necessary for any conclusion that we illustrate. The basic algebra of path models is available in some biostatistics textbooks (e.g. Sokal and Rohlf, 2012). Shipley (2002) is a more thorough introduction to path models in functional biology and ecology. Pearl (2009) formalizes many of the concepts of causal graphs. Our path models are models of simplified functional systems and used only to develop a general theory; we do not attempt to test detailed causal models of performance variation in humans. Our models ignore some complexities of real systems in order to focus on the fundamental principles. But ignoring the complexities of real systems does not make our simplified models irrelevant; on the contrary, the complexities make the goals of discovering underlying trade-offs that much more difficult. The scripts for generating simulated data using all of the path models introduced below are available elsewhere (Walker, 2015b).

*M*(say, percentage type I fibers in the biceps femoris muscle) has opposite causal effects (β) on the performance traits

*P*

_{1}and

*P*

_{2}(say 100 m and 1500 m speeds):

where β_{1} and β_{2} have opposite signs. The *U* represent ‘noise’ or additional variance that is uncorrelated with all other effects. In all later path models, the *U* are implied but not written out.

We refer to this kind of path diagram as a form–function map. The pattern of causal arrows from M-P traits to performance traits is one component of the functional architecture of an organism. The single-headed arrow indicates a causal effect and the path coefficient represents the sign and magnitude of the effect. Here and elsewhere in this paper, all variables in a path model are standardized to unit variance, which makes the path coefficients standardized. The consequence of this standardization gives the path diagram its most elegant feature: the expected correlation between any two traits can be quickly computed as the sum of the products of the coefficients along all paths connecting the two variables. The expected performance correlation between *P*_{1} and *P*_{2} is β_{1}β_{2}. The performances are correlated because they share the common cause (*M*). The *U* do not contribute to the correlation because neither is a common cause of *P*_{1} and *P*_{2}. We call the expected correlation due to the mapping of M-P traits to performance the ‘functional correlation’. In this simple model, but not in more complex models (see below), the functional correlation is the expected performance correlation. A negative functional correlation is a functional trade-off and occurs if β_{1} and β_{2} have opposite signs. A positive functional correlation is a functional facilitation (Ghalambor et al., 2003).

*Q*, which represents sources of variation extrinsic to the organism (e.g. variation in training hours). Furthermore, let

*Q*have positive effects (α) on both performances (e.g. the more one trains, the better one gets at both the 100 m and 1500 m events) that are independent of the effects of

*M*:

While it appears that *Q* affects performance by some mechanism other than through M-P traits, our model is a mathematical simplification of, but precisely equivalent to, a full model with *Q* acting through the M-P traits (Walker, 2015b).

The functional correlation in Eqn 2 is β_{1}β_{2} but the expected performance correlation is α_{1}α_{2}+β_{1}β_{2}. The component α_{1}α_{2} is positive; adding *Q* to the model shifts the expected performance correlation in a positive direction. Regardless of the magnitude of the α, the effect of the functional trade-off is masked. If the α are small relative to the β, only the magnitude will be masked (i.e. a smaller, negative performance correlation than would occur if there were no variation in training). But if the α are large relative to the β, the sign will be masked too.

*Q*is constant or has very small variance. Ideally, these data would come from an experimental design in which the individuals were raised in a common environment. If

*Q*contaminates a dataset, and the goal is to estimate a

*Q*-free correlation, the path model in Eqn 2 suggests a simple and elegant solution:

where *r*_{raw} is the measured correlation. Below, we show that *r*_{quality-free} is equivalent to the functional correlation only in the special case of no correlation among the M-P traits.

*Q*with a second M-P trait that causally affects both performances in the same direction; shorter heels, for example, increase both sprinting and endurance performance in humans (Scholz et al., 2008; Lee and Piazza, 2009).

Note that we have changed the coefficient symbols to reflect the fact that our second causal variable represents an underlying M-P trait and not an extrinsic factor. The expected performance correlation (β_{11}β_{12}+β_{21}β_{22}) is the sum of two functional correlations, each due to its own common cause. We call the sum of the functional correlations the ‘net’ functional correlation. A net functional trade-off is a negative net functional correlation. It is a pattern of form–function mapping where a set of M-P traits causally affects two performance traits in opposite directions.

In Eqn 2, we have a functional system in which the underlying form–function mapping is being masked by *Q*. We quite reasonably consider *Q* a nuisance factor and want to adjust the raw correlation to get rid of its effect. By contrast, in Eqn 4, we have a functional system with two underlying M-P traits. Trait *M*_{1} causes a trade-off (its path coefficients are of opposite sign). Trait *M*_{2} causes a facilitation. Holzman et al. (2011), among others, emphasized that the *M*_{2} facilitation mitigates the *M*_{1} trade-off but it makes equal sense to say the *M*_{1} trade-off attenuates the *M*_{2} facilitation. If the facilitation is larger, the net functional correlation is positive. A positive performance correlation faithfully represents the underlying functional architecture; that is, the pattern of how M-P traits map to performance. We most definitely do not want to adjust the correlation to control for *M*_{2}. Instead, we have high- and low-quality athletes because of the functional architecture. Quality is determined by the intrinsic properties of the causal mapping from M-P traits to performance.

The contrast between the path diagrams in Eqns 2 and 4 raises the concern, what kinds of traits do we call *Q* (extrinsic quality) and consider a nuisance variable that should be statistically adjusted and what do we consider *M* (intrinsic quality) and part of the functional architecture? For example, recent work on the genetic predictors of the individual response to training in humans suggests the presence of networks of muscle-plasticity genes that affect the ability of muscle to remodel. These networks are activated in the same way in both resistance and endurance training (Timmons, 2011; Phillips et al., 2013). If these plasticity networks have large magnitude effects on performance, we would expect some individuals to excel in both high-power and high-endurance events and some to perform poorly in both types of events, even if all athletes had precisely the same training. This variation in the response to training is an intrinsic component of phenotypic design, as opposed to the extrinsic variation in different training plans. Do we measure this response to training, score it as *Q* and adjust for its effects on the correlations, or do we score it as another M-P trait *M* and allow it to contribute to the net functional correlations? Other traits, especially physical traits (pain and stress signaling systems related to tolerance, including CNS feedback limiting muscle strain and, consequently, performance) that contribute to psychological factors like ‘mental toughness’ or ‘competitiveness’ may be even more ambiguous as to how these should be considered. We believe all these intrinsic traits should be scored as *M*.

This concern is augmented if *Q* is a latent variable, as in Wilson et al. (2014b) and Van Damme et al. (2002). Latent variables are mathematical constructs (such as the first principal component) but are generally interpreted to be something meaningful, for example ‘general intelligence’, ‘general size’ or ‘individual quality’. With a latent *Q*, how do we differentiate the system in Eqn 2 from that in Eqn 4, both of which result in a first PC with all positive loadings? Even worse, the first PC can be a mixture of intrinsic and extrinsic contributions to quality. In Walker (2015b), we generated simulated data representing 10 performance traits affected by two M-P traits and a single, extrinsic quality trait. The M-P traits had a modular mapping to performance with one having moderate effects on performance traits 1–5 and the other having moderate effects on performance traits 6–10. The quality trait had a small effect on six of the 10 performance traits and zero effect on the other four performance traits. The loadings on PC1 were all the same sign, suggestive of a single, global factor even though none existed. While a PC1 with all positive loadings is an axis representing athletic quality, a pattern of all-positive loadings cannot justify interpreting the axis as representing extrinsic sources of variation that need to be adjusted away.

### Why performance correlations may be poorly predicted

Predicting a correlation between a pair of performance traits based on a qualitative relationship between one or a few underlying M-P traits and the performance traits is nearly ubiquitous in the performance literature (e.g. increased upper limb muscle mass should increase shot put but cost high jump performance). But a net functional correlation is the sum of all the expected component correlations (Ghalambor et al., 2003). For example, in the path diagram in Eqn 4, the net functional correlation is β_{11}β_{12}+β_{21}β_{22}. This raises a major concern with bottom-up approaches to predicting correlations between performance variables. Many of the performance traits that we care about, especially at the whole-animal level, have many tens, hundreds or even thousands of underlying common causal factors, many of which we are ignorant of, and for most of which we have little information on the magnitude of the effect on each performance, and thus the expected component correlation. Given our knowledge of only a fraction of the common causes, and quantitative estimate of effect size for only a fraction of these, it would seem our ability to predict a performance correlation is extremely limited. Vanhooydonck et al. (2014) make a similar argument but without the benefit of a graphical or mathematical model.

where *r*_{12} is the correlation between the M-P traits. The expected performance correlation is β_{11}β_{12}+β_{21}β_{22}+*r*_{12}β_{11}β_{22}+*r*_{12}β_{12}β_{21}. This correlation is not only due to both the sign and magnitude of the net functional correlation but also to the sign and magnitude of the phenotypic correlation between the M-P traits. A phenotypic correlation can cause a performance correlation to be larger, smaller or even of opposite sign relative to the net functional correlation (Holzman et al., 2011; Walker, 2015b). Indeed, if *M*_{1} maps only to *P*_{1}, *M*_{2} maps only to *P*_{2} and *r*_{12} is negative, the expected performance correlation is negative despite the lack of any underlying M-P trait that has opposite effects on performance.

### A brief introduction to estimating a quality-free correlation matrix

*m*×

*m*matrix of net functional correlations among

*m*performance variables using (Ghalambor et al., 2003):

where **F** is the *p*×*m* matrix containing the path coefficients of the causal effects of *p* M-P traits on *m* performances. The matrix contains the expected trade-offs and facilitations among the performance variables at the whole-animal level given only the functional mapping. The problem with this bottom-up approach, as discussed above, is that we would need to know all causal effects to compute the net functional correlations with any accuracy.

where is the vector of path coefficients from some measure of *Q* to each performance variable. If is estimated as the loadings on PC1, then Eqn 7 is equivalent to the covariance matrix of the residuals of the regression of each performance trait on PC1. The matrix contains the component correlations due to *Q* only. As an estimate of the net functional correlation, Eqn 6 assumes (1) *Q* contains only extrinsic variables, (2) *Q* contains all extrinsic variables, (3) is an unbiased estimate of the effects of *Q* on performance, and (4) there is no phenotypic correlation among the morphological traits. We introduced assumptions 1, 2 and 4 above. Assumption 3 is violated using standard measures of a latent *Q* and we offer a solution here. Because assumption 4 is violated, we use the term ‘quality-free’ and not ‘net functional correlation’ to refer to the matrix **R**_{quality-free}.

**α**

_{Q}(Mitteroecker and Bookstein, 2009). Given the matrix of expected correlations due only to the individual quality (or general size) effects estimated by PC1 () the bias-corrected quality (or size)-free correlations are:

where and are the mean off-diagonal elements in **R**_{raw} and **R**_{PC1}, respectively. We note that Wright (1932) used only the subset of ‘among-module’ correlations in the computation of and *r*_{PC1}, where ‘among-module’ refers to a pair of morphometric traits occurring in different development modules defined *a priori*. Here, we relax the necessity of an *a priori* factor structure and use the means of all off-diagonal elements in **R**_{raw} and **R**_{PC1}. We refer to the uncorrected (Eqn 7) and bias-corrected (Eqn 8) residuals as Wright's uncorrected (WUC) and Wright's bias-corrected (WBC) correlations.

## MATERIALS AND METHODS

Decathlon data from male, college athletes (NCAA), generally aged 18–24 years, were collected from Track and Field Results and Reporting System (http://www.tfrrs.org/). Following the 2014 outdoor season, we collected personal-best values from each of the decathlon events for each athlete across all seasons recorded for that athlete. We allowed recorded personal bests for different events to occur in different seasons for the same athlete. While there are cogent arguments for analyzing sample mean and not maximum performance (Head et al., 2011), we used the maxima because these were accessible without mining data from individual meets. Data were collected for divisions I, II and III and combined into a single dataset of personal bests. To standardize performance direction such that larger values indicate better performance, times for the 100, 400 and 1500 m and the 110 m hurdles were converted to speed in m s^{−1}. All athletes with any missing event were removed from the dataset, leaving *N*=611 athletes with complete data.

We computed the partial correlations conditional on all other performance variables (Van Damme et al., 2002), regression residual correlations (Wilson et al., 2014b), and WUC and WBC residual correlations of the decathlon data. Standard statistical packages do not give the correct error of these correlations as estimates of quality-free correlations. For example, one can get the error statistics (standard errors, *P*-values, confidence intervals) for the regression residual correlations from any statistics package and these error statistics are correct for these correlations as estimates of the correlation of regression residuals but as a regression residual correlation is a biased estimate of the true quality-free (or PC1-free) correlation they do not give the correct error for the quality-free correlation.

where **P** is the *n*×*p* matrix of centered and variance-standardized simulated performance traits, **q** is the *n*×1 matrix of random, normal variables that represents athletic quality for the *n* individuals, **α** is the *p*×1 matrix of variance-standardized causal effects (path coefficients) of quality on each performance trait, **M** is the *n*×*m* matrix of centered and variance-standardized random, normal variables that represent the M-P traits, **B** is the *m*×*p* matrix of variance-standardized causal effects of the *m* M-P traits on each performance trait, and **U** is an *n*×*p* ‘unexplained error’ matrix of random, normal error with column standard deviations scaled so that the columns of **P** have unit variance.

Again, is the vector of loadings on PC1, is the mean off-diagonal correlation in the Pearson correlation matrix, and is the mean off-diagonal correlation in the matrix of expected correlations due to PC1 (see Eqn 8). The β coefficients in **B** are random, exponential with rate (λ) equal to 1 and rescaled so that each element of the diagonal of is equal to *R*^{2}=0.95, the percentage of the total variance explained by the simulated quality factor and M-P traits that causally effect performance. The number of M-P traits (*m*) was set to 19 because this number generated a distribution (mean and s.d.) of the correlations in the size-free correlation matrix similar to that estimated by WBC correlations of the decathlon data (Fig. S1).

We generated 2500 simulated datasets and computed for each the partial correlations, regression residual correlations, and WUC and WBC correlations. We then computed the error from the true correlations, for each simulated dataset, where is the estimated matrix using any of the four methods.

The complete set of scripts to process the data were written in R v3.1.2 (R Core Team, 2014) and are available as on GitHub at https://github.com/middleprofessor/NCAA_decathlon.

## RESULTS

We computed raw and quality-adjusted correlations among the 10 decathlon events for 611 college athletes. To facilitate interpretation, the events are organized into blocks containing functionally similar events (throws, jumps, runs). All 45 of the Pearson product-moment correlations among the 10 events are positive and are generally of moderate to large magnitude (Fig. 2, Table 1) with a range of 0.2 to 0.84. The Pearson correlation heat map (Fig. 2) shows slightly higher correlations within than among the functional blocks (cells closer to the diagonal tend to be darker). The smallest correlations are between 1500 m speed and all other events except 400 m speed. The pole vault also has noticeably smaller correlations with other events, including other jumps.

Quality-adjusted correlations are shown in Fig. 3. In general, the partial correlations are small and positive with only one moderately negative estimate, that between 1500 m and 100 m speeds. This result is consistent with the results of Van Damme et al. (2002) although their table 1 reports only the sign if statistically significant or ‘NS’ if not. By contrast, the regression-residual correlations are almost all negative among functional blocks. Within functional blocks, the regression-residual correlations are positive within the throws block, mostly positive within the runs block, and mostly negative within the jumps block. The overall trend in correlations is nearly opposite between partial and regression residual correlations. The two sets of correlations, however, are orthogonal. That is, the large, positive estimates using partial correlation tend to be near zero using regression residual correlation, while the estimates near zero using partial correlation tend to be large and negative using regression residual correlation.

The WUC correlations show a pattern similar to the regression-residual correlations except that all values are closer to zero. This must be the case as the uncorrected residual correlations are equivalent to the covariance of the regression residuals and the variances of the residuals are all less than one.

The WBC correlations are slightly shifted in a positive direction relative to the WUC correlations. Again, this is expected because the algorithm corrects the negative bias in the uncorrected estimates. The consequence is that the negative correlations are slightly less negative while the positive correlations are slightly more positive. The correlations within the throws and the runs functional blocks are small to moderate and positive, while the correlations within the jumps functional block are very small and positive. One exception in this pattern is the small, negative correlation (*r*=−0.07) between 1500 m and 100 m speeds. There are small trade-offs among the throws and runs functional blocks. The largest trade-offs (*r* from −0.12 to −0.14) are between 400 m speed and the throws. Otherwise, among-block correlations are very small (in both positive and negative directions). The biased (PC1 loadings) and bias-corrected α to compute WUC and WBC correlations are given in Table 2.

We used Monte Carlo simulation to estimate the error distribution under the assumption that the data are log-normally distributed and with a pattern of correlations similar to that of the NCAA data (Fig. 4A). There is a positive bias and large variance in the error for the partial correlations (mean±s.d., 0.10±0.22) and a negative bias and moderate variance for the regression residual error (−0.11±0.14). There is a small negative bias and small variance in the WUC errors (−0.05±0.05). There is effectively no bias and small variance in the WBC errors (<0.01±0.05). In Fig. 4B, we show the true correlations as a function of the WBC estimate. The 2.5%, 50% and 97.5% quantile regression lines are also shown. We used the lower and upper quantile regression functions to compute the 95% error intervals of the WBC estimate in Table 1. These error intervals are not the long-run probability intervals of the correlation coefficient but the 95% bounds of true correlation coefficients that ‘generate’ the observed coefficient (Fig. 4B).

The consequences of the effect sizes of the WBC quality-free correlations can be explored by computing what-if scenarios. For example, if we could intervene and shift M-P trait values in the direction that would cause 1500 m speeds to increase by two standard deviations, equivalent to running 42.7 s faster (i.e. from middle of the pack among all NCAA decathletes to top 2.5%), 100 m times would slow by only 0.064 s. In the 2014 division I championship, this intervention would drop the 100 m placing by an average of 1.5 places. This effect is not trivial but is quite small given the huge intervention in 1500 m time.

In addition to analyzing the NCAA decathlon data, we re-analyzed the soccer data of Wilson et al. (2014b). As shown in the original study, all five performances are positively correlated (Fig. 5A; Table S1). As with the NCAA data, the partial correlations and regression residual correlations are largely orthogonal, and result in radically different interpretations (Fig. 5B,C). The WBC estimates are small and of both signs (Fig. 5D). Notably, the WBC estimate (0.11) (Table S1) of the WBC correlation between 1500 m and 40 m sprint is in the direction opposite to the expectation of a trade-off between high-endurance and high-power performances. That said, the error associated with the WBC estimates is too large to have much confidence in their sign (Fig. S2).

## DISCUSSION

We found only positive correlations among the 10 events of the decathlon. As a consequence, the PC1 of these data describes an axis of athletic quality; athletes with high PC1 scores perform better at all events. These results are consistent with prior work on both humans (Van Damme et al., 2002; Wilson et al., 2014b) and non-human vertebrates (Reidy et al., 2000; Marras et al., 2013; Vanhooydonck et al., 2014). We used a slight modification of Wright-style factor analysis with bias-correction (Wright, 1932) to estimate quality-free correlations. Elsewhere (Walker, 2015a), we have shown mathematically and with simulation that all four methods used by Van Damme et al. (2002) and Wilson et al. (2014b) result in poor inferences of underlying functional trade-offs. This can also be shown empirically, as a simple comparison of the partial and regression correlations shows starkly different patterns (Figs 3, 5). The partial correlation and regression residual correlation were the major methods used by Van Damme et al. (2002) and Wilson et al. (2014b), respectively. Clearly, both cannot be correct. Indeed, both are incorrect; the partial correlations are shifted in a positive direction while the regression residual correlations are shifted in a negative direction (Fig. 4).

An important but unresolved question is: what is the effect of measurement error on our estimates of the quality-adjusted correlations? Correlations between variables measured with error are biased toward zero and the magnitude of this bias is a function of the repeatability and number of replicates per individual (Adolph and Hardin, 2007; Adolph and Pickering, 2008). Repeatability in our data varies between 0.81 and 0.86 for all events except 1500 m speed (0.77), which is relatively high but also computed from the best performances for each outdoor track season. Because our data are sample maxima (and not sample means), standard corrections for correlation attenuation would also likely contain some unknown error (Head et al., 2011). Even if we had estimated sample mean performances for each individual, a standard correction may not be applicable to the quality-free correlations that we attempt to interpret as these are missing the component correlation due to a global (quality) factor and we do not know how the bias is decomposed and distributed across multiple components. If the bias affects only the global (quality) component, then our quality-free correlations are not attenuated by measurement-error bias.

### The predictability of performance correlations

‘Physiological and biomechanical theory predicts that there should be trade-offs between certain pairs [of decathlon event performances] – for example, speed depends on the athlete having a high proportion of fast, fatigue-sensitive muscle fibres, whereas endurance relies on a higher proportion of slower fibres that are more resistant to fatigue’ (Van Damme et al., 2002). In the Introduction, we showed why we will often fail to predict performance correlations given our limited knowledge of the form–function map. But the general strategy of using a *P*-value between performance correlations as a test of qualitative models of form–function mapping suffers from two additional flaws. First, any test of the hypothesis that a correlation is equal to zero can be rejected without statistics. All performance traits are jointly affected by multiple underlying causal factors and must co-vary. A *P*-value tells us nothing more than the adequacy of our sample size to reject the null. When the null is rejected, the magnitude of the correlation coefficient is rarely compared to some prediction based on a quantitative model; in fact the magnitude is typically ignored and frequently not even reported. Instead, the sign is compared to an expectation based on well-known trade-offs at the cell or organ level, such as muscle fiber type or gearing ratio. But predicting the sign of a correlation is hardly a severe test; the flip of a fair coin will predict the sign of a correlation between any two performances 50% of the time.

Second, a rigorous model of the causal basis of performance correlations requires accurate and precise estimates of causal effects. Almost all causal effects of M-P traits on performance are estimated using regression coefficients from observational data. Many biases, such as omitted confounders, necessarily infect observational designs (Shalizi, 2013) but the most commonly used biostatistics textbooks in comparative physiology fail to describe these biases. Consequently, few physiologists recognize how sensitive regression coefficients are to missing confounders or that standard errors from statistical packages are not the proper standard error if interpreting regression coefficients causally. The sensitivity of regression estimates to missing confounders using realistic causal models was not systematically explored until recently (Walker, 2014). Because standard errors from statistical packages model only sampling error, and error variance due to missing confounders does not decrease with sample size, increasing sample size simply gives the investigator increasingly false *P*-values (Walker, 2014). Any rigorous estimate of causal effects will require some combination of computational modeling, direct phenotypic manipulation, and indirect manipulation via regulation of gene expression (Wang et al., 2004). It is hard to over-emphasize these two points because they are contrary to statistical training in much of biology.

### What facilitations or quality do not mask

*M*

_{1}, which has opposite effects on performances

*P*

_{1}and

*P*

_{2}, one performance will increase at the cost of the other, even if there is a positive correlation among

*P*

_{1}and

*P*

_{2}arising from a large quality effect. To show this, we parameterize Eqn 2 with a small functional trade-off and a large functional facilitation (Walker, 2015b).

The expected performance correlation is 0.33. If we intervene and increase the mean value of *M*_{1} by 0.5, then we expect *P*_{2} to increase by 0.2 units but *P*_{1} to decrease by 0.2 units. The expected response is Δ*M*_{i}*β*_{ij}, where Δ*M*_{i} is the effect of the intervention on *M*_{i} and *β*_{ij} is the causal effect of *M*_{i} on *P*_{j}, which shows that the magnitude of the trade-off in the response is independent of the mapping of *Q* to performance. This result is general; that is, the masking trait *Q* could be a M-P trait *M*_{2} (Eqn 2), and only assumes that the intervention is specific to *M*_{1}. The intervention could be experimental (including artificial selection) or arise from different reaction norms (strong in *M*_{1}, weak in *M*_{2}) to an environmental stimulus (including specific training in humans). Variation in the stimulus generating Δ*M*_{i} is an extrinsic factor while variation in Δ*M*_{i} due to the same stimulus is an intrinsic factor.

### Intrinsic and extrinsic quality

Above, we presented the intrinsic and extrinsic quality models of variation in athletic quality, both of which will generate positive correlations among performances and a PC1 with all positive loadings. The essence of each model is captured in Eqns 2 and 4. In the extrinsic quality model, *Q* represents extrinsic features such as training history or health status. In the intrinsic quality model, the causal variables represent underlying features of neuromuscular and musculoskeletal design that have net agonistic rather than antagonistic effects on the performances. These models are the extremes of a continuum. Walker (2015b) developed a model in which PC1 had all positive loadings but there was no general factor generating the data; instead, PC1 was causally generated by both the extrinsic *Q* and two intrinsic M-P traits. With this kind of generating model, WBC correlations are the wrong answer to the question ‘what are the correlations adjusted for extrinsic quality?’ Or, the right answer (a quality-free correlation matrix) to a question that we do not care about, ‘what is the residual correlation after factoring out both intrinsic and extrinsic quality?’ Even more likely is an interaction between intrinsic and extrinsic factors. For example, there could be underlying M-P traits that increase both high-power and high-endurance performances, which, in turn, gives the athlete the confidence to train harder, recover better, eat better and avoid behavior that might increase infection. Or, the intrinsically ‘good athletes’ with underlying M-P traits that make them better at both high-power and high-endurance performances are recruited to schools and coaches that have better training programs.

A necessary consequence of the above argument is our inability to use either a matrix of positive correlations or its decomposition by PCA to distinguish between the intrinsic and extrinsic quality models. This is the challenge introduced earlier. Extrinsic quality we want to control; intrinsic quality we do not – it is what we want to discover! But if we cannot know what causally generates PC1, we cannot know what we are controlling by removing the effect of PC1. Ideally, we would experimentally control extrinsic quality by raising individuals in a common-garden design where variation in the level of training, diet, temperature, parasites and pathogens is the same across individuals. Such a design is really only possible with laboratory animals.

Limiting performance measures to elite athletes must greatly reduce the contribution of extrinsic quality components to PC1. If the extrinsic quality contributions to the variance on the PC1 are indeed small relative to intrinsic quality contributions, then the positive correlations measured between the decathlon events should largely be faithful estimates of the functional organization of the causal mapping from M-P traits to performance. These positive correlations might reflect correlated, intrinsic variation in the ability of athletes to predict maximum, safe and sustainable strain and strain rates for an activity, unconscious CNS control of maximum strain and strain rates, and CNS coordination of complex motor behaviors required for skilled movement (Tucker, 2009; Martens and Collier, 2011).

If, however, extrinsic quality contributions to PC1 are large, then we have some evidence of a few, small trade-offs in the decathlon data, including a small trade-off between high endurance (1500 m speed) and high power (sprint speed). Small effects are potentially important in both evolutionary and ecological dynamics, so we do not trivialize the small effect size (−0.07) of the quality-free correlation between 1500 m and 100 m speeds (indeed, we give an example of how it affects placing in the 100 m finals of the NCAA championships). Nevertheless, we do not believe the small, negative correlations in the quality-adjusted decathlon data are good evidence for an underlying trade-off generated by the architecture of the form–function map. The issue is not so much the magnitude of these correlations but their meaning. The quality-free correlations are adjusted for quality, but this quality likely contains both intrinsic and extrinsic components and a PCA cannot quantify their relative magnitudes. Consequently, we strongly caution against citing our results as evidence of net functional trade-offs free of extrinsic quality. The high frequency of positive correlations among both human and non-human performance traits that are putatively optimized by opposing musculoskeletal designs, such as muscle fiber type, musculoskeletal gearing ratios or body shape, is surprising, and individual quality, with its intrinsic and extrinsic components, is a compelling model to explain these correlations. The physiological explanation of quality, and of the positive correlations more generally, remains to be discovered.

## Acknowledgements

We very much appreciate the many constructive comments of three anonymous reviewers.

## Footnotes

**Author contributions**

S.P.C. collected the data and contributed to the interpretation of the results and revision of the manuscript. J.A.W. analyzed the data and wrote the manuscript.

**Funding**

This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.

## References

**Competing interests**

The authors declare no competing or financial interests.