## SUMMARY

Biological scaling analyses employing the widely used bivariate allometric model are beset by at least four interacting problems: (1) choice of an appropriate best-fit line with due attention to the influence of outliers; (2)objective recognition of divergent subsets in the data (allometric grades);(3) potential restrictions on statistical independence resulting from phylogenetic inertia; and (4) the need for extreme caution in inferring causation from correlation. A new non-parametric line-fitting technique has been developed that eliminates requirements for normality of distribution,greatly reduces the influence of outliers and permits objective recognition of grade shifts in substantial datasets. This technique is applied in scaling analyses of mammalian gestation periods and of neonatal body mass in primates. These analyses feed into a re-examination, conducted with partial correlation analysis, of the maternal energy hypothesis relating to mammalian brain evolution, which suggests links between body size and brain size in neonates and adults, gestation period and basal metabolic rate. Much has been made of the potential problem of phylogenetic inertia as a confounding factor in scaling analyses. However, this problem may be less severe than suspected earlier because nested analyses of variance conducted on residual variation(rather than on raw values) reveals that there is considerable variance at low taxonomic levels. In fact, limited divergence in body size between closely related species is one of the prime examples of phylogenetic inertia. One common approach to eliminating perceived problems of phylogenetic inertia in allometric analyses has been calculation of `independent contrast values'. It is demonstrated that the reasoning behind this approach is flawed in several ways. Calculation of contrast values for closely related species of similar body size is, in fact, highly questionable, particularly when there are major deviations from the best-fit line for the scaling relationship under scrutiny.

## Introduction

Analysis of scaling relationships between individual biological features and body size across species (interspecific allometry) has a long history and has made a particularly valuable contribution to studies of animal physiology(Schmidt-Nielsen, 1972, 1984). Scaling of basal metabolic rate (BMR) has been a central concern(Brody and Procter, 1932; Brody, 1945; Kleiber, 1932, 1947, 1961; Hemmingsen, 1950, 1960; McNab, 2002). Reference is often made to the influence of Huxley(1932), but his interest focussed on scaling within species (intraspecific allometry), notably with respect to growth. In fact, it seems that interspecific scaling analyses were initiated considerably earlier in studies of vertebrate brain size (e.g. Snell, 1891; Dubois, 1897a,b, 1913). There has also been considerable interest in the scaling of variables in reproductive biology,notably among mammals (e.g. Portmann, 1941, 1965), and these have been shown to connect up with brain development and hence with the scaling of brain size (Portmann, 1962; Sacher and Staffeldt, 1974; Sacher, 1982). This review focuses on two examples taken from mammalian reproductive biology and on exploration of potential connections with the development and completed size of the brain. It follows a `frequentist approach', in which the probability of the data having occurred is estimated given a particular hypothesis. An alternative approach taken by some authors is Bayesian inference, which uses information available prior to the study to generate a probability distribution (Ellison,2004).

Bivariate allometric analyses of the relationship between any chosen individual biological dimension (*Y*) and body size (*X*,usually body mass) have generally used the empirical scaling formula *Y*=*k*·*X*^{α}, in which *k*is the allometric coefficient and α the allometric exponent(Gould, 1966; Martin, 1989). It is standard practice to convert data to logarithmic form for analysis, as this linearizes the allometric formula (log*Y*=α ** ^{.}**log

*X*+log

*k*), making it more amenable to statistical treatment and interpretation. The exponent α is directly indicated by the slope of the best-fit line and the coefficient

*k*by the intercept. This widely used approach is mainly applied to identify positive or negative deviations of individual species from the overall scaling relationship (their residual values) and to detect grade shifts between groups of species (Martin,1989). One widely used application of this approach has been examination of the size of the brain (or parts thereof) relative to body size in a given sample of species and to seek potential links with behavioural and/or ecological features (e.g. Jerison, 1963, 1973; Eisenberg and Wilson, 1978; Clutton-Brock and Harvey,1980; Harvey and Bennett,1983; Gittleman,1986; Sawaguchi,1992; Barton et al.,1995; Dunbar,1995; Allman, 1999; Barton and Harvey, 2000). Although allometric analysis can also permit identification and subsequent interpretation of the scaling exponent (α), as has been common in physiological studies (Schmidt-Nielsen, 1972, 1984), this aspect has generally received far less attention. One striking exception has been determination and interpretation of the scaling exponent for the relationship between basal metabolic rate and body mass in mammals. Debate about whether the `true' value of the interspecific scaling exponent is 0.67 or 0.75 has recently been fuelled by the observation that the empirically determined value may be biased upwards by inclusion of large-bodied herbivores with marked digestive fermentation (White and Seymour, 2003, 2005). There have been increasingly sophisticated attempts to provide a valid theoretical explanation for the commonly accepted scaling exponent value of 0.75 (e.g. West et al., 1997, 1999; Darveau et al., 2002; Bejan, 2001, 2005; Dawson, 2001, 2005), but these are bedevilled both by uncertainty about the actual scaling exponent for BMR and by the unresolved conflict between competing explanations. Furthermore,questions have been raised about the reliability of simple power laws in this context (Weibel, 2002).

## Problems in allometric analysis

The seeming simplicity of the allometric equation and the apparent ease with which a line can be fitted to logarithmically converted data are deceptive. Complex statistical and logical problems inherent in such bivariate analysis have been progressively recognized(Fig. 1). To make matters worse, these problems interact in ways that hinder straightforward interpretation of allometric analyses. The most immediately obvious problem in allometric analysis, already debated extensively, is the choice of an appropriate best-fit line (e.g. see Harvey and Mace, 1982; Martin,1989; Martin and Barbour,1989; Harvey and Pagel,1991; Riska,1991). Most published allometric analyses have used least-squares regression (Model I regression) to determine a best-fit line. This is simple to calculate because it only takes into account unidirectional deviations of species from the best-fit line (those relative to the *Y*-axis). However, this approach rests on the requirements that (a) the *X*-variable be measured without error, and (b) the *Y*-variable be clearly dependent upon the *X*-variable. With interspecific biological data, it is inherently unlikely that both of these criteria will be met, even if in some cases measurement error in body mass may be minor in comparison to that in the *Y*-variable (e.g. see Taper and Marquet, 1996). In an analysis of mammalian brain mass relative to body mass, for example, there is no reason why the measurement error for brain mass should be greater than that for body mass. Indeed, for various reasons it is more likely that the converse will be true. Furthermore, it is not evident that brain mass is unidirectionally dependent on body mass in any sense. Both brain mass and body mass depend on species-specific growth processes programmed in the genome, and some evidence in fact suggests that brain growth may serve as a pacemaker for bodily growth (Sacher and Staffeldt,1974). To escape the questionable twin assumptions underlying the use of least-squares regression in bivariate allometric analysis, various authors have instead used a Model II regression approach that allows for variation in both variables and does not require a distinction between dependent and independent variables. The major axis and the reduced major axis have both been used for this purpose. Nevertheless, it is still widely held that the least-squares regression is appropriate for any kind of prediction. Because the residual value for any species is the difference between the actual *Y*-value and that `predicted' by the best-fit line, this implies that the least-squares regression may be the method of choice for one of the main concerns in allometric analyses. Hence, uncertainty about the correct line-fitting procedure continues. In cases where the data fit fairly closely to a single best-fit line, the choice of line-fitting technique is relatively unimportant. However, in cases where the data are widely scattered relative to the line (thus containing potentially interesting information about differential biological adaptation among species), the alternative line-fitting methods can yield very different conclusions.

Determination of a best-fit line using any of the commonly used techniques(least-squares regression, major axis or reduced major axis) is complicated by two additional factors. First, there is an underlying assumption that the *X*- and *Y*-variables are normally distributed and, indeed,that the standard model of the bivariate normal distribution is potentially applicable. However, whereas the assumption of bivariate normal distribution may be justifiable for certain intraspecific comparisons, it is rarely if ever appropriate for interspecific comparisons(Martin and Barbour, 1989). The three commonly used line-fitting techniques are all derived from the general structural relationship model and thus ideally require knowledge about the distribution of the data. In comparative analyses, estimation of error variances is problematic because scatter in the data reflects a combination of sampling error and biological variation, with no means of distinguishing between them (Riska, 1991). The second obstacle to determination of an appropriate best-fit line is that individual values deviating greatly from the line (outliers), particularly if located at the extremes of the line, can strongly influence the value obtained for the slope. Because of their mode of calculation, least-squares regression,major axis and reduced major axis are all very sensitive to outliers. To avoid the problematic standard requirements for normality of distribution and knowledge of error variances, and to achieve decreased sensitivity to outliers, a non-parametric line-fitting technique was developed(Isler et al., 2002). This is an iterative method in which the slope of the best-fit line is obtained as the angle of rotation required to minimize a measure of the degree of dependence(*D*) between marginal values of the *X*- and *Y*-variables. *D* is obtained as the integral of the difference between the density of the common distribution of *X* and *Y*and the product of the marginal densities of *X* and *Y*. The pivotal point for rotation is provided by the median values of *X* and *Y*, and the data are subdivided into quantiles for assessment of dependence. This non-parametric `rotation method' involves no assumptions about error distributions. Furthermore, it proved to be remarkably resistant to the influence of outliers in comparison to standard parametric techniques.

Marked dispersion of points around the best-fit line becomes even more of a problem when the data are heterogeneous, falling into distinct subgroups(grades) that arguably require determination of separate best-fit lines. Determination of a single best-fit line can yield a very misleading result in cases where grade shifts remain unrecognized. Hitherto, however, there has been no readily available technique permitting objective recognition of grade shifts and appropriate analysis of data subsets. Analysis of separate grades has been limited to cases where the investigator suspected for biological reasons of some kind that subdivision of the dataset would be appropriate. As a matter of course, it should be asked whether a single best-fit line is appropriate for any given dataset, and it is advisable to test the outcome of fitting separate lines to subsets that may be suspected to exist on taxonomic,functional or other grounds. In fact, an additional benefit of the new non-parametric `rotation line' method(Isler et al., 2002) is that it provides an objective basis for recognition of grade shifts in datasets in cases where these are very marked. One step in applying this method is visual inspection of minima in the *D*-values indicating the degree of dependence between marginal values of *X* and *Y*. If there is a reasonable clear linear relationship between the *X*- and *Y*-values and no marked subdivision of the data into grades, a single minimum is found for the *D*-values. However, if the dataset is subdivided into distinctly separate grades, this is indicated by one or more additional local minima for the *D*-values. Unfortunately, if the sample size is small or the distributions of subsets in a given dataset overlap extensively, the existence of grades will not be detectable in this way.

*X*and

*Y*values for any pair of taxa (the `contrast values')represent independent evolutionary change, whereas the raw values may themselves be subject to phylogenetic inertia. Presenting this in simplistic terms, for any pair of taxa conforming perfectly to an allometric relationship described by the standard formula, the following subtraction applies:

Accordingly, a best-fit line determined for contrast values (e.g. log *Y*_{1}-log *Y*_{2}*versus* log *X*_{1}-log *X*_{2}) should have a slope directly reflecting the value of the scaling exponent (α) and should pass through the origin. Because limiting the calculation of contrast values to pairs of extant species would drastically reduce the sample size, the method is extended down through the tree by averaging values above each node and then calculating contrast values between adjacent nodes as well. In a perfectly dichotomous phylogeny, this means that a dataset containing raw mean values for *N* extant species will yield a total of *N*-1 contrast values, thus barely reducing the original sample size. One important weakness of the method is that the assumption of a Brownian motion model of evolutionary change, originally proposed by Felsenstein(1985), may not always be adequate. Other more realistic modes of evolution have more recently been introduced (Hansen, 1997; Freckleton et al., 2002). Furthermore, the method also has the major drawback that calculation of contrast values requires the availability of a reliable phylogenetic tree for the taxa under comparison, ideally including information on the ages of individual nodes. However, well-resolved phylogenies are becoming increasingly available for various groups of mammals and other animals. Many recent allometric analyses have used the standard programme CAIC (comparative analysis by independent contrasts) developed and distributed by Purvis and Rambaut (1995). For primates,such analyses have been facilitated by the availability of a tailor-made consensus phylogenetic tree (Purvis,1995). In principle, all of the standard methods used for allometric analysis of the raw data can also be used for analysis of contrast values. However, for technical reasons relating to the obligate selection of an independent variable for calculation of contrast values, the best-fit line prescribed for their analysis is a least-squares regression forced through the origin (see also Garland et al.,1992).

It should be noted that insistence on the need for action to offset effects of phylogenetic inertia essentially concerns the issue of reliability of tests for statistical significance. For example, if it is claimed that fruit-eating primates typically have larger brains than leaf-eating primates (e.g. Allman, 1999), it is necessary to exclude the possibility that any probability value attached to this claim is not biased by the influence of recent common ancestry of fruit-eaters and leaf-eaters, respectively. It should be noted that Smith(1994) has suggested an alternative approach to eliminating the effects of phylogenetic constraint in any such comparison by reducing the number of degrees of freedom for calculation of a probability value. In his examples, approximate halving of the degrees of freedom was found to be appropriate. However, this alternative approach leaves open the question of whether, for any given comparison,phylogenetic inertia might have biased the slope of the allometric line, and it begs the question of the reliability of residual values determined for individual species.

All three problems of allometric analysis discussed thus far - choice of an appropriate best-fit line, recognition and appropriate treatment of grade shifts, and coping with potential bias arising from phylogenetic inertia -relate to determination of the allometric relationship and calculation of reliable residual values on that basis. A fourth problem concerns interpretation of these results derived from allometric analysis and arises from dangers inherent in progressing from correlations to causation. It is all too easy to jump to conclusions about underlying causal factors on the basis of just a few allometric analyses, in the extreme case relying on just one bivariate plot. It is essential to recognize that biological variables are typically linked in complex networks and that it is oversimplification to single out a pair of variables for analysis (e.g. adult brain and body mass). This is why the distinction between `dependent' and `independent' variables in biological systems is fraught with difficulty. It is important to be especially careful in making such a distinction because it may influence propagation of errors when allometric relationships are used to generate derived ones (e.g. see Taper and Marquet,1996). It is therefore essential to conduct numerous allometric analyses and to focus on identifying testable hypotheses in order to move cautiously towards a causal interpretation. One technique that can be used in tackling complex networks of biological variables is partial correlation,which permits determination of the correlation remaining between any two variables after the influence of one or more other variables has been excluded. Although direct interpretation of a partial correlation as a causal relationship should be avoided just as carefully as for a simple correlation(Sokal and Rohlf, 1981), this approach is certainly a valuable tool in attempting to progress from correlation to functional interpretation.

These four main problems in allometric analysis are further exacerbated by potential interactions between them. For instance, the choice of an appropriate line-fitting method in allometric analysis can become a secondary issue if grade shifts are present, because determination of single best-fit line of any kind for a dataset that is clearly subdivided can yield misleading results (`grade confusion'). Now it might be thought that, in addition to eliminating potential bias resulting from phylogenetic inertia, calculation of independent contrasts could eliminate effects of grade shifts, such that no special consideration of this problem is required. Indeed, this has been claimed as a benefit of the CAIC programme(Purvis and Rambaut, 1995). In principle, it might be expected that a grade shift in one group of species might arise in the common ancestor of that group and thus affect only one contrast value calculated for that ancestral node. This might in fact happen when a single grade shift is present right at the base of an evolutionary tree. In other cases, the practice of averaging values between adjacent nodes to permit calculation of contrast values within the phylogenetic tree will actually lead to diffusion of the effects of grade shifts through lower nodes. Thus, especially if there are multiple grade shifts within a given tree and if they are located well above the initial ancestral node, a complex pattern of deviating contrast values will result. Of course, if misleading results emerge from allometric analysis because of failure to recognize and deal effectively with grade shifts in a dataset, any functional interpretation developed on that basis must also be flawed.

## Examples from mammalian reproduction

Practical application of the interconnected principles of allometric analysis discussed above can be illustrated with two examples from mammalian reproductive biology that link up with inference of a possible connection between reproductive variables and the evolution of brain size. The two examples involve (1) gestation periods in mammals, and (2) neonatal body mass in primates. Before presenting the allometric analysis of these two variables,it should be noted that none of the variables involved (adult body mass,gestation period, neonatal body mass) conforms to a normal distribution(Fig. 2: Shapiro-Wilk test; for dataset (1) with *N*=429, *M*_{b}=0.91 and 0.96 for log gestation time and log adult body mass, respectively; for dataset (2) with *N*=109, *M*_{b}=0.93 and 0.95 for log neonatal body mass and log adult body mass, respectively; *P*<0.01 in all cases;),thus confirming that a non-parametric approach is preferable.

Scaling of mammalian gestation periods provides a particularly striking example of the fundamental need to recognize grade distinctions. If no attention is paid to possible grade distinctions, the resulting conclusion is that the slope for scaling of mammalian gestation periods to body size is quite steep (Fig. 3). Any single best-fit line that is determined yields a scaling exponent value close to α=0.25. There have been suggestions that a scaling exponent of this value is typical for individual components of mammalian life histories (e.g. developmental periods and lifespan; West et al., 1997), and it has even been suspected that there might be a connection with the scaling exponent value of α=0.75 for basal metabolic rate in that the two exponents combined could result in a `metabolic lifetime' with a scaling exponent of α=1. However, all of this ignores the long-established distinction between mammal species that give birth to multiple litters of poorly developed altricial neonates and those that give birth to (typically) single, well-developed precocial neonates (Portmann, 1938, 1939, 1965). It should be patently obvious that, other things being equal, development of an altricial neonate should require less time than development of a precocial neonate. Consequently, it is to be expected that there should be a distinct grade shift in a plot of gestation periods against adult body mass, with values for species with precocial neonates generally exceeding those for species with altricial neonates at any given adult body mass. This prediction is, indeed,borne out if scaling of gestation period in placental mammals is analyzed for altricial and precocial neonates as two separate grades(Martin and MacLarnon, 1985). The slope of the scaling relationship for each individual grade (altricial or precocial) is clearly less steep than for the overall dataset(Fig. 4). In fact, the scaling exponent value for each grade is almost halved, to α≈0.15. Hence,there is in fact no empirical support for the proposition that mammalian gestation periods resemble other life-history components in scaling with an exponent value close to α=0.25.

Scaling of gestation periods in placental mammals provides a good test case for exploring the grade-detecting capacity of the non-parametric `rotation'line-fitting method (Isler et al.,2002). In fact, it had already been shown that iterative fitting of lines of different slopes to plots of gestation period against adult body mass yielded a bimodal distribution at a value of α≈0.15(Martin, 1989). Clearly, such a result should not emerge with a homogeneous dataset conforming even approximately to a bivariate normal distribution. Following application of the rotation method to data on gestation periods for 429 placental mammal species,visual inspection of a plot of *D*-values reveals that, in addition to the global minimum value corresponding to α=0.26 there is a local minimum value corresponding to α≈0.15(Fig. 5A). When altricial species (*N*=227) and precocial species (*N*=202) are analysed separately, in each case the plot of *D*-values exhibits a single global minimum (Fig. 5B,C). The minimum for altricial mammals corresponds to an α value of 0.176, while the minimum for precocial mammals corresponds to an α value of 0.133. The mean of these two values is α=0.155, corresponding to the local minimum value seen in Fig. 5A.

In fact, the problem posed by grade shifts in the scaling of gestation periods in placental mammals is even more complicated than the simple division between altricial and precocial species would suggest. If individual taxonomic groups within each category are examined, it can be seen that there are less pronounced grade shifts between them. Among altricial mammals, lipotyphlan insectivores and carnivores generally have relatively longer gestation periods than lagomorphs and myomorph rodents (Fig. 6A), while among precocial mammals primates tend to have relatively longer gestation periods than artiodactyls, and the latter in turn tend to have relatively longer gestation periods than hystricomorph rodents(Fig. 6B). However, there is considerable overlap between taxonomic groups within each category of neonate type, such that the curves for *D*-values do not show any local minima for either category (Fig. 5B,C). Hence, detection of subtle grade shifts still depends on careful examination of the data to seek differences between biologically meaningful groups (separate taxa or functional groupings). In the case of gestation periods, it is important to note that the ancestral condition for placental mammals was probably production of litters of altricial neonates(Portmann, 1938, 1939; Martin, 1990). Accordingly, in addition to the minor divergences now observable among altricial mammals(Fig. 5A), there were probably several major grade shifts associated with the evolution of precocial offspring. Assuming that no reversals occurred, the molecular phylogeny for placental mammals generated by Murphy et al.(2001) would require shifts from the altricial to the precocial condition in 10 separate lineages(primates; hyraxes + elephants + sirenians; artiodactyls + cetaceans;perissodactyls; hystricomorph rodents; elephant shrews; bats; pangolins;anteaters; xenarthrans). Despite the existence of minor grade distinctions within categories, it is obvious from histograms of residual values for gestation period, calculated with an exponent value of α=0.15, that there is a fundamental dichotomy between altricial and precocial species(Fig. 7).

The second example of the need to recognize distinct grades in reproductive biology is provided by the scaling of neonatal body mass in primates. It has been known for some time that in strepsirrhine primates (lemurs and lorises)neonates are markedly smaller relative to adult body mass than in haplorhine primates (tarsiers, monkeys, apes and humans)(Leutenegger, 1973; Martin, 1990). If a single best-fit line is determined for the scaling of neonatal body mass in primates(Fig. 8), the commonly used parametric techniques all yield an exponent value (α) close to 0.90(least-squares regression: 0.874; major axis: 0.906; reduced major axis:0.909). However, visual inspection of a plot of *D*-values derived from application of the rotation method clearly shows that, in addition to the global minimum value corresponding to α=0.916 there is a local minimum value corresponding to α=0.624 (Fig. 9A). When strepsirrhine primates (*N*=28) and haplorhine primates (*N*=81) are analysed separately, in each case the plot of D-values exhibits a single global minimum(Fig. 9B,C). The minimum for strepsirrhines corresponds to a α value of 0.688, while the minimum for haplorhines corresponds to a α value of 0.862. The average of these two values is α=0.775, which is higher than the local minimum value seen in Fig. 9A. Given this discrepancy and the fact that the α values determined for strepsirrhines and haplorhines differ from one another, it seems likely that there are further subtle grade distinctions within the dataset, in addition to the primary distinction between strepsirrhines and haplorhines(Fig. 10).

## Progressing from correlation to causation

Considerable care is needed in attempting to proceed from the empirical results of allometric scaling analyses to inference of underlying causal connections. An illustrative example is provided by analyses of potential links between brain size and basal metabolism in placental mammals. At the simplest level, several authors noted from analyses of large datasets that the value of the scaling exponent for brain mass is closely similar to that found for basal metabolic rate, with α≈0.75 in both cases(Bauchot, 1978; Eisenberg, 1981; Armstrong, 1982; Hofman, 1982; Martin, 1990). This led to a number of proposals for some kind of connection between brain size and basal metabolic rate. However, it is at once apparent that any existing link must be indirect because there is far less residual variation relative to the best-fit line with BMR than with brain size. Overall, BMR varies only by a factor of four relative to body size, whereas brain size shows much greater variation relative to body size, showing a 25-fold range of variation. Hence, there is considerable variation in relative brain size that cannot be explained by variation in relative BMR. This and other considerations led the first author to propose the maternal energy hypothesis (Martin, 1981, 1983, 1996, 1998). In this hypothesis, it is proposed that the mother's metabolic turnover constrains energy availability for brain development in the embryo/foetus during intrauterine development and in the offspring during postnatal life up to the time of weaning. Accordingly, differences between species in gestation period and lactation period could generate variability in completed brain size that is not directly attributable to BMR. One of the initial indicators of a potential link between maternal physiology and brain development is the finding that neonatal brain mass is more tightly correlated with gestation period than is neonatal body mass in placental mammals(Sacher and Staffeldt,1974).

Because of the typical existence of complex biological networks, it is unwise to rely on individual correlations between variables. For this reason,it is very useful to employ the technique of partial correlation, as this permits examination of the correlation between any two variables after the influence of other variables has been taken into account. An example of this approach is provided by partial correlations from a four-way analysis involving adult body mass, adult brain mass, basal metabolic rate and gestation period for a sample of 51 placental mammal species. [N.B. A similar analysis for 53 mammal species was reported by Martin(1996, 1998). Data for two species were subsequently found to be questionable, and a repeat analysis with a reduced sample of 51 species yielded somewhat higher correlations.] It can be seen from Fig. 11 that, as expected, there is a strong partial correlation between BMR and body mass. However, it is also seen that adult brain mass shows substantial partial correlations with adult body mass, BMR and gestation period, indicating that all three variables are connected in some way to brain mass. As the maternal energy hypothesis seemingly provides the only potential explanation for a connection between brain size and gestation period as well as BMR, this can be viewed as supporting evidence. Perhaps the most interesting finding, however,concerns the relationship between BMR and gestation period. If the correlation of either variable with body mass is considered in isolation, a strong positive value is obtained. However, when partial correlations are considered,it emerges that the remaining correlation between BMR and gestation is negative (Fig. 11). This suggests that an increase in brain mass may be associated with an increase in either BMR or gestation period relative to body mass, but that these two variables do not increase in tandem. Hence, there is an apparent trade-off between BMR and gestation period in the development of relatively large brains.

Another way to approach the problem is through the use of path analysis,for which an underlying model of causal relationships between the variables is explicitly stated. However, there is the drawback that, when applied to comparative analysis, this technique again raises the issue of having to distinguish between `dependent' and `independent' variables. A preliminary path analysis was conducted, although the available dataset (51 species) is somewhat limited for this kind of approach. Adult brain mass was considered to be the dependent variable and the three other variables (adult body mass, BMR,gestation period) were treated as inter-correlated predictor (causal)variables. Using such a model, a high coefficient of determination was achieved (*r ^{2}*=0.957; Sokal and Rohlf, 1981). Interestingly, the correlation between BMR and brain mass appeared mainly as the result of a strong direct correlation between the two variables (0.59, for a total observed correlation of 0.97), indirect `effects' (a term used here in a non-causal or predictive sense) through gestation period and body mass being relatively minor (0.38). The converse was true for the correlation between body mass and brain mass or for the correlation between gestation period and brain mass. In both cases, direct effects turned out to be minor (0.26, for a total observed correlation of 0.97, and 0.17, for a total correlation of 0.75,respectively), the major indirect effect being through BMR in each case (0.58 and 0.39, respectively). This very preliminary analysis confirmed that the relationship between body mass and brain mass is mainly indirect (here through its correlation with BMR), but also suggested that the same applies to gestation period.

It is important to note that these findings for partial correlations and for all of the other analyses discussed thus far were obtained using raw variables. Thus far, no attempt has been made to correct for possible effects of differential phylogenetic relatedness among the species examined in any sample. It is therefore necessary to turn to the issue of potential conflict between such phylogenetic relatedness and the requirement for statistical independence of data points.

## The problem of phylogenetic inertia

It is undoubtedly true that failure to consider phylogenetic relationships might lead to misleading results because of the potential problem of phylogenetic inertia. At the very simplest level, over-representation of a group of closely related species in a sample could swamp data from other taxonomic groups. An apt example is that of the hominoid primates (apes), in which relatively species-rich lesser apes (gibbons) outnumber the great apes(chimpanzees, gorillas and orang-utans). One recent taxonomic revision(Groves, 2001) recognizes 14 gibbon species allocated to one genus, as opposed to six great ape species allocated to three genera. As a result, any analysis of scaling relationships among apes at the species level could be biased by over-representation of gibbons. Although this problem could be partially offset by restricting analysis to the generic level, this would reduce the sample size by 80% (from 20 species to four genera). A similar bias can also result in cases where data are more easily available for certain taxa than for others. For instance, in scaling analyses involving Old World monkeys, data are often far more readily available for macaques (*Macaca* species) than for any of the remaining 21 genera, including the highly speciose genus *Cercopithecus*. Phylogenetic inertia can also exert more subtle influences with respect to the origins of specific adaptations. For instance, in examining possible links between diet and relative brain size, it is conceivable that a small number of adaptive shifts from frugivory to folivory might account for any observed pattern. In the oft-quoted example in which fruit-eating primates are found to have relatively larger brains than leaf-eating primates, it is important to be aware of the possibility that leaf-eating species may be descended from a small number of ancestral nodes and that any correlation with brain size that is detected may be weakly supported. Hence, it is certainly important to examine any dataset for potential sources of bias arising from imbalanced phylogenetic representation.

In the seminal paper by Felsenstein(1985) on potential bias of comparative analyses arising from phylogenetic relatedness, one example specifically cited was a study of brain size scaling in mammals conducted by the first author (Martin,1981). It was argued that the data points for individual species might not meet the criterion of statistical independence because of their differential degrees of relatedness within the phylogenetic tree. It is, of course, conceivable that this might bias the results, although Felsenstein did not actually demonstrate that it did. It is a moot point whether or not the degree of change accompanying divergence between sister species is sufficient to dilute the effects of phylogenetic inertia to the point where conflict with the requirement of statistical independence is minor and perhaps negligible. Before pursuing this point, it should be noted that any statistical problems associated with differential degrees of relatedness applying to comparisons *within* species should be massive relative to those associated with interspecific comparisons, as differential divergence within a single gene pool over much shorter periods of time must surely entail strong relatedness effects. Curiously, however, this problem has been relatively neglected in comparison to the extensive recent literature on potential effects of phylogenetic inertia in comparisons between species.

Felsenstein's (1985) view that phylogenetic inertia could be a problem in interspecific comparisons was driven in part by the results of nested analysis of variance (ANOVA) conducted with certain variables. Similar results subsequently reported by Harvey and Pagel (1991) bolstered a belief in the necessity for measures to exclude the effects of phylogenetic inertia. Harvey and Pagel reported ANOVA results for several variables in placental mammals (adult body mass, neonatal body mass, gestation period, age at weaning, maximum reproductive lifespan, annual fecundity and annual biomass production), consistently indicating that there is relatively little variance at the level of species and genera. Only 8-20% of the variance was found between species and genera, whereas 80-92% was found at higher taxonomic levels (between families and orders). On the face of it, these figures do seem to suggest that there is relatively little evolutionary divergence between closely related species or even genera and that phylogenetic inertia may hence be a major problem. Initially, this provided a rationale for several studies in which allometric analysis was conducted at the level of the subfamily or above, and subsequently it was invoked as a justification for special techniques such as calculation of `independent contrasts'. However, it should be emphasized that the nested ANOVAs reported by Felsenstein(1985) and by Harvey and Pagel(1991) were all conducted on the raw data. This approach is questionable because many biological parameters are highly correlated with body mass and because body mass itself provides a prime example of phylogenetic inertia, typically differing far less between closely related species than between distantly related species. For example,there is a relatively limited range of body mass values within each order of placental mammals (Fig. 12A),and this pattern is replicated between families within any given order, such as primates (Fig. 12B). Because most features are correlated with body mass, it follows that the distribution of variance in raw values of individual biological variables(e.g. gestation period or brain mass) will generally exhibit a pattern very similar to that observed with body mass. Yet the real question of interest in scaling analyses is whether brain mass, for instance, is tightly constrained or relatively free to vary at any given body size. In other words, it is the pattern of variation of residual values rather than that of the raw values that needs examination. When nested ANOVA was conducted on adult body mass,gestation period, brain mass and basal metabolic rate for a large sample of placental mammals, the result obtained with the raw values was similar to that reported by Felsenstein (1985)and by Harvey and Pagel(1991). Only 5-16% of the variance in residual values was found between species and genera, whereas 84-95% was found between families and orders. By contrast, when nested ANOVA was conducted on the residual values for gestation period, brain mass and basal metabolic rate, more variance was detected at low taxonomic levels and there were more pronounced differences between variables. With brain mass,34.6% of the variance in the residuals was found at the generic and specific levels, and for basal metabolic rate that figure was even higher, at 45.1%(Fig. 13). Hence, with these two variables, analysis conducted at the subfamilial level would have led to exclusion of one third to almost half of the residual variance. With gestation period, however, the picture is very different. Only 7.4% of the variance in the residuals was found at the generic and specific levels, only slightly greater than the value of 5.4% found with the raw data. Thus, it would seem that gestation period - in contrast to brain mass and basal metabolic rate -is, indeed, subject to considerable phylogenetic inertia. Such inertia had been explicitly proposed by Martin and MacLarnon(1985), who noted that gestation periods generally vary little between species within a genus.

At this point, it is necessary to reflect on the meaning of the catch-all term `phylogenetic inertia', which can encompass several different phenomena(Fig. 14). Global inertia affecting both *X* and *Y* values in closely related species,essentially resulting in repeat values, is perhaps the simplest form imaginable, with similar genotypes constraining organisms to fit a similar bodily pattern in all respects. However, it is also possible that inertia will primarily affect only one of the two variables. One possibility would be inertia mainly restricted to *Y* values, such that differences in body size between closely related species are not accompanied by adjustments in the *Y* variable (scaling inertia). An example of this might be provided by mammalian gestation periods, although species within a genus also tend to be relatively similar in overall body size. The alternative possibility is inertia mainly restricted to *X* values (body size inertia). An example of this could be provided by relatively wide variation in metabolic rate between closely related species without any marked divergence in body size. Finally, it is possible to envisage a constrained allometric relationship between the *X* and *Y* variables as a form of inertia, assuming that close adherence to a given scaling principle might be an inherited property of closely related species (allometric inertia). From this perspective, marked departures from the scaling principle (i.e. relatively large residual values) can be seen as an escape from the general allometric constraint. Given all of these different possibilities for phylogenetic inertia, it is difficult to see how a single analytical procedure (e.g. the CAIC programme) would effectively deal with them all at once. As is seen in Fig. 14, the different kinds of inertia will exert very different effects on any `independent contrast'values that are calculated. It is particularly important to note that inertia primarily affecting only one variable (scaling inertia or body size inertia)will generate contrast values that deviate from the best-fit line reflecting the allometric relationship. Ironically, only allometric inertia in which both variables are quite tightly constrained to a particular scaling relationship will yield values conforming closely to the best-fit line.

The problems involved in attempting to correct for the effects of phylogenetic inertia using `independent contrasts' can be illustrated with the practical examples of basal metabolic rate (BMR) and brain size in mammals. Allometric analyses of these two variables contributed to the maternal energy hypothesis for brain size scaling in mammals, the starting point for which was the observation that the value of the scaling exponent (α) is close to 0.75 in both cases. However, whereas the exponent value of 0.75 persisted for BMR when scaling was examined using contrast values, the exponent value for scaling of contrast values for the brain declined to 0.69(Harvey and Pagel, 1991). There are two possible explanations for this result. The first is that the exponent value of 0.75 obtained for brain size scaling by analysis of the raw data was an artefact arising from effects of phylogenetic inertia, and that application of the contrast method removed taxonomic bias to yield the correct result. The second possibility is that application of the contrast method in fact generated some distortion in the original relationship reflected by the raw data. This latter possibility is suggested by a number of findings. For instance, when analysis of brain size scaling was conducted with overall mean values calculated for individual mammalian orders, a scaling exponent value close to 0.75 was determined (Harvey and Bennett, 1983). Given that the various orders of placental mammals diverged between 60 and 90 million years ago, it might be expected that phylogenetic inertia would exert relatively little effect on the result obtained for scaling of ordinal mean values. However, it could be argued that differential degrees of phylogenetic relatedness between species within each order might have had some influence on the scaling pattern observed. An alternative approach that circumvents this problem is to take just one species at random from each order of placental mammals and examine the scaling of brain size. When this is done repeatedly (analyses conducted by the first author), the average scaling exponent value found is close to 0.75. There is therefore a mismatch between the results obtained with raw values at the ordinal level and that obtained after the calculation of independent contrast values.

*k*) and any kind of measurement error (ϵ). The revised equations read as follows:

The implication of this is that any scaling analysis conducted with contrast values - e.g. (log *Y*_{1} - log *Y*_{2}) *versus* (log *X*_{1} - log *X*_{2}) - will be complicated by any difference in value for the scaling coefficient (*k*_{1}*versus k*_{2})and by any difference in the error terms (ϵ). Hence, greater variation in the range of the residual values (i.e. a greater range in *k* values)will predictably exert an influence on the scaling relationship determined with contrast values. Furthermore, there will be an unexpectedly undesirable effect of comprehensive samples. Because closely related species tend to have similar body sizes (Fig. 12),in a dataset including marked residual variation any contrast values that are calculated will tend to be relatively small in comparison to the other terms in the equation, (log *k*_{1}-log *k*_{2}) and(ϵ_{1}-ϵ_{2}). The noise generated by these additional terms may well overwhelm the original signal in the raw data. This effect can be demonstrated with a simple example. If, as explained above, one species is selected at random from each order of placental mammals, analysis of the raw data yields an exponent value close to 0.75. If independent contrasts are calculated for such data, the exponent value remains close to 0.75, and if the regression line is forced through the origin the result remains essentially the same as with the raw data(Fig. 15A). However, if the dataset is increased to include two species instead of one from each mammalian order, thus increasing the effect of closely related species with relatively similar body sizes, very different results are obtained. Whereas the raw data still yield an exponent value close to 0.75, the value yielded for the contrasts by a regression line forced through the origin is markedly lower and close to 0.69 (Fig. 15B). Thus, merely by doubling the sample of species taken at random from each order, it is possible to replicate the result reported by Harvey and Pagel(1991) for the scaling of brain size to body size using contrast values. As such calculation of independent contrasts is typically undertaken with no preceding attempt to identify and separate grades in the dataset, the potentially distorting effect of grade shifts on contrast calculations is effectively ignored. Given the evidence summarized here, it may be concluded that the appropriate exponent value for the scaling of brain size to body size in placental mammals is, in fact, close to 0.75 and that, in this case at least, calculation of contrast values can lead to misleading results. The take-home message is that any analysis that fails to give as much attention to choice of an appropriate line-fitting technique and to grade shifts as to the potential effects of phylogenetic inertia is unlikely to yield reliable results.

## Acknowledgements

Thanks are due to Ann MacLarnon for invaluable collaboration in initial data collection and analysis within the framework of a project funded by the Medical Research Council (UK) in 1982-1985, and for much helpful discussion and advice over the intervening years. We are also grateful to Karin Isler and Andrew Barbour for their crucial work in developing the non-parametric line-fitting method (`rotation line') that figures prominently in this paper. Edna Davion also deserves heartfelt thanks for providing much-needed research assistance and logistic support during the preparation of this paper.

## References

**Allman, J.**(

**Armstrong, E.**(

**Barton, R. A. and Harvey, P. H.**(

**Barton, R. A., Purvis, A. and Harvey, P. H.**(

**Bauchot, R.**(

**Bejan, A.**(

**Bejan, A.**(

**Brody, S.**(

**Brody, S. and Procter, T. C.**(

**Clutton-Brock, T. H. and Harvey, P. H.**(

**Darveau, C. A., Suarez, R. K., Andrews, R. D. and Hochachka, P. W.**(

**Dawson, T. H.**(

**Dawson, T. H.**(

**Dubois, E.**(

**Dubois, E.**(

**Dubois, E.**(

**Dunbar, R. I. M.**(

**Ellison, A. M.**(

**Eisenberg, J. F.**(

**Eisenberg, J. F. and Wilson, D. E.**(

**Felsenstein, J.**(

**Freckleton, R. P., Harvey, P. H. and Pagel, M.**(

**Garland, T., Harvey, P. H. and Ives, A. R.**(

**Gittleman, J. L.**(

**Gould, S. J.**(

**Groves, C. P.**(

**Hansen, T. F.**(

**Harvey, P. H. and Bennett, P. M.**(

**Harvey, P. H. and Mace, G. M.**(

**Harvey, P. H. and Pagel, M. D.**(

**Hemmingsen, A. M.**(

**Hemmingsen, A. M.**(

**Hofman, M. A.**(

**Huxley, J. S.**(

**Isler, K., Barbour, A. D. and Martin, R. D.**(

**Jerison, H. J.**(

**Jerison, H. J.**(

**Kleiber, M.**(

**Kleiber, M.**(

**Kleiber, M.**(

**Leutenegger, W.**(

**Martin, R. D.**(

**Martin, R. D.**(

**Martin, R. D.**(

**Martin, R. D.**(

**Martin, R. D.**(

**Martin, R. D.**(

**Martin, R. D. and Barbour, A. D.**(

**Martin, R. D. and MacLarnon, A. M.**(

**McNab, B. K.**(

**Murphy, W. J., Eizirik, E., O'Brien, S. J., Madsen, O., Scally,M., Douady, C. J., Teeling, E., Ryder, A. O., Stanhope, M. J., de Jong, W. W. et al.**(

**Portmann, A.**(

**Portmann, A.**(

**Portmann, A.**(

**Portmann, A.**(

**Portmann, A.**(

**Purvis, A.**(

**Purvis, A. and Rambaut, A.**(

**Riska, B.**(

**Ross, C. A.**(

**Sacher, G. A.**(

**Sacher, G. A. and Staffeldt, E.**(

**Sawaguchi, T.**(

**Schmidt-Nielsen, K.**(

**Schmidt-Nielsen, K.**(

**Smith, R. J.**(

**Smith, R. J. and Leigh, S. R.**(

**Snell, O.**(

**Sokal, R. R. and Rohlf, F. J.**(

**Taper, M. L. and Marquet, P. A.**(

**Weibel, E. R.**(

**West, G. B., Brown, J. H. and Enquist, B. J.**(

**West, G. B., Brown, J. H. and Enquist, B. J.**(

**White, C. R. and Seymour, R. S.**(

**White, C. R. and Seymour, R. S.**(