## SUMMARY

Traditionally, physiologists have estimated the ability of organisms to withstand lower partial pressures of oxygen by estimating the partial pressure at which oxygen consumption begins to decrease (known as the critical *P*_{O2} or *P*_{c}). For almost 30 years, the principal way in which *P*_{c} has been estimated has been *via* piecewise ‘broken stick’ regression (BSR). BSR was a useful approach when more sophisticated analyses were less available, but BSR makes a number of unsupported assumptions about the underlying form of the relationship between the rate of oxygen consumption and oxygen availability. The BSR approach also distils a range of values into a single point with no estimate of error. In accordance with more general calls to fit functions to continuous data, we propose the use of nonlinear regression (NLR) to fit various curvilinear functions to oxygen consumption data in order to estimate *P*_{c}. Importantly, our approach is back-compatible so that estimates using traditional methods in earlier studies can be compared with data estimates from our technique. When we compared the performance of our approach relative to the traditional BSR approach for real world and simulated data, we found that under realistic circumstances, NLR was more accurate and provided more powerful hypothesis tests. We recommend that future studies make use of NLR to estimate *P*_{c}, and also suggest that this approach might be more appropriate for a range of physiological studies that use BSR currently.

## INTRODUCTION

Understanding the physiological tolerances of an organism to its environment has long been a focus of ecophysiology. Oxygen is a fundamental requirement of most organisms, and its availability may be limiting across a range of habitats (Pörtner, 2010; Verberk et al., 2011; Ferguson et al., 2013). For over 30 years, measures of oxygen tolerance have accumulated in a wide variety of taxa with variation evident at a range of scales (Greenlee and Harrison, 2004a; Mueller and Seymour, 2011; Lease et al., 2012; Ferguson et al., 2013). Surprisingly, no clear consensus over how to estimate oxygen tolerance exists, and for the most part, modern statistical approaches have not been brought to bear on this problem.

The most common estimate of oxygen tolerance is the critical partial pressure of oxygen for aerobic metabolism (*P*_{c}), which represents the lowest level of oxygen at which aerobic metabolism is independent of the ambient partial pressure of oxygen (*P*_{O2}) (Hochachka and Somero, 2002). At levels of *P*_{O2} below *P*_{c}, metabolism cannot be supported by aerobic processes entirely, and metabolic rate decreases, and/or anaerobic processes that are relatively inefficient and produce potentially toxic end-products become increasingly important (Hochachka and Somero, 2002). The original method for estimating *P*_{c} is the ‘broken stick’ regression (BSR) approach (Yeager and Ultsch, 1989) – an approach that remains the most common today. The broken stick approach to estimating *P*_{c} has been applied in a range of contexts, and been used to demonstrate, for example, that the *P*_{c} of a species is generally matched to the minimum oxygen level encountered in the environment in which it lives (Childress and Seibel, 1998; Nilsson, 2007; Ferguson et al., 2013), and that mobile species show behavioural avoidance of oxygen levels below their *P*_{c} (Burleson et al., 2001). Nonetheless, not all species show clear break points in the relationship between the rate of oxygen consumption () and *P*_{O2}, which complicates efforts to assess the regulatory ability these species (Mueller and Seymour, 2011).

The traditional BSR approach (piecewise linear regression) makes a number of unsupported assumptions about the underlying relationship between oxygen availability and respiration rate. First, it assumes that the functional response of an organism to decreasing *P*_{O2} is biphasic, that is, it consists of two linear elements with a clear break between these two phases. Above *P*_{c}, is assumed to be characterised by a linear function that is completely flat, while below *P*_{c}, decreases linearly with *P*_{O2} with an abrupt transition between these two functions (Chiu et al., 2006). Of course, in reality, rates of respiration are likely to be a continuous function between these two phases, and furthermore, concentration-dependent reaction kinetics make a linear relationship between and *P*_{O2} highly unlikely. As such, the BSR approach does not reflect the underlying structure of the data, violating the basic assumptions of the regression approach (Quinn and Keough, 2002). Our discussion here should not be taken as a criticism of the original progenitors of these approaches: when they were developed, those analyses represented the best approach available with the statistical and computational tools of the time. Today, however, more sophisticated approaches are available that better reflect the underlying processes that generated the data.

Since the development of the BSR technique, there have been a number of other ways in which authors have attempted to estimate *P*_{c}. These approaches have largely been subjective and lack repeatability (e.g. Portner et al., 1991; Greenlee and Harrison, 2004a; Greenlee and Harrison, 2004b; Lease et al., 2012). Ideally, any technique for estimating *P*_{c} should both represent the mechanistic process by which the data were generated and be repeatable. More generally, there are compelling reasons for describing continuous traits with functional relationships, rather than taking isolated point measures (Stinchcombe and Kirkpatrick, 2012). For example, Mueller and Seymour (Mueller and Seymour, 2011) fit a nonlinear function to estimate the ability of organisms to regulate oxygen consumption beyond simple oxy-conformity across a range of oxygen values: they propose a ‘regulation index’, which estimates a relative measure of oxyregulatory ability using either linear, quadratic (*y*=*a*+*bx*+*cx*^{2}) or one-phase association [*y*=*y*_{0}+(*y*_{max} −*y*_{0})(1–*e*^{−}* ^{kx}*)] fits. Here, we apply a similar logic for estimating

*P*

_{c}that meets the above criteria – a nonlinear regression approach coupled with simple differential calculus. We then evaluate the performance of this new method relative to the most widely used approach to date, BSR, across real-world data and explore a broader parameter space using simulations to compare the sensitivity and reliability of these two approaches. Finally, we examine the statistical power of the two techniques to distinguish differences in oxygen tolerance among groups.

## MATERIALS AND METHODS

### Fits to published data

After examining the form of many published relationships between and *P*_{O2}, we settled on six candidate functions that approximated the general relationship well and were reasonably tractable analytically (Table 1). While other forms could also fit the data and we encourage investigators to explore alternatives (e.g. a power function with an intercept), we chose these forms as, on first inspection, they appeared to fit real-world data reasonably well and consistently. We used standard nonlinear regression (NLR) to fit these functions to published data. It is beyond the scope of this paper to describe the general theory and approach of NLR, but we recommend Quinn and Keough [see p. 150 (Quinn and Keough, 2002)] for a general introduction and Ritz and Streibig (Ritz and Streibig, 2008) for an excellent primer on how to implement this analysis in the freely available statistical software R (R Development Core Team, 2012). The NLR analysis provides estimates of between two and four parameters (or even more if a selected function has more): with these values the relationship between and *P*_{O2} can be visualised. For ease of use and comparison among studies, however, a single metric that best represents *P*_{c} is desirable. Such a metric must be repeatable and objective. We based our metric on the underlying principles of what *P*_{c} seeks to describe: the point at which is no longer strongly affected by *P*_{O2}. In other words, when the slope of the function begins to flatten out and approach zero. As the absolute values of a slope will vary according to the maximum oxygen consumption rate of that particular organism, we first standardised our data by the maximum observed for any set of values (). By standardising, we can choose one slope value and compare where that value occurs among individuals, species and studies.

The slope across function is, of course, given by the first derivative of the original function and the derivative for each function is shown in Table 1. After rearranging we can solve for *P*_{c}, at any slope value (here, denoted *m*), and these are shown in Table 1.

We explored a range of values of *m* and found that a slope of *m*=0.065 best approximates *P*_{c} such that the solved values for *P*_{c} are shown in Table 2.

We coupled this formula to our NLR estimates of *a, b, c* and *d* as appropriate to estimate the value of *P*_{c} for a range of published data from the literature (see Table 2). NLR relationships with additive error structures were fitted using the ‘nls’ function in R v2.15.0 (R Development Core Team, 2012), and NLR relationships with multiplicative error structures were fitted using the ‘gnls’ function. The NLR and BSR fits to each data set were then compared on the basis of Akaike's information criterion (AIC) as a measure of model fit; the fit with the lowest AIC was considered the best of the candidate set of models, given the data (Burnham and Anderson, 2008). We then also compared the performance of the NLR approach with the BSR method across a range of parameter values to evaluate how each approach coped with differences in variation and sampling resolution with a simulation approach.

### Simulation methods

We explored the influence of a suite of characteristics of the relationship between and *P*_{O2} on NLR- and BSR-derived estimates of *P*_{c} using Monte Carlo simulations. We began by generating a function relating to *P*_{O2} that was explicitly biphasic, incorporating two linear portions (one that increased with *P*_{O2} until *P*_{O2}=*P*_{c}, and a second that was independent of *P*_{O2} at *P*_{O2} levels greater than *P*_{c}). *P*_{c} was set at 5 kPa and above *P*_{c} was set at 1. We then repeatedly simulated the process of sampling these data, and estimating the value of *P*_{c} by fitting both the BSR and NLR. We elected to use a biphasic function because it reflects the underlying assumption of the BSR approach, and because fitting such data should represent the greatest challenge to NLR. Put simply, if NLR outperforms BSR, even when the function is biphasic, then there can be little justification for preferring BSR over NLR, and as such our approach was highly conservative. We sampled the relationship between and *P*_{O2} at a range of resolutions (0.125, 0.25, 0.5 and 1 kPa). To each , we then added a normal deviate with a mean of zero and a coefficient of variation (CV=s.d. divided by ) of 0.025, 0.05, 0.1, 0.15 or 0.20. These values for the CV of were selected to span the range of observed values of CV in real data (*N*=10, mean=0.09, range: 0.04–0.21; see references in Table 1). One thousand such data sets were generated for each combination of sampling resolution and CV, and *P*_{c} was estimated using both NLR and BSR techniques. NLR estimates of *P*_{c} were derived as described above by fitting a Weibull function to each data set (excluding that of *Cryptobranchus* as the model would not converge) using the ‘nls’ function in R v2.15.0 (R Development Core Team, 2012) to obtain a non-linear least squares fit using a Gauss–Newton algorithm. We used the Weibull function as this function best fit the most published relationships in the literature (see Results). BSR estimates of *P*_{c} were obtained by simultaneously fitting two linear regressions constrained to meet at a specified *P*_{O2}, i.e. *P*_{c}. The slope of the linear regression below the specified *P*_{c} was a free parameter, whereas the slope above the specified *P*_{O2} was set at zero. Beginning at the third-lowest *P*_{O2} in the data set, a series of specified values of *P*_{O2} were trialled, with each successive *P*_{O2} being 0.01 kPa greater than the last until the third-highest *P*_{O2} in the data set was reached. The value of the *P*_{O2} break point that minimised the sum of squared deviations from the biphasic function was considered to represent *P*_{c}.

### Testing for differences among groups

We compared the ability of our best-fitting NLR and BSR to identify differences among groups that differ in *P*_{c} with Monte Carlo simulations. For two groups of six simulated data sets, each with *P*_{O2} values sampled at 1 kPa resolution, we generated a relationship between and *P*_{O2} that included a break point (*P*_{c}) at either 6.5 kPa (group 1) or 8.5 kPa (group 2). As a conservative measure, these values of *P*_{c} were deliberately chosen to be different from the *P*_{c} at which NLR performs best. As above, for each data set increased linearly with *P*_{O2} to equal 1 when *P*_{O2}=*P*_{c}. For values of *P*_{O2} above *P*_{c}, was independent of *P*_{O2} and equal to 1. To each value of we then added a normal deviate with a mean of zero and CV of 0.10. We then estimated *P*_{c} for each of the data sets using BSR or NLR with *m*=0.065, and tested for differences among groups using *t*-tests. We also tested for differences among groups using the best-fitting NLR regression by pooling data sets for each group, and testing for the significance of a fixed grouping factor using likelihood ratio tests. This simulation procedure was repeated 1000 times, and statistical power was estimated by calculating the proportion of tests that reported a significant difference among groups with α set at 0.05.

## RESULTS

### Fits to published data

Both techniques provide a good fit to real-world data (Fig. 1), and the predicted *P*_{c} based on NLR was strongly correlated with the *P*_{c} estimates based on BSR (*R*^{2}=0.82, *N*=13, *P*<0.001), though the relationship was not precisely one to one (regression equation: *P*_{c(NLR)}=0.646 × *P*_{c(BSR)} + 1.12). Neither consistently over- or underestimated the other; rather, the relationship between the two was idiosyncratic (Fig. 2). In eight out of the 13 cases, the NLR approach provided a better fit to the data than did the BSR approach (Table 2). For the remaining five cases where the BSR approach had a lower AIC, the differences in AIC between the BSR and NLR approaches were equivocal (ΔAIC around 2) in two cases. For nine out of 13 cases, the Weibull function (either with or without an intercept) provided a better fit to the data than any other nonlinear function. For the four remaining cases, each of the other four nonlinear functions provided the best fit for one case, suggesting that all should be considered candidate functions in future studies.

### Simulations

The best-fitting NLR was far more robust to variation in the data than was the BSR (Fig. 3). While both performed well when variation was very low and sampling resolution very high, the broken stick technique was far more likely to dramatically misestimate *P*_{c} when variance was high. The NLR also coped with lower sampling resolution relative to the BSR: at sampling resolutions greater than 0.5 kPa, the error rate of the BSR increased dramatically, with this approach being far more likely to misestimate *P*_{c}. For the BSR, the effects of sampling resolution and error interacted such that the most error-prone estimates came from simulated data with high levels of variation and low sampling resolutions (Fig. 3). In contrast, neither resolution nor variation had strong effects on the error rate of the NLR technique.

### Testing for differences among groups

Power to detect differences among groups was lowest for *t*-tests comparing BSR-derived estimates of *P*_{c} (0.63), and was higher for *t*-tests comparing best NLR-derived estimates of *P*_{c} (0.74); power was also relatively higher for comparisons of pooled data made using likelihood ratio tests (0.83).

## DISCUSSION

For both real-world and simulated data, an NLR approach to estimating tolerance to decreasing availability to oxygen was superior to a BSR approach in almost every regard. The NLR approach was more robust to variation in the data and reductions in sampling resolution, as well as offering a more powerful means of detecting differences between groups. Our simulations suggest that BSR has a tendency to overestimate *P*_{c} when data are variable, sampling resolutions are low and the underlying ‘true’ *P*_{c} is low. It is possible therefore that some species may be more tolerant to low oxygen conditions than is currently appreciated. NLR performed less well in simulations when the underlying *P*_{c} was close to the highest *P*_{c} measured in the study, and in this instance, BSR was a more accurate indicator of the *P*_{c}. The poor performance of NLR in this parameter space was due to the relationship between and *P*_{O2} being linear in our simulations (recall that we used an artificially biphasic function) and as such, it was inevitable that the nonlinear function fit the data poorly. We therefore favour the use of NLR over BSR despite this shortcoming because in nature, very high values of *P*_{c} are very rare, and genuinely linear relationships between and *P*o_{2} are similarly rare. Nevertheless, we suggest that visual inspections of the data be conducted before *P*_{c} is estimated, and on the rare occasions when the relationship between and *P*_{O2} appears to be linear, linear regression be used instead of NLR. In other words, as for any statistical model-fitting exercise, the underlying form of the relationship should dictate the form of the model that is fit (Quinn and Keough, 2002).

The use of NLR to estimate tolerance to low oxygen carries the additional advantage of allowing formal testing for differences among groups within the same statistical framework. Traditionally, comparisons of *P*_{c} among species, individuals or groups have involved estimating *P*_{c} within groups using the BSR approach, using these estimates of *P*_{c} as individual data points and then using a separate statistical test to detect differences (e.g. Ferguson et al., 2013). There are a number of disadvantages to this two-stage approach. First, estimating the *P*_{c} as a summary statistic for the entire function results in significant loss of information – essentially, a large amount of data is collected to estimate a single point, and then the bulk of the data are ignored in the hypothesis test (Stinchcombe and Kirkpatrick, 2012). Second, using the *P*_{c} summary statistic ignores error and variance associated with this estimate, and the subsuming of error will result in hypothesis tests that are more susceptible to Type I errors (Hadfield et al., 2010). In contrast, formal hypothesis tests of differences across entire functions among groups allow the use of the full complement of data that were collected as well as incorporating the appropriate error into the hypothesis test of interest (Stinchcombe and Kirkpatrick, 2012). Furthermore, the NLR method could be extended to include random effects, whereby error associated with different experimental equipment or different measurement days could be partitioned, increasing the sensitivity of this analytical approach and providing much greater scope to incorporate additional factors.

As well as the statistical advantages described above, the use of NLR affords practical advantages for comparisons among studies and meta-analyses. For example, studies from different research groups may not sample at identical values of *P*_{O2} and may sample with different resolutions. Our simulations show that differences in sampling resolution will result in systematic differences in the estimate of *P*_{c}, and furthermore, the estimate of *P*_{c} will be a product of the *P*_{O2} at which was measured. In contrast, an NLR approach is robust to these differences in sampling regime, allowing more comparable estimates of *P*_{c} among studies, as well as estimates of across a greater parameter space. Furthermore, no single function fit the data best such that comparisons among studies could be problematic. The retention of an estimate of *P*_{c} in the context of our approach provides a means of comparing equivalent slopes among studies that have fit very different functions.

We chose a point (where d*V*/d*P*=*m*=0.065) on our function that best approximated the *P*_{c} estimated by the BSR approach. We chose this point so that future studies can provide estimates of oxygen tolerance that are comparable with traditional measures of *P*_{c}. In practice, we recommend that future studies that utilise the NLR approach present estimates of not only *P*_{c}, but also estimates of function parameters (i.e. estimates of *a, b, c* and *d*, as appropriate), as well as their associated error, as these values will allow comparisons across the entire range of oxygen partial pressures. Standardised estimates of parameters in selection in evolutionary biology have facilitated important insights from meta-analyses (Kingsolver et al., 2001), and we hope similarly valuable insights might become available *via* the widespread use of the standardised approach we recommend here.

The use of approaches such as BSR to estimate biological ‘thresholds’ in nonlinear responses more generally is increasingly being scrutinized. The estimation of ecological thresholds using BSR techniques has been called into question (Toms and Lesperance, 2003), as has the distillation of continuous functions into single points in evolutionary studies (Stinchcombe and Kirkpatrick, 2012). Generally, the failure to use nonlinear functions to describe nonlinear traits results in less powerful, less accurate estimates (Griswold et al., 2008). BSR techniques are also used to estimate changes in developmental rates across temperatures (Stanwell-Smith and Peck, 1998), as well as cardinal temperatures (estimates of maximum, minimum and optimal temperature), and we suspect that NLR techniques may be more appropriate in these settings as well. While BSR techniques were once a pragmatic recognition of limited computing power and statistical programming, today we have unprecedented access to powerful statistical estimation techniques with freely available statistical software. We join others (Stinchcombe and Kirkpatrick, 2012) in calling for the increased uptake of ‘function-valued’ approaches to estimating continuous traits such as the relationship between and *P*_{O2}.

## Acknowledgements

We thank Hayley Cameron, Amy Hooper, Marcelo Lagos Orostica and Matthew Thompson for help in preparing the manuscript. Two anonymous reviewers provided extremely thoughtful and helpful comments that improved the manuscript.

## FOOTNOTES

**FUNDING**

D.J.M., M.B. and C.R.W. are all supported by grants from the Australian Research Council.

## REFERENCES

_{O2}in scarabaeid and tenebrionid beetles

_{O2}

**COMPETING INTERESTS**

No competing interests declared.