ABSTRACT
In studies of animal orientation, data are often represented as directions that can be analyzed using circular statistical methods. Although several circular statistical tests exist to detect the presence of a mean direction, likelihood-based approaches may offer advantages in hypothesis testing – especially when data are multimodal. Unfortunately, likelihood-based inference in animal orientation remains rare. Here, we discuss some of the assumptions and limitations of common circular tests and report a new R package called CircMLE to implement the maximum likelihood analysis of circular data. We illustrate the use of this package on both simulated datasets and an empirical example dataset in Chinook salmon (Oncorhynchus tshawytscha). Our software provides a convenient interface that facilitates the use of model-based approaches in animal orientation studies.
INTRODUCTION
There are numerous instances where biological information can be projected onto the unit circle rather than in linear space. Circular biological data often take the form of directions or orientations (e.g. compass bearings), but can include any data functionally periodic in nature. Generally, circular data are described statistically using parameters intuitively and semantically identical to those of linear data (e.g. mean, variance, correlation), but their mathematical derivation can be quite different (reviewed in Lee, 2010). Numerous statistical tests have been developed to determine the presence of a preferred direction within a group or differences in preferred directions between groups (see Batschelet, 1981). However, as suggested by Ruxton (2017), biologists often are not familiar with or lack specific training in circular data, thus rendering the statistical treatment of these data many times inadequately explored.
The majority of studies of circular animal orientation utilize frequentist-based statistical tests. These tests are based upon the calculation of a test statistic and the probability of observing that statistic (P-value) under the null hypothesis. It is well known that the use of P-values and frequentist-based testing procedures often suffers from flaws in experimental design, logic and/or interpretation of results (Steel et al., 2013). The most widely criticized mistakes include pseudoreplication (Hurlbert, 1984), failure to correct for multiple testing (Noble, 2009; Pike, 2011; Verhoeven et al., 2005) and low statistical power (Jennions and Møller, 2003; Nakagawa, 2004). Similarly, the improper statistical treatment of circular datasets may contribute to the lack of robust and reproducible results often encountered in animal orientation studies (e.g. Busse and Trocińska, 1999; Kirschvink et al., 2010). As an alternative to the frequentist-based tests, both maximum likelihood (e.g. Schnute and Groot, 1992) and Bayesian (e.g. Ożarowska et al., 2013) methods have been developed as model-based approaches to describe orientation. These methods rely on the probability of data given a particular model, or likelihood, and are preferred for quantifying strength of evidence in favor of a hypothesis (Burnham and Anderson, 2002; Burnham et al., 2011). These likelihood-based approaches, however, are rarely applied in animal orientation studies.
In this article, we focus on the circular statistical treatment of the most fundamental question in animal orientation studies – the presence or absence of a preferred direction(s). Although we limit our discourse to animal orientation, the principles discussed are applicable to other examinations of circular data (e.g. studies of circadian rhythm). Furthermore, we only evaluate the statistical treatment of a single sample (within a group), and relegate two-sample (between groups) comparisons to future investigations. We begin by briefly describing the most frequent statistical tests used and discuss their assumptions and limitations. Next, we highlight the importance of model-selection procedures as an alternative or complement to the frequentist approach and enable such approaches by providing a convenient set of functions for use in R (package CircMLE; https://cran.r-project.org). We conclude with a set of recommendations in the context of improving the statistical analysis of animal orientation data.
Testing for a preferred direction
Rayleigh's test
Despite the availability of numerous statistical tests for departure from uniformity in circular data, Rayleigh's test is by far the most common (Batschelet, 1981). With the Rayleigh test, the null hypothesis is uniformity (i.e. no preferred direction), whereas the alternative is unimodality (i.e. a single preferred direction) (Mardia, 1972). At the core of the Rayleigh test is the von Mises distribution (Mardia, 1972) – the circular equivalent of the normal distribution. This distribution is described by a mean direction φ and concentration parameter κ, which are analogous to the mean μ and reciprocal of the variance 1/σ2 of the normal distribution, respectively. The Rayleigh test compares the likelihood of the data assuming a uniform von Mises distribution (κ=0) with the likelihood assuming a non-uniform von Mises distribution (κ>0).
V-test
Although the Rayleigh test can be performed for an unknown preferred direction, a variant of the test, the V-test, is more powerful at detecting departures from uniformity when a preferred direction is specified a priori (Durand and Greenwood, 1958). The interpretation of the V-test is equivalent to that of the Rayleigh test and rejecting the null hypothesis does not indicate the population is oriented in the predicted direction. In other words, when using the V-test the user must be warned that the null hypothesis again is uniformity and that the mean direction may differ from the predicted direction (Aneshansley and Larkin, 1981; Ruxton, 2017).
Multimodality and the Rao spacing test
Although in many biological scenarios a single preferred direction is expected, multiple preferred directions are also plausible but violate the assumptions inherent in the Rayleigh and V-tests. In these cases, researchers must use alternative statistical approaches to analyze multimodal distributions that are less well described than their unimodal counterparts. In animal orientation studies, the only widely accepted case of multimodality is when two principle directions are opposite (180 deg apart), often termed ‘axial’ (Jammalamadaka and SenGupta, 2001). When data are axial, the angles can be doubled to produce a unimodal distribution that can be analyzed using a Rayleigh test (Batschelet, 1981). Although common in the animal orientation literature, doubling the angles has been questioned (Arnold and SenGupta, 2006). Because procedures for accurately identifying axial data remain poorly characterized (although see Arnold and SenGupta, 2006), studies often make this determination based upon either visual inspection (e.g. Takebe et al., 2012) or whether doubling increases the mean vector length (e.g. Pinzon-Rodriguez and Muheim, 2017). This qualitative assessment, if incorrect, may result in misleading statistical results when doubling of the angles is performed.
When data are multimodal, the non-parametric Rao spacing test can be used (Rao, 1972). Aside from including multimodal distributions in the alternative hypothesis, the Rao spacing test is oftentimes more powerful than both the Rayleigh and V-tests – especially for small sample sizes (i.e. n≤10; Bergin, 1991; Rao, 1972). Because of these advantages, Bergin (1991) described the Rao spacing test as ‘indispensable’ for orientation datasets influenced by a multitude of factors like nest orientation in birds. However, when the null hypothesis of uniformity is rejected, the Rao spacing test provides no information regarding whether or not the data are unimodal or multimodal. Additional tests for multimodal circular data have been described (e.g. Pycke, 2010), but are either rarely used in the literature or are less well-characterized.
Model-selection approaches
The preceding section described statistical tests based on the central paradigm of accepting or rejecting a null hypothesis based upon an arbitrary significance threshold (often a P-value <0.05). Although ubiquitous in the life sciences, a P-value does not represent a formal quantitative measure for strength of evidence (Royall, 1997). Alternatively, information-theoretic (likelihood-based) approaches, such as those derived from the Kullback–Leibler divergence, inherently quantify the strength of evidence for a hypothesis. Rather than testing a single hypothesis, these methods facilitate the simultaneous testing of null and multiple alternative hypotheses (reviewed by Burnham et al., 2011). The Akaike Information Criterion (AIC; Akaike, 1973) is the perhaps the best known and most common of these methods. The AIC is the sum of the deviance, or the model fit, and twice the number of parameters k. The AIC, in effect, penalizes models with more parameters to avoid over-fitting. Theoretically, the user calculates the AIC for each model and selects the one with the smallest AIC, or least amount of information loss, as the ‘best’ model. In general, models with a difference in AIC relative to the best model (ΔAIC)<2 are well supported, models with ΔAIC between 2 and 7 should rarely be dismissed, and models with ΔAIC>9 have little support (Burnham and Anderson, 2004; Burnham et al., 2011). Early recommendations often excluded models with ΔAIC>2, but any arbitrary cutoff such as this should be interpreted cautiously and in the context of other parameters such as the likelihood, model probabilities and/or evidence ratios (Burnham and Anderson, 2002; Burnham et al., 2011). Other common methods include a corrected version of AIC (AICc; Hurvich and Tsai, 1989) and the Bayesian Information Criterion (BIC; Schwarz, 1978). The AICc is appropriate for small sample sizes (i.e. when k is a large fraction of n; Hurvich and Tsai, 1989), and the BIC is used only when the true model is included in the set, otherwise the BIC is a poor choice for empirical data (Burnham and Anderson, 2004).
When performing a model-based approach, the limiting factor is generating a different mathematical model to describe each possible hypothesis. In circular data analysis, both maximum-likelihood and Bayesian models have been proposed (Fernández-Durán and Gregorio-Domínguez, 2016; Fu et al., 2008; Ożarowska et al., 2013; Ravindran and Ghosh, 2011; Schnute and Groot, 1992). Of these, Schnute and Groot (1992) provide perhaps the most thorough mathematical treatment of circular animal orientation models. Schnute and Groot (1992) described 10 models of orientation based upon the bimodal, or mixed, von Mises distribution (Table 1; see Materials and methods for a complete description). In general, these models fall into three categories: (i) a uniform model (M1) is that of random orientation, (ii) unimodal models (M2A, M2B, M2C) with a single preferred direction, and (iii) bimodal models (M3A, M3B, M4A, M4B, M5A, M5B) with two preferred directions. The bimodal models can further be split into axial (M3A, M3B, M4A, M4B) and non-axial (M5A, M5B) types. Each model is described by up to five parameters: a mean direction (φ1) and concentration parameter (κ1) for the first mode, a mean direction (φ2) and concentration parameter (κ2) for the second mode, and the proportional size of the first distribution (λ; the second distribution is thus fixed at size 1−λ). Rather than subjectively selecting models to test, Schnute and Groot (1992) demonstrated that a likelihood-based approach using these models is a practical method to simultaneously examine multiple orientation hypotheses. Despite the mathematical descriptions of these models, their implementation in studies of animal orientation remains rare.
Ten models of animal orientation as described by EB167056C36Schnute and Groot (1992)

MATERIALS AND METHODS
Orientation models and the likelihood function
CircMLE: a new R package
It is possible that model-based approaches to investigate circular orientation are rarely applied because they lack convenient execution in conventional statistical software. To encourage their use, we implemented the models of Schnute and Groot (1992) into a new package CircMLE v0.2 for use in the statistical software R and available on CRAN (https://cran.r-project.org/). CircMLE calculates the maximum likelihood of the 10 models for a given set of data and compares them using AIC, AICc or BIC with a single command. CircMLE requires the ‘circular’ package (Agostinelli and Lund, 2013), also available on CRAN in order to run. The input data is a vector of angular measurements, preferably in radians and modulo 2π, and of class ‘circular’ (see the ‘circular’ package description). Although the function ‘check_data’ will attempt to coerce a vector into the appropriate format, it is recommended to set these attributes beforehand.
In CircMLE, there are 10 separate functions corresponding to the models described by Schnute and Groot (1992). These functions separately report the likelihood, estimated parameters and optimization diagnostics for the respective model. The functions are named to match those of Schnute and Groot (1992) (i.e. M1, M2A, M2B, M2C, M3A, M3B, M4A, M4B, M5A, M5B). Specific information and a description for each function are available in the package manual (available on CRAN). Alternatively, all models can be run simultaneously and compared using the function ‘circ_mle’. The ‘criterion’ parameter (default is AIC) can be set to compare models with AIC (Akaike, 1973), AICc (Hurvich and Tsai, 1989) or BIC (Schwarz, 1978). The models are ranked according to the criterion and the differences relative to the best model, or Δ values (e.g. ΔAIC), are reported in the results table. The relative likelihoods, model weights and evidence ratios are also provided based on the criterion chosen (Wagenmakers and Farrell, 2004). The user can select from various likelihood-optimization procedures (default is the ‘BFGS’ method; Nocedal and Wright, 1999) and run multiple chains (default is five) from randomized starting points to reduce the chance of identifying local maxima. In some cases, the optimization will not converge, and it may be useful to increase the number of steps, or iterations, with ‘niter’ (default=5000). If the user is only interested in comparing two models, then their likelihoods can be compared using a likelihood ratio test with the function ‘lr_test’. This function requires the observed data, a null model and an alternative model. The alternative model must have more parameters than the null model and be nested within the null model to fit the testing assumptions.
Overall, the default parameters are currently set to provide a robust maximum-likelihood search applicable to most animal orientation datasets. For comparison purposes, the Rayleigh test statistic and P-value are also reported, and a native plotting function, ‘plot_circMLE’, is available to visualize the observed and fitted data. The use of CircMLE is illustrated in both the ‘Simulated examples’ and ‘Empirical example’ sections below.
Simulated examples
We determined the statistical power of the model selection procedure in CircMLE to correctly identify both uniform (M1) and unimodal (M2A) models from simulated data. We simulated a total of 1000 datasets for each combination of sample size n=20, 50, 100, 200 or 500 and concentration parameter κ=0, 0.1, 0.25, 0.5, 1, 2, 5 or 10 using the ‘rvonmises’ function in the R package ‘circular’ v0.4-7 (Agostinelli and Lund, 2013). We ran the ‘circ_mle’ function in CircMLE v0.2.0 using default parameters and counted the proportion of datasets that returned the correct model (ΔAIC<2).
Next, we simulated 100 angles from the mixed bimodal von Mises (Eqn 2) with φ1=0 deg, φ2=90 deg, concentration parameters κ1=κ2=5, and λ=0.5 using the ‘rmixedvonmises’ command in the package ‘circular’. This simulated dataset corresponded with model M5A, a special case of the bimodal model M5B with equal concentration parameters. We compared models with the maximum likelihood procedure implemented in CircMLE v0.2.0 using the default parameters. We included all 10 models in the example calculations for the purpose of demonstration, but encourage users to restrict the set of models a priori to those most plausible or feasible (see recommendations in Dochtermann and Jenkins, 2011) because the models of Schnute and Groot (1992) include models that are special cases of other models.
Empirical example
We provide an example application of our model-selection procedure using the orientation data from Putman et al. (2014). Briefly, the authors examined juvenile Chinook salmon (Oncorhynchus tshawytscha) orientation in three different magnetic fields: (1) a field from the northern periphery of their range, (2) a field from the southern periphery, and (3) the ambient magnetic field. The authors reported, based on Rayleigh test results, that when exposed to a magnetic field from either the northern or southern periphery, the salmon oriented in a direction similar to the direction of their marine feeding grounds. We re-examined their data using CircMLE also with the default parameters. Again, we included all models for the purpose of demonstration. For brevity, we focused attention on models with ΔAIC<2, although in no way do we advocate the use of arbitrary ΔAIC cutoff thresholds.
RESULTS AND DISCUSSION
Simulated examples
We simulated a total of 40 datasets, each replicated 1000 times, to determine the power of our method to correctly identify a uniform or unimodal model of orientation (Fig. S1). We used an 80% threshold as a minimum cutoff for sufficient statistical power as recommended (Cohen, 1988). For uniform models (κ=0), power was consistently above 80%, and increased with sample size. For unimodal models with a small sample size (n=20), a κ greater than ∼1 was necessary to confidently return the correct model. For larger sample sizes (e.g. ≥100), a κ>0.4 provides sufficient statistical power. Using Eqn 5, this corresponds with a 95% angular confidence interval of 355 deg. This indicates that weakly oriented datasets can be detected using the approach in CircMLE. Overall, the sample sizes chosen here are typical of those encountered in animal orientation studies and serve as a guide for researchers designing experiments.
In the simulated bimodal example, the default model selection procedure (AIC) selected model M5B as the best (ΔAIC=0; Table S1 and Fig. S2). However, model M5A also provided an excellent fit (ΔAIC=0.23). The remaining eight models poorly described the simulated data (ΔAIC≥11.6). The model weights, or probabilities, for M5B and M5A were 53% and 47%, respectively. Model M5B predicted parameters φ1=90 deg, φ2=360 deg (0 deg), κ1=4.6, κ2=4.9 and λ=0.48. These parameters were nearly identical to those used to simulate the dataset, demonstrating relatively high accuracy of the likelihood search.
Empirical example
The results from both a traditional Rayleigh test and CircMLE are shown in Table 2. For the northern group, an axial model (M3B; ΔAIC=0) was the best fit. This model suggested two, equally sized distributions with mean directions of 18 deg and 198 deg and concentration parameters of 0.59 and 1.30, respectively (Fig. 1A). Other models, including both unimodal (M2A, M2B) and bimodal (M4A, M4B, M5A) models, were also supported by the data. In the southern group, the unimodal model (M2A) was the best fit with a mean direction of 17 deg and concentration of 0.33 (Fig. 1B). A second unimodal model (M2B) also appeared as a plausible choice. In this model (M2B), the individuals are a mixture of half oriented in the same direction (17 deg) as in the ‘best’ model, whereas the other half are uniformly oriented. This is inherently similar to the best-fit unimodal model. In the ambient group, the best model was a uniform distribution (M1), yet an axial model (M3A) was also supported (Fig. 1C). In both the southern and ambient groups, the best model was in agreement with the conclusions presented by the authors. In the northern group, however, the best model (M3B) was a type of axial model with half the individuals oriented similar to the mean direction and the second half 180 deg different. A similar axial model (M3A) was reasonably supported in the ambient group as well. Thus, the model-based results not only supported the authors' conclusions but also suggested that, in each group, bimodal orientation in Chinook salmon cannot be excluded. In all cases, the highest model probability is only 39%, which advocates for a reasonable amount of model uncertainty (see ‘Technical Issue #6’ in Burnham et al., 2011).
Observed and modeled orientation in Chinook salmon. Mean directions (solid gray arrows) and circular histograms (gray bars) represent observed orientation of salmon in 20 deg bins exposed to (A) northern, (B) southern or (C) ambient magnetic fields (Putman et al., 2014). The density (dashed line) and mean direction(s) (dashed arrows) of the best model are provided. In B, the arrows overlap. In C, the best model is a uniform distribution (M1), which has no mean direction; therefore, the second-best model is shown.
Observed and modeled orientation in Chinook salmon. Mean directions (solid gray arrows) and circular histograms (gray bars) represent observed orientation of salmon in 20 deg bins exposed to (A) northern, (B) southern or (C) ambient magnetic fields (Putman et al., 2014). The density (dashed line) and mean direction(s) (dashed arrows) of the best model are provided. In B, the arrows overlap. In C, the best model is a uniform distribution (M1), which has no mean direction; therefore, the second-best model is shown.
Conclusions
Proper statistical treatment of biological data is critical for making the appropriate conclusions regarding whether or not an effect exists. With CircMLE, we make a model-based approach to the analysis of circular biological data, especially in animal orientation, conveniently available for use in R. In the future, research characterizing effect sizes relative to biological significance (Martínez-Abraín, 2008) in circular statistics in addition to extending orientation models to cross-sample comparisons are needed. Nevertheless, the use of model-based approaches will improve our understanding of how animals produce oriented responses.
Acknowledgements
We thank the Duke Shared Cluster Resource for providing computational resources, N. Putman for providing the example dataset, and L. Schweikert and E. Caves for comments on previous drafts.
Footnotes
Author contributions
Conceptualization: R.R.F., S.J.; Methodology: R.R.F., S.J.; Software: R.R.F.; Validation: R.R.F.; Formal analysis: R.R.F.; Investigation: R.R.F.; Resources: R.R.F.; Data curation: R.R.F.; Writing - original draft: R.R.F.; Writing - review & editing: R.R.F., S.J.; Visualization: R.R.F.; Supervision: S.J.; Project administration: S.J.; Funding acquisition: S.J.
Funding
This work was supported by the Air Force Office of Scientific Research [FA9550-14-1-0208 to S.J.].
References
Competing interests
The authors declare no competing or financial interests.