## SUMMARY

Often experimental scientists employ a Randomized Complete Block Design(RCBD) to study the effect of treatments on different subjects. Under a`complete randomization', the order of the apparatus setups within each block,including all replications of each treatment across all subjects, is completely randomized. However, in many experimental settings complete randomization is impractical due to the cost involved in re-adjusting the device to administer a new treatment. One typically resorts to a type of`restricted randomization', in which multiple subjects are tested under each treatment before the apparatus is re-adjusted. The order of the treatments as well as the assignment of subjects to each block are random. If the data obtained under any type of restricted randomization are treated as if the data were collected under an RCBD with complete randomization within each block,then there is potential to increase the risk of false positives (Type I error). This is of concern to animal orientation studies and other areas such as chemical ecology where it is impractical to reset the experimental device for each subject tested. The goal of the research presented in this article is twofold: (1) to demonstrate the consequences of constructing an *F*-statistic based on a mean square error for testing the significance of treatment effects under the restricted randomization; (2) to describe an alternative method, based on split-plot analysis of variance, to analyze designed experiments that yield better power under the restricted randomization. The statistical analyses of simulated experiments and data involving virgin male *Periplaneta americana* substantiate the benefits of the alternative approach under the restricted randomization. The methodology and analysis employed for the simulated experiment is equally applicable to any organism or artificial agent tested under a restricted randomization protocol.

## Introduction

In designed experiments, one measures responses on multiple experimental subjects with the goal of analyzing the effect of changes in controlled experimental conditions, or treatments, on the responses of different subjects. In laboratory experiments, one controls for extraneous conditions,to ensure each experimental run is conducted under similar conditions, so that one is reasonably confident that consistent differences in response are caused by the treatments. However, it is not always possible to ensure that all extraneous conditions are properly controlled. To reduce the effects of extraneous conditions, two techniques are commonly employed: randomization and blocking.

Blocking refers to the division of experimental runs into smaller sub-groups, or blocks. Each treatment is applied randomly to a number of subjects within each block. This design, known as a Randomized Complete Block Design (RCBD), is commonly employed in biological experiments, where, for example, experimental runs on a given day may be treated as a block(Sokal and Rohlf, 1981). The randomization protocol reduces any bias in favor of particular treatments,while the blocking enables extraneous variation to be absorbed into block effects. Consequently, one obtains better estimates of treatment effects and more powerful tests for treatment differences(Cochran and Cox, 1957).

In the RCBD, the application of treatments to subjects within a block must be completely randomized. If a treatment is applied to five subjects within a block, then the subjects are chosen randomly within the block. However, in many experimental designs, practical constraints prevent this ideal situation from being realistic. The motivation for this research arises from insect pheromone-tracking studies, where a pheromone plume is generated at one end of a wind tunnel. An insect begins at the other end of the tunnel and is challenged to track the plume to the pheromone source. The goal is to detect differences in the response of an insect to different types of plumes or treatments (Willis and Baker,1984; Linn et al., 1988a,b; Mafra-Neto and Cardé,1998; Zanen and Cardé,1999; Cardé and Knols,2000; Dekker et al.,2001; Willis and Avondet,2005). The effect of various types of formulated synthetic pheromone on different species of walking and flying insects has been studied by various researchers (Linn et al., 1988a,b; Willis and Arbas, 1991; Linn et al., 1996; Cardé et al., 1998; Willis and Avondet, 2005).

Whenever the complete randomization protocol is violated, we refer to the corresponding design framework as `restricted randomization'. One such type is illustrated in Fig. 1. Forinstance, in chemo-orientation studies, changing the treatment typically involves dismantling and reconfiguring the experimental device, which is often not practical after each experimental run. A more practical approach often undertaken by experimental scientists is: within each block, multiple subjects are challenged to a single treatment before changing the device to administer the next treatment. Only the order in which the treatments are applied is randomized. By considering subjects in groups, the experiments could be conducted in a relatively short time span. This randomization protocol is a type of restricted randomization as illustrated in Fig. 1. We demonstrate the effects of a restricted randomization on the analysis and scientific conclusions using a computer-simulated experiment and data involving virgin male *Periplaneta americana*.

Analysis of variance (ANOVA) is the fundamental tool for analyzing data from designed experiments (Cochran and Cox,1957; Searle,1971). Chapter 8 of Sokal and Rohlf(1981) emphasizes that ANOVA,while being an effective tool for any modern biologist, may create artificial constructs in the mind of a scientist that could lead to misleading conclusions. In subsequent sections of this article, this is demonstrated in the context of restricted randomization.

In many of the animal chemo-orientation studies published over the past two decades (see for instance, Table 1), an RCBD or a related design was employed to analyze experimental data obtained under the restricted or other modified randomization protocols. One such modified experimental design was employed by Linn et al. (1988a), in which some treatment effects were confounded with block effects. This might be one of the reasons for obtaining non-significance of the treatment effects. In a flight tunnel experiment, Linn et al.(1988b) challenged 5–10 males of each species of *Trichoplusia ni* and *Pseudoplusia includens* to the treatments at one of the two dosages. From their experimental design description it appears that there were multiple levels of blocking and a type of restricted randomization. They analysed the data using a one-way ANOVA, however, ignoring the block effects and restriction in randomization.

Mafra-Neto and Cardé(1998) utilized an RCBD to test the effect of treatments; however, it is not clear from the `Materials and methods' section whether their experimental design indeed satisfies the complete randomization protocol. From the `Materials and methods' section of Linn et al. (1996), it appears that they do not have a complete randomization protocol; however, they analysed their experimental data using an ANOVA (although not clear whether it was a one-way or two-way). The effect of light levels and plume structure on the orientation maneuvers of male *Lymantria dispar* (gypsy moths)flying along pheromone plumes has been studied by Cardé and Knols(2000). They used the flight tracks of 20 males per treatment (a total of six treatments), tested in a complete randomized block design. The goal was to study the effects of odor plume structure on the orientation maneuvers of different species of walking and flying insects (Willis and Baker,1984; Baker, 1990; Cardé and Knols, 2000). Justus et al. (2002) employed an ANOVA; however, they do not state the details of the experimental design and the statistical analysis employed such as a one-way ANOVA or an RCBD. Vickers (2002) considered male *Heliothis subflexa*, which were flown in a wind tunnel to a variety of combinations of synthetic pheromone components admixed on a filter paper disk. Their experimental design violated the complete randomization protocol since groups of 3–5 males were flown under each treatment on any given day;however, they analysed the experimental data using an RCBD ANOVA.

The flight behavior of mosquitoes in host-odor plumes and the effects of the fine-scale structures of such plumes have been studied by Dekker et al.(2001). They considered seven treatments; each treatment had eight replicates and treatments were randomized within each test day. Zanen and Cardé(1999) applied an RCBD to test the treatments on a given day on male *L. dispar*; however, they did not employ a two-way ANOVA to analyse the resulting experimental data. Consequently, their analysis does not match the design. Furthermore, it is not clear whether their design satisfied a complete randomization protocol.

All the above analyses lead to an important question: how does violating the fundamental assumption of complete randomization affect the interpretation of experimental results or scientific conclusions? It is often very difficult(1) to assess the consequences of subtle modifications of the design on the resulting analysis and scientific conclusions and/or (2) to identify an alternative method of analysis corresponding to the randomization protocol scheme at hand by simply referring to the statistical literature(Cochran and Cox, 1957; Searle, 1971; Sokal and Rohlf, 1981).

The goals of this article are to provide insights into the statistical analysis issues embedded within designed experiments when practical constraints impose restrictions on randomization of the treatments. The statistical analyses of simulated experiments and data involving virgin male *P. americana* demonstrate the consequences of overlooking the restricted randomization on the scientific questions being addressed as well as the analysis and interpretation of the results. Our simulated experimental data, presented in the `Results' section, demonstrate that the RCBD incorrectly finds a highly significant treatment effect that was not present in the model, while failing to find the real effect present in the model. However, the risk of a false positive (Type I error) indication of the treatment significance is substantially reduced under the alternative analysis. In essence, by employing an RCBD when the underlying assumption is not satisfied, we are more likely to declare an effect exists when it does not. This has implications for the understanding of experimental results as well as the scientific conclusions. It is important to note that the methodology and analysis employed for the simulated experiment is equally applicable to any organism or artificial agent tested under a restricted randomization framework.

Applying an appropriate model to account for the changes in the design is relevant for two reasons. Violation of assumptions in a particular design could result in (1) underestimation of experimental error variance and (2)obtaining false positives (Type I errors). These in turn, may lead to incorrect results or invalidate the analysis employed by a researcher. Therefore, it is important to choose an appropriate model and error structure in considering designed experiments.

## Materials and methods

The effect of a restricted randomization on the analysis of experimental data is best illustrated through a simulated experiment. In general, if there is a restriction on randomization at a given level in experimentation, there will be a `split' in the design, leading to a split-plot design nested within an RCBD structure. There are two types of experimental units: the larger units, the groups of subjects (for example, insects), are called the whole-plots and the smaller ones, individual subjects, the sub-plots. A split-plot design creates a nesting within the design structure since the whole-plots are nested within blocks and the sub-plots are in turn nested within the whole-plots. The design structure for the whole-plot experimental units is essentially an RCBD. A split-plot design has two advantages over a simpler ANOVA: (1) if treatment can be applied to the whole-plot at once,rather than separately to sub-plots, this may reduce costs, and (2) because sub-plots are usually more uniform, parameters measuring comparisons among conditions may be estimated more precisely.

Several examples of split-plot designs can be found in the biological literature. Linn and Roelofs(1983) considered a total of 100 treatments in a 5×4×5 factorial design such that two of the three factors were varied between days and one factor was varied within days. This design has a split-plot structure with days serving as whole-plots. The experiments of Linn et al.(1988a) consider two species, *Grapholita molesta* and *Pectinophora gossypiella*. They challenged 5–10 males to each treatment per day, with a total of 70 males for each treatment–temperature combination. Both the treatments and temperatures were randomized over the experimental period. This experimental design has a split-plot structure with unbalanced data. In both of these examples, the analysis was based on ANOVA and regression techniques,rather than a split-plot analysis.

### Model for a hypothetical split-plot design

Every ANOVA is associated with a linear model specifying the effects being considered. The linear model for a split-plot ANOVA includes hierarchies of terms modeling both the block effects and treatment effects. The key concept in constructing models for split-plot designs is recognizing the different sizes of experimental units and consequently identifying the corresponding design structures and treatment structures.

Consider an experiment in which several treatments are administered to different subjects and the experiment is conducted over several blocks. Suppose that a biological experiment consists of subjects of different ages,challenged to various treatments such as pheromones on different blocks (for example, days). The age factor may be included in the model to determine the performance of behavior of an animal as it develops over time. The split plot design originated in agricultural field trials and, in this setting, one factor may be fertilizer and another factor irrigation level.

*Y*

_{ijk}denotes a response measured on a subject at the

*k*th(

*k*=1,...,

*n*) age when challenged to the

*j*th(

*j*=1,...,

*t*) treatment in the

*i*th(

*i*=1,...,

*b*) block. The simplest possible model (1) can be written as

_{i}is the block effect, β

_{j}is the effect of the

*j*th treatment applied at the whole-plot level, γ

_{ij}is the whole-plot effect, andδ

_{k}is the age effect. The term ϵ

_{ijk}measures random error.

The scientific interest is in the treatment effect β_{j} and the age effect δ_{k}. The basic statistical problem is to detect the significance of these effects and estimate the size of any effects that are present. The effects β_{j} and δ_{k} are treated as fixed effects or parameters in the model. The block effectα _{i} and the whole-plot effect γ_{ij} are of no inherent interest; however, these can cause considerable variation from block to block and whole-plot to whole-plot. Therefore, these effects are treated as random. A typical assumption is that these effects follow distributions *N*(0,σ_{α}^{2}) and *N*(0,σ_{γ}^{2}) respectively, whereσ _{α}^{2} andσ _{γ}^{2} are unknown variance components corresponding to the block and whole-plot effects, respectively. The random errors ϵ_{ijk} are assumed to follow a *N*(0,σ_{ϵ}^{2}) distribution.

### Simulated experiment: generation of data

We demonstrate how to specify a model for split-plot design and how to construct appropriate *F*-test statistics through simulated experimental data. The design is generated as follows:

Sixteen runs, numbered 1 to 16, were performed for each of five blocks.

Each block was divided into four whole-plots of four runs. The four treatments, such as pheromone plumes, were randomly assigned to the four whole-plots.

Four subjects, one of each age, are assigned in a random order to the runs within each whole-plot.

The important feature of this design is that two factors, treatment and age, are being varied across experimental runs. The treatment is varied randomly at the whole-plot level, while the age is varied randomly at the level of individual runs or sub-plots.

*i, j*and

*k*denote the block, treatment and age, respectively. (1) The overall mean is μ=100; (2) the block effectα

_{i}has a normal distribution with mean zero andσ

_{α}

^{2}=25; (3) the δ

_{k}term representing the age effect takes values 1,...,4 corresponding to subjects of four different age groups (for example, insects of 10 days, 20 days, 30 days and 40 days old, respectively); (4) the componentλ

_{i}

*r*

_{ijk}represents drift of experimental conditions over time, with

*r*

_{ijk}taking the run number 1,...,16 of the (

*j,k*) treatment combination in the

*i*th block; (5) the coefficient λ

_{i}follows a

*N*(0,1)distribution; and (6) ϵ

_{ijk}has a normal distribution with mean zero and σ

_{ϵ}

^{2}=1. That is, we assumed that different random effects contribute differently to the level of variation in the final measurement of

*Y*

_{ijk}. The statistical model used to generate the data includes an age effect, a random block effect and a random drift effect within the block. However, the model does not include any treatment effect β

_{i}and, therefore, the responses are independent of the treatment employed.

It is important to emphasize that the model under which the simulated data is generated is different from the model. In particular, the experimental drift in the model corresponding to the simulation experiment does not match exactly the assumption of the split-plot model. This effect was intentionally introduced into the model, since in practice one does not know the precise form of any uncontrolled variation. It is important that the statistical analysis is robust to misspecification of this term. The data was generated using the statistical language S (Venables and Ripley, 2002) and the model (2) was fitted using the lm()function in S. The model fitting and data analysis can also be performed using R, a freely available open source statistical language available from http://www.r-project.org.

### P. americana *experiment*

The *P. americana* experiment is an example of a split-plot design,characterized by multiple levels of blocking. The experiment involved 3–18 weeks old virgin males of *P. Americana*, which were challenged to track wind-borne plumes of (–)-periplanone-B 2 h into their scotophase (12 h:12 h L:D cycle). The animals were video recorded as they tracked wind-borne plumes of the female sex-pheromone(–)-periplanone-B in a laboratory wind tunnel. Each videotaped walking path was digitized using a computerized motion analysis system.

### Plume structure

For this experiment, four different plume structures were constructed by varying the size, shape and orientation of the pheromone source(Fig. 2). The point source plume was constructed using a 0.7 cm diameter circular filter paper disk(Whatman No. 1, Eastbourne, East Sussex, UK) held perpendicular to the airflow with an insect pin (Fig. 2A). A ribbon plume with a chemical source 0.05 cm wide was constructed by rotating the 0.7 cm filter paper disk 90°, so that the disk shape was parallel to air flow in the wind tunnel, resulting in a very narrow plume(Fig. 2B). The third plume was created by increasing the surface area of the source by ca. 25 times while also proportionally increasing the dosage of pheromone solution applied to the source. The wide plume treatment source was 14.3 cm wide ×0.7 cm tall(Fig. 2C). The cylinder plume structure was generated by placing a Plexiglas® cylinder (81.28 cm tall×7.62 cm diameter) 5 cm upwind of the 0.7 cm diameter circular filter paper disk held perpendicular to airflow(Fig. 2D). The reader is referred to Willis and Avondet(2005) for further details on the materials employed in conducting this experiment.

The aim of this experiment was to test the hypothesis that male cockroaches steer their walking while tracking female pheromone using a chemotactic strategy characterized by counter turning (turning-back) when they experience a sharp pheromone-clean air edge (Willis and Avondet, 2005).

### Measurements

Response variables measured from the digitized insect movement tracks included: track angle (degrees), track width (cm), ground speed (cm s^{–1}), body axis (degrees), net velocity (cm s^{–1}), inter-turn duration (s), the number of times each animal stopped, and the duration of each stop (s). For the purposes of the analysis, Willis and Avondet(2005) considered (1) a turn as the location at which the head reached a local maximum or minimum value with respect to the lateral frame of reference of the wind tunnel, and (2) an animal to be in stopping position if there was no movement between two sequential positions of the head point. Measurement of these response variables from one animal was considered as one trial. The response variable is the average of the measurements for an entire walking track of a single animal.

The animals are expected to have peak response during a specific period in each scotophase, and only a limited number of experimental runs can be performed each day. The experiment was therefore carried out over 5 days. The design can be summarized is as follows:

The experiment was run over 5 days.

Each day was divided into four whole-plots. Each of the four pheromone treatments were randomly assigned to one of the whole-plots.

Within each whole-plot, five animals were tracked. Each animal or sub-plot representing the smallest experimental unit.

This gives a total of 100, corresponding to 5×4×5 experimental runs. Three animals did not respond when challenged with the experimental conditions so that 24 observations in total were completed for each treatment,except for the second treatment which yielded 25. The analysis therefore includes a total of 97 observations.

*Split-plot model for the* P. americana *experiment*

The key feature of the *P. americana* design is that the treatments(the pheromone plumes) were varied at the level of whole-plots, and not at the level of individual experimental runs. In the simulated experimental data described above, a second factor of age was varied at the level of an experimental unit; however, in the *P. americana* experiment, this second factor is absent.

Since the treatments were applied to groups of animals within each apparatus setup, the treatments must be associated with the whole-plot part of the design. Therefore, in order to make an appropriate inference regarding treatments, the *F*-statistic denominator must include the random variation between the whole-plots.

In the context of the *P. americana* experiment, the response *Y*_{ijk} in model (1) represents the ground speed (cm s^{–1}) averaged over different time points; α_{i}and γ_{ij} are the day and whole-plots effects, respectively;β _{j} is the effect of the *j*th pheromone applied at the whole-plot level and δ_{k} is the effect of the treatment applied at the subject level. However, since no factor was varied at the subject level in this experiment, the δ_{k} term is absent in the model.

*Appropriate* F*-test statistics*

*F*-ratio using the treatment MS (MS

_{trt}) and the error MS as:

_{ϵ}

^{2}, the variation between subjects within groups. The above

*F*-statistic is an appropriate one to employ when the only source of random variation in the estimated treatment effects are the random errors. This is the case for the age effect δ

_{k}term in the model (1). Since each age occurs once in each whole-plot, any block effects must influence all treatments equally. Consequently, the presence of block effects does not inflate the treatment MS.

The *F*-statistic in Equation(3) is no longer valid when one is testing for the treatment effects applied at the whole-plot level, the β_{j} term in model(1). In the *P. americana* experiment, there may be additional whole-plot variation due to differences in responsiveness of animals during the scotophase, or due to any random variation in resetting the experimental device. The MSE estimates only the subject-to-subject variation while ignoring these other potential sources of random variation. Therefore, the *F*-statistic in Equation(3) corresponding to the treatment effects is biased upwards,leading to false indications of significant treatment effects (Type I error).

_{ij}term. The appropriate denominator for the

*F*-ratio is the MS attributed to γ

_{ij}; i.e. MS

_{intr}. Consequently, the

*F*-ratio becomes:

## Results

We demonstrate the consequences of the randomization protocol on the analysis of experiments and scientific conclusions using the simulated and real data.

### Comparison of the RCBD and split-plot analyses: simulation experiment

In terms of the practical significance, the main findings from our simulation experiment are summarized in Tables 2 and 3. Table 2 presents the results of an RCBD ANOVA, which assumes complete randomization as illustrated in Fig. 3. The analysis shows a highly significant block effect (*P*<0.00001). However, scientific interest is usually in treatment effects and the analysis in Table 1 incorrectly finds a highly significant treatment effect (*P*=2.533×10^{5}),while failing to detect the real age effect (*P*=0.2919). The restricted randomization in this design has led to both false positive (Type I error) and false negative (Type II error) results.

The restricted randomization means that the treatments are submitted in whole-plots of four runs, and for the purpose of analyzing treatment effects,the correct analysis is to treat these four runs as a single experimental unit. This leads to the split-plot ANOVA. The results of applying the split-plot analysis to the simulated experimental data are shown in Table 3. This analysis correctly identifies that there is no significant treatment effect(*P*=0.1889) and that the age effect is highly significant(*P*=6.126×10^{–5}). Therefore, the risk of a false positive indication of the treatment significance is substantially reduced under the split-plot design. In essence, by employing an RCBD when the underlying assumption is not satisfied, we are more likely to reject a true null hypothesis. This has implications to the understanding of experimental results as lack of treatment effect would be expected to be of relevance.

*Comparison of the RCBD and split-plot analyses: the* P. americana *experiment*

We present the analysis of the data from the *P. americana*experiment using RCBD and split-plot models to demonstrate when and how likely false positives can occur and their consequences on the biological questions. This illustrates how violations of the underlying assumption for the RCBD leads to underestimation of the error variability and inflating the statistical significance of the treatment effects.

The response variable, ground speed (cm s^{–1}), is shown in Fig. 4 sorted first by day and next by pheromone, and in Fig. 5 sorted first by pheromone and then by day. It is clear that the pheromone D (cylinder source) is behaving differently. It also appears that treatments behave differently on different days. For example, a comparison of pheromone A (point source) *vs* pheromone B (ribbon source) shows that animals are responding more rapidly to the point source on days 2 and 4 and more rapidly to the ribbon source on days 1 and 5.

Analysis of the data was conducted using the SAS PROC MIXED program(Littell et al., 1996). Table 4 presents a two-way ANOVA for the response variable ground speed (cm s^{–1}). While the usual analysis of an RCBD includes only main effects for treatments and blocks, in the present experiment there are multiple replications of the pheromones on each day. Therefore, we are able to include an interaction term between the treatments and blocks. The table shows a highly significant day and pheromone effects (*P*<0.0001) and a significant pheromone-by-day interaction effect (*P*<0.0307). These results are consistent with those obtained by Willis and Avondet(2005).

How should the pheromone-by-day interaction effect be interpreted? What are the consequences in terms of the questions of biological interest such as responsiveness of animals to different pheromone plumes? The pheromone-by-day interaction effect means that the animals have responded more to some plumes than others on different days. However, the biological interest lies in the overall response to the individual pheromones and there is no inherent interest in the individual days. It is of little use to state that the animals respond more rapidly to the ribbon source on day 1, and to the point source on day 2, as the data here indicate.

The resolution to this paradox is to model the interaction effect, theγ _{ij} term in model (1), as a random effect, introduced by the restricted randomization of the design. However, these random effects influence estimates of the treatment means. Therefore, the SS attributed to the pheromone effects in Table 4 is inflated by these interactions. As a result, comparing the pheromone MS (MS_{trt}) with the MSE is inappropriate.

Table 5 presents the ANOVA using a split-plot model for the *P. americana* experiment under the restricted randomization while treating the block effects as random. The point estimates of the variance components and the *F*-statistic for the treatment effect in the table, were generated using the SAS PROC MIXED program(Littell et al., 1996). The *F*-statistic for the pheromone effect is now obtained as the MS for pheromone (MS_{trt}=424.3216) divided by the MS for pheromone-by-day interaction (MS_{intr}=52.2499).

The analysis in Table 5 is further complicated by the three deleted observations, causing imbalance in the design (i.e. *n*_{ij}×*n* for all *i*=1,..., *b* and *j*=1,..., *t*, where *n*_{ij} is the number of observations in the *i*th block for the *j*th treatment). The consequence is that a Satterthwaite approximation (p. 24 in Milliken and Johnson, 1984) must be used for the degrees of freedom (d.f.) of the *F*-statistic, namely the df_{2}. Note that the df_{1} is simply (*t*–1). The *F*-statistic is much smaller (8.18 instead of 16.63) under the split-plot analysis; however,the treatment effect is still significant (*P*=0.003). The 95%confidence intervals for the variance components were generated by the lme()function in S-Plus (Pinheiro and Bates,2000). The results presented in Table 5 confirm that bothσ _{α}^{2} andσ _{γ}^{2}, representing the day and whole-plot effects respectively, are significant.

### Pairwise comparisons

To further understand the treatment differences, we performed pairwise comparisons of treatment means. The analyses under RCBD and split-plot design in Table 6 correspond to the results in Tables 4 and 5, respectively. In comparison with the RCBD, the split-plot analysis returns larger standard errors and in turn, yields smaller *T*-statistics and larger *P*-values. In fact, if we choose the level of significance to be 0.01, then the pairwise difference of A *vs* D (point source *vs* cylinder) is significant under the RCBD while not significant under the split-plot analysis. Moreover, the *P*-values corresponding to the pairwise differences of B *vs* D (ribbon *vs* cylinder) and C *vs*D (wide *vs* cylinder) are closer to 0.01 than to 0.0001. The RCBD analysis has substantially overstated the statistical significance; see Curran-Everett and Benos (2004)for a discussion of why choosing 0.01 for a significance level is appropriate for certain situations.

*F*-statistic for treatment affects will be inflated, on average, when blocks such as days are treated as fixed and the RCBD analysis is employed for analysis as if the design is carried out under the complete randomization protocol. In this scenario, the

*F*-statistic for treatment effects would use the MSE as its denominator, which has an expected value ofσ

_{ϵ}

^{2}. However, under the split-plot analysis with block effect treated as random, the denominator would be MS

_{intr}, which has an expected value of(

*tn*σ

_{α}

^{2}+

*n*σ

_{γ}

^{2}+σ

_{ϵ}

^{2}). In the balanced case (i.e.

*n*

_{ij}=

*n*for all

*i*=1,...,

*b*and

*j*=1,...,

*t*), an approximate indication of the inflation in the

*F*-statistic is provided by the ratio:

_{γ}

^{2}andσ

_{ϵ}

^{2}, respectively. Recall thatγ

_{ij}is confounded with the setup variation of the experimental device as well as any potential treatment-by-block interaction under the restricted randomization protocol and this combined variation is given by σ

_{γ}

^{2}. The

*F*-statistic given in Equation (3) under an RCBD does not include σ

_{γ}

^{2}in its denominator and therefore ignores the setup variation entirely. In other words, this

*F*-statistic ignores the `loss in efficiency' when one cannot randomize between individual subjects due to practical difficulties involved in conducting an experiment.

The simulated experimental studies presented earlier demonstrate the generality of these conclusions and are consistent with our above treatment of the real data. Recall that the simulation results show a change from highly significant to non-significant treatment effects.

## Discussion

A randomized complete block design is one of the most widely used designs by experimental scientist in studying the effects of treatments on subjects. Often, treatments are replicated within each block to obtain separate estimates of the error variance and any potential treatment-by-block interaction. Experimental constraints can prevent complete randomization within each block. Previous work in identifying the statistical significance of treatment effects on the behavior of response has used either a one-way ANOVA or an RCBD. In this article, we have demonstrated that a split-plot model can be applied to analyse data under a restricted randomization protocol. Furthermore, we have demonstrated that overlooking the effect of restricted randomization on inferences from RCBD analyses can lead to various spurious interaction effects as well as potentially serious Type I or Type II errors. In particular, if the restricted randomization is ignored and an RCBD analysis performed, then there is a risk of overstating the significance of treatment effects. In contrast, the split-plot analysis provides a powerful alternative to the analysis of data collected under the restricted randomization protocol. The proposed methods are illustrated using a real data from chemo-orientation studies; however, they extend directly to other studies where it is impractical to completely randomize the treatments given to individual experimental subjects. The techniques presented in this article can be implemented using widely available statistical software.

Our findings clearly substantiate the consequences of ignoring the restricted randomization and have the following implications. (1) Under the restricted randomization (Fig. 1), one has two sets of experimental subjects: (i) the subjects nested within groups, which in turn serve as blocks for the subjects, and (ii)groups nested within blocks. The appropriate analysis is to employ a split-plot ANOVA by considering the groups of subjects as the whole-plots and the individual subjects as the sub-plots. (2) The expected mean squares usually assume complete randomization. Under the restricted randomization, one must be cautious in calculating the expected mean squares. In particular, one cannot test the treatment-by-block interaction through an *F*-statistic given by *F*=MS_{intr}/MSE, and it is possible only if experimental setups are completely randomized between animals.

An important conclusion of our work is to demonstrate the significance of describing completely the design employed and statistical analysis performed on any experimental data. As we have shown, small changes to the design protocol can have a major effect on the validity of a statistical analysis. There are many different types of ANOVA, and employing an inappropriate analysis to a dataset, can result in incorrect conclusions. The main function of the `Materials and methods' section of a scientific article is to provide sufficient details and information so that a knowledgeable reader with access to the original data can verify and reproduce the reported results(Curran-Everett and Benos,2004). Our applications, as well as the citation of the literature, have been limited due to incomplete description of the experimental designs and statistical methods employed in many of the articles we reviewed, thereby making it difficult if not impossible to replicate the experiments or the statistical analysis. Statistical methods and analysis are inherent to many allied fields and underpin the scientific discovery process. As stated eloquently by Curran-Everett and Benos(2004), misunderstanding and misuse of the statistical techniques as well as misinterpretation of the analysis jeopardizes the scientific discovery process as well as accumulation of scientific knowledge. We hope that this article will serve to improve the caliber of statistical information as well as the reporting and presentation of the statistical techniques in allied scientific publications.

## Acknowledgements

R.S.P.'s research was supported in part by the National Science Foundation(NSF) DMS 02-39053 and Office of Naval Research (ONR) grants N00014-02-1-0316 and N00014-04-1-0481. C.L.'s research was supported in part by the NSF grant DMS 03-06202 and ONR grant N00014-04-1-0481. The authors thank the Editor and the reviewers for constructive comments and suggestions that led to significant improvements of the manuscript. We especially thank Mark Willis for providing the *P. americana* experimental data, for important discussions with regard to the relevant biological literature and for several comments on the manuscript. The authors are grateful to Joseph Koonce and Christopher Cullis for stimulating discussions and for their comments on the earlier version of the manuscript.

## References

**Baker, T. C.**(

**Baker, T. C., Willis, M. A. and Phelan, P. L.**(

**Cardé, R. T. and Knols, B. G. J.**(

**Cardé, R. T., Staten, R. T. and Mafra-Neto, A.**(

**Cochran, W. G. and Cox, G. M.**(

*.*New York: Wiley.

**Curran-Everett, D. and Benos, D. J.**(

**Dekker, T., Takken, W. and Cardé, R. T.**(

*Anopheles gambiae s. s.*and

*Aedes aegypti*in a dual-choice olfactometer.

**Justus, K. A., Schofield, S. W., Murlis, J. and Cardé, R. T.**(

*Cadra cautella*males in rapidly pulsed pheromone plumes.

**Linn, C. E., Campbell, M. G., Poole, K. R., Wu, W.-Q. and Roelofs, W.**

**L.**(

*Trichoplusia ni*:relationship to the modulatory action of octopamine.

**Linn, C. E., Campbell, M. G. and Roelofs, W. L.**(

**Linn, C. E., Hammond, A., Du, J. and Roelofs, W. L.**(

*Trichoplusia ni*and

*Pseudoplusia includens.*

**Linn, C. E. and Roelofs, W. L.**(

**Littell, R. C., Milliken, G. A., Stroup, W. W. and Wolfinger, R. D.**(

**Mafra-Neto, A. and Cardé, R. T.**(

**Milliken, G. A. and Johnson, D. E.**(

**Pinheiro, J. C. and Bates, D. M.**(

**Searle, S. R.**(

**Searle, S. R., Casella, G. and McCulloch, C.**(

**Sokal, R. R. and Rohlf, F. J.**(

*.*Second edition. New York: W. H. Freeman and Company.

**Venables, W. N. and Ripley, B. D.**(

**Vickers, N. J.**(

*Heliothis subflexa*under wind tunnel conditions.

**Willis, M. A. and Arbas, E. A.**(

*Manduca sexta*L.

**Willis, M. A. and Avondet, J. L.**(

*Periplaneta americana*L., and the effects of odor plumes of different structure.

**Willis, M. A. and Baker, T. C.**(

*Grapholita molesta.*

**Willis, M. A. and Baker, T. C.**(

*Grapholita molesta*males during pheromone-mediated upwind movement.

**Willis, M. A. and Cardé, R. T.**(

*Lymantria dispar*L.: upwind flight in pheromone plume in different wind velocities.

**Zanen, P. O. and Cardé, R. T.**(