## SUMMARY

Goldfish swimming was analysed quantitatively to determine if it exhibits distinctive individual spatio-temporal patterns. Due to the inherent variability in fish locomotion, this hypothesis was tested using five nonlinear measures, complemented by mean velocity. A library was constructed of 75 trajectories, each of 5 min duration, acquired from five fish swimming in a constant and relatively homogeneous environment. Three nonlinear measures, the `characteristic fractal dimension' and `Richardson dimension',both quantifying the degree to which a trajectory departs from a straight line, and `relative dispersion', characterizing the variance as a function of the duration, have coefficients of variation less than 7%, in contrast to mean velocity (30%). A discriminant analysis, or classification system, based on all six measures revealed that trajectories are indeed highly individualistic,with the probability that any two trajectories generated from different fish are equivalent being less than 1%. That is, the combination of these measures allows a given trajectory to be assigned to its source with a high degree of confidence. The Richardson dimension and the `Hurst exponent', which quantifies persistence, were the most effective measures.

## Introduction

Individual differences are often sufficiently large as to make it difficult to quantify a behaviour and to distinguish its underlying components(Gotceitas and Colgan, 1988; Mather and Anderson, 1993; Wilson et al., 1993). However,individual variation may constitute an important aspect of behavioural selection (Clark and Ehlinger,1987; Gotceitas and Colgan,1988; Colgan et al.,1991). For example, it might ensure fitness of a population when resources are limited (Magurran,1986a; Gotceitas and Colgan,1988). Nevertheless, analyses of behavioural performance often focus on general phenomena of entire populations, and idiosyncratic aspects are noted secondarily. Moreover, studies on individuality often concern higher order behaviours, such as, in the case of teleost fish, foraging, fear avoidance, aggression, predator inspection, mating strategies, parental care and sociability (Gervai and Csányi,1986; Magurran, 1986a,b; Clark and Ehlinger, 1987; Huntingford and Giles, 1987; Gotceitas and Colgan, 1988; Francis, 1990; Murphy and Pitcher, 1991; Colgan et al., 1991; Wilson et al., 1993; Budaev, 1997; Coleman and Wilson, 1998; Budaev and Zhuikov, 1998;Budaev et al., 1999a,b). While these behaviours all involve motor activity, they are often quantified on the basis of socio-biological descriptors, such as inspection rates, or proximity to other fish. However, less attention has been paid to the possibility of individual differences in the underlying basic patterns of motor activity.

Swimming is actually composed of highly organized spatial and temporal patterns even in a relatively homogeneous environment(Kleerekoper et al., 1974; Steele, 1983). Some of these patterns are complex and cannot be characterized with the tools of classical kinematics, as they may exhibit nonlinear properties, such as persistence (the tendency to repeat a given sequence), redundancy (the relationship between the uncertainty of a signal and its length) and scale invariance (a tendency for a signal to have the same structure when observed on different temporal or spatial scales) (Faure et al.,2003). Indeed, nonlinear measures have been used to characterize locomotion and the behavioural repertoires in various species, including invertebrates (Dicke and Burrough,1988; Cole, 1995),fish (Coughlin et al., 1992; Alados and Weber, 1999; Brewer et al., 2001), birds(Viswanathan et al., 1996; Ferriere et al., 1999) and mammals (Paulus et al., 1990; Marghitu et al., 1996; Alados et al., 1996; Alados and Huffman, 2000).

The present study was designed to (1) apply five nonlinear measures and one linear measure as descriptors of goldfish swimming trajectories in order to quantify this locomotor behaviour and (2) to develop a discriminant analysis that would allow us to ask if a given trajectory could be assigned to an individual within the experimental pool. It was found that, despite the apparent variability of trajectories, our protocol could reliably achieve such a classification.

## Materials And Methods

### Animals

Mature goldfish (*Carassius auratus* L.) were purchased from a commercial hatchery (Hunting Creek Fisheries, Inc., Thurmont, MD, USA). Upon arrival in the laboratory, the animals were adapted to laboratory conditions for at least one week. Five female fish with similar body length (9–12 cm) were chosen randomly. They were maintained together in a rectangular glass aquarium (92 cm×41 cm×31 cm; 75 litre), using deionised water conditioned with NovAqua (0.13 ml l^{–1}; Novalek Inc., Hayward,CA, USA), Instant Ocean (16 mg l^{–1}; Aquarium Systems, Mentor,OH, USA), Aquarium Salt (0.125 g l^{–1}; Jungle Labs, Cibolo,TX, USA), Copper Safe (0.32 ml l^{–1}; Mardel Laboratories,Inc., Harbor City, CA, USA) and pH Stabilizer 7.2 (0.125 g l^{–1}; Jungle Labs). Water quality was monitored regularly and was the same for holding and experimental tanks (temperature 22±1°C; pH 7±0.2; dissolved oxygen saturated, 8 p.p.m.). Fish were fed on a regular 48-h schedule. A 12 h:12 h light:dark cycle was supplied by room lights (360 lux at the water surface). All video recordings were made during the light period.

### Swimming environment

A cylindrical Plexiglas tank (20 litre, 50 cm diameter) was used for the experiments. The water column was comparatively shallow (10 cm deep) to prevent fish from swimming out of the camera's focal plane and to minimise errors due to changing swimming depth. To reduce mechanosensory and visual cues, the tank was mounted on an anti-vibration table and its wall and lid were translucent white. Its bottom was clear to allow video recording from below.

Translucent white plastic sheets were mounted on the inside frame of the table with a small hole in the bottom sheet for the camera lens. Illumination was from above with a circular fluorescent bulb (approximately 350 lux at the water surface) and from below with four floodlights (approximately 250 lux at the bottom of the tank). New conditioned water was used for each recording session.

### Data acquisition and experimental design

Approximately 30 min prior to all recording sessions, fish were transferred to a translucent white container (20 cm×15 cm×10 cm) filled with aerated, conditioned water, to be marked for automated motion tracking. Two markers were applied with instant adhesive (Quick Tite; Locktite Corp., Avon,OH, USA) along the ventral midline of the fish to specify its position on the video image. They were made of double-sided black tape (1 cm×1 cm) with a dot (approximately 4 mm diameter) of white nail polish painted in the centre. For this purpose, the fish was removed from the water, the ventral midline was exposed and the skin was gently dried. The markers were applied between the paired pelvic and pectoral fins and onto the lower jaw in less than 1.5 min, after which the fish recovered in fresh aerated water for at least 10 min. This procedure had no obvious impact on behaviour and, in most cases, the marker remained in place for several days.

To analyse locomotion (see also Faure et al., 2003), recordings of the ventral view of the fish were obtained from below at 30 Hz using a digital camcorder (Canon Optura; Canon USA, Jamesburg, NJ, USA). Each recording session started 30 s after the fish was introduced into the experimental tank and lasted 15 min. Video capturing software (Adobe Premiere; Adobe Systems Inc., San José, CA, USA) was used to subdivide a recording session into three 5-min trajectories. Five such recording sessions, each obtained on a different day, were collected from five fish and used to construct a library of 75 trajectories.

### Data analysis

Commercial motion analysis software (WinAnalyze; Mikromak GmbH, Erlangen,Germany) provided frame-to-frame data on the *X* and *Y*position of the markers (Fig. 1A). Only the central marker data were used for the calculations reported in this paper. The *X* and *Y* position data as functions of time were used as primary data for the multivariate analysis described below. The five nonlinear measures chosen for this study were computed with software of our own construction; the mean velocity was then taken as the 5-min average of the instantaneous velocity[(*dx*/*dt*)^{2}+(*dy*/*dt*)^{2}]^{0.5}.

A pragmatic criterion for choosing each nonlinear measure was that it should provide a quantitative value that could be assigned to a trajectory,allowing for statistical comparisons between groups of data. A brief description of these nonlinear measures is presented here; a more detailed mathematical description, including additional references to the primary literature, is given in Rapp et al.(2002).

#### 1. The characteristic fractal dimension (CFD)

The CFD measures the degree to which a trajectory departs from a straight-line path (Katz and George,1985; Katz, 1988). It is a measure of the total distance travelled from point to point (or frame to frame) relative to the maximum separation of any two points in the series. In other words, it is an approximation, equal to the distance travelled divided by the diameter of the experimental tank. It is sensitive to the duration of the observation period and to the speed of motion (see Rapp et al., 2002). It has a minimum value of 1 but does not have an upper limit. Since, in the present application, the fish is swimming in a cylindrical tank, a circular motion of constant velocity would be equivalent to a straight line. As the trajectory deviates from circular motion, the CFD increases. This measure has been used to analyse a variety of complex geometrical patterns(Rinaldo et al., 1993; Rodriguez-Iturbe and Rinaldo,1997).

#### 2. The Richardson dimension (D_{R})

The D_{R} is also an estimate of the degree to which a trajectory departs from a straight line (Richardson,1960; Mandelbrot,1983). In contrast with the CFD, D_{R} also quantifies how the estimate of a curve changes with the precision of the measurement. It is an example of the generic class of dimension measures that have been applied to the analysis of the classical problem of fractal geometry, namely `How long is the coast line of Britain?'(Mandelbrot, 1967). Stated operationally, for a fixed step length one counts the number of steps required to walk around the coast (or, as in our application, along the fish's trajectory). The length of the stride, i.e. the distance covered with each step, is then reduced and the number of steps required using this new step length is determined. The process is repeated and the log of the number of steps required is plotted as a function of the log of the step length. Thus,D_{R} is a measure for scale invariance. The slope of this curve is used to determine D_{R}. As for the CFD, a value of 1 is obtained from a straight line. The value of 2 is the maximum possible D_{R} and it represents a theoretical limit when a trajectory covers the entire two-dimensional surface. Given the differences between the factors influencing the CFD and D_{R}, they can diverge.

Measures of fractal analysis comparable to CFD and D_{R} have been used to describe behavioural sequences, such as swimming and foraging in clownfish (Coughlin et al.,1992), trails in mites (Dicke and Burrough, 1988), reproductive behaviour in fathead minnows(Alados and Weber, 1999),social behaviour in chimpanzees (Alados and Huffmann, 2000) and head lifting during feeding behaviour in ibex(Alados et al., 1996).

#### 3. The Lempel–Ziv complexity (LZC)

The LZC is a sequence-sensitive measure that characterizes the structure of time-varying signals as a series of symbols(Lempel and Ziv, 1976; Ziv and Lempel, 1978). The spatial difference in the fish's position between two consecutive points in time is compared, generating a time series of incremental distance travelled. This distance function is simplified by partitioning it into a binary symbol sequence about the median increment size. For example, a typical sequence might be `aabaabbab' where `a' symbolises values less than the median and `b'symbolises those greater. Then, the LZC is calculated for the resulting symbol sequence. It reflects the number of sub-strings in the sequence (e.g. aab) and the rate at which they occur. This measure will therefore give information about the redundancy (or lack thereof) of a trajectory, for example about the irregularity of its velocity. Kurths et al.(1995) used this method for analysing heart rate variability in an investigation of predictors of sudden cardiac death, while Gu et al.(1994) and Xu et al.(1998) found differences in the electroencephalograms (EEGs) of healthy controls and psychotics with symbolic dynamics. Subsequently, it was shown that the complexity of multichannel EEGs of healthy controls is sensitive to changes in behaviour(Watanabe et al., 2002; this reference includes a review of the associated literature). The value of LZC increases approximately linearly with the number of measurements in the time series and attains a maximum with random numbers(Rapp et al., 2001a). For data sets of the length used in this study (9000), a maximum of approximately 700 would be expected.

#### 4. The Hurst exponent (HE)

The HE measures persistence – the tendency of large displacements to be followed by large displacements (e.g. an increase is followed by an increase) and small displacements to be followed by small displacements– and anti-persistence, which is the tendency of large displacements to be followed by small displacements (e.g. an increase is followed by a decrease) and *vice versa* (Hurst,1951; Hurst et al.,1965; Feder, 1988; Bassingthwaighte et al., 1994). In other words, it describes how deterministic a trajectory is, i.e. the extent to which a future component of the trajectory is specified by components of its past. Theoretically, its range of possible values is 0 to 1,with 0.5 as the crossover point between anti-persistence and persistence(since it is estimated from the log–log plot of variability *versus* epoch length, uncertainty in curve fitting can expand this range slightly). An HE of 0.5 would be obtained if the trajectory was indistinguishable from a random walk. Biological applications of the HE have included investigations of heart interbeat interval sequences(DePetrillo et al., 1999; Sherman et al., 2000) and pulmonary dynamics (Zhang and Bruce,2000).

#### 5. Relative dispersion (R. Disp.)

R. Disp. measures the dependence of signal variance on the duration of the dataset. It ranges from 1.0 to 1.5(Boffetta et al., 1999) and quantifies the change in the uncertainty in a time series' mean value as the observation period increases. Practically, the R. Disp. is the slope of the linear region of a log–log plot of the coefficient of variation of a signal *versus* the length of the data set. Its primary applications have been in the analysis of the physics of turbulent flow(Pedersen et al., 1996; Willis et al., 1997) but it has also been used in the quantitative characterization of pulmonary perfusion(Klocke et al., 1995; Capderou et al., 2000).

All of the algorithms used to calculate these measures are sensitive to noise in the data, non-stationarities in the underlying dynamics and the temporal duration of the examined epoch. For example, filtered noise can mimic low-dimensional chaotic attractors (Rapp et al., 1993) and, if inappropriately applied, the method of surrogate data (which is used to validate dynamical calculations) can give false-positive indications of non-random structure (Rapp et al., 1994, 2001b). These are central concerns if one is trying to establish the absolute value of one of these measures, such as the true value of the D_{R}. However, this is a less crucial consideration in the present investigation because we do not presume to calculate the value of any measure in an absolute sense. Rather, we are computing approximations of these empirical measures, which nonetheless may be of value in the classification of these signals. The efficacy of these computed values in the classification was assessed quantitatively in the course of the discriminant analysis, as described below.

### Discriminant analysis

A multivariate discrimination was constructed to ask specific questions about the behavioural data. For example, can locomotor performance be distinguished between individual fish? For this purpose, each swimming trajectory was represented by its set of values calculated for the five nonlinear measures described above plus its mean velocity. Since it was possible that no measure alone would provide consistent results for such discrimination, all the measures were incorporated into the discriminant analysis and then their relative contributions to the classification process were assessed, as described in the Results section. The discriminant analysis is thus based on these six measures, and calculations were made between the sets of values defining individual trajectories in a matrix consisting of a six-dimensional space. All statistical procedures used are explained in mathematical detail by Rapp et al.(2002). *P*_{SAME}(Fish A, Fish B) is defined as the probability that the six-dimensional measurement distributions corresponding to Fish A and Fish B were drawn from the same parent distribution. The estimate of failure in a pairwise discrimination is *P*_{ERROR}(Fish A, Fish B). This is the theoretically estimated probability that a trajectory from Fish A will be incorrectly classified as a Fish B trajectory and *vice versa*. Note that *P*_{ERROR} is not the same as *P*_{SAME}and can be much larger. For example, a previous report(Rapp et al., 2002) included an example in which *P*_{SAME}=3.2×10^{–13}while *P*_{ERROR}=0.32, which is relatively large, given that the maximum possible error in a pairwise discrimination (the error rate corresponding to random assignment) is *P*_{ERROR}=0.5. A disparity between *P*_{ERROR} and *P*_{SAME}occurs because they address different questions. *P*_{SAME}determines if the means of two multivariate distributions are significantly different. For cases where only one measure is used, *P*_{SAME}is identical to the probability calculated in a *t*-test. *P*_{SAME} can be very small even when the two distributions overlap. However, if the distributions do overlap, which is the case here,there can be considerable error in a between-group classification.

Two classification criteria were used for *P*_{SAME} and *P*_{ERROR}. The first classification is based on the minimum Mahalanobis distance (Lachenbruch,1975). In the context of the six-dimensional measure space, the Mahalanobis distance is a generalized mathematical distance between the vector from the single trajectory that is to be classified and the collection of measure vectors calculated from all of the trajectories obtained from one of the fish. The test trajectory is deemed to be a member of the group corresponding to the smallest Mahalanobis distance. The second procedure for classifying a trajectory is based on the Bayesian likelihood(McLachlan, 1992). The trajectory's vector is classified into the group corresponding to the maximum Bayesian membership probability. Both classification schemes incorporate a correction for correlations between the measures, ensuring that dynamically similar measures do not bias the classification results. In practice, the two procedures usually give identical results. Cases where results differ correspond to classification with low confidence levels. Finally, as the descriptive analysis did not reveal consistent time-dependent differences between three successive 5-min trajectories for most measures, this variable was not incorporated into the discriminant analysis.

A distinction should be made between the out-of-sample classifications used in this study and within-sample classification. When an out-of-sample classification is performed, the trajectory to be classified is removed from the library before the classification was calculated. For this reason, the error rates of classifications are always greater than, or at best equal to,the error rates obtained using within-sample classifications, where the trajectory to be classified remains in the library during the calculation. If the number of elements in each group is small (here, there are 15 trajectories for each fish), the disparity between within-sample and out-of sample classifications can be large. A comparison showing how within-sample classifications can give unrealistically optimistic results is given in Watanabe et al. (2002).

## Results

### Characterization of swimming trajectories

A representative trajectory (Fig. 1A) is characterized by a predominance of swimming along the circumference of the cylindrical tank (`wall hugging' effect; Warren and Callaghan, 1975; Steele, 1983; Kato et al., 1996), which is occasionally interrupted by swimming across the centre and by changing speed(very fast swimming is indicated by a clear separation between successive data points) and/or direction. Periods of fast swimming were observed as swimming in circles along the wall, without significant change in direction, and as occasional fast sprints across the centre of the tank. Additionally, fish did not only swim forward but sometimes propelled themselves backward, which is not obvious with visual inspection of a trajectory. In general, a trajectory gives the impression of moderate irregularity. However, there are also restricted areas signalled by path components of higher density, mostly along the wall, and visual inspection of the video tapes suggests they correspond to small turning movements of the fish while facing the wall or to periods of inactivity. Swimming in the centre occurs in a different way as the fish swims more calmly and slowly without generating a dense accumulation of path components. The instantaneous velocity calculated from the trajectory in Fig. 1A is shown as a function of time in Fig. 1B. It reveals high variability within the 5-min recording period. During this epoch, the instantaneous velocity of this trajectory ranged from 0 mm s^{–1} to 460 mm s^{–1}, with a mean value of 49±45 mm s^{–1} (mean ± s.d.). The velocity trace displays several fast bursts, prolonged periods of slower swimming and periods of inactivity. For this trajectory, the characteristic fractal dimension (CFD) is 1.609, indicating that the trajectory is not straight (if straight, CFD=1). The Richardson dimension (D_{R}) is 1.002, which would appear to suggest minimal deviation from a straight line and therefore to be in conflict with the CFD. As previously mentioned, the D_{R} additionally incorporates sensitivity to the measurement scale while the CFD depends upon the duration of observation. The two measures can also diverge in the case of noisy data or data digitised over a small range of values. However, repeated analysis of individual trajectories indicated that the measurements are not compromised by noise or a limited range. The Lempel–Ziv complexity (LZC) of this trajectory is 242. Since the expectation for a purely random trajectory is approximately 700, this result therefore indicates that velocity does not vary randomly. The Hurst exponent(HE) is 0.938, indicating that the trajectory is highly persistent; in other words, its components are determined, or preserved, and thus the trajectory corresponds to uniform or consistent motion. Finally, the relative dispersion(R. Disp.) is 1.188. This value is close to the midrange of this measure and indicates that the mean value of the time series is relatively stable as a function of time.

Swimming trajectories of different fish are dissimilar in appearance(Fig. 2). The distribution of path components in the centre *versus* the periphery of the tank seems to be most variable. For example, the three consecutive 5-min trajectories of Fish 2 in Fig. 2A show more time spent in the centre of the tank than do the three consecutive trajectories of Fish 5 in Fig. 2B, which indicate relatively little time spent in the centre or traversing it. Instead, in Fig. 2B there is rather more accumulation of path components near the wall, sometimes forming dense patches, which are not seen in the trajectories of Fig. 2A. In addition, there is session-to-session variation in an individual fish's trajectories, as seen by comparing the first (Fig. 2B) and fifth (Fig. 2C) recording sessions of Fish 5. In the fifth session, there is a greater tendency to explore the centre than in the first session. This difference is reflected in a significant difference between the mean velocities of the two 15-min sessions (62.6 mm s^{–1}*vs* 58.8 mm s^{–1}; *P*<0.002). Also, there is greater variability between the three successive trajectories of the fifth session than between those in the first session; indeed, the third 5-min trajectory of Fig. 2C more closely represents those of the first session (2B) than the two trajectories preceding it, as it is denser at the periphery and exhibits a smaller number of excursions to the centre of the tank.

An initial overall impression of the nonlinear dynamical analysis can be obtained by determining the range and variability of each measure determined across all five fish. These results are stated in Table 1. The coefficient of variation, CV=(s.d./mean)×100%, provides a quantitative characterization of the degree of spread in the observed dynamical measures. A high degree of variation is observed for some measures. The mean velocity has the highest CV (30.2%) and a nearly 10-fold range in values, and the CV of the LZC is also high (25.7%). By contrast, the CVs of the CFD, the D_{R}and the R. Disp. are less than 7%. With the exception of D_{R}, the mean values of the nonlinear measures are all consistent with properties of a complex dynamical behaviour. The data summarized in Table 1 are displayed separately for each fish in Table 2. The latter results were obtained by averaging the values from all recording sessions (five per fish) and all trajectories (three for each recording session). Appreciably different values were obtained for each fish. Nevertheless, given the large s.d.s, the between-fish distributions overlap. Mean velocity values are similar for Fish 2 and Fish 4 and for Fish 3 and Fish 5. This pattern was repeated for two of the nonlinear measures, CFD and D_{R}, but not for the other three. In general, there did not seem to be a consistent relationship between the mean values of different parameters and individual fish, suggesting that the measures, which, with the exception of mean velocity, are empirical, reflect different properties of the swimming trajectories.

Three of the six measures have time-dependent changes during the 15-min recording periods. Mean velocity decreased by 77% from the first to the last 5-min recordings (from 58.41 mm s^{–1} to 45.01 mm s^{–1}), and the mean CFD decreased by 5% from 1.62 to 1.54. By contrast, the average LZC increased by 15% from 214 to 248, while the other measures did not change appreciably. Since the data were pooled for multiple exposures of the five fish, a repeated-measures analysis of variance (ANOVA)was used to ask if there were significant changes in a given measure between the three subsequent 5-min epochs of a 15-min recording session. The results,shown in Table 3, indicate significant differences (*P*<0.015) between subsequent 5-min trajectories for mean velocity and CFD. Also, in the case of LZC, the first 5-min trajectory was significantly different from both the second and third ones. These time-dependent changes in the six measures relative to each other during a 15-min recording are illustrated in Fig. 3. Values for each measure are normalized with respect to the corresponding values obtained in the first 5-min trajectory. The repeated-measures ANOVA was also used to ask if there were differences between the five subsequent sessions in which data were collected from each fish, and the results were negative. Since the changes that occurred within a 15-min recording session were minimal, the discriminant analysis did not treat successive 5-min trajectories separately.

### Discriminant analysis classifies individual fish

Three questions were addressed in the discriminant analysis: (1) based on the application of these six dynamical measures, would it be possible to conclude that the five fish are different; (2) given a trajectory and its dynamical characterization, would it be possible to correctly determine which fish produced the trajectory and (3) of the six measures used, which ones were the most effective in discriminating between different fish? These questions were addressed by performing a discriminant analysis based on the six measures, with each fish providing a total of 15 trajectories. For this analysis, no distinction was made between first, second and third 5-min trajectories. Using these measures, we calculated *P*_{SAME}(Fish A, Fish B), which is the probability that the six-dimensional measurement distributions corresponding to Fish A and Fish B were drawn from the same parent distribution (see Materials and methods). The results from the 10 possible pairwise discriminations are shown in Table 4. As an example from that table, it is seen that *P*_{SAME}(1,2)=0.19×10^{–5}; that is, the probability that Fish 1 and Fish 2 trajectories were produced by the same fish is 0.19×10^{–5}. We conclude that Fish 1 and Fish 2 have very different dynamical profiles. The largest value of *P*_{SAME}is *P*_{SAME}(3, 4)=0.9×10^{–2}. While Fish 3 and Fish 4 are the most similar, even in this case the probability that these trajectories were obtained from the same fish is less than 1%. Given the very low value of *P*_{SAME}, it might be supposed that a classification of a single trajectory amongst the five fish would be highly accurate. However, this is not necessarily the case.

*P*_{ERROR} is a theoretical prediction of the pairwise classification error, using the between-group Mahalanobis distance. In the present study, using six measures, the theoretical *P*_{ERROR}for the 10 pairwise calculations was less than 0.07 in eight cases and ranged from 0.003 (Fish 2 *vs* Fish 4) to a maximum of 0.1118 (Fish 3 *vs* fish 4).

The error rate also can be determined empirically by performing a classification. The results of an out-of-sample classification are shown in Table 5 for both minimum Mahalanobis distance and maximum Bayesian likelihood criteria, respectively. For example, the entry 13/12 in the Fish 1–Fish 1 box means that 13 out of 15 Fish 1 trajectories were classified as Fish 1 using the minimum distance criterion and 12 were correctly classified as Fish 1 using the maximum likelihood criterion. The entry 2/3 in the Fish 1–Fish 5 box means that two Fish 1 trajectories were classified as Fish 5 using minimum distance and three Fish 1 trajectories were classified as Fish 5 using maximum likelihood as the criterion. Thus, more than 75% of the trajectories from Fish 1, 2 and 5 were correctly classified with both criteria. Also, a comparison based on mean velocity alone suggests similarities between Fish 1 and 4 and between Fish 3 and 5; the discriminant analysis, which uses six measures, does not often confuse these fish.

The expectation error rate is the error rate that would be observed if the classifications were performed randomly. There are five fish. If trajectories were assigned randomly, four out of five trajectories would be misclassified. This gives an expectation error rate of 80%. For these data, the overall error rate using minimum Mahalanobis distance as the classification criterion is 36%. The overall error rate using the maximum Bayesian likelihood is 28%.

The third question to be addressed with discrimination analysis asked, `of the measures used, which were the most effective in discriminating between different fish?'. This question is not easily answered when there are five groups (five fish) as opposed to only two. In the case of a pairwise,two-group comparison, a measure's coefficient of determination establishes the amount of total between-group variance that can be accounted for by the measure (Flury and Riedwyl,1988). Then, the larger a measure's coefficient of determination,the more effective it is in discriminating between groups. A large coefficient of determination corresponds to a large between-group Mahalanobis distance(specifically, the partial derivative of the coefficient of determination with respect to the Mahalanobis distance is positive). The effectiveness of the six measures in the 10 pairwise between-group discriminations has been assessed empirically. Table 6 gives the rank ordering of the coefficients of determination for each measure for each pairwise discrimination (ordered from the largest to the smallest). For example, when Fish 1 and Fish 2 are compared, the HE is most effective in discriminating between the two groups while the D_{R} is the least effective. When the rank ordering of the 10 pairwise discriminations is compared, none of the measures stands out as being exceptionally effective. However, if the rank order is treated as a score for each pair, the data indicate that the D_{R} and the HE have the lowest cumulative scores,suggesting they are the most effective. Interestingly, the mean values of these two measures (Table 1)are consistent with trajectories that are relatively stable or determined(i.e. mean of HE=0.82 indicates a high degree of persistence and mean D_{R}=1.06 indicates high similarity to a straight line trajectory). The lack of a consistent pattern in the results presented in Table 6 is not surprising,since our results established that the fish trajectories are highly individualistic (Table 4) using a statistic, *P*_{SAME}, that combines all six measures. Another approach for obtaining an estimate of the comparative effectiveness of each dynamical measure is to calculate each measure's average coefficient of determination, taking the average over the 10 pairwise discriminations. These average values are shown in Table 7 and again suggest that D_{R} and the HE are the most effective measures when used alone.

## Discussion

The results demonstrate that a set of nonlinear measures can be used in a discriminant analysis, or classification system, to distinguish between swimming trajectories of individual fish. That is, any two trajectories generated from different fish are distinguishable with a high confidence level. This discrimination is possible only when those nonlinear measures,along with the linear measure mean velocity, are applied collectively, as no single measure has a high coefficient of determination. The results also show that the nonlinear measures used here potentially provide a perspective on a basic behaviour, swimming in a sparse environment, that complements insights obtained with more classical kinematic measures. In general, the values for the different measures suggest that swimming is not purely random but is rather complex, with detectable redundancy.

### Interpretation of fish locomotion with nonlinear measures

Although they are empirical, the tools of nonlinear dynamical analysis are increasingly being used in the analysis of biological phenomena(Faure and Korn, 2001; Giesinger, 2001), including continuously recorded behavioural sequences. One rationale is that, since these measures are sensitive to the spatio-temporal structure of a sequence,they might reveal hidden structures in those continuous signals. Indeed,studies have shown that the examination of behavioural data that appeared to be random can reveal highly non-random components when analysed with sequence-sensitive nonlinear measures. For example, a number of behaviours have been described as fractal, from spontaneous locomotion(Dicke and Burrough, 1988; Coughlin et al., 1992; Motohashi et al., 1993; Cole, 1995) and foraging(Alados and Weber, 1999) in diverse species to social behaviour in chimpanzees(Alados and Huffman, 2000) and feeding-related activities in goats (Alados et al., 1996). Related tools have also been used to successfully analyse the pattern of transitions between periods of active swimming and inactivity (Faure et al.,2003). As discussed below, this type of analysis might be effectively employed to reveal subtle changes in locomotion not revealed with classical means.

The five nonlinear measures applied in the present study are empirical measures of complexity of swimming behaviour, and each reduces a trajectory into a single value. With the exception of the Richardson dimension, the values of these nonlinear measures are consistent with the notion that goldfish swimming in even a relatively sparse environment is a mixture of random and nonlinear deterministic activities. Their empirical nature may explain the finding that two of the measures, the characteristic fractal dimension and Richardson dimension, which are expected to reflect similar properties, often diverged.

The degree of complexity exhibited in locomotor behaviour and other behavioural patterns can depend on the environment(Coughlin et al., 1992; Motohashi et al., 1993; Anderson et al., 1997). Spatial and temporal complexity of foraging trajectories, for example, can be correlated to the pattern of occurrence of food sources(Cole, 1995; Viswanathan et al., 1996). Similarly, some bird species exhibit nonlinearities in vigilance behaviour(Ruxton and Roberts, 1999),and correlations have been drawn between fractal complexity and the ability to cope with the environment, such as in the presence of toxins or stress(Alados et al., 1996; Alados and Weber, 1999; Alados and Huffman, 2000). One can thus speculate that fish exposed to an environment more heterogeneous than that used in the present study would generate swimming trajectories with higher values of CFD and D_{R}, indicative of a more fractal nature. Such an experimental design would give more insight into what extent the environment might influence the nonlinear properties and their underlying components.

The nonlinear measures and discriminant analysis employed here may then be applied to detect subtle changes in behavioural sequences altered by changes in the environment. Fish behaviour is increasingly important in toxicology,and it has already been shown that fractal dimension could serve as a sensitive measure for quantifying differences in locomotor activity during sublethal exposure to toxic contaminants(Motohashi et al., 1993; Alados and Weber, 1999; Brewer et al., 2001). The application of multiple measures, including a linear one, may well enhance such discriminations. Indeed, preliminary data, obtained using this methodology to distinguish swimming trajectories of goldfish exposed to low dosages of Malathion, a pesticide and neurotoxin, confirm this expectation(Neumeister et al., 2001).

Exposure to a novel environment for a continuous period or for several discrete periods will, in general, result in a gradual decrease of locomotor activity over the course of several days or weeks(Russell, 1973; Warren and Callaghan, 1976; Clark and Ehlinger, 1987). Novelty represents a potentially stressful situation(Russell, 1973; Csányi and Tóth,1985; Gervai and Csányi, 1986). For example, male guppies initially show high velocity swimming at the periphery of an open field, and it has been suggested that this activity is related to some degree of fear(Warren and Callaghan, 1976). In the present study, a relatively small but significant decrease during the 15-min period was not only detected in mean velocity but also in CFD and Lempel–Ziv complexity. The results in the CFD are consistent with reports that fractal dimension decreases in conditions characterized as stressful (Alados et al., 1996; Alados and Weber, 1999; Alados and Huffman, 2000). Nevertheless, this modification with time can be subtle, and it remains to be seen if further development of the discriminant analysis would benefit by treating successive 5-min trajectories separately.

### Classifying trajectories

Multivariate discriminant analysis, which allowed us to classify swimming trajectories to the fish that generated them, has a long and successful history in the physical and biological sciences(Lachenbruch, 1975; McLachlan, 1992). The combination of discriminant analysis with nonlinear measures is, however,comparatively recent (Rapp et al.,2002; Watanabe et al.,2002). In the present study, a discriminant analysis based on six measures was used to characterize between-group differences and to classify individuals amongst the groups, with each fish defining its own group. Five fish were used and five recordings consisting of three consecutive 5-min trajectories were obtained from each fish. Thus, in the language of discriminant analysis, there are five groups, 15 elements in each group and six-dimensional measure space.

As outlined above, we addressed a sequence of three questions. First, we asked if we are able to conclude that the fish are different, computing *P*_{SAME} for each pair of fish. Although direct visual observation of the fish did not suggest that their swimming behaviour was dramatically different, the calculations of *P*_{SAME} indicate that trajectories are highly individual, and each fish has a very different swimming profile.

We then addressed the problem of classification of individual 5-min trajectories among the five possible groups, by calculating *P*_{ERROR} for each pairwise classification. As expected (see Results), *P*_{ERROR} is larger than *P*_{SAME},with an average value of 5.7%. However, *P*_{ERROR} is a theoretical estimate of the error in a pairwise classification based on the between-group Mahalanobis distance(Lachenbruch, 1975). An empirical test of this classification was produced by computing an out-of-sample classification that used the minimum individual-to-group Mahalanobis distance as the classification criterion. It gave an error rate of 36%, in contrast to the expected error rate obtained with random assignment of 80%. The error rate using maximum Bayesian likelihood as the assignment criterion was even less, 28%.

It might seem surprising that, while the average *P*_{ERROR}is 5.7%, the empirically determined classification error rate is greater. Yet, *P*_{ERROR} is the predicted error rate in a single pairwise classification. The empirically determined error rate is more appropriately compared against a classification procedure based on a sequence of pairwise classifications in which several individual pairwise errors accumulate to produce the overall result. When the distinction between pairwise and global error is taken into account, it is seen that the error rates are similar.

The third question concerned the identification of the measure or measures that were most successful in discriminating between fish. This was investigated by calculating the coefficient of determination in each pairwise classification for each measure. The results indicated that no single measure emerged as the most effective. However, it was possible to conclude that the nonlinear measures were more effective than the mean velocity, with the most effective being the HE and D_{R}, values which are consistent with the general conclusion that fish swimming in a sparse environment have a relatively low degree of complexity.

It should be recognized that the ability to classify any given trajectory is limited. To introduce an analogy, we can prove that fingerprints are highly individual but we can't usually base a positive identification on a single fingerprint. We should point out that these conclusions are dependent on the measures used in this study. The application of additional measures to these data might result in an improvement in the classification calculations. Thus,the results presented here are, in a sense, a worst-case calculation.

### Individuality

We have found that the discriminant analysis using swimming trajectories and nonlinear dynamical measures established in a convincing manner that fish locomotion is highly individualistic. Recent ethological and psychological studies have revealed individual differences in many species(Clark and Ehlinger, 1987; Bell, 1991; Mather and Anderson, 1993; Boissy and Bouissou, 1995). As already mentioned, most of these studies concerned higher order behaviours. To our knowledge, idiosyncratic variability in fish swimming has not been the subject of previous investigations, although it has been noted(Kleerekoper et al., 1974). Locomotion serves a range of behaviours in fish, including exploration,foraging and social interactions. Individuality in these behaviours can be expected to benefit survival of individuals and, therefore, of the population. For example, it may increase access to food sources by enhancing the search efficiency of shoaling fish (Gotceitas and Colgan, 1988; Colgan et al.,1991). Additionally, it can provide a competitive advantage to some individuals, such as the dominant ones within a hierarchy based upon boldness (Budaev, 1997; Wilson et al., 1993). Again,this would contribute to the fitness of the population by guaranteeing survival of individuals in the case of limited resources(Magurran, 1986a; Gotceitas and Colgan, 1988). Thus, the variations observed here may have functional relevance.

Three categories of mechanisms have been proposed to underlie behaviours that are unique to one individual as opposed to another, namely a variable environment, social effects and phenotypic variability (reviewed in Magurran, 1986a). In that context, the present study was designed to quantitatively characterise swimming of one fish alone in a sparse and constant environment, minimising any affective contribution to the resulting pattern. The results demonstrate that, with the appropriate analytical tools, it is possible to conclude that this elementary behaviour exhibits individuality. Thus, we suggest that this property reflects phenotypic differences of either genetic or experiential origin. Such differences are not simply related to environmental conditions,body size or sex, as these factors were controlled in this study. Rather, they may be embedded in underlying intrinsic processes. It has been suggested that a population benefits from varying phenotypes, or differences in individuals,by being better adapted to environmental conditions(Clark and Ehlinger, 1987). In this context, it would be interesting to know how the individuality observed in the present study would change in other conditions, such as a more heterogeneous environment or one requiring social interactions.

## Acknowledgements

The authors thank I. Cantave and N. Gianattassio for their contributions in data acquisition and analysis. We also thank H. Eckholdt for his valuable advice and assistance with both theoretical and practical statistical analysis. P.E.R. and C.J.C. thank Tanya Schmah of the Mathematical Institute,Warwick University for essential leadership in the implementation of the nonlinear measures and during the development of the discriminant analysis system. This work has been sponsored by the Defense Advanced Projects Agency(DARPA), contract No. N66001-00-C-8012.

## References

**Alados, C. L., Escos, J. M. and Emlen, J. M.**(

**Alados, C. L. and Weber, D. N.**(

*Pimephales promelas*): a mathematical model.

**Alados, C. L. and Huffman, M. A.**(

**Anderson, A. R. A., Young, M. I., Sleeman, B. D., Griffiths, B. S. and Robertson, W. M.**(

**Bassingthwaighte, J. B., Liebovitch, L. S. and West, B. J.**(

**Bell, W. J.**(

**Boffetta, G., Celani, A., Crisanti, A. and Vulpiani, A.**(

**Boissy, A. and Bouissou, M.-F.**(

**Brewer, S. K., Little, E. E., Delonay, A. J., Beauvais, S. L.,Jones, S. B. and Ellersieck, M. R.**(

*Oncorynchus mykiss*) exposed to cholinesterase-inhibiting chemicals.

**Budaev, S. V.**(

*Poecilia reticulata*): a correlational study of exploratory behavior and social tendency.

**Budaev, S. V. and Zhuikov, A. Y.**(

*Poecilia reticulata*).

**Budaev, S. V., Zworykin, D. D. and Mochek, A. D.**(

*Steatocranus casuarius.*

**Budaev, S. V., Zworykin, D. D. and Mochek, A. D.**(

**Capderou, A., Aurengo, A., Derenne, J.-P., Similowski, T. and Zelter, M.**(

**Clark, A. B. and Ehlinger, T. J.**(

**Cole, B. J.**(

*Drosophila.*

**Coleman, K. and Wilson, D. S.**(

**Colgan, P., Gotceitas, V. and Frame, J.**(

*Lepomis macrochirus*).

**Coughlin, D. J., Strickler, J. R. and Sanderson, B.**(

*Amphiprion perideraion*, larvae.

**Csányi, V. and Tóth, P.**(

*Macropodus opercularis*L.).

**DePetrillo, P. B., Speers, D. and Ruttiman, U. E.**(

**Dicke, M. and Burrough, P. A.**(

**Faure, P. and Korn, H.**(

**Faure, P., Neumeister, H., Faber, D. S. and Korn, H.**(

**Feder, J.**(

**Ferriere, R., Cazelles, B., Cezilly, F. and Desportes, J.-P.**(

**Flury, B. and Riedwyl, H.**(

**Francis, R. C.**(

**Gervai, J. and Csányi, V.**(

*m. o. opercularis*and

*m. o. concolor.*

**Giesinger, T.**(

**Gotceitas, V. and Colgan, P.**(

**Gu, F., Song, R., Wang, J., Fan, S. and Ruan, J.**(

**Huntingford, F. and Giles, N.**(

*Gasterosteus aculeatus*L.)

**Hurst, H. E.**(

**Hurst, H. E., Blank, R. P. and Simaika, Y. M.**(

**Kato, S., Tamada, K., Shimada, Y. and Chujo, T.**(

**Katz, M. J.**(

*Comput. Biol. Med.*

**19**, 291).

**Katz, M. J. and George, E. B.**(

**Kleerekoper, H., Matis, J., Gensler, P. and Maynard, P.**(

*Carrassius auratus.*

**Klocke, R. A., Schunemann, H. J. and Grant, B. J.**(

**Kurths, J., Voss, A., Saparin, P., Witt, A., Kleiner, H. J. and Wessel, N.**(

**Lachenbruch, P. A.**(

**Lempel, A. and Ziv, J.**(

**Magurran, A. E.**(

**Magurran, A. E.**(

**Mandelbrot, B. B.**(

**Mandelbrot, B. B.**(

**Marghitu, D. B., Kincaid, S. A. and Rumph, P. F.**(

**Mather, J. A. and Anderson, R. C.**(

*Octopus rubescens*).

**McLachlan, G. J.**(

**Motohashi, Y., Miyazaki, Y. and Takano, T.**(

**Murphy, K. E. and Pitcher, T. J.**(

**Neumeister, H., Cellucci, C. J., Rapp, P. E., Faure, P., Korn,H. and Faber D. S.**(

**Paulus, M. P., Geyer, M. A., Gold, L. H. and Mandell, A. J.**(

**Pedersen, T. S., Michelsen, P. K. and Rasmussen, J. J.**(

**Rapp, P. E., Albano, A. M., Schmah, T. I. and Farwell, L. A.**(

**Rapp, P. E., Albano, A. M., Zimmerman, I. D. and Jiménez-Montaño, M. A.**(

**Rapp, P. E., Cellucci, C. J., Korslund, K. E., Watanabe, T. A. A. and Jiménez-Montaño, M. A.**(

**Rapp, P. E., Cellucci, C. J., Watanabe, T. A. A., Albano, A. M. and Schmah, T. I.**(

**Rapp, P. E., Watanabe, T. A. A., Faure, P. and Cellucci, C. J.**(

**Richardson, L. F.**(

**Rinaldo, A., Rodriguez-Iturbe, I., Rigon, R., Ijjaszvasquez, E. and Bras, R. L.**(

**Rodriguez-Iturbe, I. and Rinaldo, A.**(

**Russell, P. A.**(

**Ruxton, G. D. and Roberts, G.**(

**Sherman, L. D., Callaway, C. W. and Menegazzi, J. J.**(

**Steele, C. W.**(

**Viswanathan, G. M., Afanasyev, V., Buldyrev, S. V., Murphy, E. J., Prince P. A. and Stanley, H. E.**(

**Warren, E. W. and Callaghan, S.**(

*Poecilia reticutata*(Peters).

**Warren, E. W. and Callaghan, S.**(

*Poecilia reticutata*, Peters) to repeated exposure to an open field.

**Watanabe, T. A. A., Cellucci, C. J., Kohegyi, E., Bashore, T. R., Josiassen, R. C., Greenbaum, N. N. and Rapp, P. E.**(

**Willis, D. M., Stevens, P. R. and Crothers, S. R.**(

**Wilson, D. S., Coleman, K., Clark, A. B. and Biederman, L.**(

*Lepomis gibbosus*): an ecological study of a psychological trait.

**Xu, J., Liu, Z.-R., Liu, R. and Yang, Q.-F.**(

**Zhang, X. and Bruce, E. N.**(

**Ziv, J. and Lempel, A.**(