The behavioural response study (BRS) is an experimental design used by field biologists to determine the function and/or behavioural effects of conspecific, heterospecific or anthropogenic stimuli. When carrying out these studies in marine mammals it is difficult to make basic observations and achieve sufficient samples sizes because of the high cost and logistical difficulties. Rarely are other factors such as social context or the physical environment considered in the analysis because of these difficulties. This paper presents results of a BRS carried out in humpback whales to test the response of groups to one recording of conspecific social sounds and an artificially generated tone stimulus. Experiments were carried out in September/October 2004 and 2008 during the humpback whale southward migration along the east coast of Australia. In total, 13 ‘tone’ experiments, 15 ‘social sound’ experiments (using one recording of social sounds) and three silent controls were carried out over two field seasons. The results (using a mixed model statistical analysis) suggested that humpback whales responded differently to the two stimuli, measured by changes in course travelled and dive behaviour. Although the response to ‘tones’ was consistent, in that groups moved offshore and surfaced more often (suggesting an aversion to the stimulus), the response to ‘social sounds’ was highly variable and dependent upon the composition of the social group. The change in course and dive behaviour in response to ‘tones’ was found to be related to proximity to the source, the received signal level and signal-to-noise ratio (SNR). This study demonstrates that the behavioural responses of marine mammals to acoustic stimuli are complex. In order to tease out such multifaceted interactions, the number of replicates and factors measured must be sufficient for multivariate analysis.
Acoustic playback experiments, ‘behavioural response studies’ (BRS) and ‘controlled exposure experiments’ (CEE) seek to identify and describe potential responses of animals to natural or synthetic acoustic stimuli. Playback experiments (in which animal sounds are played back) have been carried out since the late 1950s on a variety of species including insects (Alexander, 1961; Haskell, 1957), birds (Ficken and Ficken, 1970; Roche, 1966; Verner and Milligan, 1971), fish (Fish, 1968), seals (Watkins and Schevill, 1968), reindeer (Espmark, 1971) and cetaceans (Morgan, 1970). However, there is a distinct lack of published playback experiments in marine mammals compared with terrestrial animals. A report prepared in 2006 found that in a 5 year period, over 200 playback experiments were carried out on bird species compared with only 46 experiments on marine mammals (Deecke, 2006). Since that report (to date), only a few more playback studies on a marine mammal species have been presented in the peer-reviewed literature. This difference in number is probably due to the relative ease with which wild terrestrial animals can be targeted or experimental terrestrial animals can be held in captivity (Falls, 1992) compared with most marine mammal species. There are obvious logistical and monetary constraints in marine mammal research (which limits the sample size and therefore the experimental power) (Dunlop et al., 2012) and there is a lack of background data on marine mammal populations available to test hypotheses and interpret conclusions. More recently, the terms CEE and BRS have been used for experiments that control the acoustic dosage (level received by the animal), with exposure metrics measured or modelled at the animal, usually to obtain the dose-response. Many playback experiments do not include this level of control.
Humpback whales (Megaptera novaeangliae, Borowski 1781) are very vocal. Males produce a long, complex, stereotyped, repetitive ‘song’ (Payne and McVay, 1971) on the breeding grounds and during migration. The function of song is currently under debate but one function is likely to be as a sexual advertisement directed towards females (Smith et al., 2008; Tyack, 1981). Other proposed functions include song operating as a mechanism for male social sorting (Darling et al., 1983), a method of spacing between singers (Frankel et al., 1995) or a threat display during intra-sexual competition (Baker and Herman, 1984). In addition to song, humpback whales produce ‘social sounds’ (Payne, 1978; Tyack, 1981), which include surface-generated percussive sounds (e.g. breaches, pectoral flipper slaps, tail slaps) and social vocalisations. Social sounds are produced by adult males, adult females (Dunlop et al., 2008) and probably calves (Zoidis et al., 2008). These sounds are thought to convey information on species and sex of the signaller, signaller location, size, readiness to mate and readiness to compete with males as well as aid in group cohesion during joining, instigate and facilitate social interactions between groups or cohorts, maintain contact with other group members and facilitate group splitting (Dunlop et al., 2008; Dunlop et al., 2010). However, the function of specific social sounds is still not known and the contextual use of many of the social sounds in humpback whales is still to be determined.
To date, two playback studies have been carried out on humpback whales, both designed to determine the function of conspecific vocalisations; Tyack tested the behavioural response of humpback whales to conspecific song and social sounds (Tyack, 1983), and Mobley and colleagues included exposure to a synthetic sound along with playback of conspecific song and social sounds (Mobley et al., 1988). Both singers and non-singers demonstrated approach and avoidance responses to playback of social sounds (Tyack, 1983; Mobley et al., 1988), suggesting an important communicative function of these sounds between different social groups. However, as with many of these studies, sample size was unavoidably small and the experiments were ‘sacrificially replicated’ (Deecke, 2006); that is, focal individuals were used repeatedly (exposed to both stimuli) and statistical independence was violated in the analysis as it did not account for this repeated measure design. Other BRS on humpback whales have focused on assessing the response to an anthropogenic stimulus. In the marine mammal literature they are usually referred to as CEE. In many of these studies, only the received level of the sound was considered as the stimulus variable and other factors relating to the context of the exposed animal (for example the social environment and the noise environment) were not considered. The most current and preferred term for this type of experiment in wild cetaceans is BRS.
In this experiment, we used a typical behavioural response experimental design to test the response of humpback whales to one recording of conspecific social sounds compared with a low-frequency sweep (2 kHz) tone, which is within the frequency range of humpback vocalisations. Song units are highly variable in frequency range and usually lie between 30 Hz (Payne and Payne, 1985) and 4 kHz (Tyack and Clark, 2000), with harmonics extending beyond 24 kHz (Au et al., 2006), and social vocalisations range from less than 30 Hz to 2.5 kHz (Dunlop et al., 2007). We therefore assumed both stimuli were audible but hypothesised that humpback whale groups would react differently to an artificial signal (‘tones’) compared with a more natural signal (a recording of conspecific social sounds taken from the same population of whales). Behavioural responses to a sound stimulus are likely to be context specific, in terms of both the social context of the animal and the context of the source stimulus tested (signal-to-noise level, proximity of the source, novelty of the source). Therefore, we used a multivariate analysis to test for an effect of categorical factors such as social context and the presence of other cohorts such as singing whales (the social environment) and continuous variables such as received signal-to-noise level, proximity to the source vessel and background noise levels (the external environment) on the behavioural response to each stimulus type.
MATERIALS AND METHODS
Initial experiments were carried out in September/October 2004 during the humpback whale southward migration. Further experiments were carried out in 2008 during the same 2 months. The study site was located at Peregian Beach, which is 150 km north of Brisbane, on the east coast of Australia (26°29'S, 153°06'E) and about 800 km south of the potential breeding grounds in the Great Barrier Reef (Smith et al., 2012). Humpback whales passing Peregian Beach are migrating from the breeding grounds further north and show a range of behaviours typical of breeding grounds (for example singing, forming competitive groups, frequent joining and splitting of groups, meandering and variation in swim speed and direction, nursing and other maternal behaviours as a result of numerous newborn calves) while moving in a general southwards direction. A fixed array of hydrophones was moored offshore for acoustic data collection. Each hydrophone was suspended from a buoy that transmitted the acoustic data to a base station on shore. Buoys 1-3 were 1.5 km from the beach, parallel to the shoreline and ∼0.7 km apart. Buoys 4 and 5 extended seaward from buoy 2 in a line perpendicular to the shore and were ∼0.5 km apart. Buoys 1-3 were always operational and were usually adequate to fix the positions of vocalising whales [using Ishmael software (Mellinger, 2001)]. This was supplemented with information from buoys 4 and 5 for many observations. Visual survey teams were based on an elevated survey point, Emu Mountain (73 m), which was adjacent to the coast. From this vantage point, visual observations were possible out to 15 km from the survey point. Further information on the study site set-up and calibration of the acoustic array can be obtained elsewhere (see Noad et al., 2004; Dunlop et al., 2007; Dunlop et al., 2008; Smith et al., 2008; Dunlop et al., 2010). Visual data collection involved two platforms of observation: ‘ad lib sampling’ and ‘focal follow’.
Visual platform of observation
Land-based behavioural observations were collected daily (07:00 h to 17:00 h, weather permitting). A theodolite (Leica TM 1100) was connected to a notebook computer running Cyclopes software (E.K.) and used to track and observe passing whales. In this study, the sample unit was a group of whales, defined as those whales surfacing within 4 body lengths (about 50 m) of each other. When whales are travelling, their surfacing intervals are usually several hundred metres apart, much larger than the spacing of the whales from each other within a group and far less than the spacing between groups. Group sizes usually comprise one to three individuals. Cyclopes records the positions of whales from the theodolite elevation and azimuth in real time. Fixes were annotated with observed behaviours and group compositions out to a 10 km limit. Two observers with binoculars were responsible for keeping track of all visible groups in the area as ad lib observations (including the target group during an experiment) and directing the theodolite operator to groups to be fixed. Data from the visual observers included bearing and distance from Emu Mountain, group composition, group behaviours (blow, breach, pectoral flipper slap, tail slap, splitting apart of a group, joining together of two groups, no blow rise or surfacing, peduncle slap, inverted tail slap, inverted pectoral flipper slap and head lunge being the majority observed) and direction of travel. These were recorded by the Cyclopes operator (as ‘additional observations’, made using binocular bearing and reticule readings).
Focal follow platform of observation
The focal following method of observation was introduced in the 2008 experiment. A dedicated focal follow team was situated slightly apart from the ad lib sampling team to avoid confusion, but within audible range to allow some communication between teams. This team consisted of a theodolite operator and a Cyclopes data recorder. Once a suitable group was targeted for an experiment by the base station, the focal follow team concentrated only on this group for the duration of the experiment where all visible behaviours were recorded.
Digital recording tag platform of observation
A Dtag (non-invasive, digital acoustic recording tags with depth and orientation sensors, with acoustic sampling rate 64 kHz and sensor sampling rate 5 Hz) (Johnson and Tyack, 2003) was deployed onto a mother within a female-calf group during one of the ‘social sound’ experiments. The tag was attached to the back of the whale as she surfaced in front of a specially equipped boat, using a long pole attached to the bow. The tag was attached by suction cups and pre-programmed to detach after 4 h. It contained a hydrophone and three-axis accelerometers and magnetometers to measure pitch, roll and heading (Johnson and Tyack, 2003). An estimated 3D dead-reckoned track including the dive profile could be derived using Dtag data and an estimate of travel speed (Miller et al., 2009). The Dtag hydrophone provided a high-quality recording of the sound field at the whale.
A J11 acoustic projector was used as the underwater loudspeaker. It was suspended 10 m below a small boat, which was allowed to drift. A hydrophone was suspended to the same depth from the other end of the boat to measure the J11 source level (by correcting for the propagation loss between the J11 and the hydrophone assuming spherical spreading). The social sound stimulus consisted of a 20 min sequence of social vocalisations compiled from a variety of social sounds recorded using a Dtag deployed previously onto a female-calf-escort group passing through the site. The escort was probably a male, as groups with two or more adults and a calf generally consist of an adult female, a calf and one or more male escorts (Baker and Herman, 1984; Tyack and Whitehead, 1983). A collection of different social sounds was spliced together to make up a recording of 204 s duration. This was repeated to make up the 20 min stimulus. We decided to use only one recording of social sounds as the goal of the study was to look for differences in response to a recording of ‘natural’ conspecific sounds (which, based on previous work, we assumed would produce a reaction) compared with an unnatural ‘tone’ sound [following a previous design (Mobley et al., 1988)]. We assumed that using different recordings of social sounds would produce highly variable reactions (dependent on the sound types as well as the social context of the recorded group) and, therefore, using only one recording would reduce the potential variability in the reaction. However, using only one social sound stimulus does limit any conclusions that can be made about the observed response to social sounds as well as the function of these sounds. To negate external validity issues and make more generalised conclusions on any differences in response to tones versus social sounds, it would have been better to repeat the study using a different set of social sounds. The tone stimulus consisted of a sequence of tones swept in frequency from 2 to 2.1 kHz over a period of 1.5 s, repeated every 8 s for 20 min. Source levels varied from 148 to 153 dB re. 1 μPa at 1 m root mean square (r.m.s.) for both stimuli (similar to source levels of humpback whale social vocalisations; R.A.D., unpublished data). Stimuli were recorded on a CD and played through an amplifier into the J11. Also recorded on a CD was a silent control consisting of a 20 min recording with no signal input.
This experiment followed a typical BDA (before, during, after) design. The B period consisted of a pre-exposure (stimulus off) control, the D period was a period in which the stimulus was turned on, followed by a post-exposure (stimulus off) A period. Each period lasted for 20 min. Exposure treatments were one of three types: a silent control, a recording of conspecific social vocalisations or the artificially generated 2 kHz tone. To increase the sample size of the control treatment, groups that migrated within 2 km of a moored vessel (the research boat or a similar-sized vessel) were also included in this ‘control’ category. Therefore, not all control treatments involved a J11 being deployed in the water playing ‘silence’.
Baseline groups migrating through the study area, selected randomly, were focally followed for at least 1 h during times when no experiments were underway. We also selected a number of ad lib sampled, baseline groups for analysis based on the following selection criteria: (1) they had to be visually tracked within the study site for at least 1 h and (2) they did not move within 2 km of a stationary vessel during the hour specified for analysis. This comprised the ‘baseline’ dataset.
Movement response variables
Movement response variables (measures of how the group travelled through the study site) such as speed, course and distance travelled between each surfacing were calculated by examining the difference in position between each theodolite fix. The difference in course between successive fixes was used as a measure of how erratic the group course was. The total distance travelled within each period (taking into account all changes in course) was calculated by summing all distances between consecutive surfacing events for that period.
Only one theodolite fix was chosen (usually the first fix on the group within a surface interval after a deep dive) to represent each group surfacing. Generally, animals within each group were less than 50 m apart; therefore, this tracking method provided the best representation of group movement through the study area. If surfacing events were missed within experimental periods (in the ad lib sampling dataset), the assumption was that groups travelled in a straight line and at constant speed between the two consecutive surfacing events. The mean of all measurements of course travelled (magnetic bearing), variation in course travelled, and speed was calculated for each 20 min experimental period. The ‘course made good’ for each period was estimated using two fixes – the one at the start of the experimental (BDA) period and the one at the end – and calculated as the bearing of the second fix relative to the first.
Behavioural response variables
The behavioural response variables consisted of measures of diving and surface behaviour. Dive profile incorporates ‘surfacing’ dives (short and shallow dives that occur during respiration bouts) and ‘long dives’ in which the group disappears for a longer period of time. A long dive is defined as the time from when the last group member disappears to when the first group member re-appears and the ‘surface interval’ is defined as the time spent at the surface between long dives, which incorporated all surfacing dives. Discriminating between long and surfacing dives can be problematic. Typical humpback whale dive patterns tend to be a number of short respiration dives followed by a longer dive (usually lasting 3-5 min). Focal follow data were used to differentiate between shallow respiration dives and long dives as the majority of surface behaviours from each target group should have been recorded and the timing of these events should be relatively accurate. The time between each successive sightings (dive time) was measured within each group and the log-transformed time (due to non-normality) was plotted as a histogram. This gave a bimodal histogram, one peak corresponding to peak respiration dive times and one peak corresponding to long dive peak times. We used a probability density function in the histogram as a guide to determine the two peaks in the dive time dataset as well as an appropriate cut-off time between respiration and long dives (estimated as the trough between the two peaks). This provided a separation value of 60 s. Dive times of less than 60 s were designated short respiration dives and dive times longer than 60 s were designated long dives. The peak respiration dive time was found to be 10 s (times ranged from 2 to 58 s). The peak long dive time was found to be 3 min (ranging from 60 s to 18 min). Inspection of the final long dive dataset showed that 18 min was an outlier (it may have been two long dives); therefore, we omitted this point, leaving the range of long dive times to be between 60 s and 11 min. The number of long dives and surface intervals (which included all respiration dives) and the mean durations of these dive profile behaviours were calculated for each experimental period.
Surface intervals were classified as either ‘blow only’ (no animal within the group was surface active during the surface interval) or ‘surface active’ (one or more animals within the group were surface active during the surface interval; in other words, breaching, pectoral slapping or tail slapping behaviour was observed). The number of each type of surface interval was counted for each experimental period.
Whale groups were divided into five different categories based on the typical composition of groups observed during the southern migration (Table 1); lone animals (singletons, which may or may not have been singing during the experiment), female-calf groups, adult pairs, female-calf-escort groups (the escort may or may not have been singing during the experiment) and groups with three or more adults (female-calf-multiple escorts or groups of three or four adults). However, because of the small sample size of each cohort, groups were divided into female and calf groups (containing a female and no adult male), lone (many of them were singers and therefore males) and multiple (all other cohorts). It is likely that the presence of an escort, or a number of escorts, in a group, including a mature female, will have a significant effect on group behaviour compared with that of a group containing only a mature female, with or without a calf. Female associations are thought to be rare (Brown and Corkeron, 1995; Clapham, 2000) and humpback interactions involving groups with two or more adults and a calf generally consisted of an adult female, a calf and one or more male escorts (Baker and Herman, 1984; Tyack and Whitehead, 1983). The group composition of all other groups in the study area and the distance of each group from the target group were noted throughout the experiment. For this analysis, only the presence of the closest group (the ‘nearest neighbour’), the mean distance of the nearest neighbour from the target group during each experimental period, and the mean distance of the nearest singer from the target group during each experimental period (estimated using acoustic positions overlaid on top of visual positions) were considered as social factors. We also noted whether a group joined the target group or the target group split into two smaller groups within each experimental period.
Wind speed was measured using a weather station placed on the roof of the base station. The mean wind speed was calculated for each experimental period.
In the study area, sounds from singing whales were frequent components of the underwater noise environment, though small recreational vessels were often audible as they traversed the area. During this experiment, the majority of the samples had little interference from vessel noise and therefore the background noise level (without singers) was mainly typical ambient noise (Cato, 1997), mostly due to noise from sea surface motion (wind-dependent noise) and snapping shrimps. Traffic noise, the noise from distant shipping, is significant further off shore, but the shallow water approaches to the site would have limited this contribution. In many cases, noise measurements could be made without a significant contribution from singing humpback whales. When song made significant contributions to the noise, the noise in the absence of song was estimated from the periods in between identifiable song units. To do this, a recording was displayed as a wave form (Adobe Audition) and song units were deleted, leaving only the time periods between song units. A 20 s noise sample was obtained in this way. Song units were usually separated by 1-3 s and the song fades out as the singer comes to the surface to breathe. This may have contained undetectable song units, but these would not have made a significant contribution to the estimate of wind-dependent background noise levels. During exposure, the noise was estimated in the same way by deleting the periods when the stimuli were present. A 20 s noise sample was taken from each hydrophone in the array every 10 min, starting 10 min before the start of the experiment and ending 10 min after the finish of the experiment. The noise in each 20 s sample was measured in one-third octave band levels in the range of 40Hz to 2kHz and the system calibration applied to obtain levels in dB re. 1 μPa. One-third octave bands represent the logarithmic increase in frequency range of auditory filters in the mammalian ear, and in humpback vocalisations most sound energy of the fundamental frequency is contained within a one-third octave band, making this an appropriate filter. The total background noise level was calculated by summing the mean square pressure for each one-third octave band for the frequency band of interest and converting this to total broadband noise level (dB re. 1 μPa). Mean broadband noise levels for each experimental period were then calculated from all samples taken from all hydrophones.
Background noise levels (excluding contributions from singers) at the array were assumed to be similar to those at the location of each humpback whale group, as it was predominantly wind-dependent noise and wind speed was generally uniform throughout the study site (snapping shrimp noise did not contribute significantly in the frequency band of interest). This was not the case for noise from nearby singers, which was dependent on the distance of the singer from the receiver. Therefore, analysed groups were also categorised according to the social environment: ‘none’ (no audible singers present), ‘close singer proximity’ (the nearest singer was within 2 km of the group, or became part of the group, such as a mother and calf being joined by a singing escort), ‘medium singer proximity’ (the nearest singer was between 2 and 5 km from the target group) and ‘far singer proximity’ (the nearest singer was more than 5 km from the group).
Received levels and signal-to-noise ratios
All received levels of each stimulus were measured in one-third octave bands from recordings made on the fixed array using SpectraPLUS 5.0 (Sound Technology Inc., Tampa, FL, USA). Three tones were selected for measurement in the first 10 min and three tones in the second 10 min of exposure. The sound pressure levels at the array for each tone sound were measured (dB re. 1 μPa) in the 2000 Hz one-third octave. For the social sounds, one of the highest level sound types was chosen for measurement. Six samples were measured in one-third octave bands from 200 to 400 Hz (centre frequencies), which contained most of the energy.
Transmission loss was measured using the noise generated by a noisy boat as the source. The boat conducted runs along lines radiating from the array, from distances of 100 m out to about 10 km from the array. Regression lines were fitted to the data as a function of the logarithm of the distance. The results were in the form of relative loss over the distance of measurement in the form TL=a+blog(x) where b is the slope of the regression line, x is distance and a is a constant. The received level at the group could then be determined from the received level at the array by RLg=RLa+blog(xa-xg), where RLg and RLa are the received levels at the group and the array respectively, and xg and xa are the distances from the playback source of the group and the array, respectively. For most frequencies, b varied with distance but could be well approximated by two values, one applying to distances less than and the other to distances greater than a cross-over value.
The received level of each stimulus at whale groups varied over a range of 40 dB while the ambient noise varied over a range of 30 dB. It is possible that some of the lower received levels were masked by the ambient noise background and thus not heard by the whales. Masked thresholds of audibility have not been measured for humpback whales or any other species of baleen whale. However, they have been measured for a range of terrestrial and marine mammal taxa and the results are broadly consistent. While the extent to which this information can be applied to humpback whales is limited, it gives an indication of where the signals may be below the masking threshold and thus inaudible.
One measure of masked threshold is the critical ratio, which is the difference (in decibels) between the level of a tone at the threshold of audibility and the spectrum level of white masking noise at the same frequency (Richardson et al., 1995). This is similar to the masking of playback tones by ambient noise. Masking of a tone is considered to be caused by a limited bandwidth of the noise, typically less than 20% of the tone frequency at 2 kHz, and over this band the ambient noise is a reasonable approximation to white noise. Critical ratio measurements for various species are summarised elsewhere (Richardson et al., 1995; Southall et al., 2007). The value at 2 kHz ranges from 19 to 26 dB across several species of pinnipeds and is 19 dB for the beluga, 20 dB for humans and 25 dB for cats. These results provide the best information we have to infer where the playback of tones might be masked by the ambient noise. Our measurements of SNR for the tones used the noise level in the one-third octave band at 2000 Hz, i.e. over the band 1782 – 2245 Hz, a bandwidth of 450 Hz. Noise levels in this band will be 10log(463)=26.7 dB higher than the spectrum level, so that SNR using the one-third octave band for noise will be 26.7 dB lower than those using the noise spectrum levels. The range of critical ratios of 19-26 dB is thus equivalent to SNRs using the one-third octave band for noise of -7.7 to -0.7 dB, an average of -4.2 dB. The analysis of the tones experiment was therefore conducted using a subset of the data limited to SNR≥-4 dB, to exclude data that might have been inaudible, as well as using the full dataset.
Critical ratios are generally measured for tonal signals and there do not appear to be measurements applicable to signals like the social sounds. The social sound type chosen for the analysis has most energy extending across three adjacent one-third octaves (centre frequencies 200-400 Hz, i.e. from 177 to 446 Hz) and we measured the SNR for both the signal and the noise in this band. If the masking frequency band is wider than the signal band, the threshold of audibility would occur for SNR>0 dB (signal and noise measured in the same band). However, some social sounds are harmonic. The masking bandwidth for harmonic sounds may be closer to the masking band for a tone. If this was the case, the threshold of audibility for these sounds would be significantly less than 0 dB for the way we measured SNR. In the analysis, a subset of data that excluded SNR<0 dB at the start of the during phase was used to exclude playback that might have been inaudible. As it happened, the highest SNR experienced by the whale groups during exposure exceeded 6 dB (as groups approached the source vessel) for all included groups, so it seems unlikely that any in this reduced dataset were not audible, at least for some part of the exposure.
All analyses were generated using the statistical software package R (R Foundation for Statistical Computing). To test for sampling bias between BDA periods, in other words to test whether there was a more concentrated effort in the D period, the (normalised) mean number of observations between experimental periods was compared in both the focal follow and ad lib data. No sampling bias was apparent. A measure of group visibility was compared between experimental periods to test whether there was any bias in group sightability due to increased sighting effort, increased time spent on the surface or an increase in surface active behaviours making the group more visible and less likely to be missed. The measure of group visibility used was the total amount of time per BDA period that groups were sighted on the surface (or in a shallow surfacing dive) expressed as a percentage of the total time of each experimental period. These percentages were compared between periods and no significant difference was found. As a result, all observations were used in the dataset.
The mean (+s.d.) of each response variable (course travelled, change in course travelled, speed travelled, number of deep dives, number of surface intervals, length of deep dives, length of surface interval number of blow only surface intervals, number of surface active surface intervals, course-made-good and distance travelled) was calculated per experimental period for each humpback whale group. Linear mixed-effects models were fitted to each response variable, which included the random effect of group (and associated variance). Standard statistical models assume independence of errors, but when measurements are taken from the same group, they are correlated. Mixed-effects models account for the interdependence of data introduced by taking multiple observations from the same individual as they model the covariance structure introduced by grouping the data. The included random effect estimates the distribution of the means as a standard deviation of the differences of the factor-level means around an overall mean, instead of estimating a mean for every single factor level. To test the effect of stimulus exposure on behavioural measures, linear mixed-effects models [using the lme4 package (Bates et al., 2011)] were used, which included stimulus type, experimental period, environmental and social variables and measures of RL and SNR. Group ID was included as a random factor. Models including different terms (null and predictor variables) were compared using Akaike information criterion (AIC) scores and checked for significant (P<0.05) improvement using the maximum likelihood ratio (LR) test, where the probability distribution of the test statistic is a chi-squared distribution and the degrees of freedom equals d.f.1-d.f.2 (where d.f.1 and d.f.2 are the degrees of freedom for the two models being compared). Mixed fixed effects models can be problematic as the distribution of the fixed effects is uncertain under the null hypothesis and the denominator degrees of freedom for tests are difficult to determine (D. Bates, personal communication). Therefore P-values were generated using the Markov Chain Monte Carlo (MCMC) method using the language R package. Residuals of each model were checked for homoscedasticity and errors were checked for normality. Within-model t-values with associated P-values are also reported for specific comparisons.
In 2008, 15 experiments were carried out: eight using the social sound stimulus, six using the tones stimulus and one silent control. A further six groups were used as controls. All focally followed groups were from the 2008 experiment and only one group was focally followed per experiment. In 2004, 16 experiments were carried out: seven using social sounds, seven using the tones stimulus and two using a silent control. All groups in 2004 were sampled ad lib (as much data on each group in the area was collected as possible without focusing on one specific group) and multiple groups were sampled during each experiment. A further 19 groups were selected as baseline groups from the two years. Ad lib sampled groups were also used in 2008. Table 2 presents the sample size of groups used for the analysis combining the 2004 and 2008 datasets.
The experiment was carried out on southerly migrating groups (in a population of over 10,000 animals); therefore, it is highly unlikely that any group was repeatedly sampled. If the group split into two separate groups (N=8), only one of those groups was used (the one that appeared first after the split).
All focally followed samples can be considered independent as only one group was focally followed during each experiment. All baseline samples were also independent (one sample per day). Of the ad lib sampled groups exposed to either social sounds or tones, 28 groups were multiple samples; in other words, during any exposure experiment, up to three groups may have been used for the analysis. In 2008, one of these groups would have also been focally followed. In 2004, all groups were ad lib sampled. If groups do not interact with each other in such a way that the response to the stimulus is influenced by this interaction, then they can be considered as independent samples (Miller et al., 2009). We minimised the potential for non-independent sampling by ensuring the following criteria were met: no groups that were simultaneously used in the analysis interacted with each other (in other words, joined together) and none of these groups came within 3 km of each other (average distance apart was 5755 m, range 3000-10,000 m). We used a 3 km limit as the most likely interaction between groups would have been mediated acoustically and it is difficult to hear social sounds on the array from groups beyond 3 km. This minimises the risk that the groups were somehow influencing each other's behaviour. To further check this, we looked for social sounds on the acoustic recordings made during each trial and found that no sampled group that was also vocalising was within 4 km of any other simultaneously sampled group.
We also accounted for the effect of the nearest neighbour group (not usually another sampled group) to determine whether nearby groups had any influence on the behavioural response parameters. While socially vocalising groups are unlikely to be heard more than a few kilometres away, singing whales are audible over distances of tens of kilometres and therefore could potentially affect the behaviour of any group within audible range. In the analysis we also accounted for the presence of the nearest singing whale as a fixed effect (assuming the nearest singer is more likely to have an influence on the behaviour of the group than more distant singers).
Only one experiment was carried out using a Dtag. The tagged animal (the female from a female-calf group) changed dive behaviour to shorter, shallower dives during the time the social sound stimulus was played and did not return to pre-exposure dive behaviour after exposure (Fig. 1). The animal also changed direction, from consistently travelling at a mean of 225 deg (south-westerly direction) to head directly west (inshore), then north. After the experiment had finished, the group slowly returned to a southerly course. This group was also tracked from the visual station (though was lost during exposure, probably because of the change in dive behaviour, which resulted in the animals becoming very difficult to track). From the dead-reckoned track, the distance from the source vessel at the start of exposure was estimated (using received levels measured at the array and then estimated at the group) to be 880 m (signal level RL of 101 dB re. 1 μPa and SNR of 8 dB) and the distance from the vessel when the group initially changed course was 660 m (signal level RL of 105 dB re. 1 μPa and SNR of 13 dB).
Visual observation data
A total of 15 groups were both ad lib sampled and focally followed at the same time from the two different platforms of observation. Each response measurement for each platform of observation was averaged over each experimental period. A mixed effect model was used to test whether there was a difference in any of the response measurements between the two platforms of observation where ‘platform’ (ad lib or focal follow) was included as a fixed effect and group ID as the random effect.
Movement response variables (course travelled, variation in course travelled) and two of the behavioural response variables – the number of surface intervals and the number of blow only surface intervals – were comparable between the two visual survey platforms. However, behavioural variables such as long dive times and mean surface interval times were found to be significantly different (Table 3). This suggests that all of the surface intervals are being captured in both the focal follow data and ad lib sampling data; however, the timing of behaviours such as the long dive times and surface interval times were significantly different because of the ad lib sampling team missing a number of group surfacing behaviours (as shown by the difference in the number of observations per experimental period for each of the data platforms).
To increase the experimental power (by increasing the sample size) and allow the incorporation of other factors into the analysis model, we pooled the data from the two platforms of observation (using focal data from groups that were both ad lib sampled and focally followed) when testing all movement variables and when testing numbers of behavioural events, but not when testing the timing of events. Only five groups were exposed to silence; therefore, we pooled these data with those from baseline groups (after first comparing response variables between non-exposed and silent groups and finding no significant difference). These groups will hereafter be referred to as baseline groups.
The response to stimulus and experimental period
The following analysis includes only groups in which we assumed the stimulus was audible at some stage during exposure (reduced dataset) and all baseline groups.
The course travelled by groups (N=53 groups) was dependent on the stimulus type combined with the experimental period (LR χ28=31.7, P=0.0002) and results from this model suggest that groups exposed to tones generally travelled on a more south-easterly (offshore) course during exposure (change in course estimated at -20 deg relative to the before phase of baseline groups, s.e.m.=13.6, t=-4.6, P=0.001) and after exposure (change in course estimated at -12 deg, s.e.m.=13.6, t=-4.8, P=0.0006 relative to baseline groups) compared with baseline groups (whose course was estimated at 177 deg, s.e.m.=9.0). Groups exposed to the social sounds recording and baseline groups tended to migrate in a south-south-westerly direction, following the coastline (there was no significant difference in travel direction). However, some groups visibly changed direction when exposed to the social sounds stimulus, though usually returned to their previous course at some point during exposure. Looking just at the focal follow data (N=8), some groups obviously changed course and approached the boat to within 100 m (one singleton, one female-calf-escort group and one pair) whereas other groups (for example, the tagged female from the female-calf group) moved inshore and away from the vessel at some stage during the playback of social sounds. In one instance, a singer stopped singing and moved away from the vessel, whereas in two instances, a single animal split from a group and started singing in close proximity to the vessel. Therefore, we found a highly variable but not prolonged response in terms of the change in course travelled in groups exposed to our recording of social sounds, whereas the response to tones was a consistent and prolonged change in course to a more offshore direction.
The (normalised) number of surface intervals per 20 min was found to be significantly dependent on the experimental period combined with the stimulus type (LR χ28=32.2, P<0.0001; reduced ‘audible’ dataset) as was mean long dive time (LR χ28=32.6, P<0.0001; focal follow dataset). Fig. 2 illustrates the changes in dive time (focally followed groups; N=20) and number of surface intervals per experimental period (N=53) during the experiment for baseline and exposed groups. Groups exposed to tones displayed a greater number (estimated at 1.5 surface intervals per experimental period) of surface intervals during exposure (t=3.7, P=0.0001) compared with baseline groups (which surfaced about 3-4 times per experimental period) and a decrease (estimated at 106 s) in dive time (t=-2.2, P=0.03) compared with baseline groups. The number of blow only and surface active surface intervals and the length of the surface interval were not found to be significant response variables.
The effect of environmental and social variables
Environmental variables, such as wind speed or background noise levels, and social variables, such as the number of groups in the study area, the social composition of the nearest neighbour or the distance to the closest neighbour, were not significant predictor variables in any response model.
We added in the social composition of the groups [lone animals and lone singing whales were categorised together as lone animals, female-calf pairs formed their own social category, and adult pairs, female-calf-escort(s) and groups with more than two adults were categorised together as multiple adult groups] to the course travelled response model (which included the term stimulus only) and found a significant (LR χ26=17.7, P=0.006) improvement in this model. Female-calf groups, in response to social sounds, tended to take a much more westerly (inshore) course compared with multiple adult groups (t=-3.1, P=0.003). The response to tones, in terms of course travelled, was similar within each group social composition for all datasets.
There was also a significant (LR χ26=28.7, P=0.0001) improvement in the number of surface intervals response model. This was due to the differences in dive behaviour between the social categories. Lone animals tended to surface significantly less often than female-calf pairs (t=-2.8, P=0.03) and multiple adult groups (t=-5.1, P=0.0001). However, although most groups responded to tones by increasing the number of surface intervals, the response to social sounds was again highly variable. Some groups increased the number of surface intervals and others decreased the number of surface intervals during exposure, but no significant trend with social category was found. However, the sample size for each social category was quite small.
The effect of source proximity, received signal levels and received SNR variables
The following analysis includes only groups exposed to either stimulus (N=37) for the pooled ad lib plus focal follow dataset (testing course and number of surface intervals) and 14 for the focally followed groups (testing long dive time) to test the effect of proximity of the group to source, RL and received SNR at the start of exposure on each response variable. To test which of the exposure metrics (proximity to source, RL or SNR) best predicted the response, we compared four different models for each response variable within the two different datasets: the full dataset (including probably ‘inaudible’ experiments) and the audible dataset (including only those that we assume are audible as defined by the previous criteria). The following four models were compared: (1) model including stimulus and experimental period only; (2) model including stimulus, experimental period and proximity; (3) model including stimulus, experimental period and RL; and (4) model including stimulus, experimental period and SNR as predictors.
The inclusion of SNR as the exposure metric significantly improved the response model for course travelled, though only in the full dataset. The best exposure metric to predict the response in terms of the number of surface intervals was both RL or SNR (full dataset) and SNR in the audible dataset. For long dive time (using only focal follow data, which we assumed all are audible), the best exposure metric was proximity to the source (Table 4).
Groups changed their course to a more easterly direction during (t=-2.2, P=0.02) and after (t=-2.7, P=0.009) exposure to tones as the received SNR of the signal increased at the start of exposure (Fig. 3). SNRs ranged from -22 to 15 dB at the start of exposure and the proximity to the source at the start of exposure ranged from 300 m to 8.8 km in these groups, although we suspect the tones were only audible from about -8 dB (at a distance of about 3.5 km depending on the background noise). The received signal levels of the social sounds stimulus ranged from 72 to 98 dB re. 1 μPa, SNRs ranged from -23 to 21 dB, and the proximity of the group at the start of exposure ranged from 440 m to 8 km but groups did not respond to this stimulus in terms of a consistent change in course and therefore it was not possible to assess the effect of any exposure metrics.
Groups, when exposed to tones also increased the number of surface intervals as the received SNR increased at the start of exposure (t=2.1, P=0.02, Fig. 3) and the SNR was found to be the ‘best’ exposure metric for predicting this response for all datasets. An increase in the SNR at the start of exposure also resulted in a decreased number of surfacings post-exposure (t=-2.2, P=0.03). Groups tended to surface less often during exposure to social sounds compared with groups exposed to tones; however, there was no real trend with SNR in these groups (probably due to the variation in reaction). In other words, the relationship between the response variable and the SNR at the start of exposure was found only in groups exposed to tones.
The long dive time response was significantly related to all three exposure metrics though the proximity of the group at the start of exposure was the best predictor of the response (Table 4). However, these focally followed groups were always within 2 km of the source at the beginning of the exposure phase (proximity ranged from 300 m to 2 km, RL ranged from 84 to 112 dB re. 1 μPa and SNR ranged from 2 to 14 dB). The proximity to the source vessel had an effect in the post-exposure phase, where groups exposed to tones displayed a decrease in long dive time with decreased proximity to the source, and groups exposed to social sounds displayed an increase in long dive time with decreased proximity to the source (t=1.8, P=0.05).
Out of all tested response variables, three (course travelled, the number of surface intervals and long dive times) were found to change significantly in response to three exposure metrics: proximity, the RL and the SNR at the group. An easterly change in course (away from the coast) was found to occur during and after exposure in response to tones. These groups also tended to spend more time close to the surface (by increasing the number of surface intervals and decreasing dive time) during exposure. The magnitude of the change in course and dive behaviour was related to the proximity, RL and the SNR of the stimulus at the start of exposure. Groups exposed to our recording of social sounds did not significantly change their direction of migration, though we did find short-term changes in travel direction. Different social groups reacted quite differently to this stimulus; female-calf groups tended to move inshore and spend more time near the surface. Other social groups approached the source vessel but returned to their original travel direction at some point during exposure. This paper presents evidence that migrating humpback whales differ in their behavioural response when presented with a recording of conspecific social sounds compared with artificial tones, and this change in behaviour was influenced by other factors: the social group, the proximity of the group to the source vessel and the initial ‘dose’ (as measured by the start SNR and RL).
A change of course was most evident in groups exposed to tones, where groups moved away from the source vessel and offshore at some point during exposure, indicative of an avoidance reaction to this stimulus. In comparison, many groups (mainly those thought to contain a male because one member was a singer or an escort with a female and calf) exposed to social sounds first approached the source vessel, then at some point resumed their previous course, or continued along their path towards the source vessel during exposure. Both previous BRS in humpback whales using conspecific social sounds found that the social composition of the group was an important factor in determining the response. Tyack found that singing males stopped singing when either song or social sounds were played and the majority of them ‘charged’ the boat when exposed to social sounds (Tyack, 1983). However, females with calves and large groups tended to move away from the boat during exposure to these sounds. Mobley and colleagues found rapid approach responses in singletons and adult pairs but no approaches by females with a calf (Mobley et al., 1988). Although our sample size (with focally followed groups) was small, we found similar results, with some single animals and adult pairs approaching the boat, whilst some females with calves evidently changed course to avoid the source vessel (though they tended to move inshore). This avoidance reaction (in terms of a change in the direction of travel during exposure) was very clear in the single tagged group. However, only one recording of social sounds was used here and inferring the function of these sounds based on the observed behavioural reactions goes beyond the scope of this study. Although the sample size of our study, in terms of determining the social effects, was limited, it demonstrates the complexity of behavioural responses to stimuli and the need to measure as many other factors as possible (and generate a large sample size) in order to tease out such complex interactions. It would be beneficial to repeat the study with a different set of social sounds to negate external validity issues with only using one stimulus (allowing us to make more generalised conclusions on the difference in response to tones versus social sounds). These experiments could also be targeted towards testing the function of specific sounds by using a number of different recordings from various cohorts to determine whether there are consistent avoidance and attraction responses to each combination of vocal signals.
In this study we found diving and surfacing behaviour also significantly changed with exposure to both test stimuli. Previous studies assessing the behavioural response of humpbacks to an M-sequence sound (Frankel and Clark, 1998) and a recording of a full scale acoustic thermometry ocean climate (ATOC) sound source signal (Frankel and Clark, 2000) found responses such as increases in time between surfacing events and a greater distance travelled underwater. In other words, they found exposed humpback whales tended to spend more time underwater and travel further compared with baseline groups. In the present study, we found that groups consistently increased the number of surface intervals (and consequently decreased the dive time and therefore the time spent at depth) in response to tones. This may indicate an avoidance reaction to our signal, but this avoidance reaction differs from that found by Frankel and Clark (Frankel and Clark, 2000). We found that female-calf groups tended to respond to social sounds in a similar way, with a change in their dive behaviour to more frequent yet brief surfacing events. This may also be a way for females with calves to avoid what they perceive to be a nearby group that might contain a male.
The experiments by Frankel and Clark included group composition, the presence of nearby vessels and the RL as additional predictor variables (Frankel and Clark, 2000). Most cetacean BRS to date have considered only the RL (Southall et al., 2007). However, the relative level of the signal compared with the background noise (SNR) or the signal excess above masked hearing thresholds may be significant predictors of behavioural response, and may under certain conditions (such as when the receiver is at greater distances from the source and received levels are close to background noise) be a better predictor than received sound pressure level. In this respect, we found the SNR to be a better predictor of the change in behavioural response (in terms of course travelled) than RL and proximity to the source when using the full dataset (where groups ranged from 300 m to 8.8 km from the source). The change in dive behaviour (measured by the increase in the number of surface intervals) in response to tones was also highly related to the SNR at the start of exposure. We could only measure long dive time using focal follow data (where groups were within 2 km of the source) and, using this dataset, did not find that SNR was the best predictor term; rather, proximity to the source was. Results of behavioural response experiments are often used to inform management of the effects of noise on marine mammals. This study shows that care must be taken when choosing which exposure metric (proximity to the source, RL or SNR) to use when predicting dose-response relationships as results could be highly dependent on the range of data chosen as well as the response variable.
The relationship between SNRs and masked auditory detection thresholds of signals against noise is complex. It seems likely that most experiments in this study would have been audible but, given the variability of ocean noise, it is possible that some of the experiments contributing to the full dataset may not have been audible and some may have been only intermittently so. However, the subset of data should have excluded most samples where the experiment was inaudible and so long as a whale can hear a sound, there is the potential for a behavioural response. Higher SNRs might be more likely to attract a listener's attention and it is possible that SNR is used to judge signal level and thus proximity of the source. Hence, it might be expected that SNR would be an important exposure metric to dictate the response. However, the dose-response relationship may be lost when using only high SNR experiments. Therefore, including experiments with low RLs may help to determine the threshold of response and provide some clue as to the auditory sensitivity of these animals. Whether responses to low level signals have longer term significance is, of course, a different question.
This study is one of the more comprehensive BRS that have been carried out on a large whale species. Sources of pseudoreplication were considered (a limitation of the study being that only one recording of social sounds was used). We used two different stimuli and applied a statistical analysis that accounts for individual variation as well as including environmental and social factors in the analysis. We did, however, have problems with sample size. A power analysis (Dunlop et al., 2012) found that the sample size, using only focal follow data, was insufficient to confidently detect a significant change in behaviour. However, combining focal data with ad lib data improved the power to 0.9. Testing the effect of social context remained problematic because of the large number of social contexts; therefore, future studies should focus on achieving a more robust sample size per social group using the focal follow methodology, or by focusing on a small number of social group types. These experiments show that sound exposure generates a measurable behavioural response, but different exposure metrics should be considered, and this will be useful in future experiments aiming to test the hearing range of humpback whales as well as testing the function of many different types of social sounds.
We thank everyone involved in the Humpback Acoustic Research Collaboration (HARC), in particular John Noad, Nicoletta Biassoni, Ceri Morris, Melinda Rekdahl and the numerous volunteers who donated their time and energy to this project. We also thank David Paton for his invaluable field expertise and Simon Blomberg for his advice on statistical analysis. We would also like to thank Michael Bryden for taking the time to review earlier drafts of this manuscript as well as the anonymous reviewers for providing valuable input.
This work was supported by The US Office of Naval Research, the Australian Defence Science and Technology Organisation, the Australian Marine Mammal Centre and the Joint Industry Program E&P Sound and Marine Life. Deposited in PMC for immediate release.