Visually dominant animals use gaze adjustments to organize perceptual inputs for cognitive processing. Thereby they manage the massive sensory load from complex and noisy scenes. Echolocation, as an active sensory system, may provide more opportunities to control such information flow by adjusting the properties of the sound source. However, most studies of toothed whale echolocation have involved stationed animals in static auditory scenes for which dynamic information control is unnecessary. To mimic conditions in the wild, we designed an experiment with captive, free-swimming harbor porpoises tasked with discriminating between two hydrophone-equipped targets and closing in on the selected target; this allowed us to gain insight into how porpoises adjust their acoustic gaze in a multi-target dynamic scene. By means of synchronized cameras, an acoustic tag and on-target hydrophone recordings we demonstrate that porpoises employ both beam direction control and range-dependent changes in output levels and pulse intervals to accommodate their changing spatial relationship with objects of immediate interest. We further show that, when switching attention to another target, porpoises can set their depth of gaze accurately for the new target location. In combination, these observations imply that porpoises exert precise vocal-motor control that is tied to spatial perception akin to visual accommodation. Finally, we demonstrate that at short target ranges porpoises narrow their depth of gaze dramatically by adjusting their output so as to focus on a single target. This suggests that echolocating porpoises switch from a deliberative mode of sensorimotor operation to a reactive mode when they are close to a target.
In order to succeed in vital functions such as foraging, predator avoidance and navigation, animals must acquire and correctly interpret information from their environment. Information is extracted by sensory systems that analyze patterns of energy generated by, or reflected from, objects in the surroundings (Nelson and MacIver, 2006). But, while most sensory systems are passive in that they utilize extrinsic energy such as sunlight, biosonar or echolocation is an active system that analyzes an auditory scene generated every time a sound pulse is emitted. Echolocation for prey has evolved independently in two mammalian orders, microbats (Griffin, 1958) and toothed whales (Kellogg et al., 1953), and is the primary sensory modality employed by these animals for navigation and prey finding (Griffin, 1958; Madsen et al., 2005). By interpreting the timing and frequency characteristics of the returning echo energy, echolocating animals obtain information about the position, size and structure of ensonified objects, providing a basis for obstacle avoidance, selective foraging and prey tracking (Griffin, 1958; Au, 1993).
Echolocation involves a feedback between action and perception: the animal processes echo information to guide vocal-motor adjustments that further probe the environment. The tight link between action and perception is not unique to active sensory systems. The sensory environment of most animals is dynamic and characterized by a redundancy of stimuli from which the relevant ones must be extracted. To achieve this, animals control the information flow from sense organs, a well-studied example being the control of eye movements for gaze adjustment in vision. Visually dominant animals tend to sample a scene sequentially by moving their eyes (Yarbus, 1961; Liversedge and Findlay, 2000) or head (Eckmeier et al., 2008) so as to fixate targets that lie in different directions and depths onto the visual fovea (Land and Hayhoe, 2001). Sequential sampling is task dependent (Yarbus, 1961; Hayhoe and Ballard, 2005), and there is a tight coupling between eye movements and attention (Kustov and Robinson, 1996). This exemplifies that, by adjusting the direction and depth of their gaze, animals can select visual inputs for cognitive processing, and so manage the sensory load from complex and noisy scenes.
Analogous to the limited angular extent of visual gaze, the relatively narrow sound beams employed by echolocating animals (Au et al., 1999; Jakobsen and Surlykke, 2010) restrict the spatial sensitivity of this sense, and echolocators must direct their beam sequentially towards different targets to explore their environs. Big brown bats (Eptesicus fuscus) as well as bottlenose dolphins (Tursiops truncatus) point the axis of their sonar beams to selectively ensonify closely spaced targets (Amundin et al., 2008; Surlykke et al., 2009). Echolocators can also control the distance over which they detect echoes by regulating the output level of signals, and both bats (Hartley, 1992) and toothed whales (Au and Benoit-Bird, 2003) adjust the intensity of echolocation calls as they approach targets. Echolocating bats as well as odontocetes also seem to avoid range ambiguities resulting from pulse-echo overlap by adjusting the duration of their sonar signals (Madsen et al., 2002; Schnitzler et al., 2003; Johnson et al., 2006; Surlykke et al., 2009) and the intervals between them (Morozov et al., 1972; Madsen et al., 2005; Surlykke et al., 2009) to accommodate echoes from objects in the ensonified scene. Therefore, the acoustic gaze of a sonar system, i.e. the spatial extent of the collected auditory information, or the sensory volume (Snyder et al., 2007), can be defined by the sonar beam pattern in horizontal and vertical planes and by the sampling rate and output energy in the range dimension. Following the relationship between foveation and spatial attention in animals using vision as their primary sensory modality, acoustic gaze has been put forward as an indicator of a bat’s attention to objects in space (Surlykke et al., 2009; Moss and Surlykke, 2010). The acoustic gaze of an echolocating animal therefore shows many parallels with visual gaze. In fact, being an active sense, there may be more opportunities to control information flow in echolocation than are available in vision by adjusting the properties of the sound source: its intensity, timing, spectral characteristics and directionality (Moss and Surlykke, 2010). Moss and Surlykke (Moss and Surlykke, 2001) proposed that the auditory system of an echolocating bat organizes echoes arriving from multiple objects at different distances into echo streams along the delay axis to create an acoustic image. This process of auditory scene analysis utilizes a set of cues and organizational rules to group and segregate echo sources (Bregman, 1990; Barber et al., 2003) while manipulating output parameters to focus on selected targets (Moss and Surlykke, 2001; Moss and Surlykke, 2010). Thus, echolocating animals might employ a variety of behavioral strategies to limit clutter and focus on targets of interest, making echolocation an ideal sensory system in which to study the way animals perceive and react to their environment.
Biosonar performance and gaze adjustments by bats and their implications for auditory scene analysis have received substantial attention in both laboratory and field studies (Moss and Surlykke, 2001; Surlykke et al., 2009; Moss and Surlykke, 2010), while significantly less is known about the way echolocating toothed whales use this sense (e.g. Madsen et al., 2005; Johnson et al., 2008). The sonar systems of bats and dolphins operate under different constraints, dictated by the different media in which they function, and these affect both the kinds of sounds produced and the way they are used. Compared with bats, toothed whales are able to detect objects at significantly longer ranges, which, on the one hand, allows them to engage in detailed discrimination and tracking of relatively distant prey but, on the other hand, provides them with a richer auditory scene to evaluate and process. Hence, toothed whales seem to operate in what Snyder et al. (Snyder et al., 2007) call a deliberative mode in which the ratio between the sensory volume and the motor volume (i.e. the locations in space that the animal can reach within a time interval) is large. Bats, in contrast, appear to have sensory volumes that overlap their motor volumes and thus use a reactive strategy for chance encounters with nearby prey (Snyder et al., 2007). Adjustments to the emitted power, calling rate and beam extent will result in changes in the size of the sensory volume, i.e. opening or narrowing of the acoustic gaze, and may facilitate switching between these behavioral strategies.
Echolocating bats typically adjust the characteristics of their signals when hunting in different environments (Moss and Surlykke, 2001; Schnitzler et al., 2003) and when approaching targets (Schnitzler, 1973; Moss and Surlykke, 2010), indicating the importance of dynamic strategies matched to the different demands of target detection and selection compared with prey tracking and interception. Toothed whales have been shown to adjust at least some of the parameters expected to be involved in acoustic gaze control (Penner, 1988; Au and Benoit-Bird, 2003; Madsen et al., 2005; Johnson et al., 2008; Verfuss et al., 2009). Both captive and field studies have demonstrated that the frequency, level and rates of odontocete echolocation clicks may vary in time and space (Au, 1993; Akamatsu et al., 2007), but have offered little explanation for such variation. Although a general consensus holds that toothed whales reduce inter-click intervals and source levels when approaching targets (Morozov et al., 1972; Au and Benoit-Bird, 2003; Verfuss et al., 2009), there is considerable scatter around these trends, in particular for the rate–range relationships (Morozov et al., 1972; Kadane and Penner, 1983; Verfuss et al., 2009). This variation has either been neglected or hypothesized to be a mechanism for reducing range-ambiguous interference (Kadane and Penner, 1983). However, what if it rather reflects gaze adjustments to targets that are experimentally unaccounted for? Further, such gaze adjustments may relate to the use of different sensorimotor strategies at different stages of target interception. A recent study on a stationed porpoise tasked with detection of a single target has revealed fast adjustment of source level to target range (Linnenschmidt et al., 2012a). This observation, together with the considerations outlined above and the gaze control demonstrated for echolocating bats, led us to hypothesize that echolocating toothed whales also control their gaze to exploit and manage their echoic scene. Here we therefore wish to examine: (1) how acoustic gaze is manipulated while first selecting between targets and then approaching a single target in a cluttered multi-target environment; and (2) how gaze is controlled by a combination of beam directional control and range-dependent adaptive changes in output levels and pulse intervals.
A significant limitation of toothed whale echolocation studies to date is that most have involved stationed animals and static auditory scenes, often with just a single point target (Au, 1993). Biosonar performances under these conditions are unlikely to reflect those of animals facing complex naturally occurring soundscapes with constantly changing spatial relationships to a multitude of targets, of which only one or a few are prey. In contrast, studies with free-ranging animals suffer from insufficient means to control the environment or quantify behavior (DeRuiter et al., 2009; Jensen et al., 2009). To overcome these limitations, we designed an experiment with free-swimming harbor porpoises (Phocoena phocoena L.) tasked with actively choosing between two hydrophone-equipped targets and closing in on the selected target in a sea-pen. Echoes from the targets and reflective surfaces in the pen create a dynamic acoustic scene approximating the conditions under which target discrimination and tracking are performed in the wild. But the controlled setting allows the movements and echolocation signals of the study animal to be recorded with high resolution. We show that porpoises exercise active and acute control over their sonar signals to sequentially focus their gaze on different targets while exploring complex echoic scenes.
MATERIALS AND METHODS
The study was carried out in an 8×12 m semi-natural outdoor enclosure at Fjord&Belt, Kerteminde, Denmark. We recorded echolocation clicks from three harbor porpoises as they swam across the pool and closed in on targets while performing a two-alternative forced choice task (Schusterman, 1980). The task consisted of the animal discriminating between two simultaneously presented targets made of different materials and indicating its choice by swimming towards and touching the selected target with the tip of its rostrum. The targets were 50.8 mm diameter solid spheres made of aluminum, Plexiglas, PVC, brass and stainless steel. The aluminum sphere was selected to be the standard target that the animal had to choose. Spheres were suspended by 0.5 mm diameter nylon line from a 1-m-wide metal frame (Fig. 1A). At the start of a trial the targets were lowered into the water to a depth of 1 m. At that point the stationed, blindfolded (i.e. wearing opaque silicone eyecups) animal was cued to perform the selection task. If the animal made a correct response, it was bridged with a whistle and given a fish reward after returning to its station. If it made an incorrect response, the targets were immediately lifted and the porpoise was recalled to its station without a reward. No time limit was set for completing the task. Each session was started with two warm-up trials and usually terminated with two cool-down trials with the PVC sphere as the comparison target (Schusterman, 1980). These trials were not excluded from the analysis. A Gellermann (Gellermann, 1933) pseudo-random schedule was used to alternate the order of the materials and positions of the standard and comparison targets.
The animals are maintained by Fjord&Bælt, Kerteminde, Denmark, under permit nos SN 343/FY-0014 and 1996-3446-0021 from the Danish Forest and Nature Agency, Danish Ministry of Environment.
Sonar clicks received at the targets were recorded with custom-built 10×20 mm prolate spheroid hydrophones (flat frequency response ±2 dB between 100 and 160 kHz) mounted 2.5 cm above each sphere. The 3-mm-diameter hydrophone cables were tightly led along the nylon lines to the recording setup. A different hydrophone was assigned to each target every day. Each hydrophone was connected to a 40 dB amplifier with a bandpass filter (1–200 kHz comprising a first-order high-pass, and a fourth-order Butterworth low-pass filter). Signals were digitized using a multifunction acquisition board (National Instruments USB-6251, Austin, TX, USA, sampling at 500 kHz per channel, 16 bit resolution). Before and after the experiment, the hydrophones were calibrated against a Reson TC4014 hydrophone (Reson, Slangerup, Denmark) in a test tank using simulated porpoise clicks.
The hydrophone recordings were synchronized with video images from one overhead and two underwater cameras (Fig. 1). The underwater cameras were mounted on a metal frame just above the two targets, covering a range of ∼1 m at a depth of 1 m. The cameras recorded continuously at a rate of 25 frames s–1. At the start of each trial, a short sweep signal was generated by the sound-recording software. This was relayed to one of the hydrophone channels and to the audio inputs of the digital video cards for synchronization (Fig. 1B). Given the frame rate of the cameras, the synchronization was accurate to within 40 ms. Each animal also participated in one session (20 trials) equipped with a DTAG-3 multi-sensor digital recording tag (Johnson and Tyack, 2003; Johnson et al., 2009). The tag sampled sound from two hydrophone channels with 16 bit resolution at 500 kHz per channel and recorded data simultaneously from a pressure sensor, triaxial gyroscopes and triaxial accelerometers (500 Hz, 16 bit resolution). To synchronize these recordings, an additional high-frequency (210–160 kHz) sweep was generated and transmitted into the water at the start of each trial. This synchronization was later verified by comparing the clicking rate time patterns recorded on the tag and the target hydrophones. The first part of data collection was completed in June 2009 for two of the animals (Freja and Eigil) and in December 2009 for the third one (Sif). Recordings with the DTAG-3 were made in May 2010.
Localization procedure and calibration of the range estimation
All analyses were carried out in MATLAB 7.5 (The Mathworks, Natick, MA, USA) and R v. 2.15.1 statistical programming language (R Foundation for Statistical Computing, Vienna, Austria). The hydrophone, video and tag recordings were time-aligned and the video frames corresponding to the duration of each hydrophone recording were exported as pictures. The pictures were examined to establish an analysis time window starting when the animal turned away from its station by more than 90 deg to begin approaching the targets and ending when it turned away from the target after completing the approach. The positions of the animal during the analysis window were extracted using a supervised image tracking program, and the animal’s swim path relative to the targets was reconstructed (Fig. 2). A subset of trials was tracked at full frame rate, i.e. 25 frames s–1, using all three cameras, to determine the minimum video tracking rate that allowed a faithful reconstruction of the animal’s movements. This subset consisted of trials with a tagged animal, as these were further used to calibrate target range estimates. We computed spectra of the angular changes in head direction of the animal in the overhead camera frames (supplementary material Fig. S1), which confirmed that the movements were captured well at a video frame rate of 5 Hz.
The range from the porpoise to each target was derived from the reconstructed tracks. Range estimates from overhead and underwater video tracks were pooled by fitting them with a 12th-order polynomial using a least-squares method. This order was chosen as the minimum order giving an R2>0.998 relatively uniformly across the trials. The polynomial fit was then interpolated at the time cues of the recorded echolocation clicks. The magnitude of ranging errors made by this method was assessed using echo ranges derived from the tag sound recordings as described below. A 12th-order polynomial fit to the echo-based range estimates was interpolated at the time cues used in the interpolation of the video-derived ranges. The video- and echo-based ranges were compared to compute transmission loss errors corresponding to each range estimate. This method was repeated for tracks constructed at 25 and 5 Hz (supplementary material Fig. S2). Error analysis showed that processing of every fifth frame of the overhead video and every single frame of the underwater camera recordings resulted in smaller transmission loss errors (<6 dB compared with >6 dB for the estimates using every single frame for both overhead and underwater videos), and we applied this method of measuring the positions of the moving animals to the remainder of the data set.
The error reduction resulting from taking into account the refraction of light at the air–water interface improved the transmission loss error by what corresponds to less than 1 dB (received level), and so the refraction error was disregarded. We estimated the effect of location error due to flat-image distortion by calibrating pixel sizes at different ranges from the center of the camera projection. The distortion did not exceed 10% across the ranges covered in the study. For tracking we used the pixel size corresponding to the location of the targets.
Each click from the study animal within the analysis window was identified in the sound recordings using a supervised click detector. Waveshape and spectral cues were used to eliminate occasional mis-detections of echoes or signals from other animals in the neighboring pool. Only clicks with an r.m.s. signal-to-noise ratio ≥6 dB were included in the analysis. The inter-click interval (ICI) was defined as the time between each click and the preceding one. Received levels (RLs) were quantified in terms of energy flux density (dB re. 1 μPa2s) (Au, 1993; Madsen and Wahlberg, 2007) over a time window of 132 μs, which covered the duration of a click.
Echograms were formed from the sounds recorded on animals during trials and filtered around porpoise sonar frequencies (100–200 kHz). The widths of the time bins along the x-axis of the echograms were adjusted to correspond to the ICIs used by the porpoises, thereby obtaining a time axis compatible with that of the hydrophone and camera recordings and a time resolution that matched the acoustic sampling rate employed by the animals (see Johnson et al., 2004; Johnson et al., 2009). Echo ranges corresponding to the two targets were measured in the echograms and used to verify the video tracking method.
Range-dependent output adjustments
Click levels recorded by the target hydrophones were compared with animal ranges from the video tracks to investigate range-dependent signal adjustments. Clicks produced at ranges shorter than 15 cm were excluded because of the higher relative errors in range estimation (supplementary material Fig. S2) and the risk of recording clicks in the acoustic near field of the animal.
Source level estimation
Assuming spherical spreading, the transmission loss (TL) is given by 20log10(R) + αR, where R is target range and α is the frequency-dependent absorption that can be ignored at the small ranges considered in this study (DeRuiter et al., 2010). Given the approximate 13–16.5 deg beamwidth of porpoises (Au et al., 1999; Koblitz et al., 2012), the ASL will underestimate the actual source level unless the recording aspect is close to the acoustic axis (Madsen and Wahlberg, 2007). To find the clicks that were likely to have been recorded close to the acoustic axis, we first identified click sequences, here called scans, that were presumed to result from the acoustic beam of the animal passing across the target. Because the porpoises were required to ensonify two targets, we assumed a scan over one of them to end when the animal directed its gaze towards the other one, as indicated by greater RLs at the other target. Scans were accordingly defined as sequences of clicks that were received with a higher level at one target than at the other. Provided that the animal maintains a constant source level and directionality throughout a scan, the highest amplitude clicks in each scan will be the closest to on-axis. Hence, the clicks with maximum RL during each scan were classified as ‘on-axis’ signals (see Madsen et al., 2004). We verified this click selection procedure in trials with tags by comparing the echo level received on the tag for clicks classified as on- and off-axis. Echo levels were highest for the clicks designated as on-axis, supporting the selection method. The ASLs of the on-axis clicks for the standard and comparison targets were regressed against range to each target to parameterize how the animals adjusted ASL with target range. Standard errors of the regression coefficients were estimated by jack-knifing over clicks from 95 trials. The jack-knife was performed by excluding sub-sets of clicks corresponding to entire trials.
Acoustic sampling rates
ICIs can be measured accurately in the target hydrophone recordings even for off-axis clicks. We therefore relaxed our criteria to increase the number of signals available for ICI analysis. To quantify ICI adjustments to the two targets over consecutive scans, we considered all the clicks with RLs within 6 dB of the peak level in scans [i.e. within ca. 8–10 deg off-axis (Au et al., 1999; Koblitz et al., 2012)] as being directed towards the target and hence potentially adjusted to it. These clicks were designated as ‘on-target’ clicks. The ICIs of these clicks were regressed against the range to each target.
Relationship between output levels and sampling rates
To investigate the relationship between source levels and ICIs we considered the received level at the tag, here coined apparent output level (AOL) (see Madsen et al., 2005), as being representative of the relative acoustic output of the echolocating animal. The tag was placed behind the sound generator and therefore well out of the acoustic beam of the animal. Also, the RLs at the tag can change when the animal moves its head and/or beam independently of its body axis. Hence, AOL is not equivalent to the source level but likely provides some measure of relative changes in the source level (Madsen et al., 2002). We clustered the clicks into three groups: buzz, on-target regular clicks and off-target regular clicks. Following Teloni et al. (Teloni et al., 2008) we used a marked dip in the distributions of ICIs of all the porpoises as the border value between buzz and regular clicks (supplementary material Fig. S3). Clicks with ICIs shorter than 13 ms were accordingly classified as being part of a buzz. Regular clicks complying with the on-target criterion in the target hydrophone recordings were classified as on-target. The remaining clicks were considered off-target regular clicks.
We used a simulated porpoise signal to measure echo returns from the five targets. Signals were generated with a National Instruments USB-6251 analog interface module and broadcast through a calibrated directional Reson TC 2130 transducer (resonant at 104 kHz, usable transmitting band 100–200 kHz). Echoes were received through the same TC 2130 transducer, now working as a hydrophone, connected via a diode transmit–receive switch to the input of the same National Instruments module sampling at 500 kHz. Measurements were made with the target at 1 m distance. The system was calibrated with a steel ball of known target strength (–38 dB). Echo waveforms, spectra and Wigner–Ville time–frequency distributions obtained by ensonifying the targets are shown in Fig. 3.
A total of 535 trials were performed with the three study porpoises (Table 1), giving a minimum of 33 trials per material per animal. Trials for each animal were run in sessions with a maximum of 26 trials per session and up to three sessions per day. Only a subset of these trials met the requirements for range adjustment analysis, namely that animals stayed in the field of view of the camera throughout the approach, and that good quality video (i.e. with limited glare and surface ripples) and audio (i.e. with low ambient noise and relatively few clicks from other animals) were obtained. A minimum of two trials per material per animal was analyzed for range adjustment of output level and rate, amounting to a total of 95 trials.
Task performance and behavior
After being sent by the trainer, the study animals generally swam directly towards the targets. Tag recordings confirmed that the animals were making clicks continuously while approaching the targets. Buzzes were produced in all target approaches and these were typically initiated 0.5–0.8 m from the target and continued until the animal turned away from the selected target (Fig. 4).
The animals’ performance in the target discrimination task is presented in Fig. 3. The porpoises made few mistakes when discriminating the plastic or brass spheres from the aluminum sphere. They had more difficulty distinguishing the steel and aluminum spheres, for which the echoes were quite similar spectrally (Fig. 3, right panel) because of the smaller differences in impedance between these materials (Au, 1993). Although the animals were not given a time limit for the discrimination task, they would typically only take 10–15 s to complete it (Fig. 5). To assess whether the difficulty of the task influenced the way that the animals performed it, we compared trial duration (defined as time from start of the trial to target interception), number of scans per trial and number of on-target clicks per scan for two proxies of task difficulty: comparison target material and time since first training (Fig. 5). To evaluate the effects of the comparison materials, we pooled the data for each porpoise from both years, corrected for the mean of individual years. Because the data met a homoscedasticity but not a normality assumption (Levene’s and Jarque-Bera tests, P<0.05), we tested whether differences between the materials existed using a Kruskal–Wallis test. There was no significant difference in the duration, scan count or clicks-per-scan across the different materials (Kruskal–Wallis on residuals of pooled data from both years, P<0.05). However, when two of the animals were asked to perform the same task after a long break, they required significantly longer to complete it (Mann–Whitney U-test, P<0.05; Fig. 5A,B). Some of this additional time was invested in more scans (Fig. 5D,E), but only significantly so for trials with the PVC target (Mann–Whitney U-test, P<0.05). The animals also tended to swim slower after the break (median speed before versus after break: 0.89 versus 0.46 m s–1 and 0.93 versus 0.68 m s–1 for Eigil and Freja, respectively, but 0.72 m s–1 for both sampling periods for Sif; supplementary material Fig. S4). None of the animals changed the number of clicks per scan when performing the task with different materials or after a break.
The pen in which the study took place provides multiple reflective surfaces, including the bottom, side structures and the sea surface. Thus animals were required to compare echoes and select between two targets within a complex and dynamic auditory scene. The complexity of this scene is exemplified by representative echograms formed from the sounds recorded on two animals during trials (Fig. 6Ai,Bi). To explore how porpoises select and close in on targets in such a complex scene, we examined: (1) whether and how often porpoises switched gaze between targets, and (2) whether porpoises adjusted gaze direction and range to track the selected target as they approached it.
The animals passed through two geometric zones as they approached the targets. Initially, the animals were far enough away that they could ensonify both targets simultaneously, giving rise to high RLs on both targets at the beginning of trials (e.g. Fig. 6Biii, supplementary material Fig. S6). At shorter ranges, the animals could only ensonify one target at a time with on-axis clicks (Fig. 6Aiii,Biii, supplementary material Fig. S6), but rather than ensonifying one target until selected or eliminated, the study animals scanned the targets sequentially, ensonifying each target in turn for five to 10 clicks. An average of eight scans (i.e. four scans per target) was made during one trial (Fig. 5). After regular clicking, the porpoises switched to rapid buzz clicking as they completed the approach (Figs 4, 6). In this phase, the animals continued to scan their beam over the selected sphere in both horizontal and vertical planes as indicated by the movements of the animals and the RLs at the selected target (supplementary material Movie 1). Both on-animal gyroscope recordings (Fig. 6Aiiii,Biiii) and the video tracks (Fig. 2) showed that at least part of this beam steering was due to whole-body movements, but the animals also moved their heads independently of the axis of their bodies, as seen in characteristic head nodding during buzzes (supplementary material Movie 1) accompanied by fluctuating RLs at the target (Fig. 6Aiii,Biii, supplementary material Movie 1, lower panel).
All study animals adjusted both the levels and rates at which they produced clicks. Combining ICI and on-/off-target status of the clicks recorded on tags, we identified three modes of clicking (Fig. 6, supplementary material Fig. S3A). The ICI distribution of on-target clicks in the target hydrophone recordings (supplementary material Fig. S3B) showed two of these modes: very short ICI buzz clicks (ICI<13 ms) and regular approach clicks (ICI between 13 and 70 ms). The third clicking mode was associated with off-target clicks that had an ICI between ∼70 and 250 ms (Fig. 7, supplementary material Fig. S3A). These clicking modes were also associated with different AOLs as measured by tags on the animals. Buzzes were characterized by low output levels at high clicking rates whereas longer ICIs and higher levels were used when animals scanned targets during the approach or inspected different objects at similar ranges (e.g. Fig. 6Bii,iii, ca. 5–6 s to target contact) as verified using video recordings. The highest levels and longest ICIs were used when animals turned their beam away from targets in between target fixations (Fig. 6Aii,iii,Bii,iii, Fig. 7).
A significant positive relationship between ICI and AOL was found for regular clicks [slope of the least-squares regression between log10(ICI) and AOL was 21.7 dB/log10(ICI), ±0.5 dB 95% confidence bounds, adjusted R2=0.55, F=6370, P<0.0001; Fig. 7]. A similar positive relationship was found between on-axis ASL and ICI, but with a significantly different slope [33.3±4.7 dB/log10(ICI) for ASL versus 19.1±3.5 dB/log10(ICI) for AOL for the same subset of clicks, mean with 95% confidence bounds, adjusted R2=0.45 and 0.32, respectively, P<0.0001 for 10,000 permutations of means of residuals from joint regression; supplementary material Fig. S5], showing that slower clicks were received at proportionally higher levels at the targets than at the tag compared with faster clicks.
For buzz clicks, the relationship between AOL and ICI appears to be reversed, with the shortest ICI clicks having the highest AOLs [slope of the least-squares regression: –24.7 dB/log10(ICI), ±1 dB 95% confidence bounds, adjusted R2=0.14, F=2438, P<0.0001; Fig. 7], although there are some confounding factors. In particular, animals frequently made head movements to scan the selected target during buzzes, changing the location of the tag with respect to the sound axis and therefore the received level at the tag. A significant positive relationship was observed between ASL and ICI in the on-axis subset of signals recorded at the target hydrophone [for ASL, slope of 29.2±12.6 dB/log10(ICI), mean with 95% confidence bounds, adjusted R2=0.18, F=21, P<0.0001, compared with –18.5±8.8 dB/log10(ICI) for AOL of the same clicks recorded on the tag, adjusted R2=0.15, F=17, P<0.0001; supplementary material Fig. S5).
Level adjustments to target range
The peak RL at each target during scans remained relatively constant as the animals approached (Fig. 6Aiii,Biii), while the echoes received at the tag increased in intensity (Fig. 6Ai,Bi), suggesting that the animals adjusted their output level to compensate for the smaller transmission loss at shorter target range but only in the out-going direction [i.e. a 20 dB/log10(range) compensation scheme]. This compensation is evident in the back-calculated source levels (ASLs) as a function of target range (Fig. 8), but these must be interpreted with care. In Fig. 8A, all recorded clicks are included regardless of whether they were judged to be directed at a target. The solid red line shows the clipping level of the recording system, demonstrating that the level-range characteristic of the strongest clicks is not an artifact of clipping in the system. However, the similar characteristic followed by the lowest-level clicks in the figure is almost certainly a consequence of the signal-to-noise ratio limit for accepting clicks in the analysis. Also, clicks directed at one target exceeded the detection threshold at the second target in some cases, leading to a ghost curve at lower levels that followed the ASL–range relationship of the high-level clicks (Fig. 8A). So while in general the ASL was significantly lower when the porpoises were closer to the targets, the high variance in ASLs precludes any general conclusion. This trend is, however, clearer for on-axis signals (Fig. 8). Pooling data from all animals, the least-squares regression between log10(range) and on-axis ASL has a slope of 23.3 and 20.1 dB/log10(range) for the standard and comparison targets, respectively (standard, adjusted R2=0.83, F=5770, P<0.0001; comparison, adjusted R2=0.77, F=2695, P<0.0001). Regressing by individual (Table 2), the coefficients are close to the 20 dB/log10(range) for one-way level compensation, but the relatively large scatter around the general trend may reflect different target-dependent tactics employed by the individual animals. Supporting this, the standard error of the regression coefficients was greater for the data pooled from the four different comparison materials than for the standard-target data (0.9 versus 0.5).
Sampling rate adjustment to target range
Fig. 4 shows that, for clicks directed towards a specific target, the ICI was always greater than the two-way travel time (TWTT) to the target, irrespective of whether the animals were producing regular or buzz clicks. We examined the relationship between ICI and TWTT separately for regular and buzz clicks, removing the more variable clicks in the transition zone between these phases. Specifically, the ICI–TWTT relationships were examined for (1) regular clicks made at ranges >1 m, and (2) buzz clicks at the bottom part of the ICI–range relationship (selected by hand) and ranges >0.15 m. Robust linear regression (rlm in R MASS package) lines were fitted for each phase for each trial separately. The slope of the TWTT versus range regression line, and therefore the adjustment needed for ICI to track TWTT in both phases, is 1.3 ms m–1. All three animals over-adjusted ICI during regular clicking (median regression coefficients 2.8, 3.4 and 3 ms m–1 for Sif, Eigil and Freja, P<0.001 for differences between individual regression coefficients and the TWTT–time line, sign test; Fig. 4B). During buzzes, the ICI–range slopes were not significantly different from the TWTT (median regression coefficients 1.2, 1.3 and 2.6 ms m–1 for Sif, Eigil and Freja, P>0.1, sign test; Fig. 4B).
Switching targets during a buzz
In addition to low output levels and short ICIs, the buzz phase was characterized by ensonification of a single target as the porpoise homed in on it (Fig. 6). In a few trials, animals appeared to change their decision or chose to investigate the other target one last time after the initiation of a buzz. In these cases, ICIs and levels always increased just prior to ensonification of the other target (e.g. Fig. 6B, Fig. 9A), with animals reverting to regular clicking as judged against the 13 ms ICI threshold for buzz clicks. We compared the ICIs used when animals abandoned a buzz with those used at the same target ranges during normal approaches (Fig. 9). ICIs of clicks generated before the onset of a buzz and >1 m from the target were regressed against target range and 95% prediction intervals and residuals were calculated (Fig. 9B). Using the same regression model, residuals of ICIs of the clicks classified as on-axis on the new target after a gaze switch in a buzz were then computed (Fig. 9C) and the two sets of residuals were compared. Twelve examples of gaze change within a buzz were found but only two of these occurred at more than 1 m from the new target and so could be compared against clicks prior to buzzes. Both of these examples were significantly different from the clicks prior to buzzes (Wilcoxon rank-sum test, P<0.0001), but the effect size was small and the ICI–range relationship for gaze changes within buzzes was within the prediction intervals of the ICI–range relationship outside of buzzes (Fig. 9C).
In this study, we tracked and recorded echolocating harbor porpoises discriminating between, and homing in on, two fixed targets. The targets were arranged so that, through much of the approach, the animals could not ensonify both targets at the same time with their 13–16.5 deg wide sonar beam (Au et al., 1999; Koblitz et al., 2012), forcing them to explore the acoustic scene sequentially to solve the discrimination task. Reflecting walls around the pen created a complex scene (Fig. 6Ai,Bi), approximating what wild harbor porpoises might encounter in their natural surroundings as porpoises are common in shallow waters where they bottom grub or feed on shallow-water fish species (De Pierrepont et al., 2005). Synchronized sound recordings made with tags on the animals and by hydrophones mounted on the targets allowed us to study how animals adjusted their acoustic output to targets at different ranges. In the following we discuss how harbor porpoises control acoustic gaze direction and range while solving a discrimination and interception task in a complex acoustic scene and explore the information integration strategies implied by our observations.
Task performance and behavior
Free-ranging echolocating toothed whales are highly selective foragers, attempting to capture only a small fraction of the organisms they ensonify (Madsen et al., 2005; Jones et al., 2008). Toothed whales may use both temporal and spectral cues to distinguish echoes from preferred prey (Au, 1993; Au et al., 2009). Discrimination performance in the present study was generally high, but dropped when the porpoises were presented with targets giving similar echo spectra (Fig. 3), suggesting that spectral cues in particular were used to distinguish materials.
The way in which an animal invests time inspecting targets in a discrimination task says something about both the complexity of the task and the strategy adopted. In the present study, animals scanned their gaze from one target to the other (Figs 2, 4, 6), allocating only a few clicks (average of eight clicks per scan) to each target (Fig. 5). Relatively few scans per target (average of four) were made per trial, with animals effectively solving the discrimination task with an average of 32 echoes per target. After entering the buzz phase the animals focused on the selected target, only rarely (in 12 out of 94 cases) changing their decision and switching to the other target. The low and relatively stable counts of clicks per scan and scans per trial suggest that few echoes were needed to acquire and compare salient acoustic features of the targets. This could be taken as indicating that the task was simple, involving easily discriminated targets, but the up to 40% error rate on one target combination contradicts this notion. Rather the porpoises likely employed short multiple scans in order to compare echoes from the two targets when they were at similar ranges and fairly constant aspects. This would enable, for example, comparison of the TS of the two targets.
Both dolphins (Au et al., 1982) and harbor porpoises (Teilmann et al., 2002; Kastelein et al., 2008) have been shown to increase the number of clicks, and thus sampling effort, in detection and discrimination experiments when the task becomes more difficult. We did not observe a correlation between sampling effort and target material (Fig. 5) but our sample size may have been too low to detect this. Nevertheless, after a long break two of the animals (Freja and Eigil) required more time to complete the task (Fig. 5A–C). The additional time was invested in swimming slower rather than in more scans (Fig. 5D–F, supplementary material Fig. S4). A third animal, Sif, was allocated less training time and had a shorter break between the two data collection series. The increase in difficulty after the break was therefore likely not as great for Sif. Thus, although Sif required more time to complete the task than the other porpoises, her performance was more stable over time (Fig. 5C, supplementary material Fig. S4). This porpoise frequently made curving approaches to the targets, covering greater distances and likely contributing to the longer trial durations.
Scanning behavior was first reported for echolocating toothed whales by Schevill and Lawrence (Schevill and Lawrence, 1956), who observed that buzzes of a bottlenose dolphin capturing fish in murky water were coupled with horizontal head movements. Scanning was also reported by Norris and colleagues (Norris et al., 1961) for a blindfolded dolphin closing in on fish rewards. In that study, head movements became marked when the dolphin was within ∼1 m from the fish. The authors interpreted these results as indicative of the animal overcoming the limitations of a directional beam, i.e. sequential scanning allowed the dolphin to explore a larger volume while searching for prey.
Our porpoises scanned across the targets while closing in on them, as revealed both by hydrophone recordings on the targets (Figs 5, 6) and by video of head movements (Fig. 2, supplementary material Movie 1). However, the function of this scanning changed in different stages of the approach. When clicking regularly, study animals scanned sequentially between the two targets using short (five to 10 clicks) fixations (Figs 2, 5, 6). During the buzz (i.e. at short ranges) they switched to a continuous horizontal (Fig. 6 in combination with Fig. 2, supplementary material Movie 1) and vertical (supplementary material Movie 1) scanning over the selected target. Thus, scanning was used for (1) sequential sampling for target recognition and localization during approach, and (2) precise tracking of a single target during the buzz. This second function seems to be the one employed by the dolphin in the Norris et al. (Norris et al., 1961) study. We suggest that animals in that study and the present study may be exploiting, rather than compensating for (Norris et al., 1961), the directional properties of their sonar beam to facilitate fine-scale localization using the beam gradient (Yovel et al., 2010).
In addition to directional changes in gaze, the porpoises adjusted the rate and intensity of echolocation clicks throughout each trial, effectively controlling their depth of gaze.
Sampling rate adjustment during approach
ICIs were adjusted continuously, roughly in accordance with the distance to the target ensonified (Fig. 4). The porpoises always maintained their ICI longer than the two-way travel time to any strong reflector (Fig. 4, Fig. 6Ai,Bi), and so consistently avoided range ambiguity of echo returns. This behavior has been noted in other toothed whales (e.g. Johnson et al., 2008) and in bats (e.g. Hartley, 1992), and appears to reflect a constraint in the way that echoes are processed cognitively. Stationed delphinids echolocating on targets at variable ranges use ICIs that are longer than the TWTT by what appears to be a stable lag time of 15–20 ms (Penner, 1988). This delay between the TWTT and the ICI is assumed to be needed for cognitive processing of echoes (Au, 1993). However, for free-moving porpoises in the present experiment, the lag times varied widely from 10 to 144 ms and tended to decrease during the approach (Fig. 4). This shows that the common assumption of a fixed, albeit possibly task-dependent (Verfuss et al., 2005), lag time, widely used in studies of free-ranging odontocetes to infer maximum target range (Akamatsu et al., 2005; Akamatsu et al., 2007), is too simplistic a model to describe sonar performance, especially in complex acoustic environments where echo arrivals other than the assumed target may control the ICI.
The transition from target selection, in which the porpoise is scanning between targets, to continuous ensonification of a single object at the onset of the buzz is marked by a significant shortening of both the ICI and the lag time (Fig. 4). Contrary to what was found in the selection phase, the ICI in the buzz appeared to follow the TWTT to the target with a fairly constant lag of 2–4 ms (Fig. 4B). However, given the proximity of the target during buzzes, the TWTT is almost an order of magnitude less than the lag time and so the constant lag may simply be a consequence of fairly invariant ICIs in the buzz (e.g. compare Fig. 4A and 4B). Another possibility is that the ICI tracks the later-arriving surface bounce of the target echo during buzzes, which has an almost constant TWTT. This would ensure that both the target echo and the consistently strong surface bounce arrive before the following click is issued. The consistent geometry of the animal and the target in the study make it difficult to choose among these possibilities, but free-ranging Blainville’s beaked whales echolocating in deep water (i.e. free of surface bounces) have been found to adjust ICI during buzzes to track target range (Johnson et al., 2008), suggesting that this might be occurring in the present study also. In any case, the variable lag times during regular clicking and the relatively constant lag times and ICIs during buzzes suggest that a different type of echo processing may be occurring during these two phases. The ICIs are consistently longer than the TWTT (whether to the target or a surface bounce) during both phases, enabling, in principle, processing of each click–echo pair individually. Lag times of 20–50 ms during regular clicking may be compatible with such individual echo processing, but the 2–4 ms lags during buzzes are at least an order of magnitude too short for cognitive processing and motor responses (Ridgway, 2011), making this kind of processing unlikely.
Output adjustment in buzzes
Short ICIs during buzzes provide high temporal resolution but increase the possibility of echo ambiguity. The animals seemed to avoid this by lowering their source levels (Fig. 4, Fig. 6B, Fig. 8). A low source level limits the distance over which echoes will be detected, in effect reducing the range of the sonar. An important consequence of this is that the acoustic scene during buzzes is significantly simpler than during regular clicking, with echoes from non-focal targets, e.g. the water surface, lowering in level or becoming undetectable (Fig. 6). When regular clicking is resumed (Fig. 6Bi), these echo streams become distinct again. A similar result has been found with free-swimming beaked whales foraging in dense layers of organisms such as the deep scattering layer (Johnson et al., 2006), and has been interpreted as a strategy to cope with cluttered acoustic scenes. Because buzzes are initiated at relatively short ranges for which the transmission loss is small, the animal can use low source levels and still obtain sufficient echo returns from the target, while reducing the risk of range ambiguity from more distant targets. Thus, during buzzes, porpoises narrow their depth of gaze dramatically by controlling both source level and ICI to focus on a single target.
The narrower depth of gaze, unless accompanied by large changes in swim speed (which were not observed here; supplementary material Fig. S4), brings the animals closer to a reactive mode of sensorimotor operation in which the sensory and motor volumes are closely matched (Snyder et al., 2007). In comparison, during target selection and approach, porpoises most likely operate in a deliberative mode, in which the ratio of their sensory volume to motor volume is large (Snyder et al., 2007). This affords them a greater range of movement options before reaching the target at the expense of a complex auditory scene.
Switching targets during a buzz
A consequence of gaze focusing during buzzes is that, if animals change their selection, they must re-open their gaze by increasing source level and ICI to ensonify another target (Fig. 6Bii, Fig. 9). Some 12 examples of gaze re-direction during a buzz were obtained during trials. In each case, the distance to the new target at the moment of attention switch corresponded roughly to the range at which porpoises would usually produce a buzz (Fig. 4, Fig. 9C), so in principle the gaze change could have been accommodated by lengthening the ICI slightly without interrupting the buzz. However, the animals abandoned buzzes to produce a sequence of regular clicks before focusing on the new target, suggesting that porpoises may be unable to switch targets within a buzz (Fig. 9). This implies that switching of attention during a buzz is not a simple procedure and requires an intervening reacquisition of the local acoustic scene to correctly identify the new target among the clutter. The ICIs produced during gaze switches were close to the values that would be predicted from the ICI versus range relationship (Fig. 4A) obtained during regular clicking, i.e. when animals changed their focus, they immediately chose the approximately correct ICI for the distance to the new target (Fig. 9C). This indicates that porpoises knew already the approximate range to the new object of interest. Target locations remained fixed within a session providing the opportunity for animals to learn their relative positions. Thus, the close ICI adjustment to target range when porpoises shift gaze during buzzes implies a precise vocal-motor control linked to spatial perception and memory akin to visual accommodation (Aivar et al., 2005).
Between target fixations, the animals used long ICIs (70–250 ms) coupled with relatively high output levels (Figs 6, 7). These long ICIs (Fig. 7, supplementary material Fig. S3) are close to the highest values reported for wild harbor porpoises [200 ms (Akamatsu et al., 2007; Villadsgaard et al., 2007)] and roughly correspond to maximum inspection ranges of between 15 and 75 m (Fig. 4). Verfuss et al. (Verfuss et al., 2009) proposed that, in the absence of landmarks, echolocating toothed whales searching for targets sample a stable perceptual range. In complex scenes, this search range may be dictated more by range ambiguity from strong reflectors such as the sea surface or bottom rather than the expected range of targets. Under this interpretation, the high ICI values recorded here reflect gaze opening for reacquisition of the local acoustic scene, which includes numerous reflectors at ranges of 10 or more meters formed by pool access ways and support beams.
Level adjustments to target range
The ICI adjustments made by porpoises as they approached targets were accompanied by consistent output level adjustments (Figs 6, 7, supplementary material Fig. S5). The ASL followed a 20log10(range) relationship closely (Fig. 8B), effectively providing a one-way compensation for the increasing echo level with decreasing target range. This compensation matches well with previous findings for dolphins (Au and Benoit-Bird, 2003; Jensen et al., 2009) and porpoises (Beedholm and Miller, 2007; Atem et al., 2009) and results in a constant ensonification level at the target. Several potential explanations have been advanced to account for this range-dependent level adjustment. One possibility is that a constant ensonification at the target may decrease the possibility that acoustically sensitive prey detect the approaching predator (Verfuss et al., 2009), an explanation that has limited applicability to porpoises given the high center frequency of their calls in combination with low source levels overall (Wilson et al., 2011). Another possibility is that weaker clicks are produced passively by a pneumatic sound source as the ICI decreases due to a shorter time available between clicks to re-pressurize the system for the following click (Madsen et al., 2005; Beedholm and Miller, 2007). In either case, the 20log10(range) adjustment will result in an increasing echo level [i.e. following a –20log10(range) law] returning to the animal as it approaches a target. The existence of a complementary gain control on the receiving side of the biosonar system has been proposed for echolocating bats (Henson, 1965) and toothed whales (Supin et al., 2004; Nachtigall and Supin, 2008; Supin et al., 2008; Linnenschmidt et al., 2012b) to stabilize the perceived echo levels. This in turn could facilitate the detection of target-induced echo level variations, e.g. due to changes in aspect or movements of the target.
One feature of our source level measurements that is not well explained by the constant ensonification or pneumatic capacitor hypotheses is the increase in AOLs at very short ICIs near the end of buzzes (Fig. 7). The apparent increase in the level received on the acoustic tag attached just behind the sound source (Fig. 7) is accompanied by a drop in the RL at the target (supplementary material Fig. S5). This could conceivably result from a decrease in the directionality of the source: with a wider beam, more energy will reach the tag, but, for a constant power output, the on-axis levels will drop as the power is distributed over a larger solid angle (Urick, 1983). A highly directional beam allows longer detection ranges and lower clutter, but may be a handicap at short ranges, producing a small ensonified volume from which prey can readily escape. Therefore, a dynamic system capable of adjusting beamwidth seems to provide adaptive value. Beamwidth adjustments have been demonstrated in echolocating bats (Jakobsen and Surlykke, 2010) and bottlenose dolphins (Moore et al., 2008), but it is not yet clear whether dolphins can implement these changes in a systematic manner to accommodate changing target ranges. To change beamwidth, an animal must change its effective radiating aperture size relative to the wavelength of the signal (Au, 1993). This can be achieved by physical conformation (i.e. by changing the size of the sound projecting structures) (Norris, 1968; Aroyan et al., 1992) or by changing the center frequency of the signal. We have not found evidence of spectral changes in the porpoise signals, suggesting that any beamwidth adjustments are made by conformational changes in the soft structures of the sound-producing nasal complex.
Like bats, echolocating porpoises need to negotiate a complex auditory scene when selecting and approaching targets, requiring a set of behavioral strategies to make the most of their sensory capabilities. When discriminating between objects, porpoises direct their acoustic gaze at the targets sequentially, employing short target fixations. We propose that few echoes are required to build up a perceptual representation of an object, but target selection is implemented by performing many of these comparisons. In contrast, during buzzes, porpoises focus their attention entirely on a single target and repeatedly scan across it. Scanning therefore seems to have two functions, namely, target recognition and cursory localization during the approach phase and precise target tracking in the terminal phase.
Porpoises couple directional control of their sonar beams with range-dependent adaptation of output levels, click rates and possibly beam directionality. The combination of these adjustments provides fine control over information flow, matched to different biosonar objectives. During target selection, high source levels and long click intervals support target localization and classification but give rise to more complex, cognitively challenging auditory scenes. During buzzes, low source levels and rapid clicking restrict the depth of gaze to the target of interest, simplifying the task of tracking and interception. This indicates that when entering the terminal, or buzz, phase, echolocating porpoises switch from a deliberative mode of sensorimotor operation to a reactive mode. Critically, when porpoises switch their attention to another target during a buzz, they change to the click parameters used for localization and set their depth of gaze accurately for the new target location. Thus, these shallow-water echolocators exert significant dynamic control over their signals to accommodate sequential examination of multiple targets, revealing a more sophisticated echolocation behavior than assumed by the conventional model of a single target and a constant processing lag time. We therefore conclude that echolocating porpoises, like visually dominant animals, use acute gaze adjustments to manage the sensory load from complex and noisy scenes.
LIST OF ABBREVIATIONS
- Apparent output level
Received level at the tag.
- Apparent source level
The sound pressure level 1 m from the source, back-calculated from a signal recorded at a known distance but unknown angle from the acoustic axis; here based only on the received levels at the targets.
- Buzz clicks
Clicks with inter-click intervals <13 ms.
- Inter-click interval
The time between each click and the preceding one.
- Off-target clicks
Clicks with low received levels of energy flux density (below 6 dB of peak of scan) at both targets.
- Off-target regular clicks
Regular signals that do not classify as on-target clicks (distinction specific for tag recordings).
- On-axis clicks
The highest-amplitude clicks in each scan.
- On-target clicks
Signals with received levels of energy flux density within 6 dB of peaks of scans as recorded on the target hydrophones.
- On-target regular clicks
Regular signals fulfilling the on-target criterion in the target hydrophone recordings (distinction specific for tag recordings).
- Regular clicks
Click sequences presumed to result from the acoustic beam of the animal passing across the target.
We are very grateful to S. F. Hansen, J. H. Kristensen, J. D. Hansen, C. Eriksson, M. Czekała, K. Ozolina, C. I. Botelho de Oliveira and the staff at Fjord&Bælt for help with data collection. We thank N. U. Kristiansen, J. S. Jensen and the staff at the workshops at the Department of Bioscience, Aarhus University for assisting with the construction of the recording equipment and setup, and T. Hurst at Woods Hole Oceanographic Institution for providing the DTAG-3. We also thank A. Fais, J. M. Pérez, K. Clausen, M. Chudzinska, R. Mundry and F. Havmand Jensen for their help with data analyses and valuable inputs, and A. Surlykke for helpful comments on an early draft of the manuscript.
This work was funded by a PhD scholarship to D.M.W. from the Oticon Foundation, Denmark (http://www.oticonfonden.dk/) and via frame grants from the Danish National Research Foundation (http://www.dg.dk/) to P.T.M. and M.W. M.J. was supported by the Marine Alliance for Science and Technology, Scotland (MASTS; a research pooling initiative). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.