ABSTRACT
Echolocating toothed whales face the problem that high sound speeds in water mean that echoes from closely spaced targets will arrive at time delays within their reported auditory integration time of some 264 µs. Here, we test the hypothesis that echolocating harbour porpoises cannot resolve and discriminate targets within a clutter interference zone given by their integration time. To do this, we trained two harbour porpoises (Phocoena phocoena) to actively approach and choose between two spherical targets at four varying inter-target distances (13.5, 27, 56 and 108 cm) in a two-alternative forced-choice task. The free-swimming, blindfolded porpoises were tagged with a sound and movement tag (DTAG4) to record their echoic scene and acoustic outputs. The known ranges between targets and the porpoise, combined with the sound levels received on target-mounted hydrophones revealed how the porpoises controlled their acoustic gaze. When targets were close together, the discrimination task was more difficult because of smaller echo time delays and lower echo level ratios between the targets. Under these conditions, buzzes were longer and started from farther away, source levels were reduced at short ranges, and the porpoises clicked faster, scanned across the targets more, and delayed making their discrimination decision until closer to the target. We conclude that harbour porpoises can resolve and discriminate closely spaced targets, suggesting a clutter rejection zone much shorter than their auditory integration time, and that such clutter rejection is greatly aided by spatial filtering with their directional biosonar beam.
INTRODUCTION
Echolocating animals estimate range to a target via the two-way travel time (TWTT) between emission of a biosonar pulse and return of the target echo (Hartridge, 1945; Cahlander et al., 1964; Simmons, 1973), calling for acute auditory time resolution and short integration times (Moore et al., 1984). Within this framework of converting TWTT to spatial target representation along a range axis are the processes of ranging (e.g. Penner, 1988; Thomas and Turl, 1990), jitter detection (e.g. Simmons, 1979; Moss and Schnitzler, 1989; Finneran et al., 2020) and resolving the target echo of interest from a possible multitude of clutter echoes (e.g. Sümer et al., 2009; Brinkløv et al., 2010; Warnecke et al., 2014). Owing to high sound speeds in air and even higher sound speeds in water, echolocating animals must resolve closely timed echoes to effectively forage with echolocation near acoustic clutter (Madsen and Surlykke, 2013). Ultimately, there is a lower echo delay limit, where echolocators face difficulty in resolving target echoes from clutter echoes, and this forms the clutter interference zone (Simmons et al., 1988, 1989).
A recent psychophysical study on a species of leaf-nosed bat (Phyllostomus discolour) showed that echoes of similar levels from closely spaced targets cannot be resolved when the time delays are on par with, or are shorter than, the likely auditory integration time of the 2 ms reported for active bat biosonar (in big brown bats, Eptesicus fuscus; Surlykke and Bojesen, 1996), forming a clutter interference zone of 34 cm on the same range axis as the target of interest (Wagenhäuser et al., 2020). This problem is exacerbated for toothed whales that echolocate in a medium with a sound speed of ∼1500 m s−1, which is ∼4.5 times faster than for echolocators in air. Perhaps to remedy that problem, or to employ an integration time in keeping with their much shorter echolocation signals, the auditory integration time of 264 µs for bottlenose dolphins and likely other toothed whales, is about an order of magnitude shorter than for frequency modulated (FM) bats (Vel'min and Dubrovsky, 1975; Vel'min, 1976; Moore et al., 1984; Au et al., 1988). If this integration time, as implied in the bat studies, is a measure of the clutter interference zone (Schnitzler and Kalko, 2001), it follows that toothed whales cannot resolve targets with echo delays shorter than 264 µs, corresponding to a target-clutter spacing of ∼20 cm on the same range axis. The 264 µs integration time is estimated by testing the detection thresholds for click pairs with varying delays; when the delays get short enough, the detection threshold is lowered compared with single clicks of the same amplitude, and the delay at which the threshold starts to decrease defines the integration time (Au et al., 1988). Another interpretation is that it is the time window beyond which gap detection or pulse-pair experiments indicate separate signals; here, two echoes within the integration time become part of the same auditory percept, presumably precluding the resolution and discrimination of two targets (Branstetter et al., 2020). Accordingly, we hypothesize that the odontocete auditory integration time of 264 µs marks the delay limit between targets of interest and clutter targets, below which echolocation performance deteriorates.
The ability to discriminate between ensonified targets is dependent not just on temporal resolution, but also spatial and spectral resolutions in their biosonar system (Schmidt, 1992; Au, 1993; Au et al., 2009; Branstetter et al., 2020). In the psychophysical study by Wagenhäuser et al. (2020) on leaf-nosed bats, it was observed that when echo level differences between different auditory streams were very high (>50 dB), the bats could cope with time delays much shorter than the apparent auditory integration time and still resolve the targets. The flight paths of free-flying bats in cluttered environments suggest that the echoic interpretation of a target is enhanced by echo level variations that would arise from a variable azimuth and/or elevation of the targets relative to beam centre (Moss et al., 2011; Falk et al., 2014; Taub and Yovel, 2020), therefore hinting at the use of their directional beam as part of a spatial filter in an echolocation task (Moss and Surlykke, 2010; Linnenschmidt and Wiegrebe, 2016). The highly directional biosonar beam of toothed whales [with directivity indices (DIs) of ∼24–32 dB] is much narrower than that of bats (with DIs of ∼10–16 dB) (Madsen and Surlykke, 2013; Jakobsen et al., 2013; Jensen et al., 2018), and thus would yield greater differences in the returning echo levels for the same target spacing and ranges.
Here, we conducted a clutter interference experiment on echolocating toothed whales to psychoacoustically investigate the effects of clutter arising from a distracting nearby object. Specifically, we tested the hypothesis that echolocating harbour porpoises cannot resolve and discriminate two targets when they are closer than a clutter interference zone defined by their assumed auditory integration time. To do that, we presented free-swimming, tagged porpoises with a two-alternative forced-choice target discrimination task using targets at four different inter-target spacings, offering discrimination tasks of varying difficulty, owing to the increased clutter from the distracting target at close range. We predicted that when targets are more closely spaced, the auditory stream segregation task would be more difficult and this would be reflected in the porpoises' echolocation performance or effort to complete the task. We further predicted that, despite the presumed advantage of a highly directional beam, successful discrimination between targets would break down as the difference in echo time delays nears the auditory integration time.
- ASL
apparent source level (back-calculated from RL on the off-axis target)
- ΔEL
difference in echo level from the two targets (dB)
- FOV
field of view
- ICI
inter-click interval
- ITD
inter-target distance
- pp
peak to peak
- RL
received level
- SL
source level (back-calculated from RL on the on-axis target)
- ΔTat target
time delay between a single click as it arrives on 2 targets
- ΔT
time delay between echoes arriving at the porpoise (equivalent to 2·ΔTat target)
- TL
transmission loss (dB)
- T
target strength
- TWTT
two-way travel time
MATERIALS AND METHODS
Experimental procedure
The study was carried out on captive harbour porpoises (Phocoena phocoena Linnaeus 1758) at Fjord & Bælt, Kerteminde, Denmark. Two porpoises participated in the experiments: Freja and Sif, both female, and at the facility since they were ∼1–2 years old in April 1997 and July 2004, respectively (Lockyer et al., 2003; Wisniewska et al., 2015). The porpoises were housed in a ∼30×10×3 m outdoor netted enclosure in Kerteminde Harbour.
Echolocation clicks were recorded as the porpoises closed in on targets while performing a two-alternative forced-choice task (Schusterman, 1980). The recording setup included hydrophones on the targets and high-resolution movement and sound recording tags on the porpoises. The task involved a discrimination between two simultaneously presented spherical targets (5.08 cm diameter; Fig. 1) of different material (aluminium or stainless steel), with similar target strengths of −39 and −37 dB (Wisniewska et al., 2012). A spherical target was chosen (rather than a cylindrical target, for example), because the target strength of a sphere is independent of aspect. Each porpoise was trained to always target the aluminium sphere, indicating its selection by touching it with the tip of its rostrum, and wore a blindfold (opaque, silicone eyecups) to exclude visual cues from informing discrimination decisions. Both animals had extensive experience with wearing a tag and eyecups in previous psychoacoustic experiments (e.g. Verfuß et al., 2009; DeRuiter et al., 2009; Linnenschmidt et al., 2012; Wisniewska et al., 2012). The target discrimination abilities of the study porpoises have been previously shown (e.g. Wisniewska et al., 2012); the purpose of including a secondary target in this experiment was to ensure multiple targets would be within the beam swathe, thereby introducing an acoustic distractor or clutter by means of an additional, simultaneous echo stream.
Experimental set-up. (A) A representative trial, where blindfolded, tagged porpoises use echolocation to discriminate between aluminium and steel targets suspended at varying inter-target distances in randomly varying orientation orders. Target-mounted hydrophones record clicks that are digitized in the recording hut. (B) Schematic demonstrating an echolocation strategy that would maximize the angular offset between targets (thus maximizing returning ΔEL at the echolocator) versus one which maximizes the time delays (ΔT) of returning echoes. For all on-axis clicks, the angle to the off-axis target was calculated. The time delay between each on-axis click being received on each target was used to obtain the relative position of the porpoise to the targets (thus accounting for non-straight swim paths) was used to calculate the ΔT of the echoes as received at the porpoise. (C) The angle to the distracting target for all on-axis clicks (n=906) across all trials (n=120) as a function of range to the on-axis target for all four inter-target distances, demonstrating the diminishing upper limit of angular resolution that existed as inter-target distance decreased. Dotted lines show theoretical maximum angles for each range, and points to the right of this line signify errors in range measurement. (D) Demonstration of a porpoise repeatedly scanning its biosonar across two targets in a discrimination task (photo courtesy of Magnus Wahlberg; Movie 1).
Experimental set-up. (A) A representative trial, where blindfolded, tagged porpoises use echolocation to discriminate between aluminium and steel targets suspended at varying inter-target distances in randomly varying orientation orders. Target-mounted hydrophones record clicks that are digitized in the recording hut. (B) Schematic demonstrating an echolocation strategy that would maximize the angular offset between targets (thus maximizing returning ΔEL at the echolocator) versus one which maximizes the time delays (ΔT) of returning echoes. For all on-axis clicks, the angle to the off-axis target was calculated. The time delay between each on-axis click being received on each target was used to obtain the relative position of the porpoise to the targets (thus accounting for non-straight swim paths) was used to calculate the ΔT of the echoes as received at the porpoise. (C) The angle to the distracting target for all on-axis clicks (n=906) across all trials (n=120) as a function of range to the on-axis target for all four inter-target distances, demonstrating the diminishing upper limit of angular resolution that existed as inter-target distance decreased. Dotted lines show theoretical maximum angles for each range, and points to the right of this line signify errors in range measurement. (D) Demonstration of a porpoise repeatedly scanning its biosonar across two targets in a discrimination task (photo courtesy of Magnus Wahlberg; Movie 1).
The porpoises were free-swimming during the echolocation task to avoid obscuring any variability and/or richness in biosonar behaviours, as is likely the case for experimental designs involving stationary animals (Moore et al., 2008). Additionally, the free-swimming set-up provides information on how the animal uses echolocation in tasks that are both dynamic and more closely resemble those encountered in the wild (Houser et al., 2005). No rolling behaviour was observed during approaches, and so all quantifications concern the horizontal beam pattern, of which no asymmetry was accounted for.
For each trial, targets were presented at one of four different inter-target distances (target centres were spaced 108, 54, 27 or 13.5 cm apart; Fig. 1A). Targets were suspended from microfilament lines suspended from an out-of-water metal frame, and lowered into the water to a depth of 1 m at the start of each trial (as in Wisniewska et al., 2012; Fig. 1A,B). During one trial, an individual porpoise was instructed to perform the discrimination task (Fig. 1D), whereby the trainer sent the porpoise to the other side of the ∼8×13 m experimental pool to the targets. Upon targeting the aluminium sphere, the behaviour was bridged with a whistle to indicate a correct response, and the porpoise then returned to the starting station for fish reinforcement. No bridge or fish reward was given for the incorrect response of targeting the steel sphere. The frame suspending the targets was pulled up so that the targets were out of the water after each trial. The distances between the targets varied from trial to trial. For each session, a Gellerman pseudo-random schedule (Gellermann, 1933) randomised both the distance between targets, as well as the order in which targets were presented (left/right) to avoid ‘focal expectancy’ (sensuVandenberghe et al., 2001). After training, a total of 120 data collection trials occurred over 3 days in July 2017. Trials for each porpoise were run in sessions with a maximum of 12 trials per session, and 2 sessions per porpoise, per day.
The porpoises were free to modify their swim paths to alter both the spatial and temporal separation of the targets, but the extent to which this was achievable was limited by the inter-target distance (Fig. 1B). To maximize differences in the time delays of the returning echoes, the porpoise was required to approach from the side, and to maximize the angular offset to the distracting target, the porpoise had to conduct a direct approach perpendicular to the axis defined by the line connecting both targets (Fig. 1B). The bearing offset between the on-axis target and the distracting target is shown for all on-axis clicks (Fig. 1C), demonstrating the maximal angular separation of targets that was obtained with each inter-target spacing. Porpoises had to be closer to the targets to obtain greater angular separation of the two targets, and closer still for smaller inter-target distances (Fig. 1C). Additionally, at close inter-target distances, the porpoise needed to be closer to the targets to obtain greater differences in echo levels (ΔEL) reflecting off the two targets; at large ranges, range to each target was more similar and the angular offset between targets was small. Note that for the smallest inter-target distance value of 13.5 cm, differences in echo time delays between the targets were never greater than the estimated auditory integration time of ∼264 µs (Vel'min and Dubrovsky, 1975; Vel'min, 1976; Moore et al., 1984; Au et al., 1988), no matter how close the porpoise got to the target of interest.
Echolocation clicks received at the targets were recorded by custom-built cylindrical hydrophones (flat frequency response ±2 dB between 100 and 160 kHz) mounted 3 cm above the center of each sphere (Fig. 1A,D). These hydrophones were calibrated against a TC-4034 hydrophone (Teledyne Reson, Slangerup, Denmark) by using simulated porpoise clicks, and were found to have a sensitivity of −211 dB re. 1 V µPa−1. Both hydrophones were connected to a custom-built amplifier box with +40 dB of gain, where an anti-aliasing filter (180 kHz, 4-pole, low-pass) and a pre-whitening high-pass filter (1 kHz, 1-pole) were applied. In the recording hut (Fig. 1A), signals on each target were digitized with a multifunction acquisition device (National Instruments USB-6251, Austin, TX, USA), sampling at 500 kHz per channel, with 16 bit resolution, and saved as wav files with a custom-built LabView program (National Instruments).
Echolocation clicks and returning echoes were also recorded by an on-animal sound and movement tag (DTAG-4; Johnson and Tyack, 2003; Johnson et al., 2009; www.soundtags.org) non-invasively attached via suction cups behind the sound generating nasal complex and immediately posterior to the blowhole. The multi-sensor digital recording tag continuously sampled audio data on a single hydrophone at 576 kHz (flat frequency response ±2 dB from 0.4 to 150 kHz). The combined recordings of echolocation – both on the animal and on the targets – allowed for insights into sensory focus (Fig. 2). The time delays between click emission and echo reception allowed the tag to both provide range-to-target information and to record the echoic scene as experienced by the porpoise (Fig. 2E). Thus, the complete acoustic circuit could be observed by recording the acoustic information available to the porpoise. While the tag also recorded data from its pressure sensor, tri-axial accelerometers and tri-axial gyroscopes, its placement behind the blowhole prohibited the measurement of any movement signatures arising from head-scanning because movements of the head and thorax are decoupled by flexible cervical vertebrae.
Example target approaches for easier and more difficult discrimination tasks. The easier example (left panels) refers to a trial with targets furthest apart (1.08 m), and the difficult example (right panels) to a trial with the targets closest together (0.135 m). (A–H) Time series relative to target interception, with on-axis clicks on either target highlighted with triangles (right-target) and diamonds (left-target). Shapes are filled if they passed all on-axis criteria (see text). (A) On-animal recording. (B) Right-target audio recording. (C) Left-target audio recording. (D) Range to chosen target (m) in black, and inter-click interval (ICI, ms) in red. (E) Echogram created from the on-animal recording, offering a visualization of the challenge of separating the echo streams. (F) Time delays (ΔT) and (G) differences in echo level (ΔEL) of the echoes from both targets as received at the porpoise's location. (H) Angle to the off-axis target for all on-axis clicks. Note that the y-axis scales vary for F–H between the two examples.
Example target approaches for easier and more difficult discrimination tasks. The easier example (left panels) refers to a trial with targets furthest apart (1.08 m), and the difficult example (right panels) to a trial with the targets closest together (0.135 m). (A–H) Time series relative to target interception, with on-axis clicks on either target highlighted with triangles (right-target) and diamonds (left-target). Shapes are filled if they passed all on-axis criteria (see text). (A) On-animal recording. (B) Right-target audio recording. (C) Left-target audio recording. (D) Range to chosen target (m) in black, and inter-click interval (ICI, ms) in red. (E) Echogram created from the on-animal recording, offering a visualization of the challenge of separating the echo streams. (F) Time delays (ΔT) and (G) differences in echo level (ΔEL) of the echoes from both targets as received at the porpoise's location. (H) Angle to the off-axis target for all on-axis clicks. Note that the y-axis scales vary for F–H between the two examples.
At the start of each trial, a short high-frequency sweep signal (from 180 to 210 kHz), above the hearing range of harbour porpoises (Kastelein et al., 2002), was projected into the water to time-synchronise the tag data with the target-hydrophone data. The sweeps were generated by the sound-recording multifunction device, tightly synchronized to the onset of recording of the on-target hydrophone signals, which were driven by the same timer. Trials were additionally monitored underwater with a GoPro Hero 2 video camera (GoPro Inc, San Mateo, CA, USA) mounted 2.5 m behind the target frame.
Ethics statement
The porpoises are maintained by Fjord & Bælt, Kerteminde, Denmark, under permits no. SN 343/FY-0014 and 19963446-0021 from the Danish Nature Agency under the Ministry of Environment and Food of Denmark.
Data analysis
Data processing and acoustic analysis were conducted in MATLAB (version 8.5, The MathWorks, Natick, MA, USA). The hydrophone and tag recordings were time-aligned for each trial, using the synchronization sweeps, followed by manual confirmation using the inter-click intervals (ICIs, defined as the time between each click and the previous one) unique to each trial.
Each porpoise echolocation click was identified using a supervised click detector run on both the filtered acoustic data on the tag and on the target-hydrophone recordings (90–180 kHz 4-pole Butterworth band-pass filter). Received levels (RLs) on the targets were quantified as the clip level of the recording system [(171 dB re. 1 μPa)+20·log10(peak-to-peak amplitude)]. Relative peaks in the RLs of consecutive clicks, as recorded by the target-mounted hydrophones, were manually identified as candidate on-axis clicks as the porpoise scanned across a given target (n=2688; Fig. 2A–C; Madsen et al., 2004; Madsen and Wahlberg, 2007; Jensen et al., 2009).
The distance between the porpoise and the on-axis target was measured using the time delays between on-axis click emission and echo reception. ‘Echograms’, akin to echosounder images from an echolocator's perspective, were created from the tag data (Johnson et al., 2004; 2009; Johnson, 2014), and the echo streams corresponding to the two targets were used to confirm the range of the porpoise to the target that was being scanned (Fig. 2E). For all candidate on-axis clicks on either target, the time delay (ΔTat targets) of the click as received on both target-mounted hydrophones was measured via cross-correlation of triple up-sampled waveforms, with the duration of the search window constrained by the maximum inter-target distance (whereby the search window was click time on on-axis target±inter-target distance/c·f). The range of the porpoise to the off-axis target was calculated from the time delay measurement at the targets (whereby range to off-axis target=c·ΔTat targets+range to on-axis target). Owing to low SNR clicks, and/or the multi-pulsed nature of porpoise clicks, spatial aliasing errors arose from incorrect cross-correlations (Gillespie and Macaulay, 2019), manifesting as ranges to off-axis targets that resulted in impossible triangles. Therefore, clicks were removed if the ΔTat targets measurement yielded an impossible triangle, or when low SNRs of the cross-correlated clicks led to a signal that was not obvious, reducing the dataset (n=2000). ΔTat targets was multiplied by 2 to give the ΔTat porpoise, and hereafter ‘ΔT’ refers to the time delay at the porpoise location.
Given the known distance between the targets, the measured range to the on-axis target, and the calculated range to the off-axis target, the bearing to the off-axis target could be calculated for all on-axis clicks (Fig. 1B). In this way, non-straight swim-paths were accounted for, and porpoise approach tracks were extractable. If the signature of the click as received on the off-axis target was unclear, no time delay (and therefore no localisation point) could be reliably calculated. On-axis click candidates were excluded from further analyses if the time delay arising from the cross-correlation resulted in a manually identified erroneous porpoise localisation (reducing the dataset from 2000 on-axis click candidates to 1810 on-axis click candidates). 2D approach tracks for each trial were created via cubic interpolation between the remaining on-axis click candidates.
The RL of the same click recorded on both target-mounted hydrophones, along with the known target strengths (TS) of the two targets and the ranges to them from the porpoise location, were used to calculate the difference in echo level (ΔEL) for returning target echoes as received at the porpoise location. Source levels (SLs) of on-axis click candidates, defined as the sound level of this click referenced to 1 m ahead of the animal and along its beam axis, were calculated. Additionally, apparent source levels (ASLs) of the same clicks as received on the off-axis target, defined as the sound pressure back-calculated to 1 m ahead of the animal with an aspect angle that is not 0 deg relative to the centre of their sonar beams, were back-calculated. EL, SL and ASL measurements all assumed spherical spreading [20·log10(R)], and accounted for frequency-dependent transmission loss (TL) due to absorption (0.04 dB m−1 at 130 kHz).
To confirm whether on-axis click candidates were truly part of scans across a target – as opposed to being from a scan where the beam was pointed near to, but did not scan across the target – increasing and decreasing patterns in the ASL (back-calculated from each target) of the three clicks preceding and three clicks following each on-axis click candidate were examined (noting that as ranges could only be measured for on-axis clicks, interpolated ranges for the porpoise to each target were used for the preceding and following clicks). So, for example, a click was considered truly on-axis if the ASL signature on the on-axis target increased prior to and decreased after the on-axis click, and if the ASL signature on the off-axis target either increased or decreased in the clicks prior to and following the on-axis click. A total of 906 clicks passed these ‘true-scan’ criteria and were deemed as being recorded truly on-axis.
Several variables were measured as proxies to assess porpoise biosonar performance in scenes of varying acoustic complexity. For each manually identified on-axis click (n=1810), we measured: (i) the time delay of target echoes at the porpoise location (ΔT, μs), (ii) the difference in echo levels from each target (ΔEL, dB), and (iii) inter-click interval (ICI, ms). For each truly on-axis click, whereby scans across the target were confirmed (n=906), we also measured the SL (dB re. 1 μPapp) and the bearing to the off-axis target (deg). Note that the larger dataset (n=1810) could be used for ΔT and ΔEL because these values are unaffected by the true-scan criterion. However, to be conservative, only the smaller dataset (n=906) was used for reporting of SL and bearing, as measurements of both were only reliable if they passed the true-scan criterion.
For each trial, several variables were measured to assess task difficulty (sensuKastelein et al., 2008) and acoustic gaze adjustments (here defined as the spatial extent of echoic information as controlled by the beam pattern, sampling rate, and output energy, as in Wisniewska et al., 2012). These variables were: (i) trial duration (in seconds, from the start of a trial to target interception); (ii) total buzz duration (in seconds, with buzzing defined by inter-click intervals (ICI) <13 ms; Wisniewska et al., 2012); (iii) range to the on-axis target at buzz onset (in meters); (iv) the number of scans across each target, indicating the number of times the porpoise switched focus between targets (sensuWisniewska et al., 2012); and (v) the range to the targets at the discrimination decision (in metres). When and at what range the porpoise last focused its biosonar beam on the non-chosen target was taken as a proxy for the target discrimination decision. Additionally, we noted whether this ‘last glance’ occurred before or after the initial buzz onset, and whether it occurred during a buzz.
Statistical analysis
The statistical analysis was implemented in R software (version 3.6.1; https://www.r-project.org/). To quantify how porpoises modified their echolocation behaviour according to the complexity of the acoustic scene, we used inter-target distance (a proxy for acoustic clutter) as the main explanatory variable, and nine response variables (trial duration, number of scans, buzz duration, range from targets at buzz onset, range-to-targets at the discrimination decision, ICI, the time delay of target echoes, difference in echo levels from each target, and the SL of true on-axis clicks). To estimate these associations, we used generalized linear mixed-models (glmer in the lme4 package, version 1.1-21; https://cran.r-project.org/web/packages/lme4) to account for the dependent nature of data coming from the same animal, as well as the data coming from the same day and session: all models included animal ID, date of the trial, and session as random intercepts. Additionally, all models included a random slope for inter-target distance related to animal ID. Inter-target distance was included as a categorical variable with four categories (13.5, 27, 54 and 108 cm), and hence we additionally performed a Cuzick's test (Cuzick, 1985) to assess whether there was an increasing or decreasing trend for each outcome following the ordered distance categories. When investigating the association between the SL of true on-axis clicks and inter-target distance, we adjusted the relationship by the effect of range-to-target using an asymptotic function, and included an interaction term to account for potentially different relationships between inter-target distance and SL depending on range-to-target (Fig. S1). While ICI is known to decrease as porpoises get closer to a given target, there was no difference in the distributions of ranges to target with different inter-target distances (Fig. S2), and hence, it was not necessary to adjust for the potential confounding effect of range-to-target. A Gaussian family function was used for most response variables, where the assumptions of normality and homoscedasticity of residuals were checked. A Poisson (link=log) family function was fitted when the response variable represented counts, such as number of scans. Results are reported by an estimate, α, in the units of each parameter along with 95% confidence intervals (CI) in square brackets, and a P-value or Ptrend when using Cuzick's test.
RESULTS
Both porpoises had high success rates (95.0% for Freja and 93.3% for Sif) in correctly identifying the aluminium target irrespective of spacing to the alternative target (Fig. 3A). While errors by Freja only occurred in trials where inter-target distance was 108 cm, errors by Sif were not related to inter-target distance. Most of the other target discrimination performance-related variables were associated with inter-target distance, after adjusting for the random effects of porpoise ID, session and date (Figs 3, 4 and 7).
Experiment-wide target discrimination performance for the two porpoises as a function of inter-target distance. Columns are separated by porpoise (left, Freja, n=60; right, Sif, n=60), for a total of n=120. (A) Success rates of correctly targeting the aluminium sphere, indicating correct (green) and incorrect (orange) selection, with success rate overlaid. (B) There was no significant trend in trial duration (s) as a function of inter-target distance. (C) Total number of scans across both targets per trial increased with decreasing inter-target distance. Distributions of the raw data are shown as violin plots, while the black dot and whiskers represent the model estimates and 95% confidence intervals, respectively.
Experiment-wide target discrimination performance for the two porpoises as a function of inter-target distance. Columns are separated by porpoise (left, Freja, n=60; right, Sif, n=60), for a total of n=120. (A) Success rates of correctly targeting the aluminium sphere, indicating correct (green) and incorrect (orange) selection, with success rate overlaid. (B) There was no significant trend in trial duration (s) as a function of inter-target distance. (C) Total number of scans across both targets per trial increased with decreasing inter-target distance. Distributions of the raw data are shown as violin plots, while the black dot and whiskers represent the model estimates and 95% confidence intervals, respectively.
Experiment-wide target discrimination performance for the two porpoises as a function of inter-target distance concerning their echolocation click rate. (A) Inter-click interval (ICI) for non-buzz clicks (ICI≥13 ms) decreased with decreasing inter-target distance. (B) Total buzz duration (s) increased with decreasing inter-target distance. (C) Range to on-axis target at the onset of the buzz (ICI<13 ms) was greater when the targets were more closely spaced. Distributions of the raw data (n=60 for Freja, and n=60 for Sif) are shown as violin plots, while the black dot and whiskers represent the model estimates and 95% confidence intervals, respectively.
Experiment-wide target discrimination performance for the two porpoises as a function of inter-target distance concerning their echolocation click rate. (A) Inter-click interval (ICI) for non-buzz clicks (ICI≥13 ms) decreased with decreasing inter-target distance. (B) Total buzz duration (s) increased with decreasing inter-target distance. (C) Range to on-axis target at the onset of the buzz (ICI<13 ms) was greater when the targets were more closely spaced. Distributions of the raw data (n=60 for Freja, and n=60 for Sif) are shown as violin plots, while the black dot and whiskers represent the model estimates and 95% confidence intervals, respectively.
Although trial duration did not change significantly in relation to inter-target distance (Ptrend=0.510; Fig. 3B), the total number of scans on both targets per trial increased with decreasing inter-target distance (α13.5 cm=18 scans [16.6, 20.4], α27 cm=15 [13.4, 17.8], α54 cm=15 [13.7, 16.8], α108 cm=14 [12.8, 15.9], Ptrend<0.001; Fig. 3C). Each scan comprised ∼5–10 clicks across a target (Fig. 2B,C). Similarly, we observed that both buzz duration and range to the on-axis target at the onset of the buzz were associated with inter-target distance (Ptrend,duration<0.001; Ptrend,range<0.001, Fig. 4B,C). Shorter inter-target distances were associated with longer total buzz durations that started farther away from the target (buzz duration: α13.5 cm=2.4 s [2.05, 2.83], α27 cm=2.2 [1.48, 2.89], α54 cm=2.0 [1.23, 2.77], α108 cm=1.7 [1.22, 2.21]; range at buzz start: α13.5 cm=0.7 m [0.64, 0.75], α27 cm=0.6 [0.55, 0.66], α54 cm=0.5 [0.42, 0.53], α108 cm=0.5 [0.45, 0.55]; Fig. 4B,C). Additionally, the porpoises made their discrimination decision closer to the targets when the targets were more closely spaced (α13.5 cm=0.5 m [0.34, 0.59], α27 cm=0.6 [0.29, 0.83], α54 cm=0.8 [0.32, 1.33], α108 cm=1.2 [0.92, 1.53], Ptrend<0.001; Fig. 7A). Porpoises more often made their discrimination decision before the onset of the buzz when targets were far apart, and after buzz initiation when targets were closely spaced. When targets were closely spaced, discrimination decisions were often made during the buzz (Fig. 10), and there was evidence of maintaining the buzz phase while scanning across and between the two targets (as seen in Fig. 2E).
The challenge of separating echoes from closely spaced targets is demonstrated (Figs 2 and 6). Visual analogues of the received echo streams from targets show that they were more distinct from one another when targets were spaced farther apart (Fig. 2E). When the targets were closely spaced, the challenge of segregating overlapping auditory streams is also demonstrated with clicks of overlapping amplitudes on the two target-mounted hydrophones (Fig. 2A–C), smaller time delays (ΔT; Fig. 2F), smaller comparative echo strengths (ΔEL; Fig. 2G), and smaller bearing offsets between the targets (Fig. 2H). Fairly direct and comparable swim path approaches to the targets across inter-target distance treatments are observed (Fig. 5).
Bird's-eye view of porpoise approach tracks for all four inter-target distances. Black dots show the locations of the left target (0,0) and right target (inter-target distance, 0). Inter-target distance of (A) 13.5 cm (n=31), (B) 27 cm (n=27), (C) 54 cm (n=28) and (D) 108 cm (n=34). Tracks were created by connecting localized points of on-axis clicks for each trial (n=120). The sending station was at (1,−8).
Bird's-eye view of porpoise approach tracks for all four inter-target distances. Black dots show the locations of the left target (0,0) and right target (inter-target distance, 0). Inter-target distance of (A) 13.5 cm (n=31), (B) 27 cm (n=27), (C) 54 cm (n=28) and (D) 108 cm (n=34). Tracks were created by connecting localized points of on-axis clicks for each trial (n=120). The sending station was at (1,−8).
Modelling results showed that the differences in both the echo levels (ΔEL) and time delays (ΔT) of the returning echoes decreased as inter-target distance decreased (ΔEL: α13.5 cm=7.3 dB [4.54, 10.12], α27 cm=12.4 [7.96, 16.86], α54 cm=18.6 [13.67, 23.59], α108 cm=26.7 [25.19, 28.24], Ptrend<0.001; ΔT: α13.5 cm=44.4 µs [3.01, 85.72], α27 cm=130.1 [86.78, 173.42], α54 cm=265.8 [219.67, 309.91], α108 cm=398.4 [210.35, 586.43], Ptrend<0.001; Fig. 7B,C). While both the maximal ΔT and the bearing to the distracting target relative to the beam axis (Fig. 1C) have an upper bound that is constrained by inter-target distance, these values depended on the porpoise's position relative to the two targets (Fig. 1B). Fig. 6 shows the variability in ΔT for all on-axis clicks (n=1810) and across all inter-target distances. ΔT could theoretically reduce to 0 s in any inter-target distances treatment if the porpoise positioned itself so that the range to both targets was identical. As the separation between targets decreased, the porpoise was constrained in making its discrimination decision with information of reduced contrast, specifically, when the ΔEL was lower (Fig. 2G), when ΔT was smaller (Fig. 2F, Fig. 6) and when the bearing to the distracting target relative to the biosonar beam axis was smaller (Fig. 2H, Fig. 9).
Temporal delay differences between echoes returning at the porpoise's position as a function of range to the on-axis target for all on-axis clicks (n=1810). Shapes and colours denote inter-target distance treatments. The red dotted line shows the nominal auditory integration time of 264 µs. Histograms of the echo time delays (ΔT) for each inter-target distance distribution are shown on the right (25 µs bins). Maximal possible time delays based on target spacing geometry are shown with dashed black lines. Note that clicks did not have to fulfil true-scan criteria in order to be included here, as time delay information is insensitive to exclusions brought about by the true-scan criterion.
Temporal delay differences between echoes returning at the porpoise's position as a function of range to the on-axis target for all on-axis clicks (n=1810). Shapes and colours denote inter-target distance treatments. The red dotted line shows the nominal auditory integration time of 264 µs. Histograms of the echo time delays (ΔT) for each inter-target distance distribution are shown on the right (25 µs bins). Maximal possible time delays based on target spacing geometry are shown with dashed black lines. Note that clicks did not have to fulfil true-scan criteria in order to be included here, as time delay information is insensitive to exclusions brought about by the true-scan criterion.
Cues at the discrimination decision for the two porpoises as a function of inter-target distance. (A) Range at the discrimination decision (m) for each trial (n=120) decreased with decreasing inter-target distance. (B) The time delay between echoes (ΔT, µs) and (C) the echo level ratio (ΔEL, dB) both increased with increasing inter-target distance for all on-axis clicks (n=1810). Distributions of the raw data are shown as violin plots, while the black dot and whiskers represent the model estimates and 95% confidence intervals, respectively.
Cues at the discrimination decision for the two porpoises as a function of inter-target distance. (A) Range at the discrimination decision (m) for each trial (n=120) decreased with decreasing inter-target distance. (B) The time delay between echoes (ΔT, µs) and (C) the echo level ratio (ΔEL, dB) both increased with increasing inter-target distance for all on-axis clicks (n=1810). Distributions of the raw data are shown as violin plots, while the black dot and whiskers represent the model estimates and 95% confidence intervals, respectively.
Closely spaced targets gave rise to echoes from both targets that returned at temporal delays that were within the nominal auditory integration time of 264 µs (Figs 6 and 10). In the smallest inter-target distance treatment, the set-up geometry constrained the ΔT of returning echoes so that they could never exceed the estimated odontocete auditory integration time of 264 µs (Fig. 6). Despite this, target discrimination decisions were made when time delays of the echoes were below the auditory integration time (Fig. 10). For the inter-target distances of 13.5, 27, 54 and 108 cm, respectively, target discrimination decisions were made at a median ΔT of 52, 158, 233 and 238 µs (10th percentiles of 6.9, 33.7, 52.1, 38.3 µs; 90th percentiles of 104, 234, 479, 802 µs) and at a median ΔEL of 6, 11, 20 and 27 dB (10th percentiles of 0.9, 2.2, 3.8, 11.7 dB; 90th percentiles of 15, 24, 33, 40 dB). While we found ΔEL values with a median of 6 dB for the shortest target spacing, the ΔEL differences could be as small as ∼2 dB and yet the porpoises could still successfully discriminate between the targets (Fig. 10). There was no pattern in either the time delays or the echo levels at which any of the seven incorrect target discriminations occurred (Fig. 10).
Inter-click intervals (ICI) of non-buzz on-axis clicks were associated with inter-target distance: ICI decreased when targets were closer together, though no difference were observed between the two closest inter-target distances (α13.5 cm=32.8 ms [30.66, 34.92], α27 cm=32.3 [28.50, 36.02], α54 cm=33.2 [28.63, 37.78], α108 cm=37.4 [33.51, 41.29]; Ptrend<0.001; Fig. 4A). After adjusting by the asymptotic function of range-to-target, the SLs of true on-axis clicks were also associated with inter-target distance (Fig. 8). Although the porpoises presented different average SLs (Sif produced clicks 5 dB higher on average), SLs were lower when targets were closer together (α13.5cm=142 dB re. 1 µPapp, α27cm=149, α54cm=155, α108cm=161; Ptrend<0.001; Fig. 8). However, the interaction term was also statistically significant (P<0.001), and while the asymptote lies, in all four treatments, at ∼166 dB re. 1 µPapp (Fig. 8A,B), the SL at the closest ranges to the target depended on inter-target distance (Fig. 8C). Specifically, at closer target ranges, porpoise clicks were weaker when targets were closer together, but the SL was the same between different inter-target distances when porpoises at ranges >2–3 m from the target (Fig. 8C).
Source level (SL) as a function of range to target. SL for (A) Freja and Sif, (B) four inter-target distances and (C) in logarithmically spaced bins for four inter-target distances. SLs are shown as peak-to-peak values (dB re. 1 µPapp) for all true on-axis (n=906) and adjusted for range to target (m). The relationship between SL and range to target is approximated by an asymptotic function in A and B, where the red dashed line represents the asymptote at 166 dB re. 1 µPa. The black dashed line represents the overall function estimate. The number of points contributing to each box in C is shown.
Source level (SL) as a function of range to target. SL for (A) Freja and Sif, (B) four inter-target distances and (C) in logarithmically spaced bins for four inter-target distances. SLs are shown as peak-to-peak values (dB re. 1 µPapp) for all true on-axis (n=906) and adjusted for range to target (m). The relationship between SL and range to target is approximated by an asymptotic function in A and B, where the red dashed line represents the asymptote at 166 dB re. 1 µPa. The black dashed line represents the overall function estimate. The number of points contributing to each box in C is shown.
The differences between SL and ASL as a function of bearing of the biosonar beam to the off-axis target mostly clustered along previously measured harbour porpoise beam profiles (Macaulay et al., 2020) (Fig. 9). A pattern consistent with production of clicks with wider beamwidths at closer ranges to the target was observed across all inter-target distance treatments, and these broader beamwidths corresponded with buzz clicks (Fig. 9). A pattern of broader beamwidth clicks accompanying small inter-target distances is apparent, but there were more on-axis clicks recorded at close range when inter-target distances were small (Fig. 9), linked to more scans across the targets when inter-target distances were small (Fig. 3C). Outliers (e.g. in Fig. 9D) where the bearing offset to the off-axis target is large and the difference between SL and ASL are low are thought to arise from errors in range estimates (as highlighted in Fig. 1C).
Bearing from the biosonar beam to the off-axis target for all on-axis clicks (n=906) in relation to porpoise biosonar beam pattern. The difference in back calculated source levels for on-axis (SL) and off-axis (ASL) targets (dB rel. to level at 0 deg), as calculated from the RLs on both target-mounted hydrophones, the known target strengths (TS), and the measured range to each target. This is shown as a function of horizontal angle to the distracting target (n=120). Subplots show varying inter-target distances: (A) 13.5 cm, (B) 27 cm, (C) 54 cm and (D) 108 cm. The average horizontal beam pattern of the same porpoises (from Macaulay et al., 2020) is overlaid, as is double this beam pattern. Point shape denotes whether the discrimination decision was made during a buzz click (triangle, ICI<13 ms) or during a regular echolocation click (circle, ICI≥13 ms).
Bearing from the biosonar beam to the off-axis target for all on-axis clicks (n=906) in relation to porpoise biosonar beam pattern. The difference in back calculated source levels for on-axis (SL) and off-axis (ASL) targets (dB rel. to level at 0 deg), as calculated from the RLs on both target-mounted hydrophones, the known target strengths (TS), and the measured range to each target. This is shown as a function of horizontal angle to the distracting target (n=120). Subplots show varying inter-target distances: (A) 13.5 cm, (B) 27 cm, (C) 54 cm and (D) 108 cm. The average horizontal beam pattern of the same porpoises (from Macaulay et al., 2020) is overlaid, as is double this beam pattern. Point shape denotes whether the discrimination decision was made during a buzz click (triangle, ICI<13 ms) or during a regular echolocation click (circle, ICI≥13 ms).
DISCUSSION
In this study, we investigated the echolocation abilities of porpoises as they completed an active target discrimination task with varying target spacing. We hypothesized that the auditory streams of simultaneously presented targets could not be resolved and discriminated from one another when the echoes arrived within the reported auditory integration time of 264 µs and hence within the clutter interference zone (Simmons et al., 1988, 1989). We reject our hypothesis by showing that echolocating porpoises can resolve a target from a distractor when echoes arrive well below this alleged critical interval. We propose that, for toothed whales, the clutter interference zone is shorter than the suggested integration time of 264 µs, and below we discuss both the implications of such time resolution and how the directional biosonar beam helps resolve closely spaced auditory streams via spatial filtering.
Performance and acoustic behaviour
The close proximity of auditory streams generated by closely spaced targets was predicted to present the porpoises with a challenging echolocation task, and this was anticipated to be reflected both in their echolocation performance and effort. However, the high success rate (Fig. 3A) of correctly targeting the aluminium sphere was in agreement with previously reported success rates of target discrimination carried out by Freja and Sif for targets 1 m apart (of 94% and 89%, respectively; Wisniewska et al., 2012). Thus, rather than discrimination performance deteriorating with more intense distractors or more closely spaced distractors, as was the case for bats (Wagenhäuser et al., 2020), we find discrimination performance to be acute in echolocating porpoises subjected to distractors in very small spatial and temporal separation from the target of interest (Fig. 10).
Timing and relative level of target echoes as received at the porpoise during all discrimination decision clicks. (A) Freja (n=60) and (B) Sif (n=60). Both ΔEL and ΔT are plotted on a log scale. Shapes and colours denote inter-target distance treatments. The red dotted line shows the reported auditory integration time of 264 µs, and highlights that many discrimination decision clicks occurred at temporal resolutions beneath this threshold. Incorrect target discriminations are denoted with an overlaid black ‘x’.
Timing and relative level of target echoes as received at the porpoise during all discrimination decision clicks. (A) Freja (n=60) and (B) Sif (n=60). Both ΔEL and ΔT are plotted on a log scale. Shapes and colours denote inter-target distance treatments. The red dotted line shows the reported auditory integration time of 264 µs, and highlights that many discrimination decision clicks occurred at temporal resolutions beneath this threshold. Incorrect target discriminations are denoted with an overlaid black ‘x’.
A previous echolocation performance study reported that trial duration increased with increasing acoustic complexity and therefore harder discrimination tasks (Wisniewska et al., 2012), but no significant effect was observed in the present study (Fig. 3B). The porpoises scanned more across each target when the targets were closely spaced (Fig. 3C), and although we predicted that this would lead to a longer trial duration, it was likely offset by the porpoise having to spend more time moving its head back and forth more to scan across widely spaced targets when at close target range (as seen in Fig. 5D).
When confronted with a more acoustically challenging discrimination task (i.e. targets closely spaced), the buzz onset occurred at a further range (Fig. 4C) and the porpoises buzzed for longer (Fig. 4B). This pattern has been observed in previous experiments on the same porpoises, whereby buzz duration increased when confronted with more acoustic reverberation (Ladegaard and Madsen, 2019). Similarly, Daubenton's bat (Myotis daubentonii) and the big brown bat (Eptesicus fuscus) (Moss et al., 2006; Hulgard and Ratcliffe, 2016), as well as beaked whales (Johnson et al., 2008), produce longer terminal buzzes in cluttered scenes. Thus, longer buzz duration appears to coincide with greater task complexity across different guilds of echolocators.
Increasing the rate of sensory feedback to accommodate a more difficult discrimination task can also be achieved by clicking faster in the approach phase. We show here that the porpoises had lower mean ICIs during approach (Fig. 4A) when the targets were more closely spaced. Dolphins have similarly been observed to increase the number of clicks produced per unit time when a target is near a clutter screen (Au and Turl, 1983), and some bats increase information update rates via higher call rates as the echolocation task increased in difficulty (Lewanzik and Goerlitz, 2021).
Approach angles
Modifying the approach angle offers a way of managing complex echo streams. While high aspect approaches have been observed in echolocating bats and toothed whales, and are reported to be a means of reducing clutter (Turl et al., 1991; Geipel et al., 2019; Moss et al., 2006; Moss and Surlykke, 2001; Greiter and Firzlaff, 2017), bats have behaviourally demonstrated the difficulty of finding and capturing prey using echolocation near clutter screens (Schmieder et al., 2012). In our experiment, although the temporal and spatial cues in the returning echoes were constrained by the proximity between the targets, the porpoises swam freely so they could adjust their approach angles, and thus their orientation relative to the two targets during target approaches. This means they could modify both ΔT and ΔEL (Fig. 1C). While the maximal angular bearing of the distractor to the porpoise's beam axis was constrained by the inter-target distance (Fig. 1C), approaching the two targets from the side (Fig. 1B) would maximize the difference in the echo delay (ΔT), whereas a head-on approach and sequential scanning across the targets would maximize differences in level (ΔEL) of the returning echoes. Contrasts in spatial and hence temporal separation (Fig. 2F–H) increased with decreasing target range.
The fairly direct and stereotyped approach paths across inter-target distance treatments (Fig. 5), along with the absence of side-on approaches that would maximize echo time delays (Figs 1B and 6), show that the porpoise did not seek to maximize echo delays from the two targets. The porpoises could have positioned themselves to maximize temporal resolution, but this was not observed (Figs 5 and 6). Rather, we show that the porpoises could successfully discriminate the targets despite echoes from both targets arriving well within the suggested auditory integration time of 264 µs for many of the trials (Fig. 10).
Auditory integration time and target resolution
The auditory integration time, or ‘critical interval’, for odontocete audition of 200–300 µs was first reported from pulse-pair discrimination experiments with Tursiops (Vel'min and Dubrovsky, 1975; Vel'min, 1976), as determined with a 75% correct discrimination occurring at pulse intervals of 230±40 µs. A similar value of 264 µs was found in Tursiops using simulated echoes (Au et al., 1988). In a backwards masking experiment with Tursiops, Moore et al. (1984) found essentially the same interval of 265 µs, as this was the minimum time delay between target echo and noise masker in a target detection task at which a success rate of 70% was achieved. Accordingly, these studies on the bottlenose dolphin auditory system of ∼264 µs can be interpreted as the time window below which acoustic events merge (Vel'min, 1976) or appear as an acoustic whole (Dubrovskiy, 1990). Recent studies using auditory brainstem responses (ABRs) in dolphins have reported peak amplitudes occurring at latencies of ∼260 µs (Jones et al., 2019; Finneran et al., 2020) and presented this as further support for the previously published estimates of a critical interval of the same duration. However, the interpretation of both ABR findings and modulation rate transfer functions (e.g. Linnenschmidt et al., 2013) to estimate time resolution capabilities is contested (Beedholm and Miller, 2008). In contrast, much shorter integration times for odontocetes have been proposed (Beedholm and Miller, 2008; Zaslavski, 2012). Specifically, time resolution constants as low as 20 µs have been suggested for Tursiops and 50 µs for harbour porpoises in behavioural experiments involving the discrimination of targets placed near a clutter screen (Zaslavskiy, 2003; Zaslavski, 2008, 2012). However, owing to different methodological approaches, these results are difficult to reconcile or compare with those converging on ∼264 µs. While the auditory integration time has not been psychophysically measured in porpoises, it is expected to be equal to or longer than the dolphin auditory integration time, given that porpoise click duration (∼80 µs; Wisniewska et al., 2015) is longer than dolphin click duration (∼20 µs).
In the present study, the majority of on-axis clicks in all treatments had ΔT values below a 264 µs auditory integration time (Fig. 6), with ΔT never able to exceed 180 µs at the smaller inter-target distance of 13.5 cm. Similarly, many of the discrimination decision clicks occurred below the 264 µs auditory integration time, as well as below the much lower and later proposed auditory integration time of 50 µs for porpoises in a clutter wall experiment (Fig. 10; Zaslavski, 2012). The latter value of 50 µs is shorter than a porpoise click, and therefore also shorter than an echo, but in principle is still feasible given a Woodward time resolution constant of ∼25 µs for a porpoise click. Irrespectively, our results call into question the use of the auditory integration time of 264 µs as a hard delay limit for the clutter interference zone for toothed whales, below which echoes supposedly cannot be independently processed. Acoustic clutter rejection is conventionally described in the temporal domain, with bats and toothed whales placing echoes of interest between inner and outer windows, as demonstrated in the lab (e.g. Wilson and Moss, 2004) and field (e.g. Kalko and Schnitzler, 1993; Madsen et al., 2005; Stidsholt et al., 2021). However, here we see that the porpoises must be effectively rejecting the clutter of the distracting echo stream given their successful discrimination of closely spaced targets, and in the case of shortest inter-target distance, they are doing so in a very short overlap-free window (Fig. 6). How can an echolocating toothed whale achieve such clutter rejection? Part of the answer could be because toothed whales can resolve two auditory streams much shorter than the 264 µs integration time and that the porpoise critical interval is more on par with the 50 µs values suggested by Zaslavski (2012). However, for very short echo delays, another explanation may pertain to differences in spectral interference depending on whether the porpoises ensonify one target more than the other (de Boer, 1985). Because of the different sound speeds in aluminium and steel, the interference patterns of similarly sized targets of the two materials will be different, perhaps allowing for discrimination based on spectral cues (Au, 1993; Au et al., 2009; Wisniewska et al., 2012). Indeed, when one target is ensonified more than the other, the relative contributions of these interference patterns may offer spectral cues useful for solving the task (Moore et al., 1984; Schmidt, 1992; Branstetter et al., 2020). Finally, this discrimination process may be greatly aided by the weighting of each of the target echo by level differences in the two echo streams due to sequential scanning (Movie 1) of a directional beam across them as we discuss in detail below.
Biosonar beam as a spatial filter
As only the targets within the narrow swathe of a directional beam will render strong echoes, the echoes from off-axis targets will be weaker (Kalko and Schnitzler, 1993; Schnitzler and Kalko, 2001; Surlykke et al., 2009a; Schmieder et al., 2010, 2012). In this way, a highly directional biosonar beam could act as a spatial filter for clutter rejection by having one echo stream significantly louder than the other. The ΔEL ratios observed between on- and off-axis targets (Fig. 10), even when the targets were closely spaced and therefore ΔT was smaller than the auditory integration time, likely facilitated clutter rejection. We found that even when ΔEL values were as small as ∼2 dB, the porpoises still successfully discriminated between the targets (Fig. 10). In a phantom target experiment, Eptesicus fuscus bats were confronted with delays between echoes (of 5–50 µs) much lower than their auditory integration time (of ∼2 ms; Surlykke and Bojesen, 1996), and the authors suggested that the echo level differences returning from the two targets aided the discrimination decision (Simmons et al., 1989). We posit likewise that ΔELs of closely spaced objects within different parts of a highly directional porpoise sonar beam substantially aid clutter rejection via spatial filtering at very short echo delays.
Similarly to our study, an experiment on the biosonar behaviour of Phyllostomus discolor bats confronted with clutter showed that bats could spatially resolve distractors/maskers at temporal delays smaller than the bat auditory integration time when the spatial release from masking increased (Wagenhäuser et al., 2020). Indeed, shifting the clutter/distractor further off-axis has been shown to facilitate target detection in Eptesicus fuscus bats (Sümer et al., 2009; Warnecke et al., 2014). To do this in the present study, the porpoises here would have had to be closer to the closely spaced targets to resolve and perceive a gap in the spatial perception of the two targets, and we did indeed observe this (Fig. 7A). Discrimination performance in bats has been shown to deteriorate with both decreasing ΔELs and inter-masker delays (Wagenhäuser et al., 2020), and while this was not observed here, perhaps our minimal ΔEL and delay values (Figs 7C and 10) were not small enough to deteriorate performance.
The smallest inter-target distance used here was a biologically reasonable distance between neighbouring prey items in a prey school (see Benoit-Bird et al., 2017), and this gave rise to ΔT values well below the auditory integration time. Additionally, echolocating odontocetes also face arguably the most intense acoustic clutter when sonar recognition of buried targets is required. Our finding that echo streams can be independently resolved when received at temporal intervals below the critical interval lends credence to the mechanisms facilitating biosonar-mediated foraging when the targets/prey are buried in sediment (Roitblat et al., 1995; Houser et al., 2005) – a topic that warrants further study.
Acoustic clutter rejection thus appears to occur in the spatial and spectral domains when it cannot be resolved fully in the temporal domain. The example auditory scenes (Fig. 2E) and the performance results taken as a whole (Figs 3–10) show that echolocation behaviours vary according to the acoustic complexity of the scene and demonstrates the usefulness of a directional sound beam that reduces ensonification of off-axis clutter. Jensen et al. (2018) proposed a narrow acoustic field of view as the primary evolutionary driver for the highly directional biosonar beams in toothed whales. We argue that a strong driver for this convergence is the clutter rejection demonstrated here via spatial filtering in concert with directional hearing (Kastelein et al., 2005).
Within the convergence on similar biosonar beamwidths of toothed whales, there is increasing evidence for active control of the acoustic field of view around that mean. Active biosonar adjustments, including those to beamwidth, can act to pre-filter the auditory streams (Lewanzik and Goerlitz, 2021). Dynamic adjustments of biosonar beamwidths have been demonstrated, whereby echolocators can have adjust the size of the area and volume ensonified. Studies on bats (Jakobsen and Surlykke, 2010; Jakobsen et al., 2013; Linnenschmidt and Wiegrebe, 2016), delphinids (Moore et al., 2008; Finneran et al., 2014) and porpoises (Wisniewska et al., 2015) have shown such dynamic widening of the beam, even in the wild (Jensen et al., 2015; Ladegaard et al., 2017). The adaptive widening of the beam during the final phases of prey capture, which evolved convergently, is likely crucial to hunting since it allows for keeping fast-moving, evasive prey items within the field of view at close range (Jakobsen and Surlykke, 2010).
We would therefore hypothesize, given the demonstrated flexibility in beamwidth, that a narrow beam would be used at close range when echolocating on closely spaced targets. Indeed, recent findings on wild mouse-eared bats (Myotis myotis) and captive Phyllostomus discolor bats showed that just prior to prey capture, the acoustic field of view was narrowed to focus on the echo stream generated from a target of interest (Linnenschmidt and Wiegrebe, 2016; Stidsholt et al., 2021). Narrowing the beamwidth during the final phase of target interception runs counter to the observed beamwidth widening in the buzz phase of porpoises as they intercept a single target (Wisniewska et al., 2015). In the present study, if the beamwidth was constant and static throughout the echolocation sequence, the difference in SL and angle specific ASL as a function of bearing to the distracting target would be expected to be constant across inter-target spacings (i.e. points in Fig. 9 would cluster along the beam profile). If, as previously demonstrated, the porpoises used a broader beamwidth, akin to a ‘floodlight’ (Wisniewska et al., 2015), we would expect to observe points clustering at great bearings (Fig. 9). That we do observe this (Fig. 9) suggests that the porpoises adjusted the degree of beamwidth steering according to the complexity of acoustic scene, but in the opposite way from that which was hypothesized. Specifically, a broader functional beamwidth was inferred in on-axis clicks when targets were more closely spaced (Fig. 9). Therefore, while porpoises can adjust their beamwidth, they were not observed to actively narrow their beam to exclude distracting acoustic clutter from non-target objects.
The porpoises buzzed from farther away (Fig. 4C) and buzzed for longer (Fig. 4B) when acoustic complexity was greater and the auditory streams were spatially and temporally closer to one another. When the click beamwidth is broader (during buzz clicks; Fig. 9), the spatial filter offered by the beam is less steep and of lower order (i.e. ΔEL contrasts would be lower). Of course, this assumes that the wide beamwidth is hardwired with the buzz, but the observation of broader beamwidth clicks having ICIs <13 ms supports this (Fig. 9). Thus, the higher contrasts in the auditory streams of on- and off-axis targets, as provided by using a narrow beamwidth, were not available when using broader beamwidth buzz clicks. This is the case for echolocating bats, whose much broader beamwidth does not offer the stark contrast in the level of returning echoes from on- and off-axis targets (Ghose and Moss, 2003; Nelson and MacIver, 2006). Indeed, the bat's broader beam means that almost equal sound energy arrives at objects within the wide swathe of its beam (Surlykke et al., 2009a). While bats are thought to have acute directional hearing, this poorer spatial resolution has been behaviourally demonstrated in bats presented with multiple and simultaneous acoustic reflectors (Geberl et al., 2019). Instead, spectral cues are thought to be more important for guiding auditory stream segregation in bats (Surlykke et al., 2009b).
Sequential scanning
The porpoises in this study, along with bats and other toothed whales, exhibit sequential scanning behaviour with their echolocation beams (e.g. Evans, 1973; Ghose and Moss, 2003; Martin et al., 2005; Surlykke et al., 2009b; Wisniewska et al., 2012, 2015; Zimmer et al., 2005; Movie 1), and some bats also have conspicuous ear movements accompanying their echolocation (e.g. Kugler and Wiegrebe, 2017). Such scanning behaviours may aid in the detection or localization of targets by providing cues for binaural reception (Aytekin et al., 2004), as well as spectral cues of the returning echoes if the clicks are broadband (Arditi et al., 2015). The presence of distractors has been shown to influence head-scanning movements in bats (Mao et al., 2016). While the placement of the biologging tag in our study prevented measurements on the degree of head-scanning movement in the porpoises, the number of scans across each target could be quantified: the porpoises scanned across the targets more in the scenarios with close inter-target distances (Fig. 3C, Fig. S1). As each scan comprised ∼5–10 clicks across a target, and the porpoises made more scans across the targets when closely spaced (Fig. 3C), a larger amount of echoic information was needed to resolve more acoustically complex scenes. This is similar to a study on Eptesicus fuscus bats which showed that biosonar adjustment magnitude depended on the angular offset to the distractor (Aytekin et al., 2010). Head movement also increases the effective swathe of the beam if integrating information over several sequential clicks. Therefore, it worth noting that the functional beamwidth considered on a click-by-click basis is a conservative estimate of the acoustic field of view: spatial memory likely updates an auditory scene spanning several beamwidths (Madsen et al., 2013), and there is spatial redundancy between the ensonified sensory volumes generated by each click (Stidsholt et al., 2021).
A target discrimination study in harbour porpoises by Wisniewska et al. (2012) purposefully placed targets at a 1 m range from one another so that through much of the approach, the porpoise would not be able to ensonify both targets simultaneously, but rather have to scan the acoustic scene to solve the target discrimination task. In that study, it was often observed that when the porpoises homed in on a target, and then changed its decision in the discrimination task, they would often re-enter the regular echolocation click phase before buzzing on the other target. While this was also observed in the present study, we also observed inter-target buzzing (Fig. 2E), demonstrating for the first time that target discrimination can also take place in the buzz phase (Fig. 10). Thus, buzzing is apparently not only a low sensory volume, high resolution biosonar sequence to guide interception of a chosen target at close range (Madsen et al., 2005), but also a biosonar mode where echo guided discrimination can happen.
Ecological relevance
Porpoises often hunt in shallow, acoustically cluttered habitats and are thus subjected to a barrage of unwanted echoes. However, they continue to entangle and drown in nets that their biosonar is capable of detecting (Read et al., 2006). The acute time resolution demonstrated by porpoises in this study supports the idea that the biosonar of wild toothed whales would be capable of detecting and resolving both fishing nets and nearby prey, in agreement with net detection experiments (Au and Jones, 1991; Au, 1994; Kastelein et al., 2000). Therefore, the acoustic complexity of an auditory scene comprised of prey next to or caught in a net (and therefore rendering echoes with short time delays), is likely not the culprit of bycatch. Instead, perhaps net detection is more challenging if the porpoise's attention is focused on prey items within the net, or by external factors such as anthropogenic stressors. When a task is difficult and attention-demanding, foraging performance can be constrained and the detection of threats may be hindered (Dukas and Kamil, 2000). Noise has also been suggested to act as a distractor and narrow the attention in bats, whereby it reduces hunting performance in biosonar-mediated prey capture and drinking (Allen et al., 2021; Domer et al., 2021). For porpoises, it is plausible that attention on biosonar-mediated prey capture could similarly reduce vigilance to predators or fishing nets.
Acknowledgements
We are grateful to trainers Josefin Larsson, Fredrik Johansson, Mathilde Kjølby and Jakob Højer Kristensen at Fjord & Bælt, Kerteminde, Denmark for assistance in conducting the experiment. Thanks to Kristian Beedholm for building the LabView program for data collection and synchronization. We thank John Svane Jensen for making the targets and their frame. Thanks to Mark Johnson for providing the DTAG4, and to Magnus Wahlberg for providing both office space in Kerteminde and the example target discrimination video. We thank Danuta Wisniewska and Mark Johnson for thoughtful discussions on experimental design, and Lasse Jakobsen for helpful discussions on data interpretation. We also thank Peter Tyack, John Buck, and K. Alex Shorter for helpful comments on early results, and Laura Stidsholt, Kristian Beedholm, Jakob Tougaard and two anonymous reviewers for helpful feedback on earlier versions of the manuscript.
Footnotes
Author contributions
Conceptualization: C.E.M., P.T.M.; Methodology: C.E.M., L.R.-D., P.T.M.; Software: C.E.M.; Validation: C.E.M.; Formal analysis: C.E.M., L.R.-D.; Investigation: C.E.M.; Resources: P.T.M.; Data curation: C.E.M.; Writing - original draft: C.E.M., P.T.M.; Writing - review & editing: C.E.M., L.R.-D., P.T.M.; Visualization: C.E.M., L.R.-D.; Supervision: P.T.M.; Project administration: C.E.M., P.T.M.; Funding acquisition: P.T.M.
Funding
PhD funding was provided by Danmarks Grundforskningsfond (Danish National Research Council) grants to P.T.M. (27125). This, along with Office of Naval Research Global (ONR) awards N00014-18-1-2062 and N00014-20-1-2709, covered research time at Fjord & Bælt.
Data availability
The data supporting this paper are available on Zenodo at https://doi.org/10.5281/zenodo.5031343. Custom software is available upon request.
References
Competing interests
The authors declare no competing or financial interests.