Many echolocating bats forage close to vegetation – a chaotic arrangement of prey and foliage where multiple targets are positioned behind one another. Bats excel at determining distance: they measure the delay between the outgoing call and the returning echo. In their auditory cortex, delay-sensitive neurons form a topographic map, suggesting that bats can resolve echoes of multiple targets along the distance axis – a skill crucial for the forage-amongst-foliage scenario. We tested this hypothesis combining an auditory virtual reality with formal psychophysics: we simulated a prey item embedded in two foliage elements, one in front of and one behind the prey. The simulated spacing between ‘prey’ (target) and ‘foliage’ (maskers) was defined by the inter-masker delay (IMD). We trained Phyllostomus discolor bats to detect the target in the presence of the maskers, systematically varying both loudness and spacing of the maskers. We show that target detection is impaired when maskers are closely spaced (IMD<1 ms), but remarkably improves when the spacing is increased: the release from masking is approximately 5 dB for intermediate IMDs (1–3 ms) and increases to over 15 dB for large IMDs (≥9 ms). These results are comparable to those from earlier work on the clutter interference zone of bats (Simmons et al., 1988). They suggest that prey would enjoy considerable acoustic protection from closely spaced foliage, but also that the range resolution of bats would let them ‘peek into gaps’. Our study puts target ranging into a meaningful context and highlights the limitations of computational topographic maps.
Distance is important: from an ecological perspective, knowledge about one's distance from either prey or predator is vital. The ‘classic’ remote senses vision and passive audition (hearing externally created sounds) assess distance via indirect cues. In vision, distance perception is achieved via binocular cues (Erkelens and van Ee, 1998; Howard and Rogers, 1995, 2002; Qian, 1997; Rogers and Graham, 1979). In passive hearing, distance estimation is only possible when the sound source is quite close (Kuwada et al., 2010, 2015) or very familiar (Zahorik and Wightman, 2001), and further facilitated in reverberant environments (Bekesy, 1938; Bronkhorst and Houtgast, 1999; Mershon, 1975). However, in active hearing, i.e. echolocation, the absolute distance to an object can be directly perceived.
Echolocating animals measure distance, also called target range, by the delay between outgoing call and returning echo. In bats, the importance of distance, and its perceptual equivalent echo delay, is reflected in neural specialisations along the entire auditory pathway (Covey and Casseday, 1991, 1999; Grothe et al., 1992): bats possess delay-tuned neurons that respond strongest when the bat receives echoes from an object at a specific distance. The delay tuning culminates in a topographic representation of echo delay in the cortex: there is a clear relationship between a neuron's position inside the postero-dorsal auditory cortex and its preferred echo delay (Bartenstein et al., 2014; Hagemann et al., 2010; O'Neill and Suga, 1979). The existence of this neurally computed distance map suggests that bats may be able not only to accurately localize objects along the distance axis, but also resolve multiple (acoustically semi-transparent) objects separated only along the distance axis. In other words, can delay tuning be the sensory basis not just of range accuracy, but also of range resolution?
A related question was addressed by Simmons et al. (1988). The authors trained echolocating bats (Eptesicus fuscus) to detect an electronically generated phantom reflection in the presence of masking reflections. Many echolocating bats forage close to vegetation – an environment where multiple reflectors are positioned behind one another. This is commonly called a ‘cluttered environment’, with clutter referring to non-target structures (e.g. foliage) and the echoes reflected off them. Simmons et al. (1988) characterized a clutter interference zone along the distance axis, i.e. a range of distances where objects cannot be detected independently of one another. However, the study design gave rise to additional perceptual cues that may complicate the interpretation of the results by Simmons et al. (1988).
We designed an experimental paradigm that precludes these additional cues and tested the range-resolution hypothesis. Following Lord Rayleigh's definition of spatial resolution – namely, that two closely spaced light sources are spatially resolved when there is a detectable dip in their joint light diffraction patterns (Rayleigh, 1879; Westheimer, 2005) – we generally defined two objects as spatially resolved when there is a detectable gap in their joint perception. Consequently, we predicted that echolocating bats possess range resolution if they (i) can detect a target in the gap between two objects positioned behind one another. We further predicted that (ii) detection performance increases with echo-to-clutter ratio and that (iii) detection performance generally increases with distance between the two objects, potentially revealing a threshold distance, i.e. a resolution limit.
We tested these predictions in a complex acoustic virtual environment. Specifically, we simulated a prey item embedded in two foliage elements, one in front of and one behind the prey. The simulated spacing between ‘prey’ (target) and ‘foliage’ (maskers) was defined by the inter-masker delay (IMD). We report on the results from a psychophysical detection experiment with echolocating bats (Phyllostomus discolor). We demonstrate that with a delay difference of ∼2.2 ms at a reference delay of 6.3 ms, the bats can ‘listen into’ the dip between two masking reflections and detect a target reflection. We conclude that for a target distance of 1 m, P. discolor have a range-resolution limit of approximately 37 cm.
MATERIALS AND METHODS
Animals and permit
We used three adult male individuals of the bat species Phyllostomus discolor Wagner 1843. Husbandry details can be found in Baier and Wiegrebe (2018). All experiments complied with the principles of laboratory animal care and were conducted under the regulations of the current version of the German Law on Animal Protection (approval 55.2-1-54-2532-34-2015, Regierung von Oberbayern).
The experiments were performed in an open Y-maze inside a dark, echo-attenuated chamber. The 3D-printed Y-maze (see Fig. 1A) consisted of a pentagram-shaped starting area (side length 10 cm) and two arms (width×length 8×12.6 cm) and was covered in removable cloth. The loudspeakers and microphones as well as the food dispensers were mounted at the end of each arm. The experimenter was outside the chamber and observed the experiment via an infrared camera (Abus® TV6819) and headphones emitting heterodyned versions of the microphone signals. Stimulus presentation and data recording were controlled via a custom MatLab® R2007b application (The MathWorks, Inc., Natick, MA, USA) and SoundMexPro (HörTech GmbH, Oldenburg, Germany).
Virtual scenario generation
Bats were trained to detect a virtual target flanked by two virtual maskers. All target and masker reflections were implemented as virtual reflections, generated by a real-time stereo convolution engine that calculated complex echoes from the bats' ultrasonic emissions. The structure of the impulse responses (IRs) loaded into the convolution engine defined the echo-acoustic properties of the virtual reflections (see Fig. S1).
The unrewarded IR (Fig. 1B, left) consisted of two virtual maskers alone; the rewarded IR consisted of two virtual maskers surrounding a virtual target reflection (Fig. 1B, right). The maskers were implemented as short (65 samples=338 µs) white-noise bursts; the target was implemented as a simple reflector (one sample=5 µs impulse). For each trial in the psychophysical procedure, the noise bursts were refreshed. This ensured that there was no systematic spectral interference between the masker and target reflections, which would have generated unwanted spectral cues. Echoes as they are generated with these complex IRs excited by a standard P. discolor echolocation call are shown in the bottom panels of Fig. 1B. We generated the scenarios by convolving a call recorded through the microphones with either of two IRs (maskers without and with target) and playing back the resulting virtual echo via the loudspeakers (see Fig. S2). Every change a bat chose to make in its emission sequence (e.g. change in call timing, call spectrum or call direction) was immediately reflected in the echoes.
Specifically, the bat's ultrasonic emissions were picked up by two microphones (SPU0410LR5H-QB, Knowles Corporation, Itasca, IL, USA) mounted 45 deg left and right relative to the bat's starting position on the Y-maze. The microphone signals were amplified (octopre LE, Focusrite plc, Bucks, UK) and fed into the inputs of two real-time digital signal processors (RX6, Tucker Davis Technologies, Gainesville, FL, USA; 192 kHz sampling rate). In one processor, the signal was convolved with the rewarded IR (containing both the masker and the target reflections), whereas in the other processor, the signal was simultaneously convolved with the unrewarded IR (containing only the masker reflections). A constant base delay preceded the IRs in both processors such that the overall delay of the target reflection, including digital delays and acoustic delays from the bats' emissions travelling to the microphones and the echoes travelling from the loudspeakers back to the bats, amounted to 6.3 ms (corresponding to 1.07 m distance). The delay between the first and the second virtual masker (inter-masker delay IMD) was set by the experimenter. The masker delays were always geometrically centred on the target delay of 6.3 ms, i.e. the first masker was always closer to the target reflection than the second (see Fig. 1B, top right panel). This was done because the sharpness of cortical tuning to echo delay appears to scale with absolute echo delay (Greiter and Firzlaff, 2017; Hagemann et al., 2010; Suzuki and Suga, 2017). The outputs of the real-time processors were connected via a stereo amplifier (Harman Kardon HK 6150; Harman Deutschland, Heilbronn, Germany) to two ultrasonic speakers (Peerless XT25SC40-04, Tymphany HK Limited, San Rafael, CA, USA). The target strength of the target reflection was fixed at −12 dB; the root-mean-square target strengths of the maskers were varied between −72 and −12 dB to obtain a psychometric function and a threshold for that masker strength where the signal was just detectable. Psychometric functions and thresholds were acquired for IMDs of 0 (reference condition), 0.80, 1.13, 1.59, 2.25, 3.18, 9.00, 12.73 and 18.00 ms and for each of three bats (see below). The IMD is the onset difference between the first and second masker reflection.
Training/recording sessions (one to three per day) each lasted 10 min. Bats were trained on 5 days per week, followed by a 2-day break. The experiment followed a two-alternative, forced-choice paradigm (2AFC) with food reinforcement. Once a bat sat in the starting area of the Y-maze, presentation of the IRs was switched on. The position of the target reflection (left or right) was pseudorandom from trial to trial. Bats had to echolocate to identify and move towards the IR that contained the target reflection, where they were rewarded as soon as they reached the corresponding feeder (prediction i). Once a bat had learned this task with very faint maskers (−72 dB and >70% correct choices on five consecutive days), the strength of all four masking reflections was increased, making the detection task more difficult (prediction ii). Starting each session with three consecutive trials presenting the weakest maskers (−72 dB), data acquisition proceeded by increasing the masker strengths in steps of 6 dB until the bats could not detect the target at all, and then restarting at very low masker strengths until the daily sessions were completed. Testing for one IMD was completed when at least 30 trials were obtained per masker strength and bat.
Behavioural data analysis
Percent correct performance of the animals as a function of masker strength was fitted with a sigmoidal function and the value of this fit at 70% was taken as threshold (for P<0.05 in a binomial test; see Fig. 2). The threshold for a specific IMD is the masker strength that just allows a bat to reliably detect the target in the presence of the maskers. For each bat, we calculated release-from-masking values for IMDs between 0.80 and 9.00 ms as the difference between the respective threshold and the reference threshold at an IMD of 0 ms. Release-from-masking values were fitted with a sigmoidal function whose turning point determined the resolution limit of each bat.
During the psychophysical experiment, the recorded call sequences were saved in a 3-s stereo ring buffer (192 kHz sampling rate, 24 bit resolution; Fireface) parallel to the virtual-scenario production. Offline, we band-pass filtered the stereo recordings at 100 Hz to 20 kHz applying a second-order Butterworth filter. We applied a synthesized echolocation call (multiharmonic FM-downward sweep of 1 ms duration with a fundamental frequency ranging from 21 to 18 kHz) as a matched filter to separate echolocation calls from other transient events. Temporal and spectral call parameters were taken from the channel corresponding to the rewarded scenario. We calculated the −10 dB call duration. We calculated the −20 dB bandwidth from initial and end frequencies. We calculated the spectral centroid (weighted mean of frequencies present in the signal) from a time-averaged spectrogram with a 1500 Hz binwidth (see Table S1).
Three male bats (P. discolor) learned to discriminate between a virtual scenario consisting of two masker reflections and a virtual scenario consisting of two masker reflections plus the target reflection. We used the behavioural response of the bats to assess the masking thresholds, i.e. the highest masker strength that still let the bats detect the target reflection. The experiment yielded 27 psychometric functions (one per bat per inter-masker delay, IMD), i.e. a bat's performance in detecting the scenario containing the target reflection as a function of masker strength (Fig. 2). Our results verify the predictions derived from the range-resolution hypothesis: (i) all bats learned to detect the target in between the maskers; (ii) for all bats, performance increased with lower masker strengths, i.e. with echo-to-clutter ratio; and (iii) for all bats, performance increased with IMD, and the particular masking thresholds yielded each bat's resolution limit.
For the seven IMDs ranging from 0 to 9 ms, the bats' behavioural response confirmed our expectations: discrimination performance was good at low masker strengths and deteriorated with increasing masker strength (Fig. 2A–G). For the two IMDs of 12 and 18 ms, however, discrimination performance remained above chance level regardless of masker strength (Fig. 2H–I), suggesting that the masking threshold at these large IMDs – if one exists – lies above the here-tested levels. Consequently, we only extracted masking thresholds from the psychometric functions for IMDs between 0 and 9 ms (one per bat per IMD).
For IMDs between 0 and 9 ms, all bats reliably (fit at 70–90% correct choices; Fig. 2A–G) detected the target reflection when the maskers were very faint (masker strength of −60 dB and lower). In contrast, when the maskers were very loud (masker strength of −34 dB and higher), none of the bats could solve the detection task for these IMDs (43–65% correct choices; Fig. 2A–G).
For IMDs between 0 and 9 ms, detection performance as a function of masker strength shows a clear trend: masking thresholds remain around −50 to −60 dB for IMDs shorter than 2–3 ms, but improve rapidly when the IMD is increased further (Fig. 3A, Table S1).
Distance resolution threshold
In order to derive a distance resolution threshold from the behaviourally obtained masking thresholds, we assessed the release from masking provided by the separation of the maskers. As outlined in the Introduction, the two masker reflections are perceptually resolved in distance when there is a significant dip in their perceptual representation. This dip is probed with the target reflection. The dip is significant when the masking effect elicited by both maskers is significantly less than that elicited by one of the maskers. In the reference condition with an IMD of 0 ms, the two maskers act as one, because they are presented simultaneously. However, the noise power of both masker reflections adds up, making this one masking reflection 3 dB stronger than one masker reflection would be. Consequently, the perceptual dip is significant when the masking effect elicited by both separate maskers is more than 3 dB weaker than the masking effect elicited by both maskers on top of each other. In other words, the maskers are spatially resolved when the release from masking (relative to the IMD of 0 ms) is at least 3 dB. We calculated release-from-masking values as the difference between each respective masking threshold for IMDs between 0.8 and 9 ms and the masking threshold for an IMD of 0 ms. We extracted exact values as the turning points of fitted sigmoidal functions. On average, a release from masking larger than 3 dB is seen when the IMD exceeds 2.2 ms (Fig. 3B; bat 1: 2.7 ms, bat 2: 1.9 ms, bat 3: 1.9 ms). Converting echo delay into distance measures, bats showed a distance-resolution limit of approximately 37 cm for a target distance of 1.07 m (6.3 ms reference delay).
Echolocating bats perceive absolute distance to objects by measuring the time delay between call and reflection. With the current psychophysical experiment, we show that P. discolor bats can also resolve multiple reflections along the distance axis. We used the target reflection as a probe to characterize the temporal auditory excitation pattern generated by the masking reflections. We show that the average resolution limit is 2.2 ms (1.9–2.7 ms) when the maskers are centred on a reference delay of 6.3 ms. This resolution limit is equivalent to a range of approximately 37 cm around a reference distance of 1.07 m. In the following sections, we first discuss the ‘clutter interference zone’ in terms of experimental design and significance. We then consider the resolution limit in the context of previous experiments on object detection along the distance axis. Lastly, we briefly address the acoustic properties of the bats' echolocation calls.
The clutter interference zone
Simmons et al. (1988) characterized the ‘clutter interference zone’: a range of distances where objects cannot be detected independently of one another. They trained bats to detect a virtual target reflection in the presence of masking reflections off a ring-shaped object. However, with this paradigm, it is unclear which strategy the bats may have used to solve the psychophysical task: besides perceptually resolving target and masker reflections, the bats could also have evaluated (i) overall target strength, (ii) overall echo duration or (iii) spectral interference between the target and masker reflection. In criterion-free psychophysical procedures such as the 2AFC procedure (Green and Swets, 1966), the subject is free to choose the perceptual cue(s) providing the highest success rate. By no means is it certain that these are the perceptual cues that the experimenter expected to probe. Therefore, stimulus design is critical.
The current experiment (see Fig. 1) was designed to preclude the use of additional perceptual cues: (i) the overall target strength for the unrewarded and the rewarded impulse response (IR) differed by the target strength of the target reflection, which was set to −12 dB, far below P. discolor’s threshold for amplitude discrimination (Heinrich et al., 2011); (ii) the overall duration of the stimulus was set by the inter-masker delay (IMD) and was the same for the unrewarded and the rewarded IR; and (iii) the IRs of the masker reflections consisted of noise bursts that were repeatedly refreshed so that they did not create systematic spectral interference with the target reflection. We are confident that our current stimulus design prevented unwanted perceptual cues and let us actually probe biosonar resolution of objects along the distance axis.
Notably, the current paradigm requires much more measurements than the paradigm by Simmons et al. (1988). The clutter interference zone was determined with one psychometric function per bat (albeit at three different reference distances): the target reflection was set to a target strength just detectable by the bat without maskers and performance was then measured as a function of masker position relative to the target. Here, we measured performance as a function of masker strength and recorded a complete psychometric function per bat for each IMD. The current results are therefore based on approximately 10 times the number of data points per bat compared with the clutter interference zone experiment (see Fig. 2).
These crucial points notwithstanding, the resolution limit quantified here is quite similar to the limits of the clutter interference zone by Simmons et al. (1988). They found that for a target distance of 40, 80 or 160 cm, clutter interference zones for one bat extended approximately 25, 32 and 60 cm around the target distance, respectively (see fig. 4 in Simmons et al. 1988). We demonstrate here a resolution limit of approximately 37 cm at a reference distance of 1.07 m. The similarity of the results indicate that in the study by Simmons et al. (1988), bats may have relied on temporal-resolution cues to separate the target reflection from the clutter reflections, despite the presence of multiple other perceptual cues.
Object detection along the distance axis
Aside from the direct comparison with formal psychophysical experiments, the current results belong in the context of object detection along the distance axis. Detecting a target in front of or behind a non-target is a common task in biosonar. To detect prey in clutter, i.e. among non-target structures, bats usually apply one of several foraging strategies (reviewed in Denzinger and Schnitzler 2013). They either use other sensory systems such as vision or olfaction, they hunt only moving prey that generate peculiar echoes (flutter detection), or they eavesdrop on prey-generated sounds (passive gleaning).
The effects of clutter on biosonar performance are reflected in a number of behavioural studies in bats. In Eptesicus fuscus, target detection is impaired when target and clutter are arranged along the same azimuth and elevation, but shifting the clutter source off-axis leads to a spatial release from masking and facilitates target detection (Sümer et al., 2009; Warnecke et al., 2014). Moss et al. (2011) showed how free-flying bats negotiate such spatial unmasking and temporal resolution to locate and intercept prey in a complex environment. These behavioural experiments provided spatial cues not only along the distance axis, but also along the azimuth and/or elevation axes (bats could adjust their flight paths). However, backward and forward masking of the clutter onto the suspended target – and thus the bats' capability to perceptually resolve target and clutter along the distance axis – will contribute to the bats' performance.
Recently, Geipel et al. (2019) investigated a bat species that hunts silent and motionless prey among dense vegetation, solely using echolocation. The authors tested a hypothesis (modified after Denzinger and Schnitzler 2013) stating that such a foraging strategy would exploit ‘an isolated additional [prey] echo between the clutter echoes’ (Denzinger and Schnitzler, 2013). This describes exactly the current results, where bats had to detect the target reflection between the masker reflections. However, as has been pointed out earlier (Baier, 2019), the study by Geipel et al. (2019) found maximum delays of 0.25 ms between the target and the clutter echoes, because prey items were perched directly on leaves. Considering the current results, it would be impossible for these bats to separate the echoes in the time domain. Accordingly, the authors investigated and confirmed a different strategy, namely the active reduction of clutter echoes by approaching from angles that transform the leaf into a specular reflector (Geipel et al., 2019).
Acoustic properties of echolocation signals
As biosonar is an active sense, bats can change the temporal and spectral properties of their signals according to the perceptual task at hand. Generally speaking, bats produce shorter and broader/higher calls in cluttered environments. Shorter call durations result in more accurate and up-to-date information owing to less overlap between consecutive echoes (or call and echo) even at high repetition rates. Bats tend to avoid overlap of target echo and clutter echoes as well as overlap of call and echo (Kalko and Schnitzler, 1993, 1989). Higher call frequencies result in higher spatial acuity owing to shorter wavelengths and higher directionality (Griffin, 1958). The range of frequencies that an echolocation call covers, its bandwidth, determines accuracy in ranging (Simmons, 1973). Within the Myotis genus of bats, those species with the largest bandwidth are most successful at finding prey suspended in front of a clutter surface (Siemers and Schnitzler, 2004).
In light of this, we analysed the echolocation calls that the bats used throughout the experiment. Remarkably, we found no evidence that bats adapted their call parameters in response to task difficulty or in relation to their individual resolution limit (Table S2). The temporal parameters we observed, however, are a good match given a simulated distance of 107 cm between the virtual target and the bat when compared with other studies on P. discolor detecting real and virtual targets (Baier et al., 2018; Baier and Wiegrebe, 2018; Linnenschmidt and Wiegrebe, 2016).
In summary, our work offers compelling evidence for spatial resolution along the distance axis in echolocating bats. Corroborating earlier work with a more quantitative experimental design, we have introduced a virtual-reality approach with complex echo-acoustic scenarios, precluding non-conclusive perceptual cues. We demonstrated that P. discolor bats can listen into a perceptual dip between multiple reflections and therefore possess range resolution.
We thank B. Fenton and two anonymous reviewers for their constructive comments. We are grateful to E. Lattenkamp and E. Mardus for their kind help during data acquisition and to B. Grothe for providing excellent research infrastructure. This study is dedicated to the memory of our co-author, mentor and friend Lutz Wiegrebe. His ceaseless curiosity and selfless support will remain a perpetual inspiration.
Conceptualization: L.W.; Methodology: L.W., A.L.B.; Software: L.W., A.L.B.; Formal analysis: P.A.W., L.W., A.L.B.; Investigation: P.A.W., A.L.B.; Writing - original draft: L.W., A.L.B.; Writing - review & editing: A.L.B.; Visualization: P.A.W., A.L.B.; Supervision: L.W., A.L.B.; Project administration: L.W., A.L.B.; Funding acquisition: L.W.
This work was supported by the Ludwig-Maximilians-Universität München [LMU-TAU Joint Research Program grant to L.W.].
The authors declare no competing or financial interests.