Sensory systems experience a trade-off between maximizing the detail and amount of sampled information. This trade-off is particularly pronounced in sensory systems that are highly specialised for a single task and thus experience limitations in other tasks. We hypothesised that combining sensory input from multiple streams of information may resolve this trade-off and improve detection and sensing reliability. Specifically, we predicted that perceptive limitations experienced by animals reliant on specialised active echolocation can be compensated for by the phylogenetically older and less specialised process of passive hearing. We tested this hypothesis in greater horseshoe bats, which possess morphological and neural specialisations allowing them to identify fluttering prey in dense vegetation using echolocation only. At the same time, their echolocation system is both spatially and temporally severely limited. Here, we show that greater horseshoe bats employ passive hearing to initially detect and localise prey-generated and other environmental sounds, and then raise vocalisation level and concentrate the scanning movements of their sonar beam on the sound source for further investigation with echolocation. These specialised echolocators thus supplement echo-acoustic information with environmental acoustic cues, enlarging perceived space beyond their biosonar range. Contrary to our predictions, we did not find consistent preferences for prey-related acoustic stimuli, indicating the use of passive acoustic cues also for detection of non-prey objects. Our findings suggest that even specialised echolocators exploit a wide range of environmental information, and that phylogenetically older sensory systems can support the evolution of sensory specialisations by compensating for their limitations.
Specialised sensory systems that provide a high level of detail, such as foveal vision, typically suffer from a limited spatial extent and slow temporal update rate (Land, 2006). In contrast, sensory systems capable of sampling a large area at a high sampling rate are at the same time limited in the acuity and resolution of the acquired information. To solve this trade-off, the brain combines information from different sensory systems using attentional sampling routines (Schroeder et al., 2010). Such sensory integration and attentional switches across different sensory modalities have been studied in depth (Yorzinski et al., 2017). In contrast, less is known about how animals coordinate multiple streams of information within a single modality. Here, we demonstrate how a sensory specialist, the echolocating horseshoe bat, compensates for the limitations of its specialised active echolocation system by relying on the phylogenetically older and less specialized mechanism of passive hearing.
Echolocation as used by bats, toothed whales and some birds is an active sensory system, based on emitting sound energy into the environment and analysing the returning echoes (Nelson and MacIver, 2006; Schroeder et al., 2010). While permitting active control over sensory input, echolocation is at the same time severely limited: the stroboscopic and highly directional emission of calls and the strong atmospheric attenuation of ultrasonic frequencies limits the space that can be probed both temporally and spatially (Jakobsen et al., 2013; Nelson and MacIver, 2006). These limitations are especially pronounced in bats with a highly specialised echolocation system that is based on calls with long constant-frequency components (CF-FM bats): horseshoe bats, hipposiderid bats and Pteronotus parnellii (Fenton et al., 2012; Jones, 1999; Schnitzler and Kalko, 2001). While their morphological and neural specialisations enable them to identify and evaluate prey with high precision (Denzinger and Schnitzler, 2013; Koselj et al., 2011; Ostwald, 1984; Schnitzler and Kalko, 2001; Vater et al., 1985; von der Emde and Menne, 1989), their specialised biosonar is even more directional and short range than that of other echolocators (Grinnell and Schnitzler, 1977; Jakobsen et al., 2013; Schnitzler and Grinnell, 1977; Schuchmann and Siemers, 2010). In consequence, relying solely on echolocation is costly. The small perception volume of echolocation results in longer and delayed average detection times for prey and predators, both of which can have substantial negative fitness consequences.
Given these limitations of high-frequency CF-FM biosonar, we hypothesised that bats compensate for these limitations of their active echo-acoustic perception with passive listening for environmental information. More specifically, we hypothesised that the bat auditory system employs wide-angle perception of peripheral, passive acoustic information in combination with the focused perception of the actively probed environment. This process is paralleled in the visual system of many vertebrates and cephalopods, where blurry and broad peripheral vision is combined with well-resolved and directional foveal vision. Saccadic eye movements sequentially sample a visual scene in fast temporal succession by steering the gaze to points of interest and can be used to infer the underlying attentional process (Hayhoe and Ballard, 2005; Henderson, 2003; Yarbus, 1967). Correspondingly, we here evaluated the scanning movements of the biosonar beam of horseshoe bats to infer the bats' sonar attention (Fujioka et al., 2016; Ghose and Moss, 2003; Seibert et al., 2013; Surlykke and Moss, 2000) towards passive acoustic environmental cues.
We presented greater horseshoe bats either consecutively (Experiment I) or simultaneously (Experiment II) with recordings of moths rustling on vegetation and spectral and temporal control versions of these recordings. We had two main predictions. First, if bats use passive hearing to improve the acquisition of environmental information, we predicted that they would concentrate the scanning movements of their echolocation beam around a sound source for further biosonar-based investigation. Second, if bats use passive acoustic information for prey detection, we predicted that they would show a preference for insect rustling over control sounds. To test these predictions, we compared call levels recorded at different microphones and how often the sonar beam was directed at the playback position between silence and playback conditions and between different acoustic stimuli.
MATERIALS AND METHODS
Experiments were conducted in a large sound- and echo-attenuated flight room (6×3.5×3 m3) at the Max Planck Institute for Ornithology, Seewiesen, Germany. A small wooden plate (5×7 cm2, width×height) 215 cm above the ground served as a perch for the bats. It was faced by a spherical arrangement of eight condenser ultrasound microphones (CM16/CMPA, Avisoft Bioacoustics, Glienicke, Germany) and three loudspeakers (NeoCD1.0, Fountek Electronics Co., JiaXing, China), which were mounted on a wooden board (2.85×2.05 m², width×height) covered by echo-attenuating foam (Fig. 1). The microphones were arranged in three symmetrical star-shaped sub-arrays, each with a central microphone surrounded by three peripheral microphones, which were equidistant (87±max. 1 cm) and equiangular (25 deg) to the central one (Fig. 1A–D). The membrane of each microphone was 2 m (±max. 0.9 cm) away from and oriented towards the head of the perched bat. The speakers were mounted 5–8 cm from each of the three central microphones. The acoustic impulse response of each microphone (and thus its sensitivity and frequency response) was calculated based on recordings of white noise presented from a loudspeaker placed at the bat's position. Microphone signals were pre-amplified (Octopre LE, Focusrite, High Wycombe, UK) and recorded at 192 kHz sampling rate and 16 bit resolution (Fireface 800, RME, Haimhausen, Germany). Sound playbacks were played via the same soundcard (Fireface 800), amplified (AVR 445, Harman/Kardon, Stamford, CT, USA) and presented to the bats via the loudspeakers. Sound presentation and recording was controlled by custom-written code for MATLAB (R2007b, MathWorks Inc., Natick, MA, USA) using SoundMexPro (version 184.108.40.206, Hörtech, Oldenburg, Germany).
We recorded the rustling sounds of one individual each from three moth species (Ochropleura plecta, Noctuidae; Pterostoma palpina, Notodontidae; and Sphinx pinastri, Sphingidae) while the moth was fluttering its wings and walking on dry leaves. Rustling sounds were recorded in the flight room with an omnidirectional microphone (CO-100K, Sanken Microphone Co. Ltd., Tokyo, Japan) connected to a preamplifier (Fireface 800, RME) and soundcard (USG 416H, Avisoft Bioacoustics), using Avisoft Recorder software (sampling rate: 250 kHz). Out of a total of 66 recordings, we selected seven recordings for each of the three moth species that differed in temporal structure, intensity pattern and frequency spectrum (Fig. 1E–G), resulting in 21 different rustling sounds used in the experiments. Recordings were subsequently trimmed to a length of 3 s and high-pass filtered at 1 kHz (Hamming window; 1024 tabs; zero-phase filter; Avisoft SASLab Pro version 5.2.09). Average sound pressure level of the selected recordings was 42 dB SPL [re. 20 μPa root mean square (RMS) at 0.1 m] and ranged from 31 to 52 dB SPL RMS. Zero-to-peak sound pressure level varied between 62 and 81 dB SPL (re. 20 µPa) with an average of 73 dB SPL.
Three types of acoustic stimuli were used to investigate the bats' reactions to passively presented environmental acoustic cues, namely, the recorded rustling sound and two control sounds (Fig. 1E–G). As control sounds, we generated amplitude-inverted and phase-scrambled versions of each rustling recording in MATLAB, resulting in a total of 63 playbacks. Amplitude inversion inverts the sign of the amplitude values of one randomly chosen half of the samples, resulting in a flat frequency spectrum, while preserving the temporal envelope of the sound. For phase-scrambling, the amplitude and phase spectrum of each rustling recording was calculated, the phase spectrum randomized and the waveform recreated by an inverse fast Fourier transform, resulting in a disrupted noise-like temporal pattern without notable temporal fluctuation but with maintained frequency spectrum.
Three captive adult male individuals of the greater horseshoe bat [Rhinolophus ferrumequinum (Schreber 1774)] were perch-trained for 5 days a week in daily sessions of 45 min length, with breaks, from March to November 2015 (pre-experimental training). During these training sessions, the bats were motivated with food rewards (mealworms, larvae of Tenebrio molitor) to land and stay on the perch. Note that during the training, the bats were not presented with nor trained to attend to any particular acoustic stimuli. When the bats had learned to remain on the perch, data collection for Experiment I (single playbacks) and Experiment II (paired playbacks) was conducted, with breaks, from November 2015 until March 2016. Animal husbandry (no. 311.5-5682.1/1-2014-023) and animal protocols (no. 55.2.-1-54-2532-18-2015) were licenced by the relevant authorities (Landratsamt Starnberg and Regierung von Obernbayern, respectively).
Each experimental trial during data collection consisted of three phases: three initial seconds of silence (silence1), 3 s of stimulus presentation (playback) and three terminal seconds of silence after the playback stopped (silence2). The bat's echolocation behaviour was recorded simultaneously on all eight microphones during the whole trial, resulting in eight 9-s-long audio tracks (Fig. 2). In Experiment I (single playbacks), we presented only a single playback to the bats and monitored their echolocation. Each of the 63 sound files was presented twice to every bat in two successive repeats, each repetition consisting of all 63 playbacks, resulting in a total of 126 trials per individual (presented over 18–20 days, with a mean of seven trials per daily session, range: 2–8 trials). Within each repetition of 63 playbacks, stimulus type was block-wise randomised for every three consecutive playbacks, with each sound file only used once and randomly chosen per stimulus type. The active speaker was chosen pseudo-randomly with a Gellermann-like sequence (Gellermann, 1933) adapted for three alternatives [per 15 playbacks, all three speakers were active an equal number of times; each speaker was active at least twice during the first eight and last eight playbacks; the active speaker changed nine times between playbacks (i.e. five times the speaker did not change); and no speaker was active more than twice in a row]. Playback level at 10 cm distance to the loudspeaker was set to natural levels of 42 dB SPL RMS (range: 37–50 dB SPL RMS), with an average zero-to-peak level of 74 dB SPL (range: 63–82 dB SPL). Considering spherical (−26 dB) and atmospheric (−2 dB at 30 kHz, −4 dB at 60 kHz, maximally −6.5 dB at 95 kHz, for 20°C and 52% relative humidity; calculated according to ISO 9613-1; International Organization for Standardization, 1993) attenuation on the way to the bat, this results in frequency-dependent received levels at the bat's position of approximately 10–14 dB SPL RMS (range: 5–9 to 18–22 dB SPL RMS), with a zero-to-peak level of approximately 42–46 dB SPL (range: 31–35 to 50–54 dB SPL).
In Experiment II (paired playbacks), we paired a rustling sound playback from one speaker with one of three different control stimulus types played from another speaker. For this, only the left and right speakers were used. Control stimuli were the amplitude-inverted or phase-scrambled versions of the rustling sounds as in Experiment I, and silence. Pairing the rustling with silence basically resulted in an equivalent situation as in Experiment I. Again, we presented the set of 63 playbacks twice, resulting in 126 trials per individual (presented over 13–14 days, with a mean of eight trials per daily session, range: 3–11). Per repetition of 63 playbacks, each of the 21 rustling sounds was paired once with all three control types, using a Gellermann-like sequence (Gellermann, 1933) adapted for three alternatives [per 21 playbacks, all three controls were presented an equal number of times; each control was presented at least three times during the first and last 11 playbacks; the control type changed 13 times (i.e. seven times the control type did not change); and no control type was presented more than twice in a row]. The specific sound file was chosen randomly and did not repeat per 21 playbacks, ensuring that a rustling sound was never combined with its own control sounds. The rustling speaker was chosen pseudo-randomly according to Gellermann (1933). To exclude any potential bias in the bats' behaviour owing to level differences of the presented stimuli, the RMS of the rustling and control playback (if not silence) was equalised. Playback level at 10 cm distance to the speaker was set to 47 dB SPL RMS, with a roving level of up to ±3 dB with 1 dB step size applied randomly to both stimuli, resulting in frequency-dependent received levels at the bat's position of 13–19±3 dB SPL RMS.
During experimental sessions, one experimenter was inside the room, hiding motionless behind a wooden board directly behind the bat (Fig. 1B). Another experimenter controlled stimulus presentation and data acquisition from outside the room and viewed the bat via an infrared camera. Trials started when the bat was perched and attentive, which was signalled by the experimenter outside to the experimenter inside the room by switching on an LED hidden behind the board inside the room. Bats were fed with mealworms in irregular intervals between trials by the experimenter inside the room.
All trials during which the bats left the perch were excluded from the analysis (see Table S1 for final sample sizes). Firstly, all recordings were convolved with each microphone's compensatory impulse response to ensure flat frequency response and equal sensitivity between all eight channels (custom-written code for MATLAB 2016a). Secondly, we used batch processing in SASLab Pro (Avisoft Bioacoustics) to automatically detect [call level >−60 dB full scale (FS), i.e. 60 dB below the highest recordable sound level] and separate (6 ms minimal inter-call interval) calls, and to measure their start and end time and their RMS level. Thirdly, we conducted a temporal comparison across all eight recordings of each trial. Owing to the rapid scanning movement of the bats' sonar beam, any given call was not necessarily recorded on all eight channels, especially not with the same start/end time. Also, the recorded call amplitude at a given microphone can strongly fluctuate and even fall below the detection level, thus resulting in the detection of multiple parts of the same call at a single microphone. We used a method analogous to contig assembly in the process of de novo genome mapping (Gregory, 2005) to identify unique calls (MATLAB). We assigned calls recorded on different microphones as belonging to the same unique call if they overlapped in time or were separated by a hold time of up to 2 ms. If a call was split into multiple parts on a single channel, these multiple parts were also assigned to the same unique call by this process. In this case, the RMS level for this channel was calculated as the weighted average of all parts of this call. Start and end time of each unique call was set to the earliest/last start/end time of all contributing call parts from all channels. All further analyses were conducted with these unique calls (see Table S1 for sample sizes).
To analyse whether the bats concentrated the scanning movement of their sonar beam around the different presented acoustic stimuli, we first compared the recorded call levels across all microphones and focused our analyses later on the microphones located next to the three loudspeakers (i.e. on microphones 3, 5 and 6; Fig. 1C,D). Recorded call levels are reported as RMS average in dB FS. Second, we analysed how often the sonar beam was directed towards the active playback speaker, compared with the silent loudspeakers. Call level and number of beam directions towards playback were compared between the three experimental phases (silence1, playback and silence2) to which the calls were assigned based on their start time.
In Experiment I, recorded call levels were analysed as a function of the off-axis angle relative to the active speaker in each trial by grouping all microphones with the same off-axis angle. Hence, the microphone next to the active speaker is always represented by 0 deg (termed central microphone), the directly surrounding microphones by 25 deg (termed peripheral microphones), and the microphones in its wider periphery by 42.9, 47.5 and 65.9 deg (Fig. 1C,D). Because the angles 47.5 and 65.9 deg only occurred for the left and right speakers, but not for the central one, they were excluded from further analyses. In Experiment II, recorded call levels were compared between the two microphones next to the left and right loudspeakers, which presented the paired acoustic stimuli.
We first investigated whether the call level recorded at the playback loudspeaker was higher during the playback phase than before and after. Further analyses explored whether this increase resulted from a general increase in echolocation call level during the playback phase or whether the bats concentrated their sound beams more around the playback source. The small number and coarse spatial arrangement of the microphones in our array prevented us from directly reconstructing the sonar beam direction and beam width of all calls. However, we used three different approaches to investigate whether the bats concentrated the scanning movements of their sonar beams around passive acoustic cues. First, we compared call levels recorded at the microphone next to the playback loudspeaker (0 deg) with call levels at equidistant surrounding microphones. Second, we compared call levels recorded at the three microphones next to the three loudspeakers, to test whether the bats were scanning the active loudspeaker. Third, we determined how often the sonar beam was directed at the active playback loudspeaker versus the other silent loudspeakers. Note that we can only identify the beam direction if a beam is directed onto the central microphones of the array and not on the array edge or outside the array; and that owing to the ongoing scanning movements, any given call can be dynamically directed at several locations within a single call. To obtain the total number of beam directions, we determined for each unique call the maximum peak-to-peak level (PP level) recorded at each of the three loudspeakers, together with the time point when this maximum PP level was recorded. We then compared each maximum PP level with the PP level recorded at all other seven microphones at the same point in time. If the PP level was highest on the loudspeaker microphone and lower on all other microphones, the beam of this call was directed at that particular loudspeaker at that point in time. We ran analyses on the number of these events across the different experimental phases, and additionally on the RMS level of these calls as recorded with the microphone at which the call was directed. To estimate the general increase in vocalisation level in response to a playback, we compared the levels of calls directed at the two microphones next to the non-active loudspeakers among experimental phases. We excluded calls directed at the microphone next to the playback loudspeaker in this analysis to avoid the influence of the changed concentration of scanning movements on the recorded RMS levels.
Note that the half-amplitude beam width of greater horseshoe bats is approximately between 13 deg (vertically; Matsuta et al., 2013) and 25 deg (horizontally; Matsuta et al., 2013; Schnitzler and Grinnell, 1977) and that horseshoe bats continuously scan their environment (Yamada et al., 2016). Therefore, we expected to find rather small statistically estimated differences in recorded levels among experimental phases and a large recorded level variation among individual calls owing to the ongoing scanning movements. If bats directed their sonar beams perfectly onto the playbacks for the full playback duration of 3 s, call levels at the central microphone (0 deg) would be approximately 6 dB higher than at the three peripheral 25 deg microphones, and the proportion of on-playback beam directions would rise to 100%. In contrast, under realistic conditions with ongoing scanning movement, the difference in call level should be smaller than 6 dB and the proportion of directed calls should increase, but not to 100%.
Statistical analysis was conducted in R version 3.2.1 (https://www.r-project.org/). Data were analysed separately for each individual bat. We used linear mixed effects models (LMMs) run with the package nlme (https://cran.r-project.org/web/packages/nlme/index.html) to model call levels recorded on different microphones as a function of the following factors: (1) Experiment I: phase, stimulus (rustling, amplitude inversion or phase-scramble), angle (of the microphone, either 0, 25 or 42.9 deg) and loudspeaker (left, middle or right) and their interactions; (2) Experiment II: phase, condition (rustling or control) and control type (silence, amplitude inversion or phase-scramble) and their interactions. We fitted generalized linear mixed effects models (GLMMs) with the package lme4 (version 1.1-12; https://cran.r-project.org/web/packages/lme4/index.html) to the number of beam directions at the active loudspeaker relative to the number of identified beam directions at the silent loudspeakers. In both LMMs and GLMMs, we nested phase within the trial ID as a random intercept to model the longitudinal data structure. Hereby, we were only interested in the changes of the dependent variables within the trials and not among them. Additionally, when we compared call levels across the different microphones, we nested angle within call ID, which in turn was nested within phase. We thereby concentrated on call levels of the same calls between different microphone angles. In the call level analysis of calls directed at the non-active loudspeakers, we did not include call ID in the random effect structure, because here there was only one recording per call.
Fixed effects or their interactions were excluded from the models if this did not significantly decrease the likelihood. This was computed with the log likelihood ratio tests (LLR with a letter subscript denoting individual bat). Likelihood was computed assuming normal variation of residuals for call levels in dB FS and binomial variation with logit link function was used for proportion of beam directions. These distributions were found to be well suited during the exploratory data analysis with Q-Q plots. Only in the call level analysis of calls directed at the non-active loudspeakers was the distribution slightly negatively skewed. Note that the mammalian auditory system analyses sound level on a logarithmic scale; therefore, we can expect that call levels expressed in dB originate from a normal distribution. All maximum likelihood estimates [MLEs, or restricted maximum likelihood estimates (REMLEs) for LMMs] are reported from the final models, from which all higher-level non-significant terms were removed. Model selection was confirmed by the Akaike information criterion (AIC), which in all but a few cases supported the same models as the log likelihood ratio tests.
We collected more than 38,000 calls during 707 experimental trials (Table S1), with each 9-s-long trial consisting of three phases (3 s of silence, 3 s of sound playback and 3 s of silence; Fig. 2). We compared the recorded call levels and the proportion of sonar beam directions that were oriented towards the playback between experimental phases and acoustic stimuli to test whether bats reacted to environmental acoustic cues by concentrating the scanning movements of their sonar beam around passive acoustic cues (Experiment I) and whether they preferred certain acoustic cues over others (Experiment II).
Call levels recorded next to the active loudspeaker increased during the playbacks
In Experiment I, call levels recorded on the central microphone (i.e. the microphone next to the active loudspeaker; 0 deg) generally increased during the playbacks. For individuals A and B, the call level increased between 2.1 and 9.6 dB and decreased again to pre-playback levels during the silence2 phase (Fig. 3; Fig. S1, Table S2). Call levels increased differently for the different stimuli (Table S2). Therefore, we found a significant interaction between stimulus type and experimental phase (Table S3). To investigate stimulus effect, we analysed the datasets of each stimulus type separately. Individuals A and B reacted to all stimulus types (rustling sounds and both controls) with an increase in call level during the playback phase (i.e. significant phase effect; Table S4). Although the stimulus–phase interaction was not significant in the full dataset for individual C, we found that call levels differed significantly between different phases for the amplitude-inverted stimuli (Table S4), being 1.3 and 1.8 dB higher during and after the playback, respectively, than before the playback.
Increasing vocalisation level and concentrating sonar scanning around the sound source
There are two potential explanations for the increase in call level at the active loudspeaker during the playbacks. Either the bats merely increased the emission level of their calls without changing the direction of emission, or they (additionally) concentrated the scanning movement of their sonar beam around the playback source. We were mainly interested in the latter mechanism and found three lines of evidence for it: (1) significantly higher call levels on the central than the surrounding microphones during playbacks; (2) significantly higher call levels at the active playback loudspeaker than at the two silent loudspeakers; and (3) a significantly larger proportion of beam directions oriented towards the active loudspeaker. We present this evidence in the following three sections. However, we also found evidence for a general increase in vocalisation level during the playback. Analysing the recorded call levels of calls directed at non-playback loudspeakers, we found a significant interaction between stimulus type and experimental phase in individuals A and B (log-likelihood ratio tests, d.f.=4: LLRA=12.12, P=0.0165; LLRB=21.12, P=0.0003; LLRC=5.51, P=0.2387). Specifically, the recorded call levels increased in individual A by up to 8.6 dB with the playback (of phase-scrambled stimuli) and in individual B by up to 4.9 dB (amplitude-inverted stimuli; Table S5). No significant change with either stimulus or phase was found in individual C (log-likelihood ratio test, d.f.=2; experimental phase: LLRC=1.44, P=0.4857; stimulus type: LLRC=3.67, P=0.1599). The angle to the playback loudspeaker (either 25 or 42.9 deg) did not have a significant effect for individuals A and B (log-likelihood ratio tests, d.f.=1: LLRA=0.33, P=0.5678; LLRB=0.38, P=0.5376; LLRC=45.13, P<0.0001), indicating that concentrated scanning movements around the playback source did not influence the call levels recorded at the non-active loudspeakers. The general increase in vocalisation level (Table S5) was lower than the observed increase in call level at the playback loudspeaker (Table S2). This indicates that individuals A and B not only increased their vocalisation level in response to playbacks, but also concentrated the scanning of their sonar beam around the playback source. Note, however, that the contribution of the scanning cannot be simply computed by subtracting the vocalisation level increase (Table S5) from the total level increase (Table S2), because both values are not additive. Instead, we confirm the scanning effect statistically in the following three sections.
Call levels recorded at the central microphone were higher than those recorded at the surrounding microphones
If the bats directed their sonar beams towards the playbacks, then the call levels recorded at the central microphone next to the active loudspeaker (0 deg) should be higher than those recorded at the three peripheral microphones at 25 deg off-axis from the bat–loudspeaker axis (Fig. 1). Indeed, during the playback, the average call levels recorded on the central microphone exceeded those recorded on the peripheral 25 deg microphones by approximately 1.5 dB (range: 0.9–2.8; Fig. 3; Fig. S1, Table S6). However, this difference was much smaller or negative during the pre-playback silence for individuals A and B. Because this angle effect should not be present during the silence1 phase, but should be present during the playback phase and potentially also during the subsequent silence2 phase, we expected to find a significant angle–phase interaction. Additionally, if the bats reacted differently to the three stimulus types, we also expected to find a significant angle–phase–stimulus interaction. The results from bats A and B agree with these predictions, whereas individual C only showed a significant angle–stimulus interaction (Table S7). When analysing each phase separately, we found that stimulus did not play a role in the models during silence1 (Table S8). In contrast, we found a significant interaction between microphone angle and stimulus during the playback phase for individuals A and B. This shows that for particular stimuli (e.g. rustling for individuals A and B, phase-scrambled for individual A, and amplitude-inverted for individual B, Fig. 3) recorded call levels were higher at the playback source than in the surroundings. This result was not found for individual C, which showed no significant difference in its response to the different stimuli (Table S8). Note that these models, which were calculated separately for each experimental phase, also included loudspeaker as an effect. Significant angle–loudspeaker interactions possibly resulted from preferred scanning directions of the bats causing additional variation in recorded call level across microphones (see Fig. S2).
Call levels recorded at the active loudspeaker were higher than those recorded at the silent loudspeakers
To further exclude the possibility that the bats scanned all three loudspeakers equally during the playback and to show that they instead scanned more around the active playback loudspeaker, we analysed a subset of our data, including only recordings from the microphones next to the three loudspeakers (microphones 3, 5 and 6). For bats A and B, we again found a significant interaction of angle, phase and stimuli (Table S9). This was the case because call levels recorded at the active loudspeaker were higher than those recorded at the silent loudspeakers (Fig. S1, Table S10). During playbacks of phase-scrambled stimuli, call levels of individual A recorded at the active loudspeaker were 2.4 and 7.0 dB higher than those recorded at the loudspeaker 25 and 42.9 deg off-axis (Table S10). Similarly, during playbacks of amplitude-inverted stimuli, call levels of individual B recorded at the active loudspeaker were 1.9 and 5.3 dB higher than those recorded at the loudspeaker 25 and 42.9 deg off-axis (Table S10). This pattern was not found during the silence1 phase, indicating that increased call levels at the active loudspeaker were indeed a response to the playbacks. Indeed, stimulus effect and its interactions were only supported in the models for the playback phase, but not the silence1 phase (Table S11), confirming that bats scanned more around the active than the silent loudspeakers. We did not find evidence for increased scanning of individual C around the active loudspeakers during playbacks.
More identified sonar beam directions were aimed at the active than the silent loudspeakers
We counted how often the sonar beam direction could be assigned to one of the three microphones next to a speaker (see Materials and methods). For individuals A and B, but not for individual C, the proportion of sonar beam directions aiming at the active loudspeaker changed significantly with experimental phase (Table S12). Specifically, the modelled proportions increased from the silence1 phase to the playback phase from 0.28–0.39 to 0.33–0.53 (Fig. 4; Table S13) and were not influenced by stimulus type (Table S12). Likewise, we also found a clear increase in the proportion of on-playback sonar beam directions when calculated in relation to the total number of calls, with up to 20% (individual B) and 25% (individual A) of all calls directed at the playback (Fig. S3).
Simultaneous playbacks confirmed the guidance of biosonar by environmental cues
To explore the bats' natural interest and preference for prey-generated rustling sounds, we conducted a second experiment, in which we paired rustling sounds with simultaneous playbacks of silence, amplitude-inverted or phase-scrambled control stimuli. In addition to microphone-specific variation (Fig. S4), call levels increased during the playback phase, but to different extents dependent on the presented stimulus combination (Fig. 5), except for individual C. Thus, the interaction between the condition (rustling versus control sound), control type (silence, amplitude inverted and phase-scrambled) and experimental phase had a significant effect (log-likelihood ratio tests, d.f.=4: LLRA=76.31, P<0.0001; LLRB=66.38, P<0.0001; LLRC=5.86, P=0.2099). To investigate the effects of control type, we analysed the dataset from each control type separately. The increase in recorded call level differed between the rustling and the control loudspeaker for all control types in individuals A and B (significant effects of the phase–condition interaction, or of phase and condition; Table S14). Individual C did not react to the playbacks (no significant effects of the phase–condition interaction or phase; Table S14). For individuals A and B, call levels recorded at the rustling loudspeaker during the playback were 2.7 and 2.8 dB higher compared with the silent control speaker, but fainter than at the control speaker when paired with the amplitude-inverted (A: −0.5 dB, B: −2.4 dB) or phase-scrambled controls (A: −2.7 dB, B: −0.5 dB; Table S15).
To test whether the observed call level differences were the result of the bats directing their sonar beam towards the respective loudspeaker, and not caused by a general increase in emission level, we again compared the proportion of sonar beam directions aimed at the rustling loudspeaker across all three experimental phases (Fig. 6). For individuals A and B, we found a significant phase–control interaction. In other words, the type of control stimulus influenced how the proportion of on-playback beam directions changed with experimental phase (d.f.=4: LLRA=20.43, P=0.0004; LLRB=24.47, P<0.0001; LLRC=1.45, P=0.8363). When the rustling sound was paired with silence, individuals A and B directed their calls significantly towards the rustling loudspeaker (d.f.=2: LLRA=14.12, P=0.0009; LLRB=6.91, P=0.0315), increasing the proportion of on-playback beam directions from 0.48 and 0.44 before the playback to 0.66 and 0.73 during the playback (Table S16). When the rustling playback was paired with either of the other control stimuli, we found no significant difference of the proportion of sonar beam directions between the playback speakers, except for a slight decrease from 0.37 to 0.32 in individual B when presented with the amplitude-inverted control (d.f.=2: LLRB=6.91, P=0.0315; Table S16). In individual C, experimental phase again did not explain the variation in the proportion of beam directions (d.f.=2: LLRC=0.91, P=0.6318). As in Experiment I, the clear increase in the on-playback sonar beam directions was also present when calculated in relation to the total number of calls, with approximately 25% of all calls directed at one of the playbacks (Fig. S5).
These results clearly show that the bats reacted to passive acoustic cues by concentrating their sonar beam movements around the sound source and increasing the level of vocalisations. The individuals varied in the strength of their reactions towards different stimuli. Individuals A and B concentrated their sonar beam movements around the playback source. This reaction was strongest when phase-scrambled and amplitude-inverted stimuli were played back to individuals A and B, respectively. They both also reacted to rustling sounds, as shown by the higher call level at the central microphone than at the peripheral ones and those at the silent loudspeakers, and by the higher proportion of beam directions aimed at the rustling than the silent loudspeakers. Individual C did not show a clear reaction towards a specific stimulus.
Scanning the environment with a narrowly focused echolocation beam for object detection is a time-consuming task prone to overlooking objects. Here, we show for the first time that bats with a highly specialised echolocation system for prey detection and evaluation support this biosonar by attending to prey-generated sounds. Using a spherically arranged eight-microphone–three-loudspeaker array for simultaneous acoustic playback and recording, we demonstrate that passive hearing guides the active biosonar-based sampling of space. Two important corollaries follow from our results. First, exploiting sounds that originate from the environment provides excellent guidance for directing biosonar attention towards potential objects of interest, saving biosonar search time. Second, in addition to saving search time, listening for environmental sounds also increases the spatial volume in which a bat is able to instantly detect prey and predators. Linking information from environmental cues with biosonar-based perception enables an obligate echolocator to overcome the spatio-temporal limitations of its biosonar. Thus, phylogenetically older sensory systems such as passive hearing could facilitate the evolution of sensory specialisations by compensating for their shortcomings.
Relying on prey-generated sounds to find food is common in gleaning bats (Fuzessery et al., 1993; Russo et al., 2007), i.e. species that pick their prey from surfaces, where detection by echolocation is hindered by background masking (Schnitzler and Kalko, 2001). Several gleaning bats possess morphological (e.g. large ears; Coles et al., 1989; Obrist et al., 1993) and neural specialisations that enable them to detect and process prey-generated sounds (Razak et al., 1999, 2007). In contrast to gleaning bats, however, it is not clear to what extent species that use echolocation for hunting might also rely on prey-generated sounds. Horseshoe bats echolocate continuously, even when at rest (Schnitzler, 1968; Speakman et al., 2004), and are highly specialised for the auditory analysis of echoes from wing-beating insects. Our results, however, show that these obligate echolocators still perceive, localise and react to the sounds produced by prey. We therefore predict that the use of prey-generated sounds for foraging will be common in bats with strongly limited echolocation systems, for example, bats with very high call frequencies, low call amplitude, high directionality or low duty cycle, which would most strongly benefit from combining both streams of auditory information. Functionally, this might be implemented by neurons that respond to information from multiple sensory streams. In passive-listening specialists, single cortical neurons indeed process both call echoes and insect-generated transients (Razak et al., 1999), while peripherally, each type of information is still processed in its own separate pathway (Razak et al., 2007). Likewise, information from different modalities can lead to common neural representations (Green and Angelaki, 2010; Hoffmann et al., 2016; Lakatos et al., 2009). A recent study by Hoffman et al. (2016) showed a spatially congruent representation of visual and acoustic space in the superior colliculus of the bat Phyllostomus discolor, with neighbouring neuronal layers receiving input from the visual and the echolocation system. This close spatial proximity and congruent spatial maps might be the neuronal basis for cross-modal integration of visual and acoustic space. In contrast, Goerlitz et al. (2008) suggested an independent processing of passive and active auditory information, showing that the evaluation of spectral echo features was not influenced by simultaneously presented passive acoustic information in the echolocating bat Phyllostomus discolor. However, the localization performance in Antrozous pallidus decreased when bats were forced to simultaneously process passive and active auditory information (Barber et al., 2003), indicating that passive and active systems share some processing resources.
A similar perceptual combination of passive and active sensory information was also found in weakly electric fish. Gnathonemus petersii possesses two electrosensory foveae, one of which relies on scanning movements to probe the environment (Pusch et al., 2008). This species increases the rate of electric organ discharges in response to acoustic, visual, passive electrical and active electrolocation stimuli (Post and Von Der Emde, 1999). Likewise, Brevimyrus niger integrates sensory information from its passive and active electrosense and the mechanosensory lateral line system (Pluta and Kawasaki, 2008). Our study did not directly address multisensory integration, which is classically shown as multiplicative combination of neuronal activity (Kayser et al., 2007; Meredith and Stein, 1986) and improved detection or localisation performance (Gomes et al., 2016). However, our finding that passive acoustic information steers biosonar attention towards the passive acoustic sound source forms the basis for a potential integration of both auditory streams for a joint internal representation of the bat's prey. It will be a fascinating task to elucidate how both sensory streams, conveying passive and active acoustic information, are combined and how this is influenced by top-down task-dependent attentional mechanisms.
In contrast to our expectations, bats directed their sonar beam not only towards the rustling sounds, but also towards the control versions (Experiment I) and reacted in fact more to the control than to the rustling sounds (Experiment II). This suggests that horseshoe bats exploit environmental sounds not only during foraging (which we aimed to investigate here), but also for other fitness-relevant behaviours, such as predator detection. Considering this, it is not surprising that the unfamiliar control sounds elicited a stronger reaction, as unfamiliar sounds might signal danger and therefore elicit an exploratory response. This is reminiscent of novel object recognition tasks in behavioural studies, which generate a strong exploratory response to introduced novel stimuli (Antunes and Biala, 2012; Bevins and Besheer, 2006). The reaction to the control sounds cannot be simply explained by the controls being more audible and more salient than the rustling sounds. The playback levels of a rustling and its controls were equal, although the perceived loudness likely differed. The amplitude-inverted controls had a flat frequency spectrum, thus containing more energy in the higher frequencies, potentially making them more audible for the bats (Long and Schnitzler, 1975). In contrast, the phase-scrambled controls had the same frequency spectrum as the rustling sounds and were additionally missing any transient high-amplitude peaks, thus making them much less salient than the rustling sounds. Furthermore, the repeatable individual preferences in Experiments I and II contradict the notion that the playbacks simply startled the bats, but speak for a consistent reaction to different environmental cues. This is additionally supported by the individual change in vocalisation level in reaction to different stimuli types that matched the strength of reaction in beam scanning movements. Therefore, it is likely that the increase in vocalisation level results from an active process that reflects the bats' interest in the environmental cue. This is further supported by the lacking reaction of individual C, who neither focused the scanning movements nor increased its general vocalisation level in response to playbacks.
Aside from the experiments presented here, we conducted a preliminary pilot study, similar in design to Experiment I. In that study, we also found that bats reacted to all three playback stimuli. Interestingly, individuals A and B reacted most strongly to the same stimulus type as in the present study. Furthermore, in the pilot study we also found significant reactions of individual C, most strongly in response to amplitude-inverted sounds. We conclude that the small effect sizes for individual C observed in the present study are due to habituation, which we also observed for the other two individuals, but to a much lesser extent. The overall observed habituation rate was very low, and the bats continued to volitionally direct their sonar beams towards repeating environmental acoustic cues. Despite never being rewarded for scanning their surroundings or directing their sonar beam to environmental sounds, all bats exhibited ongoing scanning behaviour and continued reactions to hundreds of playbacks presented in the course of 15 weeks, although no matching echo-acoustic information was ever presented together with the playback. This continued reaction has two important consequences: first, it indicates a highly adaptive value of reacting to and tracking environmental sounds by biosonar gaze, and second, it suggests a low cost of this process and a much higher cost of failing to detect nearby predators or prey. The adaptive value likely consists of an increase in sampled volume of space for detecting prey and predators, and in a reduced detection time compared with relying solely on non-guided biosonar scanning. Both the increased volume of sampled space and the reduced search time likely bear important adaptive consequences for foraging success and survival.
We demonstrated that greater horseshoe bats react to environmental sounds by concentrating the scanning movements of their sonar beam towards the sound source, in addition to generally increasing the level of their vocalisations. Passive auditory information thus guides the biosonar sampling of space, enabling bats to extent their perceptive range beyond the reach of their biosonar and to react faster to novel sounds in their surroundings. This has important implications for the neuronal processing of multiple auditory streams, including the potential sensory integration of passive and active auditory information, and the evolution of sensory specialisations. Our results suggest that phylogenetically older sensory systems may facilitate the evolution of novel sensory mechanisms, by compensating for potential limitations in the novel system.
We thank Daniela Schmieder for sharing her anecdotal observations of horseshoe bat prey pursuit behaviour with us; Uwe Firzlaff, Wouter Halfwerk, Susanne Hoffmann, Lutz Wiegrebe, Brock Fenton and an anonymous reviewer for helpful comments on a previous version of this article; Jacob Engelmann for discussion about electrolocation; Erich Koch for help with building the experimental setup; and the Max Planck Institute for Ornithology for providing facilities.
Conceptualization: H.R.G.; Methodology: E.Z.L., R.K., K.K., H.R.G.; Software: H.R.G.; Validation: E.Z.L., K.K., H.R.G.; Formal analysis: E.Z.L., K.K., H.R.G.; Investigation: E.Z.L., S.K., R.K., M.G.; Resources: K.K., H.R.G.; Data curation: E.Z.L., S.K., K.K., H.R.G.; Writing - original draft: E.Z.L., K.K., H.R.G.; Writing - review & editing: E.Z.L., K.K., H.R.G.; Visualization: E.Z.L., H.R.G.; Supervision: K.K., H.R.G.; Project administration: K.K., H.R.G.; Funding acquisition: H.R.G.
This research was funded through the Deutsche Forschungsgemeinschaft Emmy Noether Program (grant number GO 2091/2-1) to H.R.G.
Data are archived in EDMOND (http://dx.doi.org/10.17617/3.1b). MATLAB and R code can be obtained on request from the corresponding authors.
The authors declare no competing or financial interests.