Flexible vocal production control enables sound communication in both favorable and unfavorable conditions. The Lombard effect, which describes a rise in call amplitude with increasing ambient noise, is a widely exploited strategy by vertebrates to cope with interfering noise. In humans, the Lombard effect influences the lexical stress through differential amplitude modulation at a sub-call syllable level, which so far has not been documented in animals. Here, we bridge this knowledge gap with two species of Hipposideros bats, which produce echolocation calls consisting of two functionally well-defined units: the constant-frequency (CF) and frequency-modulated (FM) components. We show that ambient noise induced a strong, but differential, Lombard effect in the CF and FM components of the echolocation calls. We further report that the differential amplitude compensation occurred only in the spectrally overlapping noise conditions, suggesting a functional role in releasing masking. Lastly, we show that both species of bats exhibited a robust Lombard effect in the spectrally non-overlapping noise conditions, which contrasts sharply with the existing evidence. Our data highlight echolocating bats as a potential mammalian model for understanding vocal production control.
The Lombard effect, a classic form of audio–vocal integration, describes a rise in call amplitude with increasing ambient noise (Lombard, 1911). For both animals and humans, the Lombard effect functions to maintain sound communication by directly compensating for the deteriorated signal-to-noise ratio (SNR) of vocalizations (Brumm and Zollinger, 2011; Hotchkin and Parks, 2013; Lane and Tranel, 1971). Mechanistically, a subcortical network seems to be sufficient in initiating the Lombard effect, whereas there is strong evidence that the Lombard effect in humans can be modulated by cortical processes (Luo et al., 2018). One line of evidence supporting the cortical modulation of the Lombard effect is that the Lombard effect in humans is influenced by the linguistic content of the speech material (Arciuli et al., 2014; Garnier and Henrich, 2014; Patel and Schell, 2008). At the sentence level, the amplitude of linguistically salient words such as those referencing objects was raised more than that of modifiers (Patel and Schell, 2008). At the word level, the Lombard effect resulted in stronger compensation for vowels than for unvoiced consonants (Garnier and Henrich, 2014). These data not only indicate that the Lombard effect in humans is possibly more dynamic than that in animals but also show that the Lombard effect in humans is modulated on a millisecond time scale. English speakers in the USA typically produce 2–3 words per second, which allows human subjects a few hundreds of milliseconds to differentially compensate for linguistically distinct words at the sentence level. Apparently, to differentially emphasize some syllables within a word requires precise amplitude control at an even finer time scale. There is also evidence that such differential amplitude compensation (DAC) helps to improve speech intelligibility by modulating the lexical or prosodic stress, a second benefit of the Lombard effect in addition to improving the overall SNR (Bosker and Cooke, 2018, 2020; Garnier and Henrich, 2014). By contrast, there is currently no evidence that the Lombard effect in animals can be dynamically modulated at the syllable level within single vocalizations.
Here, we addressed the question of whether animals in noise can perform fine-scale vocal amplitude control at a sub-call level. We recorded echolocation calls of two species of Hipposideros bat (Hipposideros armiger and Hipposideros pratti) under various noise conditions and developed analytical tools to conduct automated sub-call level analyses. Hipposideros bats emit echolocation calls consisting of two functionally well-defined units: the constant-frequency (CF) and the frequency-modulated (FM) components (Fig. 1A–C). For these bats, the CF component mainly serves to detect the presence of objects and to gain wingbeat information of insects, while the FM component mainly functions to extract spatial information of the surroundings (Schnitzler and Kalko, 2001; Suga, 1990). There is also behavioral evidence that these bats process the CF and FM components independently during vocal production (Smotherman and Metzner, 2005). We report that not only were the bats able to differentially regulate the amplitude of the CF and FM components in noise but also that DAC requires the noise to spectrally overlap with the dominant frequency content of the echolocation calls.
MATERIALS AND METHODS
Six adult Hipposideros armiger (Hodgson 1835) (4 males and 2 females) and five adult Hipposideros pratti Thomas 1891 (4 males and 1 female) were used for this study. All bats were wild caught with a hand net during the daytime at the Baini Cave, Xianning County, Hubei Province, China, in May 2019. Bats were housed, and experiments were conducted, at the Central China Normal University, Wuhan, China. Hipposideros armiger and H. pratti were kept in separate cages placed in a room (2×3×2.2 m) with a regulated air temperature of 24±2°C (mean±s.d.), relative humidity 71±5% (mean±s.d.), and a reversed light regime of 12 h darkness and 12 h light. Bats had ad libitum access to water and food. Capture, housing and behavioral studies were approved by the Institutional Animal Care and Use Committee of the Central China Normal University.
Sound recording and playback
Resting bats from a hanging stand were individually recorded with an array of 15 ultrasonic microphones (custom made, based on SPU0410LR5H, Knowles Corporation, Itasca, IL, USA) in a flight room (5×4×2.2 m; Fig. 1D). The microphone signals were amplified via two microphone preamplifiers (8 channels each, OctaMic II, RME, Haimhausen, Germany) and sampled with a 16-channel data acquisition card (PXIe 6358, National Instruments, Austin, TX, USA) at a rate of 250 kHz. All microphones were directed to the bat, guided by respective lasers for each microphone before the start of the experiments. The noise was broadcast via an ultrasonic speaker (ES1, TDT, Alachua, FL, USA) connected to the respective power amplifier (ED1, TDT). We achieved a flat frequency response of the playback system (±1 dB) between 10 and 100 kHz by digitally filtering the noise stimuli with its inverse impulse response before sending it out to the power amplifier. The inverse impulse response of the playback system was designed using the Maximum Length Sequence method (Luo et al., 2015). Specifically, the frequency response of the playback system, i.e. the combined frequency response of the speaker, the power amplifier and the distance-related frequency attenuation, was measured at a 50 cm distance with a ¼ inch free-field condenser microphone (Type 7016, ACO, Belmont, CA, USA; with protection grid removed). The sensitivity of the calibration microphone was determined with a sound calibrator emitting a 1 kHz tone of 94 dB sound pressure level (SPL) (Type 521, ACO). With the compensated playback system, the frequency response of each recording microphone was measured, and its compensatory impulse response was designed accordingly.
Sound recording and noise broadcast were controlled by a custom-made LabVIEW program (LabVIEW Professional Development System, version 2018a, National Instruments, Austin, TX, USA). In the first group of experiments, we broadcast wideband noise (10–100 kHz bandpass-filtered white noise) at 40, 52 and 64 dB SPL (root mean square, RMS), in addition to a silence control in which an empty sound file with all sample values set to zero was played (Fig. 1E). The overall background noise amplitude of the flight room in the frequency range of 10 to 100 kHz was ≤28.3±1.5 dB SPL (Fig. S1), which was the highest sensitivity that our microphones can reach. In the second group of experiments, in addition to a silence control as above, we broadcast bandpass or band-removed white noise of distinct spectral content to the bat. The noise level was 52 dB SPL for all noise types. Three types of non-overlapping noise were used: 25–40 kHz passed, 10–45 kHz passed and 45–85 kHz removed for H. armiger and 20–35 kHz passed, 10–40 kHz passed and 40–70 kHz removed for H. pratti. The four types of overlapping noise were: 45–85 kHz passed, 45–100 kHz passed, 25–45 kHz removed and 10–100 kHz passed for H. armiger and 40–70 kHz passed, 40–100 kHz passed, 25–40 kHz removed and 10–100 kHz passed for H. pratti. All the noise levels of the noise playbacks referred to the position of the bat, yet the actual noise level reaching the eardrums of the bat would be slightly different from the specified level as a result of the free movement of the head and ears of the bat. Similarly, the actual perceived noise level by the bat would be affected by the audiogram of the individual bats.
To generate the bandpass white noise, we used 4th-order Butterworth filters (MATLAB function ‘butter’), followed by zero-phase filtering (MATLAB function ‘filtfilt’). For the first group of experiments, we had 13–16 repeats for each stimulus condition of each individual of H. armiger from 5 different recording days and 9–20 repeats for each stimulus condition of each individual of H. pratti from 3–6 different recording days. Each stimulus condition lasted 1 min before switching to a distinct noise condition. For the second group of experiments, we had 15–19 repeats for each stimulus condition of each individual of H. armiger from 4 different recording days and 13–20 repeats for each stimulus condition and each individual of H. pratti from 4 or 5 different recording days. Each stimulus condition lasted half a minute before switching to a distinct noise condition. For both groups of experiments, the playback order of the stimuli for each bat on each recording day was pseudo-randomized.
Echolocation calls were analyzed with custom-written MATLAB scripts (version 2018b, Mathworks, Natick, MA, USA), adopted from our previous studies (Luo and Wiegrebe, 2016; Luo et al., 2017). The general steps of call analysis are as follows. (1) Identification of the position of all potential calls from the smoothed (25 sample window size) continuous recording with an amplitude threshold twice that of the dynamic background noise floor (σn) measured for each recording channel. σn is defined as the median of the absolute values of the smoothed continuous recording (the length of each stored file) divided by 0.6745 (Quiroga et al., 2004). The absolute values of the recording were computed before smoothing. Candidate call positions from all of the 15 recording channels were combined and clustered to represent the final positions for each call, including the start, the peak and the end positions. (2) Based on the start and end positions of the identified call, the candidate call was cut from the continuous recording with 5 ms of extra samples both before and after the identified call. (3) Various spectro-temporal parameters were then estimated, such as call duration and call amplitude (peak and RMS) of the call, CF peak frequency, FM end frequency and FM bandwidth, using the call of the highest amplitude among all the 15 microphones. To estimate the amplitude of the CF and FM components separately, elliptic filters of default parameter settings (MATLAB function ‘ellip’) were used to separate these two components and the cutoff frequency was set to be 2 kHz below the CF peak frequency of each call. The RMS amplitude of the call/component was measured across the call/component duration. Spectral analyses were based on a FFT size of 8192, which resulted in a frequency resolution of ∼30 Hz. One example of the automated call analysis routine is presented in Fig. S2. Original scripts for sound analyses of this study can be obtained upon request from the corresponding author.
In total, we analyzed 288,261 echolocation calls for H. armiger and 123,172 echolocation calls for H. pratti in the first group of experiments. We analyzed 284,339 echolocation calls for H. armiger and 148,664 echolocation calls for H. pratti in the second group of experiments. Because calls from a certain individual were correlated, instead of independent observations, we used linear mixed models for all statistical inferences. To examine whether the call amplitude of the bats statistically differed between two conditions, e.g. 40 dB SPL noise condition versus silence control, we set the amplitude as the dependent variable, the experiment condition as the fixed effect (categorical) and the identity of the individual as the random effect (categorical). The general formula of the mixed model is ‘amplitude ∼1+condition+(1+condition|individual)’. Mixed models are widely used to investigate correlated observations (Bolker et al., 2009; Kragh et al., 2019). Additionally, to evaluate the degree of statistical relevance, we calculated the large population effect size that was defined as 0.8 times the standard deviation for each pairwise comparison (Nakagawa and Cuthill, 2007).
We performed two groups of playback experiments with resting H. armiger (6 individuals) and H. pratti (5 individuals) in an echo-attenuated flight room (5×4×2.2 m). Echolocation calls of individual bats were recorded with an array of 15 ultrasonic microphones (Fig. 1D). In the first group of experiments, we used wideband white noise (10–100 kHz) of three noise levels, and in the second group of experiments, we used bandpass white noise of distinct spectral content; in both cases, we included a silence control.
Independent regulation of the CF and FM components in noise
In the first experiment, we recorded and analyzed the echolocation calls of H. armiger in silence and wideband white noise (10–100 kHz) conditions of 40, 52 and 64 dB SPL (relative to 20 µPa; Fig. 1E). In total, 288,261 calls from 6 individuals were analyzed. We found that bats regulated different call parameters differently, with some parameters changing systematically with increasing noise levels, and others changing little across noise conditions. Specifically, both the frequency and the duration of the CF component were relatively stable across the noise levels (Fig. 2A,C). The maximum estimated change in the CF frequency from the mixed models was 62 Hz and the maximum median change in the CF duration was 0.3 ms. By contrast, bats increased the bandwidth and the duration of the FM component with increasing noise levels (Fig. 2B,D; all P<0.05).
Similarly, the amplitude of the calls increased with increasing noise levels, evidencing the Lombard effect (Fig. 2E, all P<0.05). In comparison with the silence control, the estimated amplitude of the calls from the mixed models was 4.1, 6.1 and 7.6 dB greater in the 40, 52 and 64 dB SPL noise conditions. Subsequently, we analyzed the amplitude of the CF component and the FM component separately. In silence, the FM component was approximately 16 dB weaker in amplitude than the CF component. We found that the amplitude of both the CF and the FM components increased with increasing noise levels (Fig. 2F,G; all P<0.05). Interestingly, the bats compensated more for the FM component than for the CF component in all three noise conditions (Fig. 2H; all P<0.05; see Fig. S3 for individual bats). The estimated magnitude of DAC between the CF and FM components from the mixed models became 1, 1.6 and 2.9 dB larger in the 40, 52 and 64 dB SPL noise conditions compared with the silence control. This result shows that H. armiger performed a fine-scale Lombard effect at a sub-call level.
DAC for the CF and FM components
How might DAC between the CF and the FM components work from a computational perspective? Conceptually, to achieve DAC the CF component and the FM component of the call need to be processed independently with reference to the ambient noise (Fig. 3A; green pathway). Data from another CF–FM bat suggest that these bats can indeed process the CF and FM components independently during vocal production (Smotherman and Metzner, 2005). Otherwise, one would expect similar magnitudes of amplitude compensation if the CF and the FM components were to be processed collectively (Fig. 3A; blue pathway). In other words, DAC requires flexible control of the CF and FM components and the magnitude of DAC might be influenced by how flexibly the bat controls these components. To test this idea, we repeated the above experiment with H. pratti, another Hipposideros species that seems to be more flexible in controlling the CF and FM components of the call. Fig. 3B illustrates a typical echolocation call of H. pratti in the silence control. Compared with typical echolocation calls of H. armiger (Fig. 1C), the CF and the FM components in H. pratti are separated more strongly, as indicated by a weaker trough and larger amplitude difference between the trough and the peak (Fig. 3B,C). We predicted that the magnitude of the DAC should be stronger in H. pratti than in H. armiger.
Consistent with the results for H. armiger, we found that the amplitude of both the CF component and the FM component in H. pratti increased with increasing noise levels (Fig. 3D,E; all P<0.05). By contrast, we found that H. pratti compensated more for the FM component than for the CF component only at the highest (64 dB SPL) noise condition, with an estimated magnitude of 4.3 dB from the mixed model (Fig. 3F; P<0.05; see Fig. S3 for individual bats). These data suggest that there are no apparent differences in the magnitude of DAC between these two bat species.
Spectral content of noise modulates DAC
The above two experiments showed that both Hipposideros species exhibited stronger DAC with increasing noise levels (Figs 2H and 3F). One critical question to address is: is the Lombard effect in these CF–FM bats always accompanied by a DAC? To gain insight, we performed additional playback experiments with both Hipposideros species through manipulating the frequency content of the noise. In addition to a silence control, we used four overlapping and three non-overlapping noise conditions, with a reference to the dominant second harmonic of the echolocation calls of each species (Fig. 4, top). The level for all noise conditions was 52 dB SPL. Two consistent results emerged from these experiments.
First, we found that bats showed the Lombard effect for both the CF and the FM components in all noise conditions (Fig. 4A,B,D,E, all P<0.05). For H. armiger, the estimated amplitude increases from the mixed models ranged from 2.6 to 6.5 dB for the CF component, and from 2.9 to 8.4 dB for the FM component. For H. pratti, the estimated amplitude increases from the mixed models ranged from 2.9 to 10.5 dB for the CF component, and from 3.2 to 13.6 dB for the FM component. For both species, the smallest amplitude increase occurred in condition 1 (non-overlapping noise with the narrowest bandwidth, 25–40 kHz for H. armiger, and 20–35 kHz for H. pratti) and the largest amplitude increase occurred in condition 7 (overlapping noise with the widest bandwidth, 10–100 kHz).
Second, we found that DAC occurred only in the overlapping noise conditions (condition 4 to condition 7; Fig. 4C,F; all P<0.05), not in the non-overlapping noise conditions (condition 1 to condition 3; Fig. 4C,F; all P>0.05). For the overlapping noise conditions, the estimated magnitude of DAC from the mixed models ranged from 1.8 dB in condition 6 to 2.9 dB in condition 4 for H. armiger, and from 3.2 dB in condition 6 to 5.2 dB in condition 4 for H. pratti. It is interesting to note that the strongest DAC did not occur in condition 7 in which the strongest Lombard effect was observed for both the CF and the FM components. Thus, these results suggest that the Lombard effect does not always lead to a DAC and the magnitude of the Lombard effect for the call does not predict the magnitude of the DAC.
Recent research on the Lombard effect in both humans and animals has revealed that the Lombard effect is more dynamic than previously believed. Specifically, the Lombard effect is affected not only by acoustic parameters of the noise, such as the intensity, frequency content and duration, but also by potential cognitive processes (Luo et al., 2018). As evidence for cognitive influences on the Lombard effect came largely from human research, one question that arises is: is the Lombard effect in humans more flexible than that in animals? In this study, we examined the capability of echolocating bats to regulate the Lombard effect at a sub-call level, which so far has only been demonstrated by humans. Our data show that, like humans, echolocating bats can perform a fine-scale vocal amplitude control at a sub-call level in noise.
The finding that Hipposideros bats exhibit DAC in noise between the CF and FM component not only demonstrates that DAC is not limited to human speech but also shows that the Lombard effect in non-human mammals can be precisely regulated at a millisecond level. One possible explanation for the DAC between the CF and FM components observed in the first group of experiments (Figs 2F–H and 3D–F) is the SNR hypothesis, which predicts a stronger Lombard effect with a decreasing SNR (Lane and Tranel, 1971; Luo et al., 2018). Specifically, the stronger compensation for the FM component than for the CF component can be explained by the fact that the FM component is weaker in amplitude than the CF component for both species (Figs 1C and 3B). This means that for a given level of noise, the FM component is masked more by the noise than is the CF component. The SNR hypothesis also seems suitable to explain the stronger DAC in H. pratti than in H. armiger, as H. armiger produces louder echolocation calls than H. pratti (105 dB versus 96 dB SPL in silence). However, the finding that the frequency characteristics of the noise influence the DAC from the second group of experiments (Fig. 4) shows that the mechanisms for DAC, or the Lombard effect in general, are more complicated than the SNR model. Of note, the Lombard effect is not always accompanied by a DAC.
How might the bats achieve DAC between the CF and the FM components? Because the vocalization amplitude of mammals is largely determined by the subglottal air pressure (Fitch, 2000; Luo and Wiegrebe, 2016), DAC can be achieved by increasing the subglottal air pressure when producing the FM component compared with when producing the CF component. What is less clear is what auditory information the vocal-motor system of the bats uses to perform the DAC tasks. As illustrated in Fig. 3A, one plausible solution would be processing the CF and FM components of the call independently with the ambient noise, i.e. independent processing. This hypothesis assumes two capabilities of the Hipposideros bats: (1) the vocal-motor system of the Hipposideros bats is capable of adjusting the amplitude of the CF and FM components independently; (2) the auditory system of the Hipposideros bats is capable of processing the CF and FM components independently. Both assumptions are likely to be true for CF–FM bats, which recruit the two components for distinct spatial navigation tasks (Schnitzler and Kalko, 2001; Simmons and Stein, 1980; Smotherman and Metzner, 2005; Suga, 1990). Nevertheless, the nature of the auditory information that guides the fine-scaled amplitude control at the sub-call level remains unclear.
What role does DAC have for Hipposideros bats? The relative amplitude of neighboring syllables within words, i.e. lexical stress, plays a central role in speech recognition (Field, 2005). The finding that the Lombard effect increases the degree of lexical stress in human speech supports the notion that the Lombard effect in humans has an additional benefit in communication other than improving the overall SNR (Arciuli et al., 2014; Hotchkin and Parks, 2013; Lane and Tranel, 1971; Patel and Schell, 2008). As the CF component is much stronger than the FM component and thus determines the energy of the call, DAC does not influence the overall SNR of the call. Thus, these data fit with the emerging concept, proposed by researchers from the linguistics community, of viewing the Lombard effect as a two-level adaptation to counteract ambient noise: improving the overall SNR and enhancing the lexical or prosodic stress (Bosker and Cooke, 2018, 2020; Garnier and Henrich, 2014). We speculate that, as in humans, the DAC in bats has additional benefits other than improving the overall SNR of the call. Because the FM component is ideal for localizing objects (Simmons and Stein, 1980), DAC might contribute to improving the precision of spatial localization in noise.
One surprising result from this study is that non-overlapping noise caused a significant Lombard effect, with the median amplitude increases ranging from 2.6 to 5 dB for H. armiger and from 3.3 to 9.6 dB for H. pratti (Fig. 4A,B,D,E, conditions 1–3). Considerable evidence has shown that the Lombard effect is mediated by the spectral content of the noise (e.g. birds: Brumm and Todt, 2002; bats: Hage et al., 2013; Luo et al., 2015; Tressler and Smotherman, 2009; primates: Hotchkin et al., 2015; humans: Lu and Cooke, 2009; Stowe and Golob, 2013). One consistent finding from past research is that the wideband noise typically results in the strongest Lombard effect, whereas non-overlapping noise results in the weakest or no Lombard effect. Although our current study supports the view that wideband noise is most effective in inducing the Lombard effect, the apparent Lombard effect elicited by non-overlapping noise contradicts sharply with previous research. One explanation for this discrepancy is that although in this study the non-overlapping noise did not spectrally overlap with the dominant second harmonic of the calls on which the analyses were based, it overlapped with the first harmonic of the calls. Like other CF–FM bats, the first harmonic of the echolocation calls in Hipposideros bats is much weaker in energy than the second harmonic. It has long been proposed that the FM component of the first harmonic is required for target ranging based on neurophysiological investigations (Suga, 2015), despite a lack of behavioral data. Further experiments are required to test whether the strong Lombard effect observed for non-overlapping noise is due to its interference with the weak first harmonic of the calls. If so, this would serve as a first dataset to address the question of whether echolocation behavior per se affects the Lombard effect (Hotchkin and Parks, 2013).
In summary, we have shown that echolocating bats exhibit fine-scale amplitude compensation at a sub-call level, demonstrating a high degree of vocal flexibility in interfering noise. Our data highlight echolocating bats as a potential mammalian model for understanding vocal production control, in addition to its decades-long and dominant contribution to understanding spatial navigation (Fenton and Simmons, 2014; Genzel et al., 2018; Schnitzler et al., 2003; Wohlgemuth et al., 2016).
We thank Zhongdan Cui for help during data collection and Ziying Fu for discussions. Two anonymous reviewers provided critical and constructive comments for improving the manuscript.
Conceptualization: J.L.; Methodology: J.L.; Software: J.L.; Validation: M.L., G.Z.; Formal analysis: J.L.; Investigation: M.L., G.Z.; Writing - original draft: J.L.; Writing - review & editing: M.L., G.Z., J.L.; Supervision: J.L.; Funding acquisition: J.L.
This study was funded by the Career Development Award from Human Frontier Science Program (CDA00009/2019-C), the National Natural Science Foundation of China (31970426), and the Open Project Program of Jilin Provincial Key Laboratory of Animal Resource Conservation and Utilization at Northeast Normal University (130028753 and 130028827) to J.L.
The authors declare no competing or financial interests.