We investigated the possibility of conditioned dampening of whale hearing thresholds when a loud sound is preceded by a warning sound. The loud sound was a tone of 20 kHz, 170 dB re. 1 μPa, 5 s. Hearing sensitivity was measured using pip-train test stimuli and auditory evoked potential recording. The same test-sound stimuli served as warning sounds. The durations of the warning sounds were varied randomly to avoid locking an anticipated conditioning effect to the timing immediately before the loud sound. When the warning sound lasted from 1 to 9 s or from 5 to 35 s prior to the loud sound, hearing thresholds before the loud sound increased, relative to the baseline, by 12.7 and 7.3 dB, respectively. When the warning sound duration varied within a range of 20 to 140 s, the threshold increase was as low as 3.0 dB. The observed hearing threshold increase was not a result of the unconditioned effect of the loud sound, like a temporary threshold shift, so it was considered to be a manifestation of a conditioned dampening of hearing when the subject anticipated the quick appearance of a loud sound, most likely to protect its hearing.
Loud anthropogenic sounds have been associated with the stranding of whales and dolphins (Evans and England, 2001). Current mitigation procedures to protect whales and dolphins from intense sound focus on finding and avoiding marine mammals. Given their rapid movement and the difficulty of detecting many marine mammals (Madsen et al., 2005), alternative mitigation strategies might be a reasonable augmentation to current efforts.
One way to mitigate the effects of sound might be to have the animals self-protect by changing their hearing sensitivity. Animals work to avoid and escape from loud sounds that they do not generate themselves. In fact, loud sound can be as noxious a stimulus to rats as electric shock. Belluzzi and Grossman (Belluzzi and Grossman, 1969) trained rats to jump through doors to avoid either loud sound or electric shock and found that the two aversive stimuli were equally effective as motivators to learn to pass through a door when signaled to do so with a light. Bolles and Seelbach (Bolles and Seelbach, 1964) established that the cessation of intensive noise was a particularly effective reward when the behavior that was learned served to provide an escape. An index of how aversive particular types of sounds may be has been established based on the escape/avoidance behavior of various seal species (Götz and Janik, 2010). Based on the observations of the Japanese animals during whale drives, and the association of loud sounds and stranded animals, one may assume that loud sounds are, at least sometimes, similarly aversive to whales and dolphins.
Apart from behavioral avoidance, the damping of hearing sensitivity may be an effective mechanism of mitigation of the effects of loud sounds. Changes in hearing sensitivity are perhaps most readily observed with the acoustic, or stapedial, reflex. Humans producing loud sounds reduce their hearing sensitivity by reflexively tightening the muscles in the middle ear (Hung and Dallos, 1972) while some bats, during echolocation, similarly contract their middle ear muscles synchronously with vocalization to attenuate the amount of self-stimulation by as much as 20 dB (Henson, 1965). Further work measuring the cochlear microphonics of echolocating bats (Suga and Shimozawa, 1974) showed that both neural events and the middle ear muscles attenuated hearing during the acoustic reflex. The total attenuation by both the neural and the muscular events was shown to be 35 to 40 dB with 20 to 25 dB contributed by the muscles and the rest by neural events. Generally speaking, mammals have evolved mechanisms to protect their auditory systems from self-produced intense sounds.
Recent work demonstrated that a false killer whale was capable of changing its hearing sensitivity while it echolocated (Nachtigall and Supin, 2008). While there have been no studies of the cochlear microphonics measuring the acoustic reflex of echolocating whales and dolphins, measures of the auditory evoked potentials of the self-hearing of both outgoing clicks and returning echoes have shown that odontocete hearing sensitivity changes to optimize the hearing of echoes (Nachtigall and Supin, 2008; Supin et al., 2010; Linnenschmidt et al., 2012; Li et al., 2011; Supin and Nachtigall, 2013). Overall, the hearing sensitivity of the false killer whale was also shown to be more acute when the animal was searching for targets than when targets were easily found (Supin et al., 2008).
If whales and dolphins can change their hearing sensitivity during echolocation, it is reasonable to assume that they might learn to change their hearing in other situations to protect themselves when faced with intense sounds. This study investigated whether a whale would change its hearing sensitivity when provided with a warning signal that an intense sound was just about to arrive.
MATERIALS AND METHODS
Experimental facilities and subject
The study was carried out at the facilities of the Hawaii Institute of Marine Biology, Marine Mammal Research Program. The subject was an originally wild-caught female false killer whale, Pseudorca crassidens (Owen 1846), assumed to be between 30 and 40 years old. The subject was trained to accept suction-cup electrodes for brain-potential recording, to swim into a hoop station and to listen to the sound stimuli. She had hearing loss for frequencies above 35 kHz; however, her hearing sensitivity within a range of up to 25 kHz (Yuen et al., 2005) was nearly normal compared with the majority of other odontocetes (Supin et al., 2001). The subject was housed in a floating pen complex. Experiments were carried out in a section of the pen complex that was 8×10 m in size.
This research project was approved under a National Marine Fisheries Service permit (978-15670-02) and University of Hawaii IACUC approval.
Each experimental session started by calling the subject to the trainer and attaching surface suction cups containing gold electrodes for brain-potential recording. The 10-m-long thin flexible cables connecting the suction cups to the equipment allowed the whale to move throughout the entire volume of the experimental pen. After the suction cups were attached, 50 experimental trials were run.
Each trial started by sending the subject to a listening station (a hoop fastened at a depth of 80 cm below the water surface). During stationing, low-level test sounds were played which served to measure hearing sensitivity (see below, ‘Signal parameters and presentation timing’). During the presentation of the test sounds, auditory evoked potentials (AEPs), specifically, the envelope following responses (EFRs) to these test stimuli, were recorded. These responses served to measure hearing sensitivity (see below, ‘AEP acquisition and hearing-sensitivity assessment’). Right after the low-level test sound, a high-level sound (referred to below as the loud sound) was played. Thus, because the test sounds always preceded the loud sound, they also served as conditioning stimuli, warning the subject of the forthcoming loud sound. After the end of the loud sound, a secondary reinforcing whistle was blown and the animal received fish reinforcement.
Signal parameters and presentation timing
The duration of the test (warning) sound varied randomly from trial to trial (Fig. 1A). The ranges of variation of the test-signal duration during a session were different in three experimental series performed successively: Series 1, from 20 to 140 s, mean 80 s; Series 2, from 5 to 35 s, mean 20 s; and Series 3, from 1 to 9 s, mean 5 s. Thus, the mean durations of test signals were four times greater in Series 2 than in Series 1 and in Series 3 than in Series 2. The random variation of the test signal duration in each of the series served to exclude the possibility of linking a conditioning effect to a particular time interval after the test (warning) signal. Otherwise the conditioning effect could have appeared exactly before the loud sound and not have been revealed by the test signals.
The test signals were rhythmic trains of tone pips. The trains were presented at a rate of 20 s−1 (Fig. 1B). Each train contained 17 pips at a rate of 875 s−1 (Fig. 1C). Each pip contained eight cycles of 20 kHz carrier frequency (Fig. 1D). From trial to trial, levels of the test signals varied up and down from 80 to 120 dB re. 1 μPa r.m.s. According to previous measurements (Yuen et al., 2005; Supin et al., 2008), this range of levels was expected to be from 0 to 40 dB of sensation level. These sounds served also as the conditioning (warning) signals. Irrespective of the response presence or absence, the entire 80–120 dB range was examined to obtain information on the response magnitude at both threshold and supra-threshold levels; i.e. the variation of the test signal level was not induced using a typical adaptive staircase procedure that keeps the stimulus level around the threshold.
Immediately (without a gap) after the test (warning) signal, a loud sound was played. It was always a 20 kHz tone 170 dB re. 1 μPa r.m.s. lasting 5 s (Fig. 1A). In all series, the interval between the loud sound and the beginning of the test signal of the next trial was 90±15 s.
In the initial baseline experimental series, the same test signals, varying within the same level range as in the experimental series, were presented with the same inter-trial intervals; however, they were not followed by a loud sound.
Instrumentation for sound generation and data collection
Both the test and loud sounds were digitally synthesized by a standard personal computer using a custom-made program (Virtual Instruments) designed with the use of LabVIEW software (National Instruments, Austin, TX, USA). The synthesized signal waveforms were played at an update rate of 256 kHz through a 16 bit digital-to-analog converter of a USB-6251 acquisition board (National Instruments). The test signals were amplified by a custom-made power amplifier (passband of 1 to 150 kHz), attenuated by a custom-made low-noise resistor attenuator, and played through an ITC-1032 piezoceramic transducer (International Transducer Corporation, Santa Barbara, CA, USA) positioned at a depth of 80 cm (i.e. the same depth as the hoop station center) at a distance of 1 m in front of the animal's head.
Signals for the loud sound were amplified by a Hafler P3000 power amplifier (Hafler, Tempe, AZ, USA) and played through the same transducer. The transducer was connected alternatively to the test-sound attenuator or to the loud-sound power amplifier through an electromagnetic relay, so the background noise of the power amplifier never overlapped the low-voltage (down to 1 mV) test signals. The transducer was re-connected simultaneously with the loud sound onset, to avoid any cue preceding the loud sound.
Both the test and loud sounds were calibrated by a B&K 8103 hydrophone (Brüel & Kjær, Nærum, Denmark) positioned in the hoop station in the absence of the subject.
Brain potentials were picked up through 10 mm gold-plated surface electrodes mounted within 50 mm silicon suction cups, the active electrode at the vertex, and the reference electrode at the dorsal fin. Brain potentials were fed through shielded cables to a balanced custom-made brain-potential amplifier based on an AD620 chip (Analog Devices, Norwood, MA, USA) and amplified by 60 dB within a frequency range from 200 to 5000 Hz. The amplified signal was entered into a 16 bit analog-to-digital converter that was one of the A/D channels of the same USB-6251 acquisition board that served for sound generation. The digitized signals were stored and processed on a standard personal computer.
AEP acquisition and hearing-sensitivity assessment
The hearing-sensitivity assessment was based on recording the EFRs to the test tone pips. The brain potentials were averaged on-line within every trial. To assess hearing sensitivity, the test signal varied in level from record to record by ±5 dB steps. In the series with both 1–9 s and 5–35 s test signals, one level was presented during each trial, and all the original records during a trial were averaged on-line; in the series of 20–140 s signals, one to five levels (depending on the signal duration) were presented during a trial, and the original records were averaged on-line in 20 or 30 s segments. EFR records obtained by on-line averaging at the same stimulus level were additionally averaged off-line among the trials to obtain a final low-noise EFR record. A 16 ms long part of the record, from the fifth to the 21st millisecond, containing the EFR was Fourier transformed to obtain its frequency spectrum. The spectrum peak magnitude at the stimulation rate (875 Hz) was taken as the EFR magnitude. The EFR magnitudes evaluated in this way were plotted as a function of test-signal level. An oblique part of the function was approximated by a straight regression line. This ‘oblique’ part of the function was defined as a part with point-to-point gradients not less than 10 nV per 5 dB level increment. This criterion allowed the separation of the level-dependent part of the voltage-versus-level function from its flat parts presenting the background noise and ‘saturation’ range at high stimulus levels. The point of interception of the regression line with the zero response magnitude level was taken as the threshold estimate (Supin and Popov, 2007).
Behavior associated with loud sound exposure
At the first presentation of the loud sound (after completion of the initial baseline series), an element of aversive behavior of the subject was observed as a short backward movement, but without leaving the hoop station. This ‘aversive’ behavior extinguished during the first experimental (with loud sound exposure) session after the first five or six trials. Later on, no ‘startle’ response similar to that observed by Götz and Janik (Götz and Janik, 2010) or aversive behavior was observed during both warning and loud sounds, and the animal stayed quietly in the hoop station until called back by the trainer.
In total, the results were based on 47 initial baseline trials, 139 conditioning trials of 20–140 s signals, 211 trials of 5–35 s signals and 201 conditioning trials of 1–9 s signals. The number of original records averaged for obtaining each of the final records varied from 2300 to 3900. With this number of averaged original records, background levels of the near-threshold spectra were of 2.3–4.9 nV within a range of 500–1250 Hz (i.e. ±375 Hz around the 875 Hz response peak).
The quality of obtained waveforms and their dependence on signal level in the baseline series are presented in Fig. 2A. The brain response records demonstrate a robust EFR as a series of waves at the 875 s−1 rate. A lag of approximately 4.5 ms relative to the stimulus confirms the neurophysiological origin of the waveforms. The frequency spectra of the records (Fig. 2B) featured definite peaks at the frequency of the stimulation rate of 875 Hz.
Both the waveforms and their frequency spectra demonstrated typical EFR magnitude dependence on stimulus level. As stimulus level increased from 85 to 110 dB re. 1 μPa, response magnitude increased. As is typical, at a level of 110 dB re. 1 μPa, the response magnitude reached a saturation level and further increases in level did not result in a response magnitude increase. Transition from an obvious response presence (a definite 875 Hz spectrum peak) to response absence (no spectrum peak exceeding the spectrum background) appeared within one step of the stimulus variation (from 90 to 85 dB re. 1 μPa).
20–140 s test durations
In this series, the range of EFR magnitude dependence on test stimulus level differed little from that in the baseline series (Fig. 3). Only at a test-stimulus level of 90 dB re. 1 μPa could any difference be noticed: the EFR magnitude (spectrum peak of 8 nV) was a little less than in the baseline series (14 nV). Nevertheless, similar to that seen in the baseline series, the transition from response presence to response absence appeared within the same level interval between 90 and 85 dB re. 1 μPa.
5–35 s test durations
In this series, the response dependence on test stimulus level differed from that in the baseline session (Fig. 4). A stimulus level of 90 dB re. 1 μPa, which produced a small but definite response in the baseline series, produced no response in this series. Response magnitudes at other stimulus levels were also less than in the baseline series.
1–9 s test durations
Records obtained in this series were of lower quality – more contaminated by noise – than in the other series. This was a natural consequence of fewer averaged original records obtained during shorter presentations of the test stimuli (2300 to 2500 as compared with 3500 to 3900 in the previous series). Nevertheless, the background spectrum level of near-threshold records did not exceed 4.9 nV, thus allowing for the detection of low-voltage threshold response peaks. A substantial difference of this series (Fig. 5) from the baseline series was obvious. At each particular test stimulus level, the EFR magnitude was less than in the baseline series. The lowest stimulus level producing a noticeable response was a level of 105 dB re. 1 μPa, as compared with 90 dB re. 1 μPa in the initial baseline series. At a level of 100 dB re. 1 μPa, the response peak in the record spectrum disappeared.
Thresholds at short test duration and long inter-trial interval
In the series with test stimulus durations from 20 to 140 s, the test stimuli acted in a different manner as compared with the 5–35 s and 1–9 s series. In the 5–35 s and 1–9 s series, test stimuli were presented approximately 1.5 min after the loud sound in the preceding trial, whereas in the 20–140 s, the latest stimuli were delayed up to twice as long. In order to verify how this difference may have influenced the thresholds, a series was performed with 5–35 s durations of the test stimulus, but with twice prolonged (up to 180±15 s) inter-trial intervals. Thus, the warning sound durations in this series were 5–35 s; however, the inter-trial intervals were within the same range as in the 20–140 s series.
The results of this series are presented in Fig. 6. There was no noticeable difference from the series with equal test duration ranges (5–35 s) but shorter (90±15 s) inter-trial intervals (see Fig. 4). In both of the series, a stimulus level of 95 dB re. 1 μPa produced a minimal response whereas a level of 90 dB re. 1 μPa produced no response. Response magnitudes at other stimulus levels were also similar in the two series.
Thresholds in the early part of the long test stimulus
When the test stimulus duration varied from 20 to 140 s, the earliest stimuli were presented approximately 1.5 min after the loud sound in the preceding trial, whereas the latest stimuli were delayed almost twice longer. In order to verify how this difference might influence the thresholds, the first of the 20 s long segments of each record in this series was selected, and the final off-line averaged records were obtained for stimuli presented during this 20 s (the 20 s duration was the same as the mean duration in the 5–35 s series). Thus, the obtained final records characterized responses occurring not longer than 20 s after the onset of the test/warning stimulation, although the warning signal lasted from 20 to 140 s.
Records obtained in this way and their frequency spectra are presented in Fig. 7. The records and their spectra look similar to those obtained from the total population of test stimuli varying from 20 to 140 s. The lowest stimulus level that produced a response peak just detectable in the spectrum background was 90 dB re. 1 μPa. Response magnitude at a high stimulus level of 115 dB re. 1 μPa (120 nV) was almost the same as for the total stimulus population in this series (114 nV; see Fig. 3).
Threshold estimates in the baseline and conditioning series
All the results are summarized in Fig. 8 as EFR magnitude-versus-test level functions. The oblique parts of the functions selected as described in the Materials and methods (voltage increments not less than 10 nV per 5 dB level increment, i.e. 2 nV dB−1) could be satisfactorily approximated by straight regression lines (r2=0.97 to 0.99). The slopes of the regression lines (±s.e.m.) ranged from 4.3±0.2 to 5.5±0.3 nV dB−1. These regression lines were used to quantitatively estimate the response thresholds. The results of the regression analysis (zero-voltage crossing ± s.e.m., dB re. 1 μPa) were:
87.0±0.7 dB in the baseline series;
90.0±1.1 dB in the conditioning series with 20–140 s test stimulus durations (+3.0 dB relative to the baseline);
94.3±0.7 dB in the conditioning series with 5–35 s test stimulus durations (+7.3 dB relative to the baseline);
99.7±0.7 dB in the conditioning series with 1–9 s test stimulus durations (+12.7 dB relative to the baseline);
94.1±1.1 dB in the series with 5–35 s test stimulus duration and prolonged inter-trial interval (+7.1 dB relative to the baseline, −0.2 dB difference from the series with the same test durations and non-prolonged inter-trial intervals); and
90.3±0.9 dB for the initial 20 s segment of 20–140 s stimulus duration (+3.3 dB relative to the baseline, +0.3 dB difference from the overall series data).
It is notable that the inter-series difference manifested itself not only in the threshold estimates. The whole magnitude-versus-level functions were shifted relative one another. So the inter-series difference could not be attributed to any imprecision in the threshold evaluations themselves.
Thus, the presentation of a loud sound after a test/warning sound resulted in an increase of the hearing thresholds. The maximum increase of 12.7 dB appeared in the series with the shortest delays of the loud sound after starting the warning sound (a delay range from 1 to 9 s).
The data presented above demonstrate changes in hearing sensitivity when a warning signal was presented prior to the presentation of the louder 170 dB 5 s signal. If the warning sound occurred shortly before (within 1–9 s or 5–35 s) the louder sound, the animal's sensitivity shifted and hearing thresholds increased.
The correct interpretation of the data presented above requires an answer to a crucial question regarding the nature of the observed threshold increase: is the change in hearing due to some sort of learning effect or is it some other phenomenon? If it were due to some sort of learning or conditioning, then the test stimulus would serve as a warning signal. The whale learned to dampen its hearing to protect it from the loud sound and did so as soon as the test/warning signal was presented. If the change were due to some sort of non-conditioning effect, then some other process must be found to explain the hearing shift.
It is well known that presentation of a loud sound can result in a temporary or permanent decrease of hearing sensitivity – temporary or permanent threshold shift (TTS or PTS), respectively. This effect has been investigated in detail in both terrestrial mammals (reviewed by Miller et al., 1963; Clark, 1991) and humans (reviewed by Melnick, 1991), and is under investigation in cetaceans (reviewed by Southall et al., 2007). In the experiments described above, the test stimulus was presented as soon as approximately 1.5 min after the loud sound in the preceding trial. Moreover, the loud sound was presented many times during an experimental session. Within these conditions, neither a short-term TTS effect – occurring immediately after the previous loud sound – nor a long-term TTS effect, because of multiple presentations of the loud sound, could be totally excluded by definition without a careful examination of the data.
A regular control to separate a conditioning and a non-conditioning effect would be to present the non-conditioned stimuli without the conditioning stimuli. In our case, it might be the presentation of the loud sound without the preceding warning sounds. However, this control design was not applicable to our study because the presentation of the test stimuli before the loud sound was necessary for sensitivity measurements in every trial, and the test stimulus was expected to serve as a conditioning signal irrespective of an intention to use it as experiment or control.
Fortunately, an examination of the data themselves provided the necessary evidence to rule out the TTS effect. The evidence follows from the difference in the data between the experimental series. In all of the series, there was the same number of loud sound exposures of the same level and duration. In all the series, there were the same, or similar, delays between the loud sound and the test of the next trial. Note that in the 5–35 s series, the mean duration of the test train was 20 s, so the mean delay after onset of the test sound was 10 s; together with the mean 90 s inter-trial pause (see Fig. 1A), the mean post-exposure delay after the preceding loud sound was 100 s. The mean delay of the test stimuli in the initial 20 s segment of the 20–140 s series was exactly the same. In the 1–9 s series, the mean as calculated in the same way was 92.5 s, i.e. only differing a little from the two preceding cases. If the observed increased thresholds were a result of direct non-conditioned action of a loud sound, such as a TTS, the effects should be negligibly different in all of these cases. But that did not occur.
The effect was substantially different between the different series (see Fig. 6): the small threshold increase (3.3 dB) when the delays between the warning and following loud sound varied from 20 to 140 s, even if only the early 20 s segment of the test train was considered; a much higher threshold increase (12.7 dB) occurred at short (1–9 s) delays; and an intermediate increase (7.3 dB) occurred at intermediate (5–35 s) delays. These differences cannot be explained by direct action of a preceding overall loud sound such as a TTS.
Another possibility of TTS manifestation to be considered might be as follows. During the investigation, we first performed the control series, then successively the 20–140 s, the 5–35 s, and then the 1–9 s series. So one might suppose that during the investigation, a long-term TTS effect was accumulated, and thus each successive series featured higher thresholds than the previous one. Although we have not found any indication of this possibility in the literature, a conservative approach requires its consideration. The results of the series with 5–35 s warning signal durations and prolonged inter-trial intervals contradict this possibility. This series was performed after the 1–9 s series. If the observed threshold increase appeared due to a long-term cumulative TTS effect, thresholds in this series should be higher than in the 1–9 s series. However, thresholds in this series were in fact lower than in the 1–9 s series and the same as in the 5–35 s series performed previously. Thus, no long-term TTS effect manifested itself either.
Therefore, the threshold increase resulted from an interaction between the warning signal and the following loud sound. In other words, a conditioned regulation of hearing sensitivity took place. The animal learned to change its hearing sensitivity when warned that a loud sound was about to arrive.
Assuming that the observed threshold increases manifested a conditioning effect, the next conclusion that can be drawn is that this effect is sensitive to the delay between the warning and the loud sound. Effective conditioning appeared only when the warning occurred as short as a few seconds before the loud sounds. We may hypothesize, therefore, that the subject was capable of dampening its hearing when it anticipated the quick appearance of a loud sound. But the animal further had an intrinsic motivation to keep the hearing sensitivity high enough to not dampen hearing for a long time. Further investigations of this phenomenon must investigate the time relationships between stimuli and the course of extinction in order to validate it as a conditioning phenomenon.
Another intriguing question that still cannot be answered is: how quickly does the conditioning appear? To characterize the response quantitatively, with satisfactory precision, it was necessary during this experiment to average records from many trials and several sessions. This procedure resulted in the loss of the temporal dynamics of the conditioning process. Hopefully, further elaboration of the technique may help to answer this question.
If this conditioning process is further validated and replicated, it may have several consequences. On the one hand, the possibility of active protective regulation of hearing sensitivity should be taken into consideration when experimental data are used to assess the effects of loud sounds on marine mammals in the wild. Experienced experimental animals may dampen their hearing when exposed to loud sounds, thus mitigating their effects, whereas naïve animals in the wild may by more susceptible. On the other hand, the conditioning process may prove to be a valuable tool in the practical protection of whale and dolphin hearing. Short, loud anthropogenic noises placed in the animal's environment might be partially mitigated by providing less intense warning sounds before the loud sound is received by the animal, thus allowing the whale to proactively change its hearing sensitivity for protection.
Continued thanks to Dr Michael Weise and Dr Robert Gisiner. This manuscript is contribution number 1556 of the Hawaii Institute of Marine Biology.
This research project was supported by an Office of Naval Research grant to P.E.N. [N00014-12-1-02-12].
LIST OF ABBREVIATIONS
No competing interests declared.