SUMMARY
Echolocating toothed whales produce high-powered clicks by pneumatic actuation of phonic lips in their nasal complexes. All non-physeteroid toothed whales have two pairs of phonic lips allowing many of these species to produce both whistles and clicks at the same time. That has led to the hypothesis that toothed whales can increase the power outputs and bandwidths of clicks, and enable fast clicking and beam steering by acutely timed actuation of both phonic lip pairs simultaneously. Here we test that hypothesis by applying suction cup hydrophones on the sound-producing nasal complexes of three echolocating porpoises (Phocoena phocoena) with symmetrical pairs of phonic lips. Using time of arrival differences on three hydrophones, we show that all recorded clicks from these three porpoises are produced by the right pair of phonic lips with no evidence of simultaneous or independent actuation of the left pair. It is demonstrated that porpoises, despite actuation of only one sound source, can change their output and sound beam probably through conformation changes in the sound-producing soft tissues and nasal sacs, and that the coupling of the phonic lips and the melon acts as a waveguide for sound energy between 100 and 160 kHz to generate a forward-directed sound beam for echolocation.
INTRODUCTION
All studied toothed whales use echolocation for orientation and foraging by the emission of high-powered, directional clicks and subsequent reception and processing of returning echoes with a detection and discrimination performance that rivals man-made sonars at short ranges (Au, 1993). Despite dedicated research over the last 40 years, in part motivated by a desire to design biomimetic sonar systems, we still do not fully understand how toothed whales can generate ultrasonic transients with source levels between 180 and 240 dB re. 1 μPa (p.-p.) with their nasal complexes. The current understanding is that echolocation clicks are generated by pneumatic actuation of pairs of phonic lips that couple sound energy into the water via the fatty melon (Ridgway et al., 1980; Cranford, 2000, Madsen et al., 2002). When a small volume of pressurized air moves from the nasopharyngeal sac to the vestibular air sacs via regulation of the nasal plugs (Ridgway and Carder, 1988), the phonic lips will be accelerated whereby the click is generated (Cranford et al., 1996; Cranford and Amundin, 2003; Dubrovsky et al., 2004).
Most delphinid toothed whales can, besides clicks, generate long tonal whistles that may be emitted concomitantly with echolocation clicks (Brill and Harder, 1991). This capability of dual sound production has been explained by the simultaneous use of two sets of phonic lips so that the left pair of phonic lips are envisioned to be involved in whistle production and the right pair in click production (Cranford, 2000); both systems are powered pneumatically by air that can be recycled (Dormer, 1979). The evidence for simultaneous production of both clicks and whistles has led to the proposition that some toothed whales may also be capable of generating a click by using both pairs of phonic lips simultaneously (Cranford et al., 1996). Dual click production by phonic lip pairs of different sizes, and hence probably different resonance frequencies, have been proposed to be the explanation for the two spectral peaks often seen in delphinid echolocation clicks (Au et al., 1995; Cranford et al., 1996). Further, it has been hypothesized that microsecond timing of the delay between actuation of the two phonic lip pairs may serve to increase the overall acoustic power output of the toothed whale forehead (Cranford et al., 1996) and provide the basis for active beam steering (Moore et al., 2008; Lammers and Castellote, 2009).
The dual sound source model for toothed whale click production has been formulated on the basis of anatomical observations (Cranford et al., 1996), spectral analyses of clicks (Au et al., 1995) and preliminary endoscope observations (Cranford, 2000) but there has been little empirical testing of it. Recently, however, Lammers and Castellote presented intriguing data from two-channel hydrophone recordings of an echolocating beluga whale (Lammers and Castellote, 2009). They report that echolocation clicks recorded on the acoustic axis consist of one single pulse that breaks up into two discrete parts that exhibit increasing interpulse intervals when recorded further and further off axis.
Lammers and Castellote explain this double pulse pattern that arises off the acoustic axis as being the result of near simultaneous actuation of bilateral sound sources with small delays possibly controlled by the whale to allow for beam steering (Lammers and Castellote, 2009). They proceed to advance the hypothesis that the double source sound production may be found in all echolocating toothed whales. However, these conclusions are at odds with previous findings (Dormer, 1979; Mackay and Liaw, 1981; Amundin and Andersen, 1983) and modeling efforts (Aroyan et al., 2000) whose results indicate that the right pair of phonic lips is primarily used to produce echolocation clicks. Further, the intrapulse intervals in the Lammers and Castellote study are very long, and hence difficult to reconcile with the physical separation of the two pairs of phonic lips of some 10–15 cm and the speed of sound in tissue around 1500 m s–1. The conflict between some previous reports and the recent intriguing findings motivated us to experimentally test the two source model for toothed whale sound production using hydrophones applied in suction cups on the foreheads of harbour porpoises (Phocoena phocoena L.); a species that carries nearly symmetrical pairs of phonic lips. Here we present data from three echolocating porpoises showing that for all the clicks measured only the right pair of phonic lips is used for click production, but that porpoises are still capable of dynamic beam formation with a fatty melon that acts as a wave guide for signal energy between 100 and 160 kHz.
MATERIALS AND METHODS
Three porpoises (one adult male, two adult females) kept at the Fjord and Belt Centre in Kerteminde (Denmark), were trained to wear suction cup hydrophones in different configurations on the head while echolocating for a fish reward in the water in front of them. Two hydrophone configurations were used: configuration A involved a hydrophone just above each eye, laterally of the blowhole, but in line with the two pairs of phonic lips, and a third hydrophone on the melon, antero-medially relative to the blowhole (Figs 1 and 2). Configuration B consisted of two hydrophones placed symmetrically on the melon on each side of the midline (Fig. 3).
The suction cups (diameter of 50 mm) were custom made from medical grade silicone. Each suction cup contained a spherical hydrophone element [similar to that of a Reson 4034 hydrophone (Reson, Slangerup, Denmark)] and a 20 dB preamplifier with a 400 Hz first-order high-pass filter. Each of the suction cup hydrophones were calibrated against a Reson 4034 hydrophone in an anechoic tank over the frequency range from 50 to 300 kHz. The sensitivity of the suction cup hydrophones was measured to be –188 dB re. 1 V/1 μPa ±2 dB in the range from 50 to 300 kHz. The hydrophones were for the first series of measurements connected to a conditioning box with 40 dB gain and a band pass filter (1 pole high-pass filter at 1 kHz and a 4 pole low-pass filter at 200 kHz). The output from the filter box was relayed to two synchronized multifunction cards (National Instruments NI6251, Austin, TX, USA) sampling at 500 kHz per recording channel (16 bits). Identification of very high frequency components around the Nyquist frequency of 250 kHz in the recorded clicks from the first series of recordings led us to use a different setup repeating the measurements with a conditioning box with a 1 kHz high-pass filter, 32 dB gain and no low-pass filter, sampled at 1 MHz per channel. The suction cup hydrophones, filter boxes and ADC channels were swapped between sessions to control for potential biases in the recording chain.
For configuration A the clicks were analyzed by extracting clicks from the three channels based on the peak of the detected signal on the central melon hydrophone. Clicks on the three channels were then compared with respect to time of arrivals, amplitude and spectral content. For configuration B, clicks were found by peak-detecting clicks on the right hydrophone recording because this channel rendered the highest received levels and hence the best signal-to-noise ratios (SNR). We made 10 recordings with each animal using a sampling rate of 500 kHz per channel, and 10 recordings for each animal with a sampling rate of 1 MHz per channel. We analyzed more than 10,000 clicks from each animal for which the SNRs on all channels were better than 10 dB.
We pursued several analytical avenues for automated, objective derivation of the time of arrival differences between the different hydrophones in order to enable formation of statistical distributions of the time delays. Normally, cross-correlation is a powerful tool for determining time delays but in this case it proved futile as it is not the same waveform that reaches the different receivers (Fig. 1A,B). Any distributions of time delays formed by this method will then be an unknown mix of actual variability in time delays and differences in the waveforms and hence timing of the cross-correlation peaks. Another method would involve the automated measurements of click onsets of the different waveforms on the three channels but such an approach is highly sensitive to SNR, and would require that the SNR for the same waveform is the same on all three channels, which is not the case. However, one robust and easily identified timing feature in the waveforms was the peak of the envelope of the clicks recorded on the center hydrophone. Again it could be tempting to compare that with the peak of the envelopes on the two lateral hydrophone channels but given that they have poorer SNR and a different and variable frequency content this will not be a meaningful approach either. After running trials with these different approaches, we settled on using the peak of the envelope of the clicks from the center hydrophone to form stacked energy plots of the three channels aligned to time zero by the peak of the center envelope (Fig. 2). This method allows for a rapid qualitative assessment of a large number of clicks, where any changes or reversals in time delays from the right to the left side or vice versa will be easily identified. Using this graphical approach allows for the detection of trends that are consistent across many clicks, because these present themselves as lines in the plot (Fig. 2). Hence, we used that method to analyze more than 10,000 clicks from 20 different recordings from each of the three porpoises. All analyses were performed using custom written scripts in Matlab 6.5 (MathWorks, Natick, MA, USA).
RESULTS
In recording configuration A, we placed the hydrophones in line with the two pairs of phonic lips using the porpoise eyes and blowhole as landmarks (Fig. 1). Given the high degree of bilateral symmetry of the porpoise nasal complex (Huggenberger et al., 2009), this configuration allowed us to use time of arrival differences to test the hypothesis that both pairs of phonic lips are actuated simultaneously during click production. If both pairs are actuated simultaneously, we would expect the time of arrival to be the same on the two laterally placed hydrophones. By contrast, if only one of the phonic lip pairs generates sound during click formation, we would expect that the difference in time of arrival on the two lateral hydrophones would correspond to the physical separation of the two pairs divided by the sound speed in that tissue. For all clicks that could be analyzed for all three animals we found that the times of arrival on the left side were delayed between 20 and 50 μs compared with the times of arrival on the right side (Figs 1 and 2). Time delays between 20 and 50 μs correspond to travel paths of 3–7.5 cm in soft tissue (using a sound speed of 1500 m s–1), and we therefore infer that all clicks must be produced by a source that is 3–7.5 cm closer to the right hydrophone than the left hydrophone. We therefore find that our data does not support the hypothesis that two symmetrical pairs of phonic lips are actuated at the same time, rather it is demonstrated that all recorded clicks are produced by the right pair of phonic lips (Figs 1 and 2).
The variations in time delays between 20 and 50 μs are not linked to the relative output, click intervals or particular animals, and may therefore rather be due to small differences (±1cm) in hydrophone placement between sessions and animals, and conformation changes in the soft anatomy and air sacs of the sound production apparatus during clicking (Cranford, 2000). The waveforms and spectra of sounds recorded on the center hydrophone closely resemble those of clicks recorded at close range on the acoustic axis in the acoustic far field with a centroid frequency around 130 kHz and a narrow –10 dB bandwidth (Fig. 1C, black curve). On the two lateral hydrophones, however, the clicks have received levels that are generally lower by more than 10 dB (Fig. 1), and spectra showing little energy around 130 kHz with frequency centroids above 200 kHz and broad bandwidths (Fig. 1).
If the porpoise nasal complex has a fixed radiation pattern and an acoustic axis with constant offset to the body axis (Au et al., 1999), the relationship between received levels on different, fixed parts of the melon should be constant. To test whether porpoises, like the bottlenose dolphin (Moore et al., 2008), are capable of dynamic beam formation in the horizontal plane when using only one pair of phonic lips, we placed two suction cup hydrophones on the melon symmetrically on both sides of the midline on the three porpoises (configuration B, Fig. 3B). All analyzed clicks arrived on the right hydrophone before they did on the left hydrophone but with a smaller time of arrival difference of around 10 μs compared with the delay between right and center hydrophones in recording configuration A – again corroborating that only the right pair of phonic lips was actuated. The received levels fluctuate up and down on both hydrophones, and in general follow each other, but with a higher received level on the right side of the melon (Fig. 3C). However, Fig. 3A,C show that the difference in received level on the two sides of the melon is not constant. Indeed, for a few of the clicks described in Fig. 3A, it can be seen that the left side hydrophone records a slightly higher level than does the right one (but the right side clicks always arrive first). This flexibility in the received level differences between different points on the surface shows that porpoises can change the direction and/or the width of their sound beams with just a single active sound source in the form of the right pair of phonic lips. In other bilateral recordings of the B configuration, we often found click series where the levels were higher on the left side, although the signal always arrived at the right side receiver first. In those cases the envelope and spectrum of the right side clicks showed deep notches indicating that this amplitude difference might stem from destructive interference.
DISCUSSION
A fundamental problem when linking the functional morphology of the toothed whale nasal complex with click waveforms radiating from the system is to know at which aspect angle the click waveforms are recorded with respect to the sound-generating structures (Moore et al., 2008). Toothed whale clicks are highly directional, and off the acoustic axis they will therefore have lower amplitudes and increasingly distorted waveforms (Au, 1993). A flat piston model is often used to model the radiation patterns from toothed whale foreheads. One of the predictions from this model is that click waveforms should reduce in amplitude, bandwidth and centroid frequency for increasing off-axis angles in the far field, and eventually break up in two pulses formed by edge contributions from the flat piston (Au, 1993).
The observations made here in the acoustic near field of the porpoise foreheads only support the predictions of the flat piston model in part. While the received levels in general are lower on the two laterally placed hydrophones in recording configuration A relative to the centrally placed hydrophone, we find consistently that the centroid frequencies and –10 dB bandwidths of the click versions recorded on the two lateral hydrophones are higher than on the center hydrophone (Fig. 1). This finding is at odds with the predictions that the centroid frequency and bandwidths should be lower off axis (Au, 1993), and puzzling even given the distortion that is inherent to recordings in the acoustic near field (Au et al., 1978). It turns out that the high frequency parts above 160 kHz of the off-axis waveforms resemble very much the on-axis spectrum (Fig. 1) but with a conspicuous lack of excess energy around 130 kHz that porpoises use for echolocation. We interpret this to be the result of a filtering process where the coupling between the phonic lips and the melon, and the melon itself, acts as a waveguide with a high frequency cut-off around 160 kHz. So that the combination of the geometry of the phonic lips and air sacs and the supported modes of the melon waveguide seem to couple the bulk part of the sound energy centered around 130 kHz into the water forming a directional sound beam suited for echolocation.
We therefore infer that the phonic lips in the head seem to generate a more broadband transient with most energy around 130 kHz but with significant energy from 160 kHz and at least up to 350 kHz. However, only the sound energy from around 100–160 kHz is effectively collimated through the melon, and the energy at higher frequencies is radiated with less directionality. This is in agreement with the findings of Au et al. (Au et al., 2006) who used suction cup hydrophones to demonstrate that on average the highest output levels can be recorded where the low velocity core of the melon interfaces with the water, and lends weight to the view that the melon and the link between the phonic lips and the melon not only serve an impedance matching function but also to filter and collimate the sound energy from the phonic lips.
Returning to the issue of waveform shapes as a function of off-axis angles, it therefore seems from the outline above that a model incorporating the melon as a waveguide collimating the sound produced at the phonic lips will have more explanatory power for high off-axis angles. Clicks from such a structure recorded at angles close to on-axis will behave much like a flat piston but at higher off-axis angles the delays involved are expected to be larger than what can be generated from edge contributions of a flat piston (Au 1993; Beedholm and Møhl, 2006). For instance at 90 deg for the beluga in the Lammers and Castellote (Lammers and Castellote, 2009) study, the first pulse will arrive relatively directly from the side of the head whereas the last pulse will be made up of sound energy that has traveled first the posterior–anterior length of the melon and then exited from the anterior tip of the melon before traveling a longer path to the recording hydrophone. That means an extra traveled distance in water and tissue of at least 40 cm given a beluga melon–head length of some 50 cm. The data set of Lammers and Castellote (Lammers and Castellote, 2009) follows these predictions with time delays between the two pulses at 90 deg of some 250 μs corresponding to travel path differences on some 40 cm in tissue. However, the authors only briefly hint to this alternative geometric explanation, and proceed to interpret the double pulse waveforms as evidence for the dual sound source model proposed by Cranford et al. (Cranford et al., 1996). In order to explain the delay of 250 μs between pulses, Lammers and Castellote suggest that the sound is entering and exiting from an air-filled volume in the head, thereby delaying the sound excessively compared with a pure tissue path (Lammers and Castellote, 2009). Such a route is difficult to explain physically because the impedance mismatches involved will greatly reduce signal amplitudes in air compared with the sound paths in denser tissue with impedance close to that of the phonic lips producing the clicks. A simple test of the geometric interpretation of the results could be made by recording at different distances but a fixed angle off axis (say 45 deg) from the beluga. If the geometric interpretation is correct, it should result in smaller interpulse intervals when recording further away due to the smaller difference in traveled distance between direct path from the active phonic lip pair and the path from the front of the melon.
To avoid the problem of interpreting far field recordings around the head, we used suction cup hydrophones for near-field sound recordings (sensuDiercks, et al., 1971; Au et al., 2006) to test if there were one or two pairs of phonic lips active during sound production. Changing sound speeds, refraction and reflective air sacs preclude straight line triangulation of sound source locations in the toothed whale forehead (Diercks et al., 1971) but the bilateral symmetry of the porpoise nasal complex (Huggenberger et al., 2009) increases the chance that laterally placed hydrophones in line with the phonic lips have the same sound paths to each of their phonic lip pairs. This implies that simultaneous actuation of both pairs in the formation of a click (Cranford et al., 1996; Lammers and Castellote, 2009) should lead to the same time of arrival at the two receiving hydrophones in porpoises. This is not what we find for any of the many thousand clicks we have analyzed in three porpoises; rather that the received waveforms consistently arrive at the right suction cup hydrophone before the left one with time delays that correspond to 3–7.5 cm of difference in travel path in soft tissue.
As seen from Fig. 1, such travel path differences are consistent with the spacing between parts of the two pairs of phonic lips in porpoises. We therefore conclude that the three porpoises studied here produced clicks using just their right pair of phonic lips, and that simultaneous actuation of both pairs of phonic lips is not needed to generate porpoise echolocation clicks. We cannot exclude the possibility that porpoises may also at times, perhaps when higher source levels are needed, use their left pair of phonic lips or actuate both pairs simultaneously, but neither of these two sound production modes was seen in any of the many thousands of clicks that could be analyzed from these three animals performing short-range echolocation for fish rewards.
The simultaneous actuation of both pairs of phonic lips in echolocating toothed whales has been argued to serve the purpose of (1) active beam steering by very small, accurately timed changes in the delay between actuation of the two phonic lip pairs, (2) increasing the power output through constructive interference, (3) achieving high repetition rates by the two sources taking turns in producing a click, and (4) increasing the signal bandwidth by having sources of different sizes producing different centroid frequencies (Cranford et al., 1996; Cranford, 2000; Moore et al., 2008; Lammers and Castellote, 2009).
We observed click rates of up to 600 clicks s–1 with only the right pair of phonic lips being active, so porpoises do not need two active sources to achieve high repetition rates. If porpoises, contrary to what we have demonstrated here, at times actuate both sources in synchrony, they can maximally increase their source level by 6 dB but that would require two completely identical pulses to be emitted in phase from the two sources. This could perhaps be achieved through a passive mechanical master-and-slave relationship between the two sets of phonic lips. However, slight changes in the actuation delay of source II to perform beam steering (in combination with a refractive melon) would call for motor neuron spike timing on the order of microseconds, which to our knowledge is unprecedented in any vertebrate. As an example, Haplea et al. described auditory neurons in bats as having ‘extremely low variability’ when responses to sound signals occurred with standard deviations below 100 μs (Haplea et al., 1994). The degree of motor neuron timing that is called for to allow for beam steering is orders of magnitude lower than that, so it would be a most interesting system to study if indeed echolocating whales can provide the neural control to achieve the microsecond timing of their phonic lip pairs required for this type of beam steering.
For dolphins generating broadband clicks it has been proposed that two asymmetrical pairs of phonic lips generate pulses of different centroid frequencies to generate a wider bandwidth than would a single pulse (Cranford et al., 1996). This in turn would dramatically reduce the potential for beam steering due to the limited frequency overlap between the two sources, and the maximal increase in source level would in that case only be around 3 dB. The proposed advantages for actuating two sources at the same time therefore seem limited and in some cases mutually exclusive, and would call for very accurate motor control over actuation of source number II with respect to number I.
With only the right pair of phonic lips producing clicks, the porpoises can nevertheless form a dynamic sound beam as evidenced by the fluctuations in received levels in recording configuration B. Furthermore, some of the dynamics can be read out of the differences and changes in energy distribution of the clicks from the porpoises in Fig. 2. The large changes in relative levels across the melon surface are consistent with the high standard deviations in received levels reported by Au et al. from suction cup recordings on the melon of porpoises (Au et al., 2006). We therefore conclude that porpoise have dynamic beam formation with a potential to both change the direction and width of the sound beam as found in bottlenose dolphins (Moore et al., 2008).
As shown here these beam dynamics do not require two active sources but can be explained by conformation changes in the sound-producing soft tissues and the surrounding air sacs (Dormer, 1979; Huggenberger et al., 2009; Cranford et al., 1996). So while the concepts of directionality indices and acoustic axes are useful parameters for quantifying and comparing the source performance of toothed whale sound production (Au, 1993; Madsen and Wahlberg, 2007), the present findings and those of Au et al. (Au et al., 2006) and Moore et al. (Moore et al., 2008) emphasizes that sound radiation patterns from toothed whale foreheads are dynamic, and that directionality indices and beam widths should be reported as distributions rather than as fixed numbers.
To summarize, we have shown that porpoises produce clicks with their right pair of phonic lips with no evidence that they actuate both pairs simultaneously or that they click with their left pair of phonic lips. That does not exclude click production with the left pair of phonic lips, and nor have we demonstrated that they never actuate both sides simultaneously. However, we have outlined a number of problems with the theory of simultaneous sound production, and we have shown that many of the virtues ascribed to the simultaneous actuation of the two pairs of phonic lips can be achieved in porpoises with just the right pair of the phonic lips active.
The present observations are consistent with those of Amundin and Andersen (Amundin and Andersen, 1983), reporting that click production only involved vibrations in the right nasal plug in porpoises. Similarly, in their ultrasound Doppler study of bottlenose dolphins, Mackay and Liaw observed vibrations mainly in the right side of the nasal complexes during clicking (Mackay and Liaw, 1981). These earlier findings and the ones made here thus challenge the hypothesis advanced by Lammers and Castellote (Lammers and Castellote, 2009) that double pulse production is universal among echolocating toothed whales. In addition, both modeling (Aroyan et al., 2000), and some anatomical data (Mead, 1975; Heyning, 1989) (but see Cranford et al., 1996; Cranford, 2000) also suggest that dolphins primarily use their right pair of phonic lips but with the capability to actuate the left pair for clicking instead of rather than in synchrony with the right pair. We therefore hypothesize that all toothed whale species only click with one set of their phonic lips at a time, and preferably their right pair. This then raises the question of why some toothed whales that do not whistle still carry two apparently functional pairs of phonic lips. Some non-whistling toothed whales, the sperm whales, have in fact lost the left pair of their phonic lips during the course of evolution (Cranford et al., 1996), lending weight to the contention that clicking with preferably the right pair of phonic lips may be a very old trait in echolocating toothed whales. Still, a large and taxonomically diverse group of non-whistling toothed whales belonging to Delphinidae and Phocoenidae do have two pairs of phonic lips. It may be that the left pair of phonic lips is used for high repetition rate click patterns used in communication. Why they seemingly carry two identical phonic lip pairs while apparently only using one pair at a time for clicking needs to be addressed in future experiments on this intriguing pneumatic sound generation system that makes up the complex forehead of echolocating toothed whales.
Acknowledgements
S. Hansen, M. Wahlberg, J. Hansen, J. Kristensen, M. Hansen and the staff at the Fjord and Belt Centre provided skilled support and training of the animals. T. Hurst and N. U. Kristiansen were instrumental in designing and making the suction cup hydrophones and their power supplies. We thank T. Cranford, F. Jensen, M. Johnson, L. Kyhn, M. Lammers, L. Miller, B. Møhl and M. Wahlberg for helpful discussions and critique.
This work was carried out under Permits No. J.nr. SN 343/FY-0014 and 1996-3446-0021 from the Danish Forest and Nature Agency, Danish Ministry of Environment, and was funded by the Danish Natural Science Research Council through grants to P.T.M.