Elephants' low-frequency vocalizations are produced by flow-induced self-sustaining oscillations of laryngeal tissue. To date, little is known in detail about the vibratory phenomena in the elephant larynx. Here, we provide a first descriptive report of the complex oscillatory features found in the excised larynx of a 25 year old female African elephant (Loxodonta africana), the largest animal sound generator ever studied experimentally. Sound production was documented with high-speed video, acoustic measurements, air flow and sound pressure level recordings. The anatomy of the larynx was studied with computed tomography (CT) and dissections. Elephant CT vocal anatomy data were further compared with the anatomy of an adult human male. We observed numerous unusual phenomena, not typically reported in human vocal fold vibrations. Phase delays along both the inferior–superior and anterior–posterior (A–P) dimension were commonly observed, as well as transverse travelling wave patterns along the A–P dimension, previously not documented in the literature. Acoustic energy was mainly created during the instant of glottal opening. The vestibular folds, when adducted, participated in tissue vibration, effectively increasing the generated sound pressure level by 12 dB. The complexity of the observed phenomena is partly attributed to the distinct laryngeal anatomy of the elephant larynx, which is not simply a large-scale version of its human counterpart. Travelling waves may be facilitated by low fundamental frequencies and increased vocal fold tension. A travelling wave model is proposed, to account for three types of phenomena: A–P travelling waves, ‘conventional’ standing wave patterns, and irregular vocal fold vibration.
Voice is an important means for communication in all kinds of mammals, including humans (Hauser, 1996; Bradbury and Vehrencamp, 1998). With a few exceptions, such as cat purring (Remmers and Gautier, 1972; Sissom et al., 1991), vocal production is governed by the physical principles of the myoelastic–aerodynamic theory: by muscle-supported, flow-driven, self-sustaining oscillations of laryngeal tissue (Van Den Berg, 1958; Titze, 2006). The mammalian sound generator exhibits a wide variety of oscillatory behaviour, including non-linear phenomena such as subharmonics and deterministic chaos (Fitch et al., 2002; Herbst et al., 2013). This complex system with multiple degrees of freedom has been well studied in humans and a few mammalian species with comparable laryngeal dimensions, such as dogs and sheep (Herzel, 1995; Svec et al., 2000; Tokuda et al., 2008; Döllinger et al., 2011).
African elephants (Loxodonta africana) are the largest terrestrial mammals. Their vocal communication is characterized by a rich repertoire of distinct sounds, spanning a fundamental frequency range from ten to several hundred hertz (Payne et al., 1986; Poole et al., 1988; Langbauer, 2000; Leong et al., 2003; McComb et al., 2003; Garstang, 2004; Soltis et al., 2005; Stoeger-Horwath et al., 2007). The elephant larynx is the largest mammalian sound generation system so far investigated. Recently, we conducted an excised larynx experiment and showed that the infrasonic vocalizations of elephants are produced by muscle-supported, flow-driven, self-sustaining oscillations, comparable to those during human voice production (Herbst et al., 2012). The fundamental frequencies of the generated sounds are determined by the dimensions of the elephant vocal folds, ranging from approximately 8 to 10 cm in length in adult animals (Kühhaas, 2011).
Here, an in-depth analysis of the experimentally observed oscillatory phenomena in elephant voice production is presented. The vibratory characteristics are discussed with respect to the specific larynx anatomy of the elephant, as well as being related to anatomical and physiological aspects of human voice production.
MATERIALS AND METHODS
Larynx specimen and computed tomography scan
The larynx specimen came from a 25 year old female African elephant, L. africana (Blumenbach 1797) (body mass 2500 kg), which died of natural causes in the Tierpark Berlin in October 2010. The larynx was excised several hours post-mortem on the same day, and immediately stored at −20°C [see supplementary material in a previous publication (Herbst et al., 2012)]. The frozen specimen was shipped to the Laboratory of Bioacoustics, Department of Cognitive Biology, University of Vienna. Here, the specimen was slowly thawed over a period of 3 days, and then immediately prepared for and used in data acquisition.
Computed tomography (CT) examination was performed using a Somatom Emotion multislice scanner (Siemens AG, Munich, Germany). The specimen was placed in a ventral recumbency, and was scanned at 130 kV, 200 mA, rotation time 0.6 s, resulting in 1 mm thick slices. Images were reconstructed using OsiriX 3.7.1 64 bit software (copyright, Antoine Rosset). The anatomical measurements shown in supplementary material Fig. S1 were taken with the FIJI Image processing package (Schindelin et al., 2012).
For cross-species comparison with a well-researched model of mammalian sound production, in vivo data from a CT scan of a 42 year old human male are included in this study. These data were acquired with a CT scanner (model LightSpeed VCT, GE Medical Systems, Cleveland, OH, USA) at 420 kVp (peak kilovoltage), 200 mA, 5 s, using a helical mode and transversal slices with a thickness of 0.625 mm.
Excised larynx setup
The vocal folds were adducted by exerting a steady manual pressure on the lateral surfaces of the corniculate processes of the arytenoid cartilages, thus moving the arytenoid cartilages both medially and anteriorly. The excised elephant larynx was phonated by blowing warmed humidified air through the trachea and past the manually adducted vocal folds. Vibration of the laryngeal tissue was documented using a Casio EX-F high-speed camera positioned 52 cm above the vocal fold level [see supplementary material in a previous publication (Herbst et al., 2012) for details].
For the data reported here, two conditions of larynx preparation were evaluated: in preparation stage a, the cranial parts of the larynx (epiglottis and laryngeal vestibulum) were left intact, in order to observe the role of these structures in sound generation; in preparation stage b, these structures were removed by transverse cuts, and the remaining parts of the vestibular folds were pulled away from the glottis with sutures in order to provide an unobstructed view of the oscillating vocal folds.
Electroglottographic, acoustic and air flow data
Electroglottography (EGG) is a method used to monitor relative vocal fold contact area during phonation (Fabre, 1957). A low-intensity, high-frequency current is passed between two electrodes placed on each side of the thyroid cartilage at glottal level. The time-varying change of vocal fold contact during the flow-induced oscillation of laryngeal tissue introduces alterations into the electrical impedance across the larynx, resulting in a variation of the current between the two electrodes (Fourcin and Abberton, 1971; Baken, 1992; Baken and Orlikoff, 2000). This approach provides a method to assess the oscillation of laryngeal tissue during voice production. In this study, the EGG signal was captured with a Glottal Enterprises EG 2-1000 two-channel electroglottograph (lower cut-off frequency at 2 Hz; Syracuse, NY, USA).
Acoustic data were captured with a DPA 4061 omni-directional microphone (DPA Microphones, Alleroed, Denmark) positioned 7 cm from the vocal folds. Both the acoustic and the EGG signal were recorded with a RME Fireface 800 external audio interface (RME, Haimhausen, Germany) at a sampling frequency of 44,100 Hz. Before analysis, the signals were downsampled to 8000 Hz with the software package Cool Edit Pro 2.0 (2095.0; Syntrillium, Phoenix, AZ, USA). In order to compensate for the time delay caused by the larynx-to-microphone distance, the acoustic signal was shifted forward in time by 0.21 ms relative to the video.
The sound pressure level (SPL) was measured with a Voltcraft SL-400 sound level meter (Voltcraft, Hirschau, Switzerland), positioned 30 cm from the vibrating vocal folds. Trans-glottal air flow data was acquired with a Sensorion SDP1000-L differential pressure transducer and a F300L flow head (F-J Electronics, Vedbaek, Denmark). Both SPL and air flow data were collected with a Labjack U6 data acquisition interface (Lakewood, CO, USA) at a sampling rate of 1000 Hz. The time-series data were low-pass filtered with a 201 point moving averager.
High-speed video data analysis
In digital kymography (Wittenberg et al., 2000), the principles of videokymography (Svec and Schutte, 1996) are applied to high-speed video sequences. In order to create a digital kymogram (DKG), a line perpendicular to the vocal fold axis is selected within a high-speed video sequence, and the corresponding video data pixels are successively extracted for each video frame in the analyzed sequence. The extracted lines are concatenated to form the final graph. The DKGs created for this manuscript were generated with a Python script written by C.T.H. (Herbst, 2012), which was run as a plug-in within the FIJI image analysis software package (Schindelin et al., 2012). Digital kymography allows, amongst others, observation of the following features of laryngeal tissue vibration with high temporal resolution: (a) vertical phase differences between the lower (inferior) and the upper (superior) margins of the vocal folds (Baer, 1981; Titze et al., 1993b) – see supplementary material Fig. S2 for a schematic illustration; (b) mucosal waves (Hirano et al., 1981; Berke and Gerratt, 1993), i.e. air flow-driven travelling waves within the surface of the vocal fold tissue, moving along the trans-glottal air flow from the inferior to the superior vocal fold edge and then laterally across the upper vocal fold surface once every oscillatory cycle; and (c) vibratory asymmetries and different types of irregularities and cycle aberrations (Svec et al., 2007).
To enable quantitative analysis of the vibrating pattern along the entire length of the vocal folds, a clinically evaluated image processing procedure was applied, which is described in detail elsewhere (Lohscheller et al., 2007). Within each frame of the high-speed data, the algorithm extracts the medial edges of both vocal folds. This is used to create glottovibrograms (GVGs), i.e. a visualization technique that transfers information on the time-varying glottal width (as colour information) along the anterior–posterior (A–P) dimension into a single image (Lohscheller et al., 2008). Such images can also be used to objectively describe the 2D vibration type of glottal closure.
Anatomical terms of location
The phenomena we will describe are quite complex 3D events unfolding over time, and are difficult to verbalize. To make our descriptions as clear as possible, we will use plain English terms, oriented to the larynx itself, wherever possible (e.g. stating ‘front’ or ‘anterior’, rather than ‘rostro-ventral’ – see Fig. 1). Given that the structures of interest are oriented obliquely to the ‘anatomically correct’ rostral/caudal/dorsal/ventral axes, these terms would be very cumbersome. We hope this makes our descriptions clear and intelligible. Nonetheless, we suggest watching the videos included as supplementary material first (see supplementary material Movies 1–5).
The vocal folds of the elephant are long and voluminous (see Table 1). They attach far anteriorly, close to the broad base of the epiglottis. Their attachments to the vocal processes of the arytenoid cartilages are located far posteriorly at the level of the cricoid arch. The vocal folds are arranged obliquely, at an angle of about 45 deg relative to the longitudinal axis of the trachea (see Figs 2, 3). Given the position of the larynx in relation to the trachea, it must be assumed that only the posterior three-fifths of the vocal folds are directly exposed to the passing air stream from the lungs and trachea. Immediately above to the inferior thyroid notch (where the anterior ends of the vocal folds attach), we observed a small area of distinctly higher ossification in comparison to the remaining parts of the thyroid cartilage (see Fig. 2).
In Fig. 3, the elephant larynx anatomy is juxtaposed with that of a human adult male to illustrate substantial configuration differences. The most obvious difference is found in the structural proportions. When normalizing the tracheal diameter just inferior to the cricoid cartilage in both the elephant and the human, fundamental dissimilarities in the laryngeal configuration become apparent (Fig. 3C–F): (1) relative to the human vocal fold, the elephant vocal fold is ca. 88% longer and ca. 180% thicker, and its cross-sectional area is ca. 406% larger – see Fig. 3E,F; (2) the vocal fold of the human is oriented nearly perpendicularly to the longitudinal axis of the trachea (and hence the tracheal air stream). In contrast, the elephant vocal fold is tilted by an angle of 45 deg; (3) whereas the human vocal fold borders upon the tracheal space along almost its entire length, the elephant vocal fold is positioned more anteriorly, such that the anterior two-fifths of the vocal fold are not directly adjacent to the tracheal space. As a consequence, simplified air stream–tissue interactions, which are assumed in most physical models of human or canine vocal fold vibration (Ishizaka and Flanagan, 1972; Kob, 2004), might have to be revisited when modelling and simulating sound generation in the elephant larynx.
Periodic vocal fold vibration
A typical example of a periodic infrasonic vocalization created by the excised elephant larynx in preparation stage a is illustrated in Fig. 4 (see also supplementary material Movie 1). Phonation was induced in the larynx with a tracheal pressure of ca. 6 kPa (which is about half the maximum expiratory pressure found in human females) (see Baken and Orlikoff, 2000), resulting in a regular, symmetrical oscillation of laryngeal tissue at ca. 15 Hz. The vestibular folds vibrated regularly (without collision) at the same frequency as the vocal folds, with a phase delay of ca. 180 deg as compared with the superior margins of the vocal folds.
The glottis (i.e. the visible air space between the vocal folds) was closed during ca. 82% of each vibratory cycle (resulting in a closed quotient of ca. 82%). This is considerably higher than, for example, the values measured for human speech (Baken and Orlikoff, 2000; Lohscheller et al., 2012). Such a surprisingly large closed quotient value was facilitated by a phase difference between the inferior and the superior vocal fold edge: the initiation of vocal fold contact at the superior vocal fold edge was delayed by ca. 27 ms as compared with that of the inferior vocal fold edge. Thus, the two vocal fold edges vibrated with a phase delay of ca. 150 deg (inferior edge leading, see Fig. 4C), leading to the large closed quotient value of ca. 82% (see Fig. 4C). Assuming a vocal fold thickness of 32 mm, the propagation speed of glottal closure (and thus the mucosal wave speed) along the inferior–superior dimension can be estimated as ca. 1.19 m s−1, which is comparable to data from excised canine larynges (Baer, 1981; Titze et al., 1993b).
Analysis of the time-synchronous microphone signal and high-speed video data suggests that acoustic energy was created at two instances within each vibratory cycle. The main excitation event occurred at the separation of the superior vocal fold edges, i.e. at the presumed onset of glottal air flow. A secondary, less pronounced excitation event was found at the initiation of vocal fold contact at the inferior vocal fold edge, i.e. at the presumed termination of glottal air flow. These two events are marked by the two vertical arrows in Fig. 4B,C.
Synchronized vibration of vocal folds and vestibular folds
The effect of vestibular fold oscillation is illustrated in Fig. 5 (see also supplementary material Movie 2). Over the course of 3 s, the manual adduction (i.e. approximation) of the vestibular folds was gradually increased. As a result, the already vibrating vestibular folds started to collide around t=1.5 s, leading to a SPL increase of more than 12 dB over the entire sequence. As in the previous example, the vestibular folds vibrated with a phase delay of ca. 180 deg (in relation to the vibration of the vocal folds). As a consequence of the periodic collision of the vestibular folds during the second half of the sequence, a synchronized ‘airlock’ oscillation of the vocal folds and the vestibular folds emerged. In this kind of vibratory pattern, the vocal folds and the vestibular folds formed a single coupled system with alternating contact of either of the involved tissue structures, such that the glottis was never visible. This system was presumably very efficient in the conversion of aerodynamic energy into mechanical tissue vibrations, and hence the generation of acoustic energy.
Complex ‘double zipper’ vocal fold oscillation
A case of a complex nearly periodic oscillatory pattern of the vestibular and vocal folds, obtained with a tracheal pressure of 6 kPa, is documented in Fig. 6 (see also supplementary material Movie 3). The most striking feature of this oscillatory pattern is the simultaneous ‘double zipper’ oscillation of the superior and inferior edges of the vocal folds. A ‘zipper’ oscillation (Childers et al., 1986; Hess and Ludwigs, 2000) is characterized by A–P phase differences, strongly resembling the x–21 vibratory mode described previously (Berry et al., 1994; Titze, 2000) (see supplementary material Fig. S3).
Three visible phases of this pattern (during one cycle) can be distinguished: (1) the separation of the upper (superior) vocal fold margins, starting in the front (image 4 in Fig. 6D,E) and propagating to the back; (2) the approximation/contacting of the lower edge of the vocal folds, starting in the back (image 9 in Fig. 6D,E) and propagating to the front; and (3) the approximation/contacting of the upper vocal fold edges, starting in the front (image 10–11 in Fig. 6D,E) and propagating to the back. The separation of the lower edges, presumably occurring during the first 15 ms of the cycle, is obscured by the contacting upper vocal fold edges.
Analysis of the instant of complete approximation/closure of both superior and inferior vocal fold edges revealed a phase difference of ca. 83 deg. The two vocal fold edges were 180 deg out of phase along the A–P axis. The vestibular folds participated in the nearly periodic vibration at a phase difference of ca. 180 deg (as compared with the superior vocal fold edges), just as in the examples shown in Figs 4 and 5.
The mid-sagittal DKG shown in Fig. 6C reveals that both the posterior commissure of the vocal folds and the corniculate processes of the arytenoid cartilages vibrated synchronously with the vestibular and vocal folds, providing evidence for an A–P longitudinal vibratory mode. The posterior commissure of the vocal folds reached its maximum posterior position shortly after the superior vocal fold edges were maximally separated (see lower sinusoidal dashed line in Fig. 6C; supplementary material Fig. S4). The corniculate processes of the arytenoids vibrated with a lesser amplitude and a phase difference of ~100–110 deg as compared with the posterior commissure of the vocal folds (see Fig. 6C; supplementary material Fig. S4).
Analysis of synchronous tissue vibrations (Fig. 6C) and the generated sound (Fig. 6A,B) revealed that the mechanical aspect of the oscillation of laryngeal tissue was more regular than the acoustic output. Minor perturbations in the laryngeal tissue mechanics might have had large impacts on the resulting time-varying air flow, suggesting the presence of complex aerodynamic phenomena.
Irregular vocal fold vibration
A case of irregular vocal fold vibration is illustrated in Fig. 7. Spectral analysis of the acoustic signal revealed only two clear harmonics (Fig. 7A), while most of the spectrum was characterized by non-harmonic energy. Overall, the oscillation of laryngeal tissue assumed a nearly periodic pattern, which was particularly true for the vestibular folds (Fig. 7C,E). The irregularities in the acoustic signal were presumably introduced by complex irregular vibratory modes along the A–P axis (see supplementary material Movie 4).
Alternating A–P transverse travelling waves
As an unusual case of vocal fold oscillation, intermittent episodes of alternating A–P travelling waves were obtained during larynx preparation stage b (see supplementary material Movie 5). These travelling waves occurred at a high degree of manually induced vocal fold elongation, resulting in a fundamental frequency of ca. 27 Hz. One of these episodes is illustrated in Fig. 8. It is characterized by the alternating occurrence of a posterior–anterior (P–A) and an A–P ‘zipper-like’ glottal opening/closure pattern.
This manuscript presents unique data from laryngeal oscillations in an excised elephant larynx, the largest animal sound generator ever studied experimentally. As shown in a previous publication (Herbst et al., 2012), the basic sound production mechanism of elephant vocalizations at the observed fundamental frequencies is similar to that seen in humans and many other mammals, and consists of flow-induced self-sustaining oscillations of laryngeal tissues. Here, an in-depth analysis of the experimentally observed oscillatory patterns in elephant voice production is presented, including some complex phenomena so far not documented in the voice science literature. The data include various combinations of: phase differences of the superior and inferior vocal fold edge; simultaneous phase-shifted A–P ‘double zipper’ oscillation of the superior and inferior vocal fold edges; oscillation of the vestibular folds, simultaneous with vocal fold oscillation, resulting in increased acoustic energy; A–P longitudinal vibratory modes, involving the (apexes of the) arytenoid cartilages; and alternating A–P and P–A travelling waves.
Although without further specimens we cannot rule out the possibility that the observed complex biomechanical phenomena might be unique for the investigated sample, we hypothesize that they are typical features resulting from the anatomy of the elephant larynx. The most obvious difference from previous studies in other species is larynx size and position: the elephant vocal fold is five times longer and three times thicker than that of the largest species (Panthera tigris) so far examined in an excised larynx setup (Titze et al., 2010). However, the elephant larynx is not just a linearly scaled version of the human larynx. When considering the normalized dimensions (scaled to similar tracheal diameter), the elephant vocal fold is still several times larger than that of a human (see Fig. 3C–F).
The vocal fold is oriented to the tracheal air stream at an oblique angle, suggesting that the anterior two-fifths of the vocal folds are not in the direct line of the tracheal air stream. This may lead to complex interactions between the trans-glottal flow and the tissue mechanics, which are not found in humans, in whom the entire vocal fold is positioned almost perpendicular to and entirely within the tracheal air flow.
The forces created by the longitudinal vibratory modes, together with the phase-delayed oscillations of the tissue surrounding the vocal folds (see supplementary material Movies 1–5) might have led to the development of the ossified portion of the thyroid cartilage at the anterior commissure of the vocal folds (see Fig. 2). Such an ossification regularly occurs in mammalian larynges with increasing age. The ossification process is a self-regulating adaptation of the connective and supporting tissue to mechanical stress. Ossification preferentially occurs in areas of deformation forces transferred to the laryngeal framework by contracting muscles. Laryngeal anatomy and the pattern of forces acting on the larynx differ between species and, therefore, the pattern of ossification is species specific (von Glass and Pech, 1983), as well as individually variable.
Transverse travelling wave patterns
The unusual layout of the elephant larynx might also facilitate a special vocal fold vibratory pattern: alternating A–P and P–A travelling waves. Such an oscillation phenomenon has not previously been documented in the literature, and it might elude explanation with the models so far developed for describing flow-induced vocal fold vibration.
Based on CT data, the approximate dimensions of such a string model of the elephant vocal fold are determined by a length L of ca. 10 cm, a thickness t of 2.5 cm, and a width w of 2 cm. When assuming a tissue density ρ of 1.04 g cm−3, the mass of one vocal fold is estimated as 52 g. The oscillation shown in Fig. 8 had a period of 37 ms, during which the longitudinally travelling wave covered a distance of twice the vocal fold length, resulting in a wave speed of 5.4 m s−1. Inserting these values into Eqn 1 and solving for Θ predicts a tension of ca. 15.16 N, and consequently a tensile stress in the vocal fold of ca. 30.33 kPa (obtained by dividing the tension by the cross-sectional area of the vocal fold). Vocal fold tensile stress of this magnitude is well within the range of both theoretical predictions (Titze, 1994) and experimental data (Alipour and Vigmostad, 2012) for other mammals.
In analogy to, e.g. a guitar string model, the vocal fold is ‘plucked’ (i.e. deflected laterally) in its posterior three-fifths by an aerodynamically induced glottal opening event. (By hypothesis, the glottal opening cannot originate in the anterior two-fifths of the vocal folds, as this region has no direct contact with the air stream coming from the trachea – see the schematic illustration in Fig. 3E.) In the simplified model proposed here, the lateral deflection is first only propagated anteriorly (driven by the trans-glottal air flow that interacts with the vocal fold at an oblique angle of ca. 45 deg). (In a more complex model, the lateral deflection might initially also propagate posteriorly, but the reflection from the posterior end is considerably damped at the free boundary so that the standing wave cannot fully emerge.) The travelling glottal deflection pulse is then reflected by the rigid anterior boundary, and then propagated posteriorly, where it is dampened by the softer boundary at the vocal processes of the arytenoids.
The following three scenarios are described with the assumption that lateral deflections are initiated at the same position along the A–P axis: (1) if the deflection pulse periodically completes exactly one round-trip along the vocal fold before the next deflection pulse occurs (τ=T), an alternating P–A and A–P travelling wave pattern will emerge; (2) if the consecutive deflection pulses are delayed such that a spatio-temporal synchronization between consecutive deflections is achieved where τ=2T, an oscillatory pattern characterized by a strong P–A phase delay is likely to emerge; (3) in any other case, if no spatio-temporal synchronization between travelling waves and consecutive deflection pulses is achieved, an irregular vocal fold vibratory pattern is likely to occur.
The travelling wave pattern is facilitated by two biomechanical features: (1) a low fundamental frequency, which effectively increases the temporal delay d between two successive lateral deflections and thus allows the travelling wave to complete an entire round trip along the vocal fold before initiation of the succeeding lateral vocal fold deflection (caused by an air pulse); (2) an increased longitudinal tension in the vocal folds, leading to an increased speed of the travelling wave. Both these facilitating conditions were fulfilled in the example shown in Fig. 8.
The model presented here is not intended to offer a complete explanation of flow-driven self-sustaining oscillation, but it might prove to be a useful addition to the already well-established descriptions of vocal fold vibration. In particular, it may assist in explaining vibratory phenomena in species with very long vocal folds, and it might also shed new light on the phenomenon of ‘zipper-like’ glottal opening and closure in humans and other mammals.
Creation of acoustic energy and the role of the vestibular folds
The data presented in this manuscript suggests that acoustic energy in elephant vocalization is mainly produced at the instant of glottal opening, i.e. during the commencement of trans-glottal air flow (recall Fig. 4). The instant of glottal closure (the cessation of trans-glottal air flow) contributes to a lesser extent to the production of acoustic energy. This is a surprising finding, as it is in contrast to what is known about sound generation in humans, where most of the acoustic energy is created during the instant of glottal closure (Miller and Schutte, 1984; Schutte and Miller, 1988). Further studies are required to make detailed measurements of the time-varying trans-glottal air flow (only averaged air flow rates have been measured in this study), and to investigate the complex aerodynamic phenomena that appear to be found in the elephant larynx during sound generation.
The vestibular folds might play a crucial role in ordinary elephant sound production. Even when not colliding, they tended to vibrate at the same fundamental frequency (but with a phase delay) as the vocal folds in the excised larynx, thus forming a coupled oscillator with the vocal folds. When colliding during periodic sound generation, the vestibular folds were found to facilitate an ‘airlock’ oscillation where the glottis was never visible (see Fig. 5). Such a 180 deg phase-shifted 1:1 entrainment of vestibular fold and vocal fold oscillation might enhance the transfer of aerodynamic energy into the vibrating tissue (Titze, 1988), increasing the output sound level (+12 dB in the case documented here) and thus the efficiency of the oscillator. These results are in line with previous research (Finnegan and Alipour, 2009), highlighting the importance of supraglottal tissue structures in sound generation, similar to what has been documented for humans (Fuks et al., 1998; Lindestad and Södersten, 1999; Sakakibara et al., 2004; Bailly et al., 2010).
To date, no physiological data on in vivo elephant voice production are available. In particular, in contrast to human or canine phonation (Zemlin, 1988; Hunter et al., 2004; Herbst et al., 2011; Chhetri et al., 2012), the subglottal air pressure ranges and the exact laryngeal configuration for vocal fold adduction in elephants are not known. The position of the arytenoid cartilages in the excised larynx experiment had to be inferred from careful examination of the available CT data and the functional and mechanical possibilities offered by the excised elephant larynx. Whether the adductory manoeuvres performed in this study exactly resemble those of in vivo vocalization would need to be established in future studies, which will be very challenging to perform as direct endoscopic evaluation of vocal fold vibration in a live animal is virtually impossible with the current technological means. However, several arguments speak in favour of a faithful duplication of natural vocalization conditions: (1) the laryngeal configuration was created in the only way possible to easily induce phonation in our excised larynx; (2) similar adductory gestures to the ones we used in the excised larynx experiments have been documented in humans; and (3) the acoustic output of the excised larynx was closely comparable to sounds captured from in vivo vocalizations, with fundamental frequencies that were well within the range of those reported for the ‘rumble’ call type (Poole et al., 1988; Langbauer, 2000; Herbst et al., 2012; Stoeger et al., 2012).
The mammalian larynx is a non-linear system capable of exhibiting a wide range of vibratory behaviour, such as periodic vibration, subharmonics and deterministic chaos (Titze et al., 1993a; Herzel et al., 1995; Behrman and Baken, 1997; Fitch et al., 2002; Neubauer et al., 2004; Jiang et al., 2006). In such a system, small changes of boundary conditions can lead to fundamentally different oscillation patterns (Berry et al., 1996; Svec et al., 1999; Tokuda et al., 2008; Herbst et al., 2013). Consequently, the different vibratory regimes documented in this study may have arisen from subtle differences in the adduction of the arytenoids across various flow-induced vocalizations with comparable air pressure conditions.
The elephant larynx is the largest oscillator for mammalian voice production that has received experimental study to date. It is able to produce a wide variety of complex vibratory phenomena, such as: simultaneous ‘double-zippering’ of the superior and inferior vocal fold edges; ‘airlock’ oscillations involving the vestibular folds (increasing the efficiency of the coupled oscillator as a sound generation device); and transverse travelling waves along the A–P axis. With respect to the transverse travelling waves, we propose a new model augmented by flow-driven travelling waves. Such a model is capable of explaining travelling waves, ‘conventional’ standing wave vocal fold vibration and irregular vocal fold vibration.
Our sincere thanks go to Dr Bernhard Blaszkiewitz (Direktor, Tierpark Berlin Friedrichsfelde) for supplying us with the elephant larynx. We thank R. Hofer for contributing to the setup of the excised larynx experiment and P. Pesak for assisting in the computed tomography scan of the larynx specimen.
This research was supported by European Research Council (ERC) Advanced Grant ‘SOMACCA’ (C.T.H. and W.T.F.); a start-up grant from the University Vienna (W.T.F.); the European Social Fund Project OP VK CZ.1.07/2.3.00/20.0057 (J.G.S.); a grant by the Deutsche Forschungsgemeinschaft (DFG) grant no. LO1413/2-2 (J.L.); and an Austrian Science Fund (FWF) grant P2309921 (A.S.S.).
LIST OF SYMBOLS AND ABBREVIATIONS
No competing interests declared.