Despite the functional importance of loud, low-pitched vocalizations in big cats of the genus Panthera, little is known about the physics and physiology of the mechanisms producing such calls. We investigated laryngeal sound production in the laboratory using an excised-larynx setup combined with sound-level measurements and pressure–flow instrumentation. The larynges of five tigers (three Siberian or Amur, one generic non-pedigreed tiger with Bengal ancestry and one Sumatran), which had died of natural causes, were provided by Omaha's Henry Doorly Zoo over a five-year period. Anatomical investigation indicated the presence of both a rigid cartilaginous plate in the arytenoid portion of the glottis, and a vocal fold fused with a ventricular fold. Both of these features have been confusingly termed ‘vocal pads’ in the previous literature. We successfully induced phonation in all of these larynges. Our results showed that aerodynamic power in the glottis was of the order of 1.0 W for all specimens, acoustic power radiated (without a vocal tract) was of the order of 0.1 mW, and fundamental frequency ranged between 20 and 100 Hz when a lung pressure in the range of 0–2.0 kPa was applied. The mean glottal airflow increased to the order of 1.0 l s–1 per 1.0 kPa of pressure, which is predictable from scaling human and canine larynges by glottal length and vibrational amplitude. Phonation threshold pressure was remarkably low, on the order of 0.3 kPa, which is lower than for human and canine larynges phonated without a vocal tract. Our results indicate that a vocal fold length approximately three times greater than that of humans is predictive of the low fundamental frequency, and the extraordinarily flat and broad medial surface of the vocal folds is predictive of the low phonation threshold pressure.
The cats (family Felidae) form an important subdivision of the order Carnivora (class Mammalia). Cats vary significantly in size but are relatively conservative carnivores anatomically. The big cats of the genus Panthera (lions, tigers, jaguars and leopards) are often called the ‘roaring cats’ because of their loud, low-frequency vocalizations that play roles in territoriality and mate attraction (Hast, 1989; Owen, 1834; Peters, 1978; Weissengruber et al., 2002). Although several anatomical differences between the roaring cats and the smaller ‘purring cats’ of the genus Felis (cougar, bobcat, mountain lion, the domestic cat and other small- and medium-sized cats) have long been known (Hast, 1989; Owen, 1834; Weissengruber et al., 2002), the acoustic and physiological bases by which these iconic predators produce their loud low-frequency calls remains poorly understood. This difference in vocalization has traditionally been linked to differences in vocal anatomy (specifically the anatomy of the epihyoid portion of the hyoid apparatus) that have been known since at least 1834 (Owen, 1834) and have played an important role in cat taxonomy (Pocock, 1916). The cheetah, unusual in several aspects, is in its own subfamily, the Acinonychinae (Wozencraft, 1993).
The tiger, Panthera tigris, is one of the four species in the genus Panthera. It is the largest of the great cats. One subspecies we report on here is the Siberian tiger (also called the Amur tiger), Panthera tigris altaica. It is the largest and northernmost tiger, highly endangered, which survives in the wild only in a small strip of easternmost Siberia near the Pacific coast. There are 500–600 purebred Siberian tigers in captivity, of which 133 are in North American zoos; the zoo-based breeding program hopes to maintain a carefully controlled breeding population of 150 animals for the next century. We report on two other subspecies as well, a purebred Sumatran and a generic Bengal tiger.
Vocalization plays an important role in felid communication, and loud, low-frequency vocalizations have been investigated from a functional viewpoint in lions (Panthera leo). Both male and female lions roar, suggesting a major role in territoriality, and females can assess male characteristics and pride size by means of roar acoustics (Grinnell and McComb, 2001; McComb et al., 1994; McComb et al., 1993; Pfefferle et al., 2007). Surprisingly, given its iconic status, tiger vocalizations remain little studied. Numerous anatomists have commented on the large size of the tiger larynx, and its thick vocal folds (Harrison, 1995; Hast, 1989; Schneider, 1964). Schaller provided a basic description of call types and their function for the Bengal tiger (Schaller, 1967). Based on a more detailed spectrographic investigation, Peters (Peters, 1978) distinguished four main types of vocalizations in the tigers he studied: ‘mews’, ‘main calls’ (with and without grunt elements) and ‘prusten’. The mew is one of the most basic and widely shared felid vocalizations – a relatively quiet contact call, often produced by young animals. Main calls are much more intense variants of the mew and can be accompanied by throaty grunt elements to form a separate call category. Finally, prusten is an atonal snort-like sound made in greeting or other affiliative situations.
The vocalizations of the great cats have not been studied in terms of aerodynamic and acoustic power. In general, vocal power is determined by at least three factors (Titze, 2000): (1) the aerodynamic power available from the pulmonary airstream, (2) the efficiency of conversion of this aerodynamic power into acoustic power in the larynx and (3) the efficiency of transmission and radiation of this acoustic power through the vocal tract into free space. The perception of vocal power appears to be enhanced by aperiodic vocal fold vibration, which produces a rich spectrum of inharmonic frequencies.
Virtually all previous studies on animal vocalization deal with recordings and processing of a microphone signal, from which absolute acoustic power cannot be determined unless the precise mouth-to-microphone distance and the acoustic environment are known. Hence, most researchers have reported spectral and temporal features of vocalization. The purpose of this study was to analyze the physical and physiological mechanisms by which tigers produce their loud, low-pitched vocalizations. In particular, we aimed to (1) measure the pressure–flow characteristics of several great-cat larynges on a laboratory bench, (2) compute the aerodynamic power from these pressures and flows, (3) compute the radiated power from a specified distance from the snout to a sound-level meter and (4) determine the ease of phonation by means of the phonation threshold pressure.
MATERIALS AND METHODS
The first larynx that became available was excised from a 17.5-year-old female Amur (Siberian) tiger Panthera tigris (L) (Isis No. 9684). Two other Amur larynges were later obtained post mortem: one male (Isis No. 6220, 16.4 years old) and one female (Isis No. 12058, 7.6 years old). In addition, two other larynges were obtained post mortem from two other subspecies – one Sumatran female (Isis No. 10153, 22.4 years old) and one generic female tiger with Bengal ancestry (Isis No. 5380, 18.4 years old).
The five larynges were all harvested at the time of death at Omaha's Henry Doorly Zoo. They were quick frozen in liquid nitrogen to prevent ice crystals from damaging the tissue. Previous work on other animal larynges (Titze, 2006) demonstrated that this technique results in minimal distortion or destruction of laryngeal tissues. The tissues were shipped overnight to be stored in a –80°C freezer until ready for experimental use. These specimens included the posterior portion of the tongue, the entire basihyoid bone, the intact larynx and the top six or seven tracheal rings. Portions of the velum and the upper part of the epiglottis were also attached in some cases. There was no detectable damage to any of the inner portions of the larynx, and no evidence of vomitus within the aditus or trachea.
The largest accompanying data set was from the first female Amur. The frozen larynx was examined after its arrival in Denver, and the following initial data were obtained while it was frozen: the mass was 0.58 kg; the length was 22 cm from the tracheal cut to the tongue base. There were 6–7 tracheal rings, so that the distance from the lower cut to the vocal folds was 7.5 cm. The specimen was transported by car in a cooler (with ice) from Denver to the Wendell Johnson Speech and Hearing Center at the University of Iowa in Iowa City (IA, USA), where the first excised larynx experiments were performed. The larynx remained frozen on arrival.
Comments on Siberian tiger anatomy
The vocal folds of the roaring cats have been reported to include a ‘fibrous pad’ (Hast, 1986; Hast, 1989). This pad was previously discussed and photographed (Harrison, 1995). Fig. 1A shows a reproduction of this photograph and a corresponding sketch. Our own description is as follows. There are two peculiarities in the tiger vocal folds, as we discovered in our dissections. First, there is a flat cartilaginous surface in the posterior glottis, immediately ventral to the arytenoid cartilage (Fig. 1). We will call this structure the ‘arytenoid flange’ to avoid confusion. In our specimen, its approximate dimensions are 18 mm along the vocal fold length, and 10 mm along the vocal fold thickness (rostro-caudal). This cartilaginous portion of the vocal fold was observed to vibrate if (and only if) the folds were tensed.
Second, the membranous portion of the vocal fold is quite unusual. There are two components or ‘sub-folds’. An upper (cranial) portion is large and seems to represent the ‘vocal pad’ mentioned by Hast and Harrison, whereas the lower (caudal) portion resembles a typical mammalian vocal fold. Although it has been stated that there are no ventricular folds in roaring cats (which are said to possess ‘undivided folds’) (Hast, 1986; Hast, 1989; Peters and Hast, 1994), our investigation suggested that this is questionable, at least in our tiger specimens. We suspect that the cranial portion, the ‘vocal pad’, represents the ventricular fold, which runs alongside and above the true vocal fold, joining it to become a large, up-jutting structure. These two folds are separated by a narrow sulcus that, presumably, represents the ventricle. Our impression agrees with that of Weissengruber and colleagues (Weissengruber et al., 2002), who also observed two folds divided by a small impression or sulcus. During our experiments on excised samples, both sub-folds were observed to vibrate together, thus functioning as a single ‘bipartite’ vocal fold. Hereafter, the term ‘vocal fold’ will be used to refer to both of these components.
Some lengths were determined from the specimen, using calipers. The length of the entire glottis (anterior commissure to the posterior portion of the arytenoid) was 55 mm, stretchable without difficulty to 63 mm. The most important length for vocal fold vibration is the length of the membranous vocal fold – the length from the widest opening towards the bottom of the illustration in Fig. 1B. It averaged 34 mm across the five specimens (42, 36, 35, 30, 27 mm). The largest (42 mm) was for the male Amur, and the smallest (27 mm) was for the female Sumatran.
The epiglottis (bottom of Fig. 1B) was very loosely attached to the thyrohyoid membranous area and easily flopped either onto the vocal folds or ventral-rostrally to open the airway. We kept the epiglottis out of the airstream during our bench work and therefore did not observe epiglottal vibration, but this clearly is a possibility. The epiglottis was closed, lid-like, over the vocal folds when our specimen arrived frozen from Omaha. This is also the position seen in some of Hast's and Harrison's sections. The hyoepiglotticus muscle is well developed, and the tiger appears to have no difficulty retracting (abducting) the epiglottis from the glottal airstream when necessary.
The trachea was distinctive in two ways. First, the open dorsal (posterior) portion – the paries membranaceus – was quite large and drumlike. This is normal in carnivores and is also found in some other mammals such as goats (G. E. Weissengruber, personal communication). This might allow a large bolus of food to pass down the epiglottis, protruding into the tracheal cavity during swallowing. Second, the tracheal rings allow a large degree of ‘telescoping’, such that the trachea could be collapsed or extended. This might be associated with lowering of the larynx, which would allow the trachea to compress, rather than forcing its resting length into the thorax.
The musculature associated with laryngeal descent was very well developed (both sternothyroid and sternohyoid muscles appeared very robust). The suprahyoid structures, by contrast, were loose. The epihyoid ligament was very elastic. The suprahyoid musculature (especially the genoihyoid) was also flattened and extensible. We made no specific observations of the thryoglotticus muscle discussed by Weissengruber and colleagues (Weissengruber et al., 2002).
Finally, the laminae of the thyroid cartilage in the tiger were unusually narrow rostrocaudally and have a ‘beamlike’ form. The posterior portions on both sides are typical for mammals, with the two pairs of thyroid horns connecting to the hyoid bone above and cricoid cartilage below (and a well-developed thyroid foramen visible on both sides, with a blood vessel entering), but the anterior ‘shield’ portion of the glottis is unusually slim – it is only about as tall as the vocal folds themselves. This results in a very extensive cricothyroid space ventrally. This large, flexible subglottal space might have biomechanical or acoustic implications. The total distance from the base of the cricoid to the top of the thyroid, measured anteriorly, was 70 mm. Of this overall height, the thyroid lamina was 19 mm, the elastic membranous portion 39 mm and the cricoid ring 12 mm.
Methods for pressure–flow experiments
Excised larynx procedures
The excised tiger larynges were mounted on a laboratory bench, as described by Alipour and Scherer (Alipour and Scherer, 1995). Further details are published elsewhere (Titze, 2006). A poly-vinyl-chloride (PVC) tracheal tube of outside diameter approximately 3.5 cm (inside diameter 3.0 cm) supplied humidified and heated air to the larynx. Glottal adduction was accomplished by using a pair of two-pronged needle probes to press the arytenoids together, using sutures to close the vocal processes and posterior commissure. The mean pressure 10 cm below the glottis was monitored with a wall-mounted water manometer (Dwyer No. 1230-8, Michigan City, IN, USA). The mean flow rate was monitored simultaneously with an in-line flowmeter (Gilmont rotameter model J197; Barnant, a division of Thermo Fisher Scientific, Barrington, IL, USA) or a pneumatic flow meter (Rudolph 4700; Hans Rudolph, Kansas City, MO, USA) for higher flows. The time-varying subglottal pressure was measured by a piezoresistive pressure sensor (Microswitch 136PC01G1; Allied Electronics, Fort Worth, TX, USA) at the same location as the manometer tap.
Two electrodes from an electroglottograph (EGG) made by Synchrovoice (Harrison, NJ, USA) were attached to the exterior of the thyroid cartilage with pushpins. The third (ground) electrode was placed on the posterior surface of the larynx. The EGG, subglottal pressure and flow signals were monitored on a digital oscilloscope and recorded on a PC using a DATAQ A/D board and WINDAQ software (DATAQ Instruments, Akron, OH, USA). The sampling frequency was 10 kHz, and the signals were recorded simultaneously. The sound pressure level (SPL; dB) was also recorded (Brüel and Kjær 2238; Brüel and Kjær, Naerum, Denmark), and the audio pass-through signal of the sound-level meter recorded as an acoustical channel (10 kHz sampling). Fig. 2 shows two examples of EGG and microphone waveforms – one for a periodic sound and one for a rough growl sound with a period-3 bifurcation in the EGG. For this report, the purpose of the EGG signal was only for fundamental frequency (F0) extraction in real time.
The primary independent control variables for phonation were subglottal pressure Ps and vocal fold elongation ΔL. Elongation was measured optically (Sony Cyber-shot digital camera; Sony Electronics, Tokyo, Japan) by tracking two micro-sutures sewn onto the tissue. The manipulation to achieve different elongations was performed with a micrometer attached to a fixed rod on the table and a suture sewn into each of the arytenoid cartilages, which were pulled backwards. The primary dependent variables were glottal airflow, SPL at 15–18 cm glottis-to-microphone distance and fundamental frequency F0.
Given that the data set is small and it took five years to collect it, each specimen will be treated individually. It is not possible to make meaningful statistical inferences about gender or subspecies differences.
Fig. 3A shows the pressure–flow relationships in the glottis for the male Amur. The data points are fairly tight (unscattered) because very little change in vocal fold length was obtainable. The cricothyroid and cricoarytenoid joints were arthritic, allowing for only a 2.0 mm length increase of the membranous vocal fold by micrometer manipulation. As will be seen later, with greater length variation, the pressure–flow curves are more variable. A regression table for curve fitting and standard deviations will be given when all individual data have been presented.
Phonation threshold pressure (PTP), the minimum subglottal pressure required to establish vocal fold oscillation, ranged between 0.2 and 0.3 kPa for this ‘at rest’ phonatory length, which is observed on the lower-left points on the horizontal axis in all parts of Fig. 3. The glottal resistance, calculated as the subglottal pressure divided by the mean glottal flow, was 0.75 kPa l–1 s, which is the linear regression slope through the data points of Fig. 3A.
Fig. 3B shows the SPL at 15 cm from the glottis. It reached 85 dB (linear scale) at a lung pressure of 1.5 kPa.
The aerodynamic power was calculated as the product of the mean subglottal pressure and the mean glottal flow (Bouhuys et al., 1968). This power is plotted in Fig. 3C with open circles. Note that this aerodynamic power is approximately 1.0 W (0 dB re. 1.0 W) at 1.0 kPa lung pressure.
Fig. 3D shows the fundamental frequency F0 as a function of subglottal pressure. Without being able to produce a significant change of length of the vocal folds in this aged specimen, all frequencies were below 40 Hz, and some were below 20 Hz. While still alive, we recorded growls of the adult male Amur. These growl vocalizations also had a very low and relatively constant pitch. The fundamental frequency averaged 28.5 Hz (N=5; range 21–34 Hz, s.d. 1.8–3.7 Hz).
Fig. 4 shows a similar data set for the first female Amur studied. There is a greater range represented in the data (which appears as more scatter) because vocal fold length was changed by 15 mm (the cadaveric vocal fold length was 31 mm). In other words, a 50% increase in vocal fold length was achievable across the measurement range. Flow measurements were limited to 1.5 l s–1 (Fig. 4A) because the need for a larger rotameter was not anticipated. A larger device was obtained after the first experiment was completed. Note, however, that glottal airflow and aerodynamic power for this female basically follow the trends for the male up to the measurement limit. The acoustic power and SPL data are less reliable, however, because the sound-level meter was set on the A scale (typical for human sound-level recording) and had to be corrected according to its low-frequency roll-off. All other measurements (for four other specimens) were made with a flat low-frequency response. Fundamental frequency reached 100 Hz (Fig. 4D). The female's growl F0 in live recordings averaged 49.5 Hz (N=6; range 40–67 Hz, s.d. 2.4–9.1 Hz), which falls in the lower part of the excised larynx F0 range, suggesting that the animal does not stretch its vocal folds much for these phonations. The recorded growls from this female had inharmonic partials, suggesting nonlinear phenomena or biphonation. These nonlinear phenomena will be discussed in a later paper.
Fig. 5 shows results for the second female Amur. The cadaveric vocal fold length was 35.3 mm, from which the length in vibration was increased by 2.1 mm and decreased by 2.0 mm, again a smaller range. At the shortest length (33.3 mm), the flow reached the highest value (2.6 l s–1) and the intensity also reached its highest value (89 dB). Both flow and F0 were somewhat bimodal, however, because the mode of vibration was not stable. Often the F0, extracted from an electroglottograph, was the second harmonic frequency and had to be divided by two to obtain the graph in Fig. 5D. On average, the mean glottal flow and the aerodynamic power were similar to the previous two Amur specimens. SPL and acoustic power were also similar in variation with lung pressure, but approximately 5 dB higher in overall level. Thus, for the same pulmonary effort, this second female larynx produced louder sounds than the male larynx. As mentioned earlier, the higher F0 (50 Hz instead of 30 Hz) and a large amount of second harmonic energy around 100 Hz can account for that difference in SPL.
Fig. 6 shows results for the female generic Bengal specimen. The cadaveric vocal fold length was 36 mm, from which the length was increased 2.0 mm and 3.0 mm to obtain multiple recordings with variable lung pressure. The mean glottal airflow was more scattered because the vibration modes changed a lot. The F0 averaged around 50 Hz, with a range from 25 Hz to 60 Hz. SPL peaked at approximately 90 dB at 15 cm.
Finally, Fig. 7 shows the results for the female Sumatran tiger larynx. Its cadaveric length of the vocal fold was 27 mm – on the order of 5–10 mm shorter than that of the Siberian specimens. The length of the vocal fold was incremented by 3.0 mm and 6.0 mm. In general, the mean glottal flow was about half that of the Amur specimens for the same lung pressures. This is also reflected in the lower aerodynamic power (–3 dB). SPL and acoustic power are not significantly lower, however, because the fundamental frequency is higher. It averaged around 60 Hz, but 80–90 Hz was common for elongated vocal folds. This is again an excellent demonstration that the acoustic power radiated is a strong function of frequency. Thus, loss of aerodynamic power, which generally produces less time-varying (acoustic) flow at the glottis, can be compensated for by raising F0.
Table 1 shows the mean values of the regression coefficients for the curves shown in Figs 3, 4, 5, 6, 7. The single independent variable is subglottal (lung) pressure. The average regression slope and the standard deviation are given in the last two columns. Table 2 shows the range of the regression coefficients.
DISCUSSION AND CONCLUSIONS
The tiger larynx is remarkable as a sound source because the vocal folds are so easy to set into vibration. The phonation threshold pressure (PTP), the minimum pressure required to set the vocal folds into self-sustained oscillation (Titze, 1988), is very low compared with that of other species. For canine excised larynges without a supraglottal vocal tract, PTP has been measured to be of the order of 0.8 kPa (Titze, 1988), and, in bovine excised larynges, it has been measured at 0.44±0.23 kPa (Alipour and Jaiswal, 2008). The low PTP increases the ease of sound production. Theoretically, PTP is lowered by a large vocal fold thickness, a flattened medial surface, a low mucosal wave velocity, and a low tissue viscosity (Titze, 1988). It appears that nature has given the tiger an optimal design for all of these factors. Key features include long and thick vocal folds and a set of arytenoid flanges that help adduct the inferior portion of the vocal folds. The vibrating component of the larynx appears to consist of a fusion of the ventricular and vocal folds, increasing the effective thickness of the folds and also flattening the medial surface of the membranous (vibrating) component [the combined folds have been termed ‘vocal pads’ in the literature (e.g. Hast, 1989; Peters and Hast, 1994)]. Thus, a bipartite set of vocal folds, together with a set of paired cartilaginous plates, which we term ‘arytenoid flanges’ to avoid confusion, maintain a medial surface that is near the theoretically optimal rectangular configuration for a low PTP.
Glottal resistance is also very low in comparison with that of other species. For example, in human, a typical lung pressure for conversational speech is approximately 0.6 kPa, and a typical glottal flow is 0.2 l s–1 (Holmberg et al., 1988), resulting in a glottal resistance of 3.0 kPa l–1 s. The range of glottal resistances for other species is as follows: for pig, 3.4 kPa l–1 s; for sheep, 2.8 kPa l–1 s; and, for cow, 2.6 kPa l–1 s (Alipour and Jaiswal, 2009). Our measurements averaged over all tiger specimens were 1.2 kPa l–1 s.
Fundamental frequency and mean glottal flow scale approximately 3:1 in comparison with human. Thus, human subjects phonate at an average of 160 Hz (male/female average for speech), whereas tigers typically growl at approximately 40–50 Hz. This reduction in F0 is accounted for entirely by the greater length of the membranous vocal fold, which also scales approximately 3:1 with human. Lung pressure does not scale up, which is understandable on the basis of the maximum skeletal muscle stress (force per unit area) that can be generated.
Aerodynamic power averages around 1.0 W but can range between 0.1 W and 10 W (as lung pressure ranges between threshold and 3 kPa). Human larynges, with a mean flow of 0.2 l s–1, would need 5.0 kPa of pressure to produce 1.0 W of aerodynamic power.
The acoustic power radiated from the larynx is of the order of 0.1 to 10 mW when no vocal tract is attached. This radiated vocal power (and the associated SPL of 80–90 dB at a distance of 15–18 cm from the larynx) is remarkable only because the fundamental frequency is so low. Large canine larynges have produced an SPL of 82 dB at a similar distance, but only with at least twice the subglottal pressure and at least twice the fundamental frequency (Alipour et al., 2007). Considering an increase of 6 dB per doubling of subglottal pressure and another increase of 6 dB for doubling of F0 (Titze and Sundberg, 1992), we would expect on the order of 100 dB for the tiger compared with 82 dB for the canine. As another comparison, porcine larynges can produce 96 dB at a similar distance, with an average of 88±4.5 dB (Alipour and Jaiswal, 2008), but again only with higher pressures (2.3–3.0 kPa) and higher fundamental frequencies (150–250 Hz).
It has been shown, theoretically and experimentally, that SPL increases by approximately 9 dB if a vocal tract is attached to a larynx (Titze and Sundberg, 1992). The reason for this increase is a better impedance match for radiation into free space. This is a megaphone effect or a trumpet-horn effect. In addition, a vocal tract with a narrow epilarynx tube diameter at the glottal end can provide nonlinear coupling between source and vocal tract, which could add another 10 dB (Titze, 2008). Thus, the addition of a vocal tract could add on the order of 20 dB to the SPL, with the prediction that the tiger would be able to produce 120 dB at a 20 cm distance from the snout at an F0 of approximately 50 Hz. Future studies of radiated power, efficiency and PTP will include the effect of the vocal tract.
Along with a low threshold for self-sustained oscillation, we observed the simultaneous excitation of multiple modes of vibration, making the vocal fold vibration pattern complex. This presumably generates the rough quality of roars and other low-frequency vocalizations. Future studies will also address the specifics of vibration modes and nonlinear dynamic phenomena.
Funding for this work was provided by the National Institute on Deafness and Other Communication Disorders, Grant No. R01 DC008612-01A1, and in part by NSF grant No. 0823417. We express great appreciation to the Omaha's Henry Doorly Zoo for making the larynges available for study and to Gustav Peters for providing recordings of tiger vocalization. We also thank the Minnesota Zoo for providing a larynx for photography. Deposited in PMC for release after 12 months.