ABSTRACT
Males of several species of deer have a descended and mobile larynx, resulting in an unusually long vocal tract, which can be further extended by lowering the larynx during call production. Formant frequencies are lowered as the vocal tract is extended, as predicted when approximating the vocal tract as a uniform quarter wavelength resonator. However, formant frequencies in polygynous deer follow uneven distribution patterns, indicating that the vocal tract configuration may in fact be rather complex. We CT-scanned the head and neck region of two adult male fallow deer specimens with artificially extended vocal tracts and measured the cross-sectional areas of the supra-laryngeal vocal tract along the oral and nasal tracts. The CT data were then used to predict the resonances produced by three possible configurations, including the oral vocal tract only, the nasal vocal tract only, or combining the two. We found that the area functions from the combined oral and nasal vocal tracts produced resonances more closely matching the formant pattern and scaling observed in fallow deer groans than those predicted by the area functions of the oral vocal tract only or of the nasal vocal tract only. This indicates that the nasal and oral vocal tracts are both simultaneously involved in the production of a non-human mammal vocalization, and suggests that the potential for nasalization in putative oral loud calls should be carefully considered.
INTRODUCTION
A key objective of mammalian vocal communication research is to determine whether the acoustic structure of vocal signals encodes functionally relevant information. To achieve this aim in a given species, it is important to understand how vocal signals are produced because any potential acoustic variation is primarily constrained by the biomechanical properties and dimensions of the caller's vocal anatomy (Fitch and Hauser, 2002). While the causal links between vocal production and acoustic variation are well established in human speech (Titze, 1989, 1994), the biomechanical and physiological sources of acoustic diversity in non-human animal signals remain poorly understood. Nonetheless, the generalization of the source-filter theory of human voice production (Fant, 1960) to non-human mammal vocal signals has significantly advanced our understanding of the acoustic structure of mammalian calls in light of their production mechanisms (reviewed by Taylor et al., 2016). According to this theory, voiced vocalizations are the result of a two-stage production process. First, the source signal is generated in the larynx by vocal fold vibration. The rate at which the glottis opens and closes determines the fundamental frequency (F0) of the vocalization (the main perceptual correlate of the pitch). The source signal is subsequently filtered by the supra-laryngeal vocal tract, whose resonance frequencies shape the spectral envelope of the radiated vocalization, creating broad bands of energy called ‘formants’ (Fitch, 2000a). Because F0 and formants are produced independently, they are subject to separate biomechanical constraints.
Recent anatomical investigations of mammalian supra-laryngeal vocal tracts have revealed an extensive diversity of vocal tract morphology, with, for example, elongated noses (Frey et al., 2007b), air sacs (Frey et al., 2007a) and descended larynges (Frey and Gebler, 2003), suggesting that vocal tract resonance is under strong selection pressure. Yet, to conclusively determine how such anatomical specializations affect vocal production requires 3D cineradiography to visualize call-synchronous internal dynamic changes in vocal tract shape and document the position of oscillating structures during call production (Fitch, 2000b). This approach is, however, logistically difficult – if not impossible – to perform on large wild animals. An alternative method is to obtain precise static vocal tract geometries from cadavers and use these data to simulate vocal tract dynamics and provide predictions of formant values for multiple vocal tract configurations. Several studies of non-human mammals have used this approach to predict the resonance characteristics of air spaces in the upper respiratory tract, showing good concordance with actual formant patterns in species-specific vocalizations (Adam et al., 2013; Carterette et al., 1979, 1984; Gamba et al., 2012; Gamba and Giacoma, 2006b; Koda et al., 2012; Riede et al., 2005). Studies investigating the evolutionary origins of speech have also used vocal tract models to predict the potential articulatory abilities of human ancestors (Boë et al., 2002; Lieberman et al., 1972) as well as non-human primates (Boë et al., 2002, 2017; Fitch et al., 2016). However, attempts at predicting vocal tract resonances from anatomical data remain scant and largely focused on primate species (Gamba et al., 2012; Gamba and Giacoma, 2006a; Koda et al., 2012; Riede et al., 2005), essentially because of the lack of data on vocal tract geometries available for non-human terrestrial mammals.
During the autumn breeding season, male fallow deer (Dama dama) produce high rates of sexually selected groan vocalizations (Briefer et al., 2010; McElligott and Hayden, 1999). Groans are characterized by a very low F0 and unevenly spaced and modulated formants that obey stereotyped distribution patterns, indicating that they are produced by a consistent but complex vocal tract shape (Reby et al., 1998; Vannoni and McElligott, 2007). Fallow bucks have a descended and mobile larynx that is retracted towards the sternum during groan production (Fitch and Reby, 2001; McElligott et al., 2006), thereby extending the vocal tract. The effect of vocal tract extension on formant frequency has been extensively documented in fallow deer (McElligott et al., 2006) and the closely related red deer (Fitch and Reby, 2001; Frey et al., 2012). As the animal extends its vocal tract, formants are lowered until they reach a minimal plateau corresponding to maximal extension, and formant frequency spacing is inversely correlated with the length of the vocal tract during extension (Fitch and Reby, 2001; McElligott et al., 2006).
Previous attempts at relating formant frequency spacing to vocal tract length have typically modelled the vocal tract as a simple tube of uniform diameter that is closed at the glottis and opened at the mouth (Charlton et al., 2011; Reby and McComb, 2003; Vannoni and McElligott, 2007; Fitch, 1997). Under these assumptions, the length of the vocal tract can be predicted from the formant frequency spacing (and vice versa) using the equation: eVTL=c/2×DF, where eVTL is the estimated vocal tract length, c is the speed of sound in the vocal tract of 350 m s−1 and DF is the overall formant frequency spacing measured in the vocalization (Reby and McComb, 2003). Measurements of anatomical oral vocal tract length in adult male fallow deer (taken as the distance from the larynx to the tip of the snout) based on calibrated photographs have produced fully extended vocal tract lengths ranging between 46 and 54 cm (McElligott et al., 2006). Yet, the minimum formant spacing (DF) measured in male fallow deer groans varies between 326 and 281 Hz (Vannoni and McElligott, 2007), which, when the vocal tract is modelled as a simple cylindrical tube, corresponds to vocal tract lengths between 54 and 62 cm. This overestimation of the vocal tract length from the acoustic data indicates that the animal's vocal tract produces more/lower formants than expected from its anatomical length. Combined with the observation that formants are stereotypically unevenly spaced in groans (McElligott et al., 2006), this suggests the vocal tract configuration during groan production is more complex than previously assumed.
Here, we investigated the hypothesis that male fallow deer simultaneously use the oral and nasal vocal tracts as resonating systems during call production, allowing this species to produce a larger number of formants and a more complex formant pattern than predicted by a simple cylindrical tube model. To this end, we performed computed tomography (CT) scans of head-and-neck specimens of male fallow deer to achieve a more detailed description of the complex anatomical structure of the male fallow deer supra-laryngeal vocal tract. The specific aims of this investigation were threefold: (1) to describe the 3D geometry of the supra-laryngeal vocal tract in male fallow deer, including the oral and nasal vocal tracts, while the larynx is maximally retracted; (2) to predict the resonance patterns produced by the fully extended vocal tract configuration, simulating the effect of the involvement of the oral vocal tract only, of the nasal vocal tract only, or of the combined oral and nasal vocal tracts on formant patterns; and (3) to compare resonance patterns predicted by each of these configurations with formant patterns observed in actual vocalizations.
MATERIALS AND METHODS
Specimen collection
Our measures of vocal tract area functions are based on CT scans from two adult male fallow deer, Dama dama (Linnaeus 1758). Male 1 was a 7-year-old buck that died from injuries sustained during a fight with rival males during the breeding season on 20 October 2011 in Home Park, London, UK. Male 2 was an 11-year-old buck culled by park staff during annual population control management practices on 7 February 2011 in Richmond Royal Park, London, UK. The two specimens had similar skeletal dimensions: lower mandible length (male 1: 21.0 cm, male 2: 20.3 cm) and lower hindleg length from the calcaneal tuber to the end of the metatarsus (male 1: 31 cm, male 2: 32.5 cm). Head-and-neck specimens of both individuals were obtained by separation between the 2nd and 3rd ribs. Specimens were chilled with ice within 2 h of death and were frozen at −20°C within 5 h of death.
Specimen preparation and CT scanning
Specimens were thawed, and water was flushed through the oral and nasal vocal tracts to remove debris and fluids and they were left upright to drain for about 30 min before scanning. Specimens were scanned with and without artificially extending the vocal tract. Vocal tract extension was achieved by maximally pulling the sternothyroid muscle and the trachea towards the sternum and fastening them to the sternum using a string. This recreated a naturalistic configuration of the fully extended vocal tract in male fallow deer.
Specimens were positioned as much as possible in configurations typical of vocalizing (see Fig. 1), although it was not possible to stretch the neck as much as desired (Fig. 2). The oral cavity was kept open using a block of Styrofoam. This affected the position of the tongue, which was pushed backward in an unnatural position in one of the specimens. The detrimental effect on the cross-sectional area measures was moderated by extrapolating the typical position of the tongue from other specimens and anatomical examinations.
Basihyoid
Unpaired, most ventral, transverse component of the hyoid apparatus intercalated between the paired suspension from the skull and the paired arms to the larynx.
Choanae
The ‘internal nares’, i.e. the two openings at the caudal end of the nasal cavity, where the paired nasal meatuses lead into the nasopharynx.
Cricoid cartilage
The most caudal, ring-shaped cartilage of the larynx that is attached to the trachea; it consists of a dorsal plate and a ventral arch.
Epiglottis
The most rostral cartilage of the larynx; during quiet breathing, it has a so-called intranarial position, i.e. it protrudes dorsally, through the intra-pharyngeal ostium, into the nasopharynx; during an open-mouth call, the larynx is withdrawn from the intra-pharyngeal ostium so that it is now positioned in the oropharynx.
Glottis
The vocal source of the larynx; it consists of the two vocal folds, ventral parts of the arytenoid cartilages and the vocal cleft in between the vocal folds; regarding the laryngeal cavity, it is positioned between the laryngeal vestibule rostrally and the infraglottic cavity caudally.
Hyoid apparatus
A framework of small rod-like bones connecting dorsally to the base of the skull, rostrally to the tongue and caudally to the larynx. It has three components: (1) the arms of the paired dorsal part flank the pharynx on both sides and suspend the entire hyoid apparatus from the skull base – it consists of several parts on both sides termed (dorsal to ventral): tympanohyoid, stylohyoid, epihyoid, ceratohyoid; (2) the arms of the paired caudal part connect to the larynx (in fallow deer via the two thyrohyoid ligaments) – it consists of one element per side termed the thyrohyoid; (3) the unpaired, transverse ventral part connects the two paired structures thereby forming a larger U-shaped fork for suspension from the skull dorsally and a smaller U-shaped fork for connection to the larynx caudally.
Intra-pharyngeal ostium
An opening in the soft palate, creating a passage between the nasopharynx and oropharynx; it is bordered by the palatopharyngeal muscle, which can constrict the intra-pharyngeal ostium.
Isthmus faucium
A narrow short passage between the mouth cavity and the oropharynx, bounded by the soft palate dorsally, the tongue ventrally and the palatoglossal arch (a symmetrical, dorsoventral mucosa fold between the soft palate and tongue) laterally.
Laryngeal entrance
The rostral entrance to the larynx, bounded by the epiglottis, the aryepiglottic folds and the corniculate processes of the arytenoid cartilages.
Nasopharynx
The nasal part of the pharynx, dorsal to the soft palate; it extends from the choanae to the intrapharyngeal ostium.
Oropharynx
The oral part of the pharynx, ventral to the soft palate; it extends from the isthmus faucium to the base of the epiglottis.
Pharynx
A musculomembraneous cross-way of the respiratory and digestive tracts between the oral and nasal cavities rostrally and the oesophagus and larynx caudally; for the sake of simplicity, and in contrast to textbooks, the pharynx is here not subdivided into three parts (nasal, oral and laryngeal) but only into two parts (nasal and oral).
Soft palate
Also known as the palatine velum, the soft palate is a soft tissue structure that is laterally fused to the pharyngeal walls; it completely separates the nasopharynx and oropharynx, except at the intra-pharyngeal ostium, which represents the only communication between the nasopharynx and oropharynx; like the pharynx, it extends from the choanae to the larynx.
Thyrohyoid
A paired caudal element of the hyoid apparatus that, together with the basihyoid, forms the smaller U-shaped fork for establishing the connection between the hyoid apparatus and the larynx.
Thyrohyoid ligament
Replaces the usual thyrohyoid articulation of most mammals by a ligamentous connection between the caudal tip of the thyrohyoid (of the hyoid apparatus) and the rostral horn of the thyroid cartilage on both sides.
Thyroid cartilage
The most superficial and largest cartilage of the larynx, unpaired. Its two lateral laminae are ventrally fused and enclose most of the laryngeal cavity in between them; its rostral horns connect the larynx to the thyrohyoid of the hyoid apparatus; its caudal horns articulate with the cricoid cartilage in the cricothyroid articulations.
Trachea
The windpipe, connecting the lungs to the larynx, extends from its bifurcation into the main bronchi caudally to the cricoid cartilage of the larynx rostrally.
Estimation of vocal tract area function
We measured the supra-laryngeal vocal tract area functions (the cross-sectional area at 1 cm steps along the length of the vocal tract; De Boer and Fitch, 2009) using the 3D curved Multi-Planar Representation viewer in Osirix (version 6.0, 64bits for Mac, www.osirix-viewer.com) and following the three-step method described in Kim et al. (2009). First, the oral and nasal vocal tract dorsoventral midlines were drawn manually using the ‘3D curved path’ tool on a midsagittal section (Fig. 3A,D). Second, for each vocal tract, cross-sectional areas orthogonal to the midline were produced at 1 cm intervals from the glottis to the lips or nostrils (Fig. 3B–D). The vocal tract area was then measured in each cross-sectional slice using the closed polygon selection tool to delineate the VT area. Osirix automatically returned the area of the delineated zone (in cm2). Each slice/measure was saved as an image file.
Prediction of vocal tract resonances
where Zc=ρc/Ao is the characteristic impedance (where ρ and c are, respectively, the air density and the speed of sound at body temperature, and k=ω/c=2πF/c is the wave number).
If several complex tubes are connected (for example, at the branching point of the oral and nasal tracts), there is continuity of the acoustic pressure and conservation of the acoustic volume flow. The particular frequencies where the input impedance magnitude reaches a local maximum correspond to the vocal tract resonances visible as formants in the spectral acoustic structure of the produced vocalization.
The input parameters are the area functions of the oral and nasal vocal tracts and the branching point of the tracts. The air temperature inside the fallow deer vocal tract, which affects the absolute frequency of predicted resonances (formant frequencies are proportional to the speed of sound in the air and thus to the square root of the absolute temperature) but not their relative frequency distribution, was set to 38°C. The model assumed wall damping of rigid walls and radiation impedance of an open-end, un-flanged tube (Chaigne and Kergomard, 2016).
Resonances were predicted for each male, based on the area functions of three possible configurations: (1) the oral vocal tract only (common laryngopharyngeal tract, oropharynx and oral cavity); (2) the nasal vocal tract only (common laryngopharyngeal tract, nasopharynx and nasal cavities); and (3) both the oral and nasal vocal tracts (common laryngopharyngeal tract, oropharynx, oral cavity, nasopharynx and nasal cavities).
The total cross-sectional areas of both (left and right) nasal cavities were approximated by doubling the cross-sectional areas measured for the left nasal cavity connecting to the left nostril. To approximate the potential effect of the partial opening of the sides of the mouth, we only included half of the portion of the oral cavity that is laterally open. The cross-sectional areas for this portion of the oral cavity were estimated by manually linking the upper lip to the lower lip.
RESULTS
Vocal tract anatomy
Anatomically, the complete vocal tract is composed of five distinct and connected sections, one of which is paired: (1) the common laryngopharyngeal tract, shared by the oral and nasal vocal tract, between the glottis and the intra-pharyngeal ostium; (2) the oropharynx, from the intra-pharyngeal ostium to the isthmus faucium (oropharyngeal tract); (3) the oral cavity, from the isthmus faucium to the mouth opening (oral tract); (4) the nasopharynx, from the intra-pharyngeal ostium to the choanae (nasopharyngeal tract); and (5) the paired nasal cavities, from the choanae to the nostrils (nasal tract). The oral vocal tract comprises parts 1, 2 and 3 and the nasal vocal tract parts 1, 4 and 5 (Fig. 2). The choanae and the isthmus faucium are located approximately at the same distance from the glottis.
In the resting configuration (Figs 1A, 2A and 4A,B), the pharynx, soft palate and thyrohyoid ligament are relaxed, and the larynx resides at the level of cervical vertebrae 2 and 3, but farther rostrally in male 2 than in male 1. Accordingly, the rostral end of the trachea is located 41 cm from the lips in male 1 and 34 cm from the lips in male 2. The resting oral vocal tract length (glottis to lips) is 36 cm in male 1 and 30 cm in male 2, and the resting nasal vocal tract length (glottis to nostrils) is 40 cm in male 1 and 35 cm in male 2 (Figs 4 and 5). The flexible region between the rostral edge of the thyroid cartilage and the base of the epiglottis is relaxed and short (4 cm in both male 1 and male 2). Correspondingly, the overall length of the larynx (from cricoid arch to epiglottal tip) is 13 cm in male 1 and 10 cm in male 2. The hyoid apparatus is folded and the distance between the basihyoid and the epiglottis is small (4.5 cm in male 1 and 2.5 cm in male 2). The rostral edge of the intra-pharyngeal ostium (the caudal tip of the palatine velum) is located about 30 cm from the lips in male 1 and 25 cm from the lips in male 2. The epiglottis is in contact with the intra-pharyngeal ostium or overlaps its rostral edge, so that the laryngeal entrance comes to lie in continuation of the nasopharynx (so-called ‘intranarial position’). The mouth is mostly closed.
In the extended phonatory configuration (Figs 1B, 2B, 3A,D and 4C,D), the pharynx, soft palate and thyrohyoid ligament are maximally extended, and the larynx resides at the level of cervical vertebrae 4, 5 and 6 in male 1 and 3, 4 and 5 in male 2. Accordingly, the rostral end of the trachea is located approximately 54 cm from the lips in male 1 and 44 cm from the lips in male 2. The extended oral vocal tract length (glottis to lips) is 48 cm in male 1 and 40 cm in male 2, and the extended nasal vocal tract length (glottis to nostrils) is 50 cm in male 1 and 43 cm in male 2 (Figs 4C,D and 5). The length of the common laryngopharyngeal tract (glottis to intra-pharyngeal ostium) is 12 and 9 cm for males 1 and 2, respectively (Figs 4C,D and 5). The flexible region between the rostral edge of the thyroid cartilage and the base of the epiglottis is maximally tensed and considerably elongated (7.5 cm in male 1 and 5 cm in male 2). Correspondingly, the overall length of the larynx (from cricoid arch to epiglottal tip) is 15 cm in male 1 and 13 cm in male 2. The hyoid apparatus is maximally unfolded and the thyrohyoid rotated caudally. The distance between the basihyoid and the epiglottis is considerably enlarged, being now 12 cm in male 1 and 7.5 cm in male 2. The rostral edge of the intra-pharyngeal ostium (the caudal tip of the palatine velum) is located approximately 37 cm from the lips in male 1 and 31 cm in male 2. The epiglottis is retracted from the intra-pharyngeal ostium so that the laryngeal entrance is positioned in continuation of both the oropharynx and nasopharynx. From the intra-pharyngeal ostium onward, the pharyngeal cavity splits into two tubes (the nasopharynx connected to the nasal cavities and the oropharynx connected to the oral cavity) completely separated by the soft palate (velum). The mouth is open for vocalizing.
Cross-sectional area
The cross-sectional areas measured along the vocal tracts of each male specimen are shown in Fig. 5. The area functions from the glottis (right) towards the lips and nostrils (left) are highly comparable between the two specimens.
Longitudinally, the choanae and the isthmus faucium are located about 25–30 cm (male 1) and 20–25 cm (male 2) rostral to the glottis, respectively (Fig. 5).
The decrease of the cross-sectional area from the glottis towards the intra-pharyngeal ostium is a consequence of the large larynx of male fallow deer. Its considerable dorsoventral height causes a relatively large intra-laryngeal cross-sectional area at the glottis (Fig. 3D, cross-section 1). From the intra-pharyngeal ostium, the cross-sectional area of the nasopharynx increases towards the choanae and, similarly, the cross-sectional area of the oropharynx increases towards the isthmus faucium. The choanae and isthmus faucium mark the rostral end of the pharynx, i.e. the transition from the nasopharynx to the nasal cavities and from the oropharynx to the oral cavity, respectively. The gradual, mostly uniform increase in cross-sectional area, from the intra-pharyngeal ostium towards the rostral end of the pharynx, reflects the basic funnel shape of the pharynx, which is narrowest at its connection to the larynx and widest at its connection to the skull and the oral cavity. From the choanae to the nostrils there is an overall decrease of cross-sectional area as a consequence of the narrowing of the nasal cavities towards the muzzle. The particular decrease of cross-sectional area at around 42 cm in male 1 and at around 37 cm in male 2 comes from the ventral nasal conchae, which narrow the nasal cavities by their extensive, scrolled osseous lamellae (Fig. 3D, cross-section 3). From the isthmus faucium to the lips, the cross-sectional area initially increases and then decreases. In between is a particular decrease at around 39 cm in male 1 and around 33 cm in male 2. The initial increase represents the caudal end of the oral cavity between the isthmus faucium and the root of the tongue and the final decrease represents the rostral end of the oral cavity between the lingual fossa and the lips. The intermediate decrease in cross-sectional area results from the lingual torus, an elevation of the tongue in ruminants, which considerably narrows the middle oral cavity. When the mouth is opened and the lower jaw depressed for groan emission, the cross-sectional areas of the (then funnel-shaped) oral cavity will increase accordingly in the direction towards the lips.
Predicted vocal tract resonances
The vocal tract resonances predicted from the vocal tract area functions of male 1 and male 2 and corresponding to the three possible configurations are presented in Fig. 6. Formant F1 predicted by the combined oral and nasal vocal tracts is in an intermediate position between the F1 predicted using the oral vocal tract only and that predicted from the nasal vocal tract only. F2 and F3 predicted by the combined oral and nasal vocal tracts corresponds to the F2 predicted by the nasal vocal tract only, and to the F2 predicted by the oral vocal tract only. F4 and F5 predicted by the combined oral and nasal vocal tracts correspond to the F3 predicted by the nasal vocal tract only, and to the F3 predicted by the oral vocal tract only (Fig. 6, Table 1).
The centre frequencies of each predicted formant are reported in Table 1. The models using both oral and nasal vocal tracts predict much lower formants overall (average formant spacing of 255 Hz) than models using the oral vocal tract only (average formant spacing of 446 Hz) or models using the nasal vocal tract only (average formant spacing of 358 Hz).
Comparison with acoustic observations
Fig. 7 plots the average centre frequencies of the first five formants observed in groans from 16 adult fallow deer males (reported in Vannoni and McElligott, 2007; see Table 2) against the resonances predicted by our three vocal tract models. The resonances predicted by including both the oral and nasal vocal tracts are a better fit to the observed formants than those predicted by using the oral vocal tract or the nasal vocal tract only: the slope of the regression line is closer to 1 (indicating a better fit of the scaling of the resonances) and R2 is also higher (indicating a better fit of the pattern of the resonances). Examination of the regression slopes in Fig. 7 shows that while model 3 (combined oral and nasal vocal tracts) underestimates DF by 9%, model 1 (oral vocal tract only) overestimates DF by 37% and model 2 (nasal vocal tract only) overestimates DF by 23%. Separate correlations for male 1 and male 2 are given in Fig. S2.
DISCUSSION
In this study, the artificially extended vocal apparatuses of two adult male fallow deer were CT-scanned, and the cross-sectional areas of the complete supra-laryngeal vocal tract (common laryngopharyngeal tract, oropharynx, oral cavity, nasopharynx and nasal cavities) of both specimens were measured along the oral and nasal vocal tracts. We then used this data to model resonance patterns produced by these supra-laryngeal cavities including the oral vocal tract only, the nasal vocal tract only or the combined oral and nasal vocal tracts. We found that the configuration combining the oral and nasal vocal tract geometries produced a resonance pattern which more closely matches the formants observed in fallow deer groans, in terms of both formant frequency pattern and formant frequency scaling.
The formants observed in the groans of fallow deer (Briefer et al., 2010; McElligott and Hayden, 1999) and more generally in the sexually selected calls of male polygynous deer with extensible vocal tracts (Fitch and Reby, 2001; Passilongo et al., 2013; Reby and McComb, 2003; Reby et al., 2016) obey stereotyped, uneven formant patterns incompatible with a vocal tract consisting of a simple linear tube. More specifically, in both fallow deer groans (McElligott et al., 2006) and red deer roars (Reby and McComb, 2003), the second and third formants are close to one another and the fourth formant is higher, leaving a gap between the third and fourth formant. Our simulations combining both the oral and nasal vocal tracts predict this pattern. Comparison of the models suggests that F2 and F4 of fallow deer groans are affiliated to the nasal vocal tract while F3 and F5 are affiliated to the oral vocal tract.
In terms of frequency scaling, our predictions resolve the aforementioned mismatch between apparent vocal tract lengths estimated from formant frequencies measured in fallow deer groans and actual anatomical vocal tract length derived from photogrammetric and anatomical data. Presumably, this is because the inclusion of formants affiliated to the nasal vocal tract led to an overestimation of apparent vocal tract length in previous studies modelling the fallow deer vocal tract as a single uniform tube. Indeed, previous investigations of apparent vocal tract length (i.e. vocal tract length estimated from formant frequencies) in polygynous deer with descended and mobile larynges (Corsican deer: Kidjo et al., 2008; fallow deer: McElligott et al., 2006; Mesola deer: Passilongo et al., 2013; red deer: Reby and McComb, 2003) have modelled the vocal tract as a linear tube with a constant cross-section, closed at one end (glottis) and open at the other (mouth), and excluded the involvement of the nasal vocal tract for loud calls produced with an open mouth. While these succeeded at characterizing inter-individual differences in formant frequency spacing (Kidjo et al., 2008; Vannoni and McElligott, 2007), apparent vocal tract length and thus body size (Reby and McComb, 2003; Vannoni and McElligott, 2008), they probably yielded over-estimations of the anatomical vocal tract length.
The inclusion of nasal resonances in future models should allow for better estimations of apparent vocal tract length from recorded mating calls in these species, thereby potentially enhancing the reliability of bioacoustic tools aimed at assessing body size from vocalizations for research, conservation or wildlife management purposes.
Taken together, our observations strongly suggest that the nasal cavity and oral cavity are simultaneously involved in the vocal production of fallow deer groans. This involvement of the nasal vocal tract due to the non-closure of the intra-pharyngeal ostium during vocal tract extension may be widespread in species with a permanently descended larynx and extensible vocal tract (such as other polygynous deer and goitred gazelles, for example), but also occur in species where callers lower their larynx temporarily for the production of oral (rather than nasal) calls. We suggest that the potential for nasalization of putative oral loud calls should be carefully examined across terrestrial mammals.
The role of nasal cavities in acoustic output has been investigated in humans using anatomical scans, area functions, vocal tract modelling and/or acoustic analysis (Dang et al., 1994; Feng and Castelli, 1996; Hattori and Fujimura, 1958; Pruthi et al., 2007; Story, 2005). Compared with modulation of the oral vocal tract, nasalization plays a relatively minor role in human speech variation and is often left out of vocal models. However, models that include coupling between the nasal and oral cavity can result in transfer functions that more closely match recorded acoustic output (Dang et al., 1994; Feng and Castelli, 1996). The effects of nasalization are strongest in the lower frequencies (Feng and Castelli, 1996; Pruthi et al., 2007) and include the addition of low-frequency formant peaks as observed here in fallow deer groans. Nasal coupling has also been suggested as a likely mechanism for the addition of low spectral peaks in Diana monkey alarm calls (Riede and Zuberbuhler, 2003).
There are obvious limitations to this investigation. Our dead specimens were scanned in artificial positions constrained by the dimensions of the CT scanner, and thus only approximate the natural postures of live animals during vocalizing. Moreover, the vocal tracts were artificially extended. The geometries are thus approximations of the vocal tract of live animals during vocalizing, and do not account for internal adjustments such as, for example, the possible contribution of palatopharyngeal muscles. Future investigations could involve performing several scans with different combinations of head/neck angle, extent of the laryngeal descent or mouth opening, or perform simulations of these parameters (Gamba et al., 2012; Gamba and Giacoma, 2006b). Using a larger sample of specimens would also allow the assessment of inter-individual variation, including the effect of age or size.
Formant frequencies are known to provide cues to the caller's body size in fallow deer groans and red deer roars, because of the close correlation between formant frequency spacing and body size (McElligott et al., 2006; Reby and McComb, 2003), and are used by male and female receivers to assess rivals and potential mates during the breeding season (Charlton et al., 2007; Charlton et al., 2008a,b; Pitcher et al., 2015; Reby et al., 2005). The descended larynx and extensible vocal tract of fallow and red deer males (and some other species) are therefore considered to be adaptations that allow callers to maximize the acoustic impression of their body size conveyed to receivers (the ‘size exaggeration hypothesis’: Fitch and Reby, 2001; Ohala, 1984). Our investigations show that the involvement of the nasal vocal tract adds additional formants to the lower part of the spectrum, which increases formant density (decreasing formant spacing), and may make the caller sound larger when compared with oral- or nasal-only calls. Similar functional explanations have been suggested for the evolution of air sacs (de Boer, 2009; Harris et al., 2006), which also increase formant density by adding resonances.
In conclusion, we contend that, while expensive and technically challenging, using 3D CT scanning to predict vocal tract resonances can greatly assist the interpretation of formant patterns in mammalian vocalizations. We suggest that similar approaches could be generalized to the study of vocal tract resonances in other terrestrial mammals.
Acknowledgements
We thank John Bartram, John Comfort and Paul Douglas at the London Royal Parks for their help obtaining the specimens. We are also very grateful to Jan Bush and the amazing staff at the Clinical Imaging Sciences Centre for facilitating the production of the CT scans.
Footnotes
Author contributions
Conceptualization: D.R., J.G.; Methodology: D.R., J.D., J.G.; Software: J.D., J.G.; Formal analysis: D.R., R.F., J.G.; Investigation: D.R., M.W., R.F., J.G.; Resources: D.R., M.W., R.F., J.G.; Data curation: D.R., M.W., R.F., J.G.; Writing - original draft: D.R., M.W., R.F., J.G.; Writing - review & editing: D.R., M.W., R.F., B.C., J.D., J.G.; Supervision: D.R.; Project administration: D.R.; Funding acquisition: D.R.
Funding
J.G. was supported by a Projet International de Coopération Scientifique grant reference 6188 from Centre National de la Recherche Scientifique. D.R. was supported by an invited professorship from Le Mans Université. M.T.W. was supported by a grant from US National Science Foundation International Research Fellowship (grant number 0908569) and an award from the Systematics Research Fund.
Data availability
Supporting data and scripts for predicting vocal tract resonances are available on request from the corresponding author.
References
Competing interests
The authors declare no competing or financial interests.