The complex and elaborate vocalizations uttered by many of the 10,000 extant bird species are considered a major driver in their evolutionary success, warranting study of the underlying mechanisms of vocal production. Additionally, birdsong has developed into a highly productive model system for vocal imitation learning and motor control, where, in contrast to humans, we have experimental access to the entire neuromechanical control loop. In human voice production, complex laryngeal geometry, vocal fold tissue properties, airflow and laryngeal musculature all interact to ultimately control vocal fold kinematics. Quantifying vocal fold kinematics is thus critical to understanding neuromechanical control of voiced sound production, but in vivo imaging of vocal fold kinematics in birds is experimentally challenging. Here, we adapted and tested electroglottography (EGG) as a novel tool for examining vocal fold kinematics in the avian vocal organ, the syrinx. We furthermore imaged and quantified syringeal kinematics in the pigeon (Columba livia) syrinx with unprecedented detail. Our results show that EGG signals predict (1) the relative amount of contact between the avian equivalent of vocal folds and (2) essential parameters describing vibratory kinematics, such as fundamental frequency, and timing of syringeal opening and closing events. As such, EGG provides novel opportunities for measuring syringeal vibratory kinematic parameters in vivo. Furthermore, the opportunity for imaging syringeal vibratory kinematics from multiple planar views (horizontal and coronal) simultaneously promotes birds as an excellent model system for studying kinematics and control of voiced sound production in general, including in humans and other mammals.
Birds produce a wide variety of sounds crucial to their survival and communication (Marler and Slabbekoorn, 2004). Additionally, birdsong has developed into an important model system to study motor sequence learning (Fee and Scharff, 2010) and imitative vocal learning in humans (Brainard and Doupe, 2013; Doupe and Kuhl, 1999). Birdsong is a highly quantifiable stereotyped behaviour, which, combined with discrete neural substrates, promotes understanding of links between brain and behaviour (Bolhuis and Gahr, 2006; Brainard and Doupe, 2013). Despite the fact that mammals and birds use two different vocal organs, the larynx in mammals and the syrinx in birds, we recently demonstrated that the same physical mechanism underlies their voiced sound production, i.e. the myoelastic-aerodynamic (MEAD) mechanism (Elemans et al., 2015; Titze, 1980; Titze, 1994; Van den Berg, 1958). This finding constitutes yet another parallel between the motor control of voiced sound production and allows us to exploit the much larger literature on human voice production and its control to further our understanding of vocal production and control in birds, and potentially vice versa.
The neuromechanical control of voiced sound production is formed by a closed loop, of which the neural circuitry (brain) and biomechanics (sound production and perception) form an integral part (Düring and Elemans, 2016; Goller and Riede, 2013; Suthers and Margoliash, 2002). To fully comprehend the motor control of sound production, we thus need to understand both neural mechanisms as well as peripheral biomechanics. In birds, great advances have been made in elucidating the activity of individual neurons involved in their vocalization (Berwick et al., 2011; Keller and Hahnloser, 2009; Markowitz et al., 2015) and neuromechanical control models (Amador et al., 2013). However, we have little quantitative data on the peripheral biomechanics in avian sound production that are essential to interpret neural signals and validate models (Düring and Elemans, 2016; Goller and Cooper, 2004; Riede and Goller, 2010; Zollinger and Suthers, 2004).
In humans, the physical and physiological framework of laryngeal sound production for speech and singing has been relatively well described. The cyclical motion of the vocal folds periodically obstructs the passage of respiratory air, leading to pressure fluctuations in the vocal tract that are radiated as sound (Story, 2002). The complex laryngeal geometry, vocal fold tissue properties, airflow and musculature all interact to ultimately control the kinematics of the vocal folds (Herbst et al., 2011; Herbst et al., 2009; Titze and Talkin, 1979), thus causally influencing the radiated sound. The quality of the cyclical motion of the vocal folds determines, amongst others, the fundamental frequency (fo) and the spectral slope (Alipour et al., 2012) of the generated sound, equally relevant for speech communication and singing (Herbst et al., 2015). Because of the critical role in setting important acoustical cues during voice production in both healthy and pathological voice production, quantifying and analysing vocal fold motion is and has been a main research focus for several decades in humans. Quantifying syringeal vibratory kinematics must likely be equally essential for explaining vocal variation in birds. However, we currently lack detailed quantitative insight into syringeal vibratory kinematics.
Ideally, we would like to be able to quantify the time-varying 3D geometry of the syrinx at high speed during sound production in vivo. However, quantifying the motion of the avian equivalent to mammalian vocal folds, the lateral labia and medial labia in songbirds (Düring et al., 2013; Goller and Larsen, 1997b) and the lateral vibrating masses (LVMs) in non-songbirds (Elemans et al., 2008), is experimentally difficult because of the small size and anatomical location of the syrinx deep in the body. Endoscopic high-speed imaging has been used successfully (Goller and Larsen, 1997a; Jensen et al., 2007), but remains very challenging in anaesthetized birds, let alone in freely singing birds. Another approach for imaging syringeal dynamics is using an in vitro (Fee et al., 1998; Paulsen, 1965) or perfused whole-organ ex vivo approach (Elemans et al., 2015). However, we have insufficiently detailed knowledge regarding syringeal motor control to meaningfully mimic song ex vivo. It is thus critical to gain detailed insight into the syringeal vibratory kinematics in vivo and how this affects vocal production.
derivative of the EGG signal
caudo-cranial height of tissue contact
lateral vibrating mass
LVM contact area
interclavicular air sac pressure
projected LVM contact area
relative vocal fold contact area
vocal fold contact area
A less intrusive technique to investigate in vivo vocal fold dynamics in humans is electroglottography (EGG), which measures the variation in impedance between two electrodes, placed on the thyroid cartilage on each side of the vocal folds. Because air is a much better insulator than tissue, the impedance will depend on how much contact there is between the vocal folds (Rothenberg, 1992; Titze, 1990). An AC current with a frequency of 2 MHz is used to overcome the relatively nonconductive layer of skin and the myelin insulation of muscle fibres between the vocal folds and the electrodes (Rothenberg, 1992). The EGG signal has been shown to correlate well with the time-varying relative vocal fold contact area (rVFCA), which is the area of contact between the two vocal folds (Hampala et al., 2016; Scherer et al., 1988), but cannot predict either the absolute VFCA (Hampala et al., 2016) or The lateral vocal fold displacements.
Here, we test whether EGG can be used to quantify essential parameters for describing syringeal kinematics, such as fo and timing of opening and closing events, and thus whether EGG could be used as a viable tool for examining syringeal dynamics in vivo. We recently established paradigms to study sound production in vitro (excised) and ex vivo (perfused whole organ), which allows for new opportunities to image syringeal kinematics at high temporal (<25,000 frames s−1) and spatial resolution (∼5–30 µm) from different orientations simultaneously (Elemans et al., 2015). Using this approach, we can image and quantify syringeal vibratory kinematics in unprecedented detail, which allows us to test alternative methods to direct imaging for quantifying syringeal vibratory kinematics that may be applicable in vivo.
MATERIALS AND METHODS
Previous work observed syringeal dynamics with high imaging quality in domestic pigeons (Elemans et al., 2015). Therefore, we examined the syringes of eight adult male domestic pigeons (Columba livia Gmelin 1789) that were kept in a 3×6×2 m outdoor aviary at the University of Southern Denmark with food and water provided ad libitum. All procedures were carried out in accordance with the Danish Animal Experiments Inspectorate (Copenhagen, Denmark).
The birds were collected in the aviary in an opaque box to minimize the stress for the animal. In order to avoid blood clotting in the syrinx, the animal was injected intramuscularly with 80 µl kg−1 body mass of 5000 units ml−1 heparin (Amgros, Copenhagen, Denmark) using a 30 G needle on a Hamilton 50 µl syringe 30 min prior to the experiment. Animals were euthanized by isoflurane (Baxter Medicals, Deerfield, IL, USA) overdose. An incision was made 2–3 mm left of the sternum. The ribcage was opened, the bronchi were cut as close to the lungs as possible, and the trachea was cut approximately 10 cm cranial of the syrinx.
The syrinx was extracted and placed in oxygenated Ringer's solution (recipe as in Elemans et al., 2004) in a Sylgard-covered Petri dish on ice. The syrinx was cleaned of blood and excess adipose tissue and mounted in the experimental chamber (see below), ventral side up. The bronchi were tied airtight onto silastic tubing using 5.0 monofilament sutures (Argos Surgical Instruments, Newport Beach, CA, USA). The trachea was fastened airtight onto a custom-made plastic connector using 4.0 sutures (Argos Surgical Instruments). Two EGG electrodes were fixed with 10.0 monofilament sutures (Argos Surgical Instruments) bilaterally onto tracheal ring T1.
Experimental chamber design and hardware
We used an in vitro paradigm to study sound production as described in detail in Elemans et al. (2015). In brief, the excised syrinx was placed in an airtight chamber that allowed for precise control of air pressure, flow, humidity and temperature. The air pressures in the chamber and bronchi were controlled by two dual-valve differential controllers (model PCD, Alicat Scientific, Tucson, AZ, USA) with a 0–10 kPa gauge pressure range. The flow through the bronchi was measured using a flow transducer (model PMFc3000 Posifa Microsystems, San Jose, CA, USA; 3 standard litres per minute full scale, response time 1 ms). Care was taken to mount the syrinx in its natural position. Because in pigeons the LVMs are attached to and encased by the rigid ossified skeleton of the syrinx, LVM adduction and tension are not affected as a result of incorrect mounting. Instead, LVM tension is affected by the differential pressure between airsac and bronchus (termed transmural pressure) (Elemans et al., 2008), and by the syringeal muscles (Elemans et al., 2004, 2006). The pressure difference is carefully controlled for in the experiments as described in detail below. This situation is in contrast with excised larynx experiments where mounting requires assumptions on vocal fold adduction, length, edge approach and position symmetry that critically affect the conditions for vibration.
The experimental chamber had a glass lid allowing imaging of the syrinx in the coronal plane with a high-speed CMOS camera (model HS4, IDT, Pasadena, CA, USA) through a stereomicroscope (Leica, M165 FC, Leica Microsystems, Wetzlar, Germany). This camera could capture up to 5000 frames s−1 at full frame image size of 1024×1024 pixels. Below the syrinx, the experimental chamber had a glass window to allow transillumination of the syrinx by a 1700 lm white LED powered by a stable power source (PS23023, HQ Power, Gavere, Belgium). To image the syrinx from a tracheal perspective (the horizontal plane), we used either the HS4 camera or a videokymographic (VKG) camera (model 2156, Cymo B.V., Groningen, The Netherlands) (Qiu and Schutte, 2006) through a 1.2 mm diameter flexible endoscope (Schölly, Denzlingen, Germany). The VKG system combined a CMOS chip capturing the full frame in colour at 25 Hz and a monochrome line-scan camera capturing a centred horizontal line at 7.2 kHz.
Sound pressure was measured in between 2 and 4 cm distance (accuracy ±1 mm) from the tracheal tube outlet with a 0.5 inch condenser microphone (model 40AF with preamplifier type 26AH, G.R.A.S. Sound & Vibration, Holte, Denmark). The microphone signal was amplified (model 12AQ, G.R.A.S.) and system sensitivity was calibrated prior to each experiment with a microphone calibrator (model 42AB, G.R.A.S.).
EGG signals were recorded using a modified two-channel EGG device (model EG2, Glottal Enterprises Inc., Syracuse, NY, USA) (Rothenberg, 1992). In contrast to the 3 cm diameter gold-plated disc electrodes placed on human skin (Rothenberg, 1992; Scherer et al., 1988), we used wire electrodes attached directly on the vibrating tissue as in Elemans et al. (2015). The electrodes consisted of 28 µm Formvar insulated nichrome wire (A-M Systems, Sequim, WA, USA) soldered to a four-pin connector connected to airtight electrical connectors inside the experimental chamber. Owing to the higher vibration frequencies expected in birds (fo is typically 500–5000 Hz) compared with humans, the frequency response of the EG2 device was extended upwards by the developers (M. Rothenberg and M. Lyaski). Additionally, because trial runs showed that the EGG signals on the syrinx with our electrode design exceeded 20 V peak-to-peak, the EGG device was modified, allowing reduced amplification settings. Two additional three-position switches allowed us to individually change the frequency ranges and amplification of each channel from the standard low-pass filter setting on the device up to 10, 12 or 13.5 kHz, with concurrent amplification settings of 1×, 1/2× and 1/3×, respectively.
Data acquisition and synchronization
All data acquisition, data analysis and control software was custom-written in MATLAB (MathWorks, Natick, MA, USA). A 1 ms transistor–transistor logic trigger pulse was generated to synchronize recordings and to trigger the HS4 high-speed camera. Pressure, flow, trigger and audio signals were low-pass filtered with a cut-off frequency of 10 kHz (custom-built filter, Thor Labs, Newton, NJ, USA) and digitized with a 16 bit AD converter (NI USB 6259, National Instruments, Austin, TX, USA) at a sampling frequency of 48 kHz, and saved as binary files in MATLAB's MAT format. The microphone signal was corrected for delay owing to the velocity of sound in air with respect to the other recorded signals. The analogue video output from the VKG was digitized with an audio-video capturing device (Intensity Extreme, Black Magic Design Pty Ltd, Port Melbourne, Victoria, Australia) together with the microphone and trigger signals (on the two respective audio channels), and saved as AVI files. We recorded data with three separate systems, i.e. the VKG, the digitized signals and the HS4 camera, resulting in three different file data streams. To synchronize the data stream of these systems in time, we measured the three time delays in our system: (1) time delay d1: an intrinsic delay in the VKG system between the video and audio signal by design of the VKG system; (2) time delay d2: time between triggers in the VKG AVI file time and MAT file; and (3) time delay d3: intrinsic delay between trigger and capture of the high-speed camera. Delay d1 was determined prior to experiments for each subject. An LED was connected to the trigger signal and filmed with the VKG system and high-speed camera. The synchronization runs consisted of four consecutive frequency sweeps (40 to 1000 Hz) of a 1 ms wide TTL signal. To determine d1, we calculated the cross-correlation between the audio channel on the AVI file and the intensity value of the LED filmed by the VKG. The averaged time delay d1 was 42.2±3.7e–3 ms (N=47). Delay d2 was measured for each recording as the time difference between trigger onsets in AVI and MAT files. Delay d3 was determined prior to experiments. We imaged the intensity of the LED sync signal described above with the high-speed camera at 48 kHz and cross-correlated the sync signal with the light intensity signal. No delay more than one image frame duration (i.e. 21 µs) was detected. Because d3 measured less than 21 µs, we did not correct for this delay.
We induced phonation by ramping up bronchial pressure (pb) from 0 to 3 kPa at a speed of 1 kPa s−1, at the same time applying constant pressure in the experimental chamber, analogous to the interclavicular air sac pressure (pICAS) of 1 kPa. These pressure values are physiologically realistic for ring doves (Beckers, 2003; Elemans et al., 2008) and were used previously to reliably induce self-sustained vibrations in domestic pigeons (Elemans et al., 2015). We first imaged LVM vibrations in the horizontal plane (endoscopically through the trachea) with the full-frame high-speed camera at 3000 frames s−1. To facilitate automatic extraction of the glottal opening area in this orientation, we placed a light source caudally to the LVM, in order to enhance the contrast between the LVM tissues and opening in between. Second, we simultaneously imaged the LVM vibrations in the coronal plane with the high-speed camera (by transillumination) and in the horizontal plane (endoscopically) with the line-scan camera at 7200 Hz during a pressure ramp (pb=0–3 kPa @ 1 kPa s−1, pICAS=1 kPa). It was experimentally challenging to achieve the correct lighting conditions allowing LVM extraction in both the coronal and horizontal planes simultaneously. Out of eight individuals, we were able to generate high quality data in three animals that were used for further analysis.
In laryngeal voiced sound production, the EGG signal relates to the relative vocal fold contact area (rVFCA). Because the sound-producing vibrating structures in birds (i.e. LVMs) are not termed vocal folds, we will use the abbreviation LVMCA in this paper. To verify the physiological relevance and reliance of the EGG signal in birds, we estimated LVMCA from our combined endoscopic (horizontal plane) and transillumination (coronal plane) imaging data (Fig. 1).
First, we investigated the presence of dorsoventral (DV) LVM waves in the horizontal plane using the tracheal endoscopic view. We extracted the lateral distance between the two LVMs as a function of DV position and time (Fig. 2), i.e. a glottovibrogram (Karakozoglou et al., 2012; Lohscheller and Eysholdt, 2008). To determine whether opening/closing events travelled as a wave, we calculated the normalized distance between the LVMs for all DV positions as a function of the phase of the cycle (Fig. 2C). We calculated a linear regression of the phase at maximal opening across the DV positions (Fig. 2D), which relates to the velocity v of a travelling wave in the DV direction. As imaging was done simultaneously in the coronal plane and the horizontal plane, we could use structures seen in both planes to calibrate the endoscopic view from the calibrated coronal view.
Second, we investigated LVM contact area in the coronal plane using the transilluminated view. Obtaining high quality transillumination imaging with clearly visible LVMs was very sensitive to lighting conditions, and we included only experimental runs where the LVMs were easily seen. Even so, automatic extraction of LVM shape was not possible and therefore we manually scored caudal–cranial contact between the two LVMs. To calculate the caudo-cranial height of tissue contact (H), we visually determined the most cranial and most caudal point of contact between the LVMs in 580 frames of 30 oscillations.
LVMCA was approximated as the product of H and the DV length (L) of the LVM. As the contact between the LVMs results from the transillumination projection, we use the term projected LVM contact area (pLVMCA). Ten consecutive cycles were investigated from three specimens (at pICAS=1 kPa, pb=2 kPa). Furthermore, we calculated the closed quotient for these cycles from the pLVMCA signal as the ratio between the duration for which the syrinx was closed (i.e. pLVMCA>0) and the duration of the entire cycle. To avoid observer bias regarding correlation to EGG, the observer was not able to see the synchronized EGG signal during this scoring process.
In human laryngeal EGG analysis, the changes in value of the EGG signal, i.e. the first derivative of the EGG signal (dEGG), are often used to approximate key events within a vibratory cycle where contact between the paired vibrating vocal fold tissue is gained or lost (Childers and Krishnamurthy, 1985; Herbst et al., 2014). Therefore, we extracted both maxima and minima of the dEGG signal for each of the 10 analysed vibratory cycles per specimen.
fo extraction from EGG and LVM motion
To extract the fo of LVM vibrations in the coronal plane, we computed digital kymograms (DKG) from the high-speed imaging data to observe periodicity in the vibrations (Švec and Schutte, 1996). From the DKG we extracted the LVM vibration period, and thereby the frequency, of the LVM. The fo of the EGG signal was extracted using autocorrelation.
Wavegram visualization of EGG signals
To be able to follow the development of the EGG waveform representing within-cycle vibration kinematics during dynamic pressure ramps, we constructed wavegrams of the EGG signals (Herbst et al., 2010). In brief, a wavegram isolates individual cycles of the EGG or dEGG signal and normalizes the duration and amplitude of each cycle. The time-varying normalized signal amplitude within each individual cycle is then colour coded. The resulting strips of pixels are consecutively aligned from bottom to top along the y-axis, where the position on the x-axis represents the respective time coordinate (i.e. overall time is mapped onto the x-axis) and the position of the y-axis represents normalized intra-cycle time.
No sample sizes were computed before the experiments. A technical replicate is a replicate of the measurement on the same preparation, and a biological replicate is an individual. Data are presented as means±1 s.d. No outliers were excluded.
Testing the EGG technique required measuring the contact area between the vocal folds (VFCA). Previously, we imaged the inner edges of the LVM in the pigeon syrinx during self-sustained vibrations using transillumination (Elemans et al., 2015). This technique allowed us to look through the tissue, analogous to an X-ray image. Because this technique resulted in a projected image, motion and different positions of the LVM perpendicular to the coronal plane are superimposed and thus averaged in the coronal projection. This may lead to (1) a decreased precision of the LVM position in the coronal projection, but also (2) an oversimplification of the physical processes at play as we would miss the presence of DV waves. Establishing the presence of DV waves is thus essential to better understand the physical progresses but also to increase the accuracy of the LVM positioning and estimation of LVMCA.
Dorsoventral syringeal vibrations
Therefore, we first tested the presence of DV oscillatory modes during sound production. We induced sound production in the syrinx of domestic pigeons in vitro and quantified LVM vibration dynamics in the horizontal plane using tracheal endoscopic high-speed imaging in 588 opening/closing events in three different animals (Fig. 2). To detect phase differences along the DV axis, we measured the distance between the LVMs within each oscillatory cycle along the DV axis (Fig. 2C) and regressed the phase of the maximal LVM distance relative to the maximal opening at midline along the DV axis (Fig. 2D). If the LVMs would open, for example, in a zipper-like mode as sometimes observed in humans (Hess and Ludwigs, 2000) and elephants (Herbst et al., 2013), the maximal opening would shift in phase along the DV axis.
Over a range of vibration frequencies, the maximum phase delay did not exceed 1.2 deg mm−1 (length of LVM=5.0±0.2 mm) and was <0.8 deg mm−1 for 95% of all openings in all animals (Fig. 2). The peak of the normal distribution was 0.083±0.15, 0.22±0.19 and 0.07±0.53 deg mm−1 for the three animals, respectively. In conclusion, our data show that the projected inner surface of the LVMs represents the LVM phase in the coronal projection within 0.8 deg mm−1 for 95% of all openings. Furthermore, our data show that at any point along the DV axis the LVMs moved in phase. Therefore, DV travelling waves were not considered to be present.
EGG as a predictor of LVM vocal fold contact area
Because DV waves were absent, we estimated the LVMCA from the coronal projection alone and did this over 10 consecutive oscillatory cycles in three animals (Fig. 3). The pLVMCA changed magnitude consistently within the oscillatory cycle for all three individuals. LVMCA sharply increased after glottal closure up to a maximum pLVMCA of 10.7±0.6 mm2 (N=3 animals) (range: 10.2–11.3 mm2) at 0.6±0.1 ms (range: 0.49–0.71 ms) after first contact of the LVMs. The closed quotient (part of the oscillation where the syrinx is closed as determined by transillumination) was not significantly different between individuals (one-way ANOVA, P=0.402) and measured 0.47±0.03 [N=3, raw data: 0.47±0.02, 0.46±0.03 and 0.48±0.03 (N=10 technical replicates) for the three biological replicates, respectively].
Next, we tested whether the EGG technique can predict essential parameters quantifying syringeal kinematics such as LVMCA, timing of opening and closing events, and fo, and thus whether EGG can be ultimately used as a viable tool for examining syringeal dynamics in vivo. The overall shape of the pLVMCA and EGG signal consisted of a steep increase at first contact followed by a variable decrease until complete loss of contact between the LVMs in all three animals (Fig. 3A,C,E), but their relative differences over the entire vibration cycle varied between −44% and 76% (Fig. 3B,D,F). At LVM closure, first contact between the LVM was associated with a sharp increase in the EGG signal (in these examples within 0.6 ms) and a peak in the dEGG occurred at −0.10±0.10, −0.12±0.09 and 0.61±0.12 ms (N=10) after contact for the three animals, respectively. At LVM opening, we observed a time difference between the opening event and the minimum dEGG of −1.45±0.14, −0.89±0.11 and 0.48±0.27 ms, respectively. From these values we computed contact quotients (part of the oscillation where the syrinx is hypothesized to be closed, based on the EEG signal) of 0.24±0.01, 0.47±0.02 and 0.40±0.01 (N=10) for the three animals, respectively.
Previous research in a canine excised larynx revealed that the dEGG minima and maxima do not precisely coincide with the events of glottal opening and closure (Herbst et al., 2014), corroborating the notion that glottal closure and opening constitute different physiological events than vocal fold contacting and decontacting (Herbst et al., 2017b). It is therefore not surprising that a number of different algorithms for (arbitrarily) defining the (de)contacting events within the EGG signal have been suggested. In one of those, the opening event in the EGG signal has been defined arbitrarily as the time where the normalized VFCA has dropped to three-sevenths of its maximum value within a cycle (Howard, 1995; Howard et al., 1990). This criterion is met −1.10±0.19, −0.13±0.09 and 0.23±0.26 ms after opening for the three animals, respectively, thus leading to an improvement of 0.25 to 0.76 ms compared with the dEGG concept. Using the three-sevenths approach resulted in contact quotients of 0.32±0.01, 0.46±0.01 and 0.43±0.01 (N=10) for the three animals, respectively.
EGG as a predictor of LVM fo
Next, we tested whether the EGG signal can predict the oscillatory frequency of the LVMs, by extending our analysis using digital kymographs that visualize multiple vibrations (Fig. 4). We measured the elapsed time between consecutive closing events on digital kymograms and EGG peaks (Fig. 4BC) over a range of fo values resulting from different bronchial pressure excitations. The data show a very strong correlation between the fundamental frequency of the EGG signal and the oscillatory frequency of the LVMs (slope=0.9603, 0.9889 and 0.8786 and R2=1.0, 1.0 and 0.8775 for the three animals, respectively, and R2=0.991 for all animals combined; Fig. 4D).
EGG as a predictor of oscillatory modes
Taken together, the EGG signal fairly accurately predicted several landmark events in the LVMCA oscillatory cycle of the LVMs, such as fo and syringeal closure. Building on these findings, we followed the development of the EGG signal over time during a dynamic phonation situation using wavegrams (Herbst et al., 2010) (Fig. 5). We subjected the syrinx to a pb ramp from 0 to 2 kPa (1 kPa s−1 ramp speed) at constant air sac pressure of 1 kPa. This quasi-steady excitation resulted in a range of oscillatory frequencies from 80 to 160 Hz. The wavegram of one individual suggested that two vocal modes were present with distinct EGG wave shapes (Fig. 6): mode 1 was present when transmural pressure was <0.3 kPa from the phonation onset to 1.5 s into the run and also from 2.8 s to the end, and mode 2 was present in between, when transmural pressure was >0.3 kPa. The fo ranged from 70 to 125 Hz and 125 to 150 Hz for modes 1 and 2, respectively. We confirmed the existence of distinct syringeal oscillatory modes using our endoscopic imaging data of LVM motion (Fig. 6F). Distinctly different modes were observed with contact quotients of 0.94 and 0.72 for modes 1 and 2, respectively.
We show that EGG can be used to estimate syringeal vibratory kinematics in birds in vitro. The relative LVMCA is followed within a cycle, leading to accurate predictions of LVM vibration fo, and to approximation of landmark events, such as syringeal closing and opening event timing, during sound production in pigeons. Determining the timing of opening events from the EGG signal improved using different combinations of dEGG and EGG signal magnitudes as also used in human EGG studies (Howard, 1995; Howard et al., 1990). However, in analogy to what has been shown for humans (Echternach et al., 2010; Herbst et al., 2017b; Lã and Sundberg, 2015), without rigorous experimental validation as presented here, care should be taken when presenting contact quotients (Herbst et al., 2017b) based on the EGG signal alone. Furthermore, investigation with a larger number of syringes, potentially from different species, is required to determine which algorithm for estimating the EGG contact quotient best matches the respective closed quotients. The distinction between EGG contact quotient and closed quotient is essential, because only the latter has a true causal relationship to the laryngeal and syringeal sound generation events.
Compared with EGG signals from the human larynx, we did not observe a baseline during the part of the cycle without tissue contact (Rothenberg, 1992; Scherer et al., 1988). Furthermore, we observed additional smaller peaks in both EGG and dEGG signals. Firstly, these disparities could arise from phase distortions within the acquired signal. Secondly, the phenomenon could be due in part to differences in electrode design and placement in birds versus humans (3 cm diameter gold-plated disc electrodes placed on the neck versus 26-µm diameter wire nichrome electrodes placed directly on the LVMs) and variation in electrode placement between individuals. For the EGG signal to be linear and unbiased, the electric field lines between the electrodes must have uniform density across the area of interest. This is accomplished in humans by having electrodes of a size not smaller than the area wherein the tissue collisions happen (Titze, 1990). In our experiments the electrodes were located caudo-cranially, close to where the first contact happens during a cycle (Elemans et al., 2015). During the cycle, the caudo-cranial location of tissue contact moves cranial before the decontacting event. This means that the electric field lines will have a higher density at the location of contact than at the location of decontacting. Placing the EGG electrodes further away from the syrinx and increasing the size of the electrodes to improve linearity of the EGG signal is unfortunately not possible as the syrinx in vivo is suspended in air inside the interclavicular air sac. Thirdly, in addition to the LVMs, the medial tympanic membranes are oscillating during sound production in pigeons. These thin structures are located caudally to the LVMs on the medial side of the primary bronchi, stretching caudally from the pessulus (Goller and Larsen, 1997a; King, 1989). Future studies may evaluate whether and how their vibrations affect the shape of the EGG signal.
Furthermore, we show that distinct EGG waveforms in the wavegrams reflected distinct vibratory regimes, which implies that the EGG signal has the potential to detect distinct vibratory regimes of the LVMs. The transmural pressure in pigeons during sound production has not been directly measured in vivo. However, measurements and models (Beckers, 2003; Elemans et al., 2008) suggest that transmural pressures of 1–2 kPa during dynamic vocal situations are possible in ring doves, a close relative of the pigeon. Driving the syrinx with this physiologically reasonable pressure range, we were able to induce LVM vibrations from approximately 50 to 250 Hz. LVM dynamics showed two distinct vibratory modes in this range, potentially analogous to different laryngeal mechanisms in humans (Henrich, 2006). The fo range for domestic pigeon vocalization is 150–400 Hz (Goller and Larsen, 1997a). This range suggests that pigeons primarily use the second observed oscillatory regime (fo>125 Hz), a hypothesis that remains to be tested in vivo. In support of this notion, it seems physiologically reasonable that pigeons have both or more oscillatory regimes at their disposal in vivo, and comparable modes have also been observed in anaesthetised crows (Jensen et al., 2007).
The unique possibility of visualising LVM shape by transilluminating the pigeon syrinx, together with the absence of dorsoventral LVM waves, allowed us to quantify LVMCA for the first time in birds. The syringeal LVMCA waveform in pigeons demonstrates several similarities to laryngeal VFCA measurements. First, as in the human and canine larynx, a sharp increase in the amount of tissue contact follows initial contacting (Boessenecker et al., 2007; Doellinger and Berry, 2006; Herbst et al., 2014). Just as observed here, the maximal tissue contact is reached within 1 ms after initial tissue contact. In the three animals investigated here, we found similar pLVMCA waveforms within the oscillatory cycle and peak values of around 11 mm2 (with H=2.2 mm and DV length=5.0 mm). Second, we found closed quotients of around 0.7, which is at the upper limit of what is found in humans and dogs (Herbst et al., 2009; Howard et al., 1990; Verdolini et al., 1998a,b), but below values of 0.82 measured in an elephant larynx (Herbst et al., 2013). In addition to the data presented here, only very few measurements of absolute VFCA have been reported. To our knowledge there are no absolute VFCA measurements reported for human vocal folds, only for dogs (Gunter, 2003; Jiang and Titze, 1994). Different techniques have been used to determine VFCA in dogs and deer (Hampala et al., 2016; Herbst et al., 2017a; Jing et al., 2017; Shau et al., 2001). However, because we can image LVM dynamics simultaneously from different angles, birds provide an excellent model system to study the kinematics and control of voiced sound production.
Summarizing, we think that EGG could be a promising tool for predicting essential motion parameters of syringeal vibration in birds in vivo. Here, we studied syringeal vibratory kinematics in pigeons in vitro, which allowed for the quantification of LVMCA. This approach will be more challenging in small songbirds, such as zebrafinches, where imaging using tracheal endoscopy results in lower spatial and temporal resolution and is experimentally challenging even in vitro, and the dense syringeal musculature limits transilluminating the syrinx for observing caudocranial waves (Elemans et al., 2015). Nevertheless, we think that EGG can be an interesting alternative to, for example, bronchial flow measurements for investigating song lateralization of the two individually controlled sound generators in songbirds. Also, by combining EGG measurements with phenomenological or simplified lumped-mass models, as regularly done in humans (Ishizaka and Flanagan, 1972; Story and Titze, 1995), it may be possible to infer other critical data such as vocal fold position and shape, larynx position and even muscle activation to assess syringeal motor control.
The authors wish to thank T. Christensen, S. Jakobsen and P. Martensen for technical support.
Conceptualization: J.H.R., C.T.H., C.P.H.E.; Methodology: J.H.R., C.T.H., C.P.H.E.; Software: J.H.R., C.T.H., C.P.H.E.; Validation: J.H.R., C.P.H.E.; Formal analysis: J.H.R., C.T.H., C.P.H.E.; Investigation: J.H.R., C.T.H., C.P.H.E.; Resources: J.H.R., C.P.H.E.; Data curation: J.H.R., C.P.H.E.; Writing - original draft: J.H.R., C.P.H.E.; Writing - review & editing: J.H.R., C.T.H., C.P.H.E.; Visualization: J.H.R., C.P.H.E.; Supervision: C.P.H.E.; Project administration: C.P.H.E.; Funding acquisition: C.P.H.E.
This research has been supported by Danish Research Council (Natur og Univers, Det Frie Forskningsråd) and Carlsberg Foundation (Carlsbergfondet) grants to C.P.H.E. and an ‘APART’ grant received from the Austrian Academy of Sciences (Österreichische Akademie der Wissenschaften) to C.T.H.
The authors declare no competing or financial interests.