Although the call repertoire and its communicative function are relatively well explored in Japanese macaques (Macaca fuscata), little empirical data are available on the physics and the physiology of this species' vocal production mechanism. Here, a 6 year old female Japanese macaque was trained to phonate under an operant conditioning paradigm. The resulting ‘coo’ calls and spontaneously uttered ‘growl’ and ‘chirp’ calls were recorded with sound pressure level (SPL) calibrated microphones and electroglottography (EGG), a non-invasive method for assessing the dynamics of phonation. A total of 448 calls were recorded, complemented by ex vivo recordings on an excised Japanese macaque larynx. In this novel multidimensional investigative paradigm, in vivo and ex vivo data were matched via comparable EGG waveforms. Subsequent analysis suggests that the vocal range (range of fundamental frequency and SPL) of the macaque was comparable to that of a 7–10 year old human, with the exception of low intensity chirps, the production of which may be facilitated by the species' vocal membranes. In coo calls, redundant control of fundamental frequency in relation to SPL was also comparable to that in humans. EGG data revealed that growls, coos and chirps were produced by distinct laryngeal vibratory mechanisms. EGG further suggested changes in the degree of vocal fold adduction in vivo, resulting in spectral variation within the emitted coo calls, ranging from ‘breathy’ (including aerodynamic noise components) to ‘non-breathy’. This is again analogous to humans, corroborating the notion that phonation in humans and non-human primates is based on universal physical and physiological principles.
Humans and non-human primates (together with other mammals) are believed to share a universal mechanism of phonation (laryngeal sound production), governed by the myoelastic aerodynamic (MEAD) principle (van den Berg, 1958; Titze, 2006; Herbst, 2016). Steady airflow, coming from the lungs, is converted into a sequence of airflow pulses by the passively vibrating vocal folds (and/or other laryngeal tissues), resulting in self-sustaining oscillation. The acoustic pressure waveform generated by this sequence of flow pulses excites the vocal tract, which filters them acoustically, and the result is radiated from the mouth (and/or the nose) (Story, 2002). This phenomenon, combining the individual contributions of the laryngeal sound source and the vocal tract to determine the quality of the emitted sound, is termed the source–filter theory of sound production (Fant, 1960; Chiba and Kajiyama, 1941; Taylor et al., 2016; Fitch and Hauser, 1995) and its non-linear extension (Titze, 2008; Flanagan, 1968; Rothenberg, 1981).
In human speech and singing, the physics and physiology of phonation and the respective detailed motor control are relatively well investigated, owing to several decades of research in vivo (Baken and Orlikoff, 2000), ex vivo (Döllinger et al., 2011) and in silico (Kob, 2003; Story, 2002). In contrast, much less is known about the actual physical and functional and/or physiological framework of in vivo sound production in non-human mammals. The non-human vocal system is typically treated as a ‘black box’, and its function is inferred from the acoustic output alone. This is true, for instance, for the vocalization of Japanese macaques (Macaca fuscata Blyth 1875). Ever since Itani's groundbreaking work (Itani, 1963), the investigation of this species' vocal communication has received widespread attention (Le Prell and Moody, 1997; Beecher et al., 2008; Blount, 1985; Katsu et al., 2016; Green, 2010; Tokuda et al., 2002; Machida, 1990; Masataka, 2010; Owren et al., 1992; Sugiura, 2008; Bouchet et al., 2017; Koda, 2004). However, most studies have typically focused on the acoustic description and classification of calls, to be regarded in a motivational and social context.
The purpose of this study is thus to provide physiological evidence concerning laryngeal in vivo sound production in Japanese macaques. Addressing the hypothesis that humans and non-human primates share universal sound production principles, the gathered data were compared with that of humans, in order to demonstrate detailed functional similarities.
The compliance of humans with measurement protocols allows for in vivo documentation of a number of physical and physiological key variables of human speech production and singing, such as subglottal and/or tracheal air pressure (Schutte, 1980; Finnegan et al., 1998), glottal airflow (Rothenberg, 1977; Stathopoulos and Weismer, 1985), laryngeal configuration (Herbst et al., 2011; Södersten et al., 1995), vocal tract geometry (Echternach et al., 2008; Story et al., 2003) or the kinematics of vocal fold vibration (Hertegard, 2005; Deliyski and Hillman, 2010; Lohscheller and Eysholdt, 2008). Unfortunately, most of the involved investigative methods are somewhat uncomfortable or invasive, which makes application to non-human primates a challenge.
A non-invasive alternative for assessing the dynamics of laryngeal vocal fold vibration during sound production is electroglottography (EGG) (Baken, 1992; Fabre, 1957). A high frequency, low intensity current is passed between two electrodes attached to either side of the skin at the side of the thyroid cartilage at the level of the vocal folds (see Fig. 1A). The measured admittance variations are largely proportional to the time-varying vocal fold contact area (Hampala et al., 2016), thus providing detailed physiological information on vocal fold vibration. A schematic model of a stereotypical EGG signal for one vibratory cycle of the vocal folds in humans is shown in Fig. 1B (Berke et al., 1987; Baken and Orlikoff, 2000). The landmarks in that illustration are identified as follows: (a) initial contact of the lower vocal fold margins; (b) initial contact of the upper vocal fold margins; (c) maximum vocal fold contact reached (glottis not necessarily fully closed); (d) de-contacting phase by separation of the lower vocal fold margins; (e) upper margins start to separate; and (f) glottis is open and the contact area is at its minimum.
Several approaches exist for extracting quantitative information from the raw EGG signal (Rothenberg and Mahshie, 1988; Orlikoff, 1991; Baken and Orlikoff, 2000). These are loosely correlated with physical key phenomena of vocal fold vibration, but need to be interpreted with care (Herbst et al., 2017, 2014).
Although EGG, thanks to its relatively inexpensive and non-invasive nature, has seen wide application in human voice science, surprisingly, only one pilot study has been conducted on non-human primates (Brown and Cannito, 1995). Here, we apply EGG data acquisition to in vivo phonation of a female Japanese macaque trained to vocalize on command. EGG data are complemented with sound pressure level (SPL) calibrated acoustic recordings and matched EGG data from an excised larynx preparation of a Japanese macaque ex vivo. This novel multidimensional approach allows for deeper insights into the physiological and physical nature of voice production in this species.
MATERIALS AND METHODS
Data acquisition in vivo
In vivo data acquisition was performed at the Primate Research Institute, Inuyama, Aichi, Japan. All procedures were approved by the ethics committee of the Primate Research Institute of Kyoto University (number 2015-014, 2016-103), with compliance to the Guide for the Care and Use of Laboratory Primates (third edition, the Primate Research Institute, Kyoto University, 2010). The subject animal was a 6.5 year old female Japanese macaque, with a resting vocal fold length of approximately 7.7 mm, as measured from a computerized tomography (CT) scan with a spatial resolution of 0.35×0.35 mm and a slice interval of 0.2 mm.
The animal had been trained over a period of 6 months for another research project (H.K., T.K. and T.N., unpublished) to sit in a custom-made monkey chair wearing a special-purpose jacket (Fig. 1A). Using an operant conditioning approach, the animal was rewarded when producing ‘coo’ calls after presentation of a visual and auditory stimulus. In addition to these trained responses, we also recorded a number of spontaneous calls (see below). For the purpose of this work, a total of three recording sessions, each lasting approximately 50 min, were conducted over a period of 8 days.
EGG signals were recorded with a VoceVista electroglottograph (Roden, The Netherlands). The EGG electrodes were embedded into the collar of a special-purpose jacket that was worn by the animal during data acquisition (see Fig. 1A). In this setup, head movement of the animal resulted in intermittent contact loss between the electrodes and the individual's neck in approximately 60% of all recorded signals. EGG signals were only considered for further analysis if two conditions were fulfilled: (1) a cyclical EGG signal at a fundamental frequency (fo) corresponding to that of the acoustic signal (checked through inspection of respective spectrograms) was present; and (2) there was no evidence of clipping in the acquired EGG signal.
The acoustic signal was recorded with a Sennheiser MKE platinum-C microphone (Sennheiser Electronic GmbH & Co. KG, Wedemark, Germany). The microphone was placed at a fixed distance of 10 cm from the animal's mouth. SPL levels were calibrated with C frequency weighting for a distance of 30 cm using an SPL meter (ATL SL-8851, ATP Instrumentation Ltd, Ashby-de-la-Zouch, UK), applying method 5 from Svec and Granqvist (2017). Background noise levels were measured at 55.3 dB(C).
Both the EGG and the acoustic signal were simultaneously digitized at a sampling frequency of 48 kHz with a Tascam audio interface (US-144KMII, TEAC America Inc., Montebello, CA, USA). The digitized signals were recorded using the software Audacity (http://www.audacityteam.org/) and stored as 16-bit uncompressed stereo .wav files.
Data acquisition ex vivo
Data acquisition ex vivo was conducted at the Department of Cognitive Biology, University of Vienna, Austria. No ethical approval was required. The larynx was from a female Japanese macaque (mass=7.4 kg, head–body length without tail=72.6 cm) that had died of natural causes, acquired through the specimen acquisition program at the National Museums of Scotland. A detailed description of that specimen's preparation is provided elsewhere (Garcia et al., 2017). The resting vocal fold length was visually determined to be approximately 7.3 mm.
A previously described excised larynx setup was utilized (Herbst et al., 2014). The larynx was mounted on a vertical tube supplying heated (ca. 37°C) and humidified (100% humidity) air. For the purpose of this study, the vocal folds were adducted and elongated manually, in order to have maximum freedom for achieving vocalizations that resemble those documented in vivo.
Vocal fold vibration was documented with acoustic and EGG recordings (see Herbst et al., 2014 for details), while simultaneously measuring the subglottal driving (air) pressure. For comparative analysis of data recorded in vivo and ex vivo, EGG signals from these two scenarios were matched using the following criteria: (1) comparable fo; (2) comparable periodicity and harmonic content (nearly periodic and sinusoidal for coo calls, slightly irregular and slightly aperiodic for growls and chirps); and (3) comparable relative EGG signal level (note that the EGG signal level of chirp calls was typically approximately 15–20 dB lower than that of all other calls; see below).
fo was estimated with the Praat (Boersma and Weenink, 2017) program's autocorrelation-based algorithm [‘To Pitch (ac)…’]. Standard parameters were used, except for minimum and maximum fo, which were set to 50 and 5000 Hz, respectively. fo was estimated every millisecond, resulting in 1000 analysis data points per second.
At the time offset of each successfully estimated fo data point, two further parameters were calculated with a custom algorithm written in Python by C.T.H.: the calibrated SPL, expressed in dB(C), and the dominant frequency (fDOM) (Fischer et al., 2013), representing the frequency with the maximum amplitude within the acoustic spectrum of the analyzed signal portion. The respective source code is available online (www.christian-herbst.org/python/).
Preliminary perceptual assessment of the acoustic data suggested various degrees of ‘breathiness’ (i.e. aerodynamic noise components) in a subset of the coo calls produced in vivo. In order to assess this quantitatively, the average harmonics-to-noise ratio (HNR) was calculated for all coo calls with Praat. In particular, the function ‘To Harmonicity (ac)’ was called with standard parameters, except for the time step (1 ms) and minimum fo (50 Hz).
where the reference sound intensity I0=10−12 W m–2. Finally, the aerodynamic power, PAIR, expressed in W, was calculated as the product of the time-averaged glottal airflow and the time-averaged subglottal pressure.
A total of 448 calls were recorded in vivo, which were labeled manually according to the classification scheme provided by Green (1975), resulting in 377 coos, 31 growls, 14 chirps, and 26 transitions between coo and grunt. Whereas the coo calls were emitted as a trained response of the investigated animal, the growls and chirps were mostly spontaneous vocal emissions uttered when one of the experimenters adjusted the EGG electrodes.
An overview of analysis data for all calls is provided in Table 1. The relationship between fo and SPL for all vocalizations is depicted in Fig. 2A. Such a display, called a phonetogram (Damste, 1970) or voice range profile (VRP) (Pabon and Plomp, 1988), is a typical tool in human voice science and clinical work, utilized to obtain an overview of a person's vocal capacities. The gray diamonds and dashed lines superimposed upon Fig. 2A, allowing for a comparison between the investigated Japanese macaque and humans, are normative VRP data for children aged 7–10 years (Schneider et al., 2010).
In order to corroborate the similitude of VRP data between Japanese macaques and human children on an anatomical level, the vocal fold lengths of the Japanese macaques analyzed in vivo and ex vivo (7.7 and 7.3 mm, respectively) were compared with those of pre-pubertal children according to data from Hirano et al. (1983) (Fig. 2B). A substitution of the vocal fold lengths of the two examined Japanese macaque specimens into the linear regression through the data for children below 12 years of age (Hirano et al., 1983) suggests that comparable vocal fold lengths are found in children aged approximately 7.9 and 7.4 years, respectively.
Preliminary analysis of the coo calls suggested a systematic co-variation between fo and SPL in a large portion of the calls (see Fig. 2C for an example). This co-variation was quantified by calculating first order linear regressions between SPL and fo within all coo calls. Computing the average of all data points where the coefficient of determination, R2, was equal to or greater than 0.8 (39.3% of all cases) resulted in an average slope of 0.28 semitones per dB SPL. The semitone scale (Young, 1939) was chosen in order for the data to be comparable with those in a previous publication in humans (Gramming et al., 1988). For reference purposes, at the mean fo of all coo calls, this value would be equivalent to an increase of approximately 9.5 Hz per dB SPL.
Basic physical data for the excised larynx sound production are listed in Table 2: subglottal pressure, airflow rates, SPL and glottal efficiency. In Fig. 3, stereotypical EGG waveforms from both the in vivo condition and the excised larynx preparation are shown for all three call types. Care has been taken to find EGG waveforms that are similar both in appearance and in fo. The EGG waveforms for the growl vocalizations were mostly irregular, with residual traces of periodicity. The coo calls typically resulted in periodic EGG waveforms, approximating a sinusoidal shape in most cases (but see Fig. 5 for an important counter-example). The EGG signals of the chirps also approximated sinusoidal shapes. However, they had markedly weaker amplitudes (−26.6 dB in Fig. 3, compared with −8 dB and −11 dB for growls and coos, respectively). This suggests a lesser degree of vocal fold contact, and noise introduced by the measurement equipment had greater influence on the waveform.
In 26 out of the 448 analyzed calls, transitions between the coo and growl call types were found. These transitions typically occurred over a few glottal vibratory cycles. One such example is documented in Fig. 4: fo drops abruptly from approximately 464 Hz to approximately 190 Hz, whereas the EGG waveform abruptly alternates between two distinct shapes around t=280 ms in Fig. 4D.
The average HNR of all coo calls is plotted against the respective average SPL in Fig. 5. The data in panel A suggest an overall trend for HNR to be lower in softer calls. A stereotypical example of a coo call characterized as ‘breathy’ (including aerodynamic noise components) by the experimenters is further analyzed in panels B and C. The spectrogram of the acoustic signal contained only three harmonics above noise level, and the respective EGG waveform was quasi-sinusoidal, containing considerable noise. In contrast, the acoustic signal of a stereotypical coo call characterized as ‘non-breathy’ (panels D and E) contained 12 harmonics above the noise floor, and the corresponding EGG waveform was devoid of visible noise components, resulting in a pronounced wave shape."
This study introduces a new multidimensional investigative paradigm to the fields of primatology and animal bioacoustics: controlled in vivo experiments with accompanying excised larynx experimentation, linked through matched EGG waveforms as a physiological ‘ground truth’. In this manner, advantages from both approaches can be combined. The in vivo setup, thanks to calibrated microphone signals and a controlled mouth-to-microphone position, facilitates assessment of SPLs of targeted call types (see Fig. 2). The supplemented data from the excised larynx experiment allow for the estimation of physical and physiological voice production parameters (see Table 2), which are difficult to obtain in vivo. In this approach, EGG data provide the key evidence through which the two setups (in vivo versus excised larynx) are linked. Although in the current study, larynges of two different animals were examined in vivo and ex vivo, future investigations could, given logistical and ethical feasibility, utilize the same animal in both setups to control for variation in laryngeal anatomy between animals.
The three investigated call types, growls, coos and chirps, had distinct fundamental frequencies and were well separated within the generated phonetogram (Fig. 2). The growl and coo calls were well aligned within normative VRP data published for 7–10 year old children (Schneider et al., 2010) (but note the greater sound levels of the growl vocalizations in comparison with the respective phonations of children around 200–250 Hz). However, even the higher frequencies of the chirps (fo≈3 kHz) can be sung by some children of that age, but typically only at high vocal intensities (C.T.H., personal observation). The VRP comparison is, however, limited by the fact that the VRP data of the children were acquired via instructions to continuously and fully cover their entire voice range (i.e. reaching the minima and maxima of both sound level and fo), whereas the data from the Japanese macaque were acquired through the operant conditioning approach without such restrictions. The actual voice range of the Japanese macaque could thus be greater than that indicated by the collected data. Furthermore, although the children's VRP is continuous, the Japanese macaque's VRP is not, owing to the different methods of data acquisition. Therefore, it cannot be determined whether areas in the Japanese macaque's VRP that are not covered by our current data from growls, coos or chirps (e.g. the frequency region between 750 and 1700 Hz) constitute evidence that the animal would not have the ability to produce sounds at those frequencies and sound levels.
A recent comparative allometric study showed that vocal fold length is a good predictor for the minimum fo across 11 non-human primate species (Garcia et al., 2017). The resting vocal fold length of the Japanese macaques investigated here in vivo and ex vivo was approximately 7.7 and 7.3 mm, respectively. Hirano et al. (1983) found comparable vocal fold lengths for children aged approximately 6–10 years (see Fig. 2B). This evidence thus strongly suggests that the similar fo ranges of the examined Japanese macaque and 7–10 year old children are determined by comparable vocal fold length. This would imply that the string model approximation (Eqn 4) applies to both humans and non-human primates (see also Riede, 2010), supporting the hypothesis of universal sound production principles.
The similarity between the primate and the human vocal organ is also seen when assessing dynamical aspects of fo control. We found an fo increase of approximately 0.28 semitones per dB SPL. This value is comparable to data from humans, where an increase of approximately 0.4 semitones per dB SPL was found (Gramming et al., 1988). In analogy to the argument made in that study (Gramming et al., 1988) and building on previous research in humans, we hypothesize that subglottal pressure (van den Berg and Tan, 1959; Titze, 1989) is a major influencing factor for fo control in Japanese macaque vocalizations (the other being vocal fold tension; Titze et al., 2016), thus further demonstrating the physiological commonality between Japanese macaques and humans. Rigorous testing of that hypothesis with excised larynx experiments is, however, required.
Normative VRP data from humans suggest that the upper fo limit can typically only be reached at maximum SPLs (Sulter et al., 1994), suggesting high subglottal pressures (Schutte, 1980). In contrast, the investigated Japanese macaque's chirp vocalizations in vivo were produced at relatively low SPLs, a phenomenon which deserves further discussion. We hypothesize that these low SPLs were facilitated by the presence of vocal membranes (sometimes called ‘vocal lips’) in the laryngeal anatomy of the Japanese macaque, i.e. thin upward extensions of the vocal folds with little mass (Fitch, 2002; Schön Ybarra, 1995; Mergell et al., 1999). Unfortunately, we were unable to duplicate these softer chirp vocalizations in the one specimen examined in the excised larynx setup. Further investigation with excised larynx experiments and computational modeling is thus necessary to substantiate this hypothesis.
Exemplary EGG evidence suggested distinct differences in vocal fold vibration patterns for the three call types. The sinusoidal waveforms of the coo calls in Figs 3 and 5C, as well as the chirp call in Fig. 3, are comparable to EGG data from humans phonating in the so-called falsetto register (thyroarytenoid muscle not contracted) with a low degree of vocal fold adduction (Herbst et al., 2017), regularly resulting in a posterior glottal gap and breathy phonation (Sundberg, 1995). This class of EGG signals typically has a low signal amplitude owing to the lack of vocal fold contact.
Interpretation of the other EGG waveforms, including those presented in Figs 3A and 5E, is more difficult because they do not clearly match stereotypical waveforms known from humans. This can be attributed to potential differences in laryngeal anatomy between humans and Japanese macaques. In EGG, the complex three-dimensional (de-)contacting pattern of the vocal folds is reduced to a one-dimensional value, reflecting the time-varying relative vocal fold contact area. Consequently, anatomically induced differences of vocal fold geometry are reflected in the resulting EGG waveform. Further excised larynx experiments with acquisition of simultaneous EGG and high-speed video data are thus necessary to better facilitate interpretation of EGG waveforms in Japanese macaques and other primate species.
This limitation notwithstanding, EGG was quite useful for revealing the dynamics of laryngeal sound generation in vivo. This is perhaps best seen in Fig. 4, where a transition from coo to growl is documented. The EGG evidence reveals an abrupt transition between two distinct states of vocal fold vibration, occurring over the course of about five vibratory cycles. Several insights can be gained from this example: (1) the cause for acoustic differences between these call types is clearly laryngeal, similar to different laryngeal mechanisms in human voice registers (Henrich, 2006); (2) the suddenness of the change between the two call types is evidence for the presence of a bifurcation, i.e. an abrupt change between vibratory states of a non-linear system when gradually varying boundary conditions (Fitch et al., 2002; Herzel et al., 1998); and (3) as expected from a bifurcating system, the two vibratory phenomena do not coexist.
Some of the softer coo calls had a pronounced breathy perceptual quality, as noticed by the experimenters. This phenomenon, which is spectrally characterized by fewer noteworthy harmonics and the appearance of high-frequency noise components, was quantified by calculation of the HNR (see Fig. 5A). Acoustically, the coo calls with lower HNR (see Fig. 5B,C for a stereotypical example) typically had only about two to five harmonics above the noise floor. The respective EGG signals assumed a sinusoidal wave shape, with superimposed noise. As mentioned above, this is analogous to breathy phonation in the falsetto register in humans (Herbst et al., 2017) and strongly suggests that those breathy coo vocalizations were produced with incomplete glottal closure, allowing turbulent airflow to occur, thus causing the audible noise components and giving the perceptual impression of breathiness.
The breathy coo vocalizations were contrasted by non-breathy coo vocalizations, which typically had higher HNR values. The corresponding EGG waveforms were less noisy, deviated from a sinusoidal shape, and bore indicators of vocal fold contacting and de-contacting events, suggesting a greater degree of vocal fold adduction than in the breathy calls. However, as mentioned above, without clearly established landmarks for EGG signals in Japanese macaques, further interpretation requires caution.
Overall, the physiologically based EGG evidence strongly suggests that the investigated macaque varied its glottal configuration while producing the variety of coo calls in vivo. This is, to our knowledge, a novel finding that has not yet been documented at the laryngeal level for vocalizations in non-human primates and other mammals. Laryngeal modification of the voice timbre (i.e. the spectral composition of the sound source) via the degree of glottal adduction would provide an animal with an additional dimension for voice quality modification, potentially allowing macaques to encode arousal and/or valence states in a social communicative context, analogous to what has been shown for humans when using breathy voice in speech (Gobl and Ní Chasaide, 2003; Ishi et al., 2010; Miyazawa et al., 2017).
This study has a few limitations that are worth mentioning. This is a two-subject study, so findings may not be generalized without further evidence. The larynx utilized for the ex vivo experiments was not flash-frozen post mortem (Chan and Titze, 2003), which might have altered the biomechanical tissue properties, thus explaining some surprisingly high values for subglottal pressure and airflow (see Table 2). Repetition of the experiments with flash-frozen larynx specimens is thus warranted.
A novel multidimensional investigative paradigm was introduced with this study: controlled in vivo data acquisition, supplemented by ex vivo recordings from an excised larynx setup, linked via matched EGG waveforms. The data from these experiments, although from only two animals, provide a number of new insights into the sound production of Japanese macaques. When considering growls, coos and chirps, the vocal range of the investigated adult Japanese macaque was comparable to that of a 7–10 year old human, with the exception of low intensity chirps, the production of which may be facilitated by the species' vocal membranes. In coo calls, dynamic control of fo in relation to SPL was also comparable to that in humans. EGG evidence suggested that growls, coos and chirps were produced by distinct laryngeal vibratory mechanisms, analogous to those of humans. EGG data also revealed that the investigated Japanese macaque most likely varied the degree of vocal fold adduction, resulting in variations of the spectral characteristics within the emitted coo calls, ranging from breathy to non-breathy. This is again analogous to what is found in humans, further corroborating the hypothesis that humans and non-human primates share universal physical and physiological principles of vocal production, governed by the MEAD principle.
Conceptualization: C.T.H., H.K., T.N.; Methodology: C.T.H., H.K., T.K., M.G., T.N.; Software: C.T.H., H.K.; Validation: C.T.H.; Formal analysis: C.T.H.; Investigation: C.T.H., H.K., T.K., J.S., M.G., T.N.; Resources: C.T.H., H.K., J.S., W.T.F., T.N.; Writing - original draft: C.T.H.; Writing - review & editing: C.T.H., H.K., M.G., W.T.F., T.N.; Visualization: C.T.H.; Supervision: C.T.H., W.T.F., T.N.; Project administration: T.N.; Funding acquisition: T.N.
This research was supported by an APART grant from the Austrian Academy of Sciences (awarded to C.T.H.), a postdoctoral fellowship from the Fyssen Foundation (awarded to M.G.), and the Japan Society for the Promotion of Science KAKENHI grants no. 16H04848 (awarded to T.N.) and no. 4903, JP17H06380 (awarded to H.K.).
The authors declare no competing or financial interests.