Adult mice emit ultrasonic vocalizations (USVs), sounds above the range of human hearing, during social encounters. While mice alter their vocal emissions between isolated and social contexts, technological impediments have hampered our ability to assess how individual mice vocalize in group social settings. We overcame this challenge by implementing an 8-channel microphone array system, allowing us to determine which mouse emitted individual vocalizations across multiple social contexts. This technology, in conjunction with a new approach for extracting and categorizing a complex, full repertoire of vocalizations, facilitated our ability to directly compare how mice modulate their vocal emissions between isolated, dyadic and group social environments. When comparing vocal emission during isolated and social settings, we found that socializing male mice increase the proportion of vocalizations with turning points in frequency modulation and instantaneous jumps in frequency. Moreover, males change the types of vocalizations emitted between social and isolated contexts. In contrast, there was no difference in male vocal emission between dyadic and group social contexts. Female vocal emission, while predominantly absent in isolation, was also similar during dyadic and group interactions. In particular, there were no differences in the proportion of vocalizations with frequency jumps or turning points. Taken together, the findings lay the groundwork necessary for elucidating the stimuli underlying specific features of vocal emission in mice.
Vocalizations play an important role across the animal kingdom, communicating personal information including physical attributes (Gamba et al., 2012; Ji et al., 2013; Stoeger and Baotic, 2016) and physiological state (Knutson et al., 2002; Sehrsweeney et al., 2019), as well as survival-related information such as the presence of predators (da Silva et al., 2002; Seyfarth et al., 1980). Animals emit vocalizations with varying levels of acoustic complexity. The acoustic complexity of vocalizations ranges from simple continuous sounds, with frequencies that increase, decrease or remain constant, to complex sounds with elements such as instantaneous jumps in frequency or reversals in the direction of frequency modulation (Behr and von Helversen, 2004; Bradbury and Vehrencamp, 1998; White and White, 1970; Zuberbühler et al., 1997). By varying acoustic complexity, signaling animals may alter the information content of the emitted auditory cues and potentially change the behavior of conspecific receivers (Kershenbaum et al., 2016). Consequently, animals are thought to modulate vocal activity and acoustic complexity as a function of their social context. Male zebra finches (Taeniopygia guttata), for instance, sing differently when a female is present than when alone (Chen et al., 2016; Kao and Brainard, 2006). The golden rocket frog (Anomaloglossus beebei) emits vocalizations with greater acoustic complexity during courtship behaviors compared with those produced during aggressive behaviors (Pettitt et al., 2012). The yellow mongoose (Cynictic penicillate) produces calls to indicate the presence of a predator when group members are present, but not in isolation (le Roux et al., 2008). Thus, across numerous species, animals regulate their vocal emissions based upon their social environment.
The house mouse (Mus musculus) also modulates ultrasonic vocal emission across different social contexts, emitting distinct types of vocalizations during opposite-sex interactions and non-social conditions (Hanson and Hurley, 2012; Yang et al., 2013). Mouse ultrasonic vocalizations (USVs), signals ranging in frequency from 30 to 110 kHz (Gourbal et al., 2004; Holy and Guo, 2005), are predominantly emitted during opposite-sex encounters (Wysocki et al., 1982), and these signals are typically assumed to be produced by males (Warburton et al., 1989; Whitney et al., 1973). The features of mouse USVs are context dependent, differing in acoustic complexity (as measured by the percent of vocalizations with instantaneous jumps in frequency), vocalization type and spectral-temporal features across discrete behavioral contexts (Gaub et al., 2016; Grimsley et al., 2016; Hammerschmidt et al., 2012). Moreover, mice produce complex, multi-syllabic vocalizations containing frequency jumps when socially interacting (Matsumoto and Okanoya, 2018; Miller and Engstrom, 2007; Scattoni et al., 2008; Weiner et al., 2016) and USV emission changes over the course of interaction (Matsumoto and Okanoya, 2018). However, direct knowledge of how individual mice alter their vocal emissions across different social contexts is lacking. One study found that males produce more complex vocalizations in the presence of a male listener than when alone (Seagraves et al., 2016). Other work has found that vocalizations in a mixed-sex dyad differ from vocalizations emitted when a male is in isolation (Chabout et al., 2015; Hanson and Hurley, 2012). These studies, however, typically employed a dyadic social encounter, thus not determining the role of group size in vocal emission, and also assumed that all vocalizations were emitted by the male mouse. Recent work, however, revealed that female mice emit USVs while interacting with males (Heckman et al., 2017; Neunuebel et al., 2015; Sangiamo et al., 2020; Warren et al., 2018b), indicating that all socializing mice are capable of emitting USVs. As such, to fully elucidate the role social engagement plays in altering mouse vocal emissions, the vocal behavior of each mouse needs to be individually tracked across different social contexts.
In the current study, we combined a novel approach to extract a full repertoire of USVs with our previously established sound-source localization system (Warren et al., 2018a). Together, these tools allowed us to pinpoint where individual vocalizations originated and accurately assign vocalizations to their respective emitter. We tracked the vocal behavior of individual mice in three social contexts: (1) an isolated context involving a single male or female, (2) dyadic social interactions consisting of a male and a female, and (3) group social interactions comprising 2 males and 2 females. We directly showed that male mice emitted more acoustically complex vocalizations (i.e. the proportion of vocalizations with frequency jumps and turning points in frequency modulation) and altered the types of vocalizations they emitted in the presence of a social partner compared with during periods of isolation. Male vocal emission was unaltered by increasing the number of social partners from 1 to 3, as vocal emission was indistinguishable between dyadic and group social interactions. Similarly, female mice showed no vocal differences between dyadic and group encounters. Together, our results indicate that while male and female mice increase the complexity of their vocal repertoire between isolated and social contexts, complexity does not scale linearly with the size of a mouse's current social group.
MATERIALS AND METHODS
Adult mice (age 2–5 months) on a B6 background were used to examine acoustic complexity across multiple social contexts. In the first context, isolated males were exposed to female cues (previously described in Warren et al., 2018a; n=19 males) or isolated females were exposed to male cues (n=35 females). The second context consisted of a male and female pairing (previously described in Warren et al., 2018b; n=10 males and 8 females). The third context consisted of groups of mice, two males and two females (previously described in Sangiamo et al., 2020; n=22 males and 22 females). All mice were raised in a colony in the Life Sciences Research Facility at the University of Delaware. Colony founders were purchased from Jackson Laboratory (Bar Harbor, ME). At 3 weeks old, mice were weaned, group-housed by sex (maximum of four per cage) and implanted with light-activated microtransponders (P-Chip injector, PharmaSeq Inc., Monmouth Jct, NJ) for identification purposes. Mice were housed in cages containing ALPHA-dri bedding (Animal Specialties and Provisions, LLC, Watertown, TN) and environmental enrichment, and allowed ad libitum access to food and water.
Individually recorded males were housed in isolation (n=2), with same-sex littermates (n=14), or in a breeding pair (n=3). Females recorded in isolation and all mice used in a social encounter were isolate-housed for at least 2 weeks before recordings to minimize group-housing effects on social behavior (Hilakivi-Clarke and Lister, 1992; Jones and Nowell, 1989; Konig, 1994). All mice were maintained on a 12 h:12 h light:dark cycle and experiments were conducted during the dark phase of the light cycle.
All experiments were conducted at the University of Delaware in strict accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health. The University of Delaware Animal Care and Use Committee approved all experimental protocols (protocol number: 1275).
To make mice visually distinct, the fur of each mouse recorded in a group social encounter was bleached in a unique pattern with hair dye (Clairol Born Blonde, Proctor and Gamble, Cincinnati, OH; Ohayon et al., 2013). Patterns were two vertical stripes, two horizontal stripes, one slash or five dots. If the dye faded over time, mice were repainted with the same pattern. For dyadic recordings, all males were painted with a five-dot pattern; females remained unpainted. For individually recorded males, as mice did not need to be visually distinguishable, hair dyeing depended on whether that male was being prepared for use in another experiment. All individually recorded females were first used in dyadic or group recordings; therefore, patterns depended on the experimental condition. No mice were used in both dyadic and group recordings.
USV emission is enhanced by previous experience with the opposite sex (Arriaga et al., 2012). Therefore, all mice recorded with a social partner were exposed to a single animal of the opposite sex for 10 min 1 day after males were first marked with hair dye. Opposite-sex exposures were conducted in 11 singly recorded males and every singly recorded female. Opposite-sex exposures occurred in a clean cage with no bedding, and sessions were terminated before successful copulation or after 10 min. Animals used as opposite-sex stimuli were never used in behavioral recordings.
Males are typically non-vocal in isolation unless female scent cues are present (Guo and Holy, 2007; Musolf et al., 2010). Thus, for males recorded without a social partner, female scent cues were introduced into the arena by allowing a female to explore the environment for 3–5 min and/or by introducing freshly soiled bedding from the cage of a female prior to recording (Warren et al., 2018a). For isolated female recordings, male scent cues were provided by having the male in the arena prior to the recording. Recordings of individual animals lasted for 10 min.
Females were only recorded in the estrus stage of the estrous cycle, as determined by non-invasive lavage (Sangiamo et al., 2020; Warren et al., 2018b). Lavages were performed 30 and 120 min before a potential recording for dyadic and group interactions, respectively. The vaginal cavity was flushed with 30 µl of saline, the resultant solution was placed on a microscope slide, and the cells were stained with crystal violet. Pictures of the cells were taken using a camera (World Precision Instruments, cat. #USBCAM50) attached to a microscope (VWR, cat. #89404-890) via a coupler (World Precision Instruments, cat. #501381). Estrous stage was determined by assessing the observed cell types, with estrus defined as the majority of cells being anucleated, cornified squamous epithelial cells (Cora et al., 2015). Mice were recorded with each social partner only once. Mice in dyadic recordings participated in multiple interactions (Warren et al., 2018b).
All experiments were conducted in an anechoic chamber while audio and video data were recorded. Audio data was sampled by an 8-channel microphone array (microphones from Avisoft-Bioacoustics, Glienicke, Germany; cat. #CM16/CMPA40-5V) using equipment from National Instruments (Austin, TX; cat. #PXIe-1073, PXIe-6356, BNC-2110). Each microphone (model 3384, Krohn-Hite, Brockton, MA) sampled at 250,000 Hz and was low pass filtered at 200 kHz. Microphones were surrounded by rings of LED lights to visually determine microphone position prior to recordings. Video data was recorded via a single overhead camera (FLIR, Richmond, BC, Canada; cat. #GS3-U3-41C6M-C) using BIAS software (https://bitbucket.org/iorodeo/bias/downloads/) developed by Kristin Branson, Alice Robie, Michael Reiser and Will Dickson. The camera was triggered at 30 Hz via a counter pulse sent from the PXIe-6356 hardware. The triggering pulse was sent to both the camera and the National Instruments equipment concurrently through a BNC splitter to facilitate alignment of the audio and video data. Custom-written MATLAB v. 2014b software (MathWorks, Natick, MA) was used to control all recording devices. All data were stored on a PC (Hewlett-Packard, Palo Alto, CA).
Recordings were conducted in a cage with walls constructed of mesh (McMaster-Carr, Robbinsville, NJ; cat. #9318T25) with a frame of extruded aluminium (cuboid arena: 76.2×76.2×61 cm width×length×height; cylindrical arena: 68.6×91.4 cm, diameter×height; 80/20 Inc., Columbia City, IN) surrounded by Sonex foam (Pinta Acoustic Inc., Minneapolis, MN; cat. #VLW-35). Recordings were conducted in the dark, with infrared lights (GANZ, Cary, NC; cat. # IR-LT30) positioned above the cage. The arena floor was covered with ∼1.5 cm of ALPHA-dri bedding to enhance contrast between the mice and the cage floor. A ruler, placed in the center of the arena, was used to convert camera pixels into meters. Two 15 s pre-tests were run before each recording. In the first, the LEDs surrounding each of the microphones were illuminated and used to determine the position of each microphone. In the second, the LEDs were turned off and the overhead infrared lights were turned on to confirm the focus of the camera. Following the two pre-tests, the ruler was removed while the infrared lights remained on for the duration of each recording. Recordings lasted 10, 30 and 300 min for the mice recorded in isolation, pairs and groups, respectively. To control for potential temporal differences in vocal emission, all analyses focused upon the first 10 min of recording.
A data analysis pipeline was created on the University of Delaware's Farber computer cluster. The pipeline was used to determine each mouse's trajectory, to localize the source of individual vocalizations and to extract information about the vocalizations assigned to each animal.
Machine learning approaches were used to automatically identify and track the position of each mouse (Motr; Ohayon et al., 2013). For every video frame, an ellipse was fitted around each recorded mouse, and the x- and y-positions of their bodies, their length and width, and their heading direction were calculated. Motr used the dyed patterns to identify individual mice within group recordings. Tracking accuracy was manually inspected in MATLAB.
Extracting and assigning vocalizations to individual mice required four major steps, outlined below.
Step 1: audio segmentation
Multi-taper spectral analysis was used to automatically extract continuous stretches of sound (Seagraves et al., 2016; Warren et al., 2018a) from the audio recordings across the 8 microphone channels. All data were bandpass filtered between 30 and 110 kHz. Then, temporally overlapping segments were Fourier transformed via discrete prolate spheroidal sequences as window functions (K=5, NW=3). Each time-frequency point was compared with noise using an F-test (Percival and Walden, 1993; P<0.05). This was repeated on multiple segment lengths to capture a range of spectral and temporal scales (non-equispaced fast Fourier transform, NFFT=64, 128 and 256). Data were combined into a single spectrogram and convolved with a square box (11 pixels in frequency by 15 in time) to fill in small gaps before continuous stretches of sound, containing a minimum of 1500 pixels and lasting a minimum of 5 ms, were extracted. The minimum duration of 5 ms was chosen as localization accuracy plateaued for vocal signals lasting ≥5 ms (data not shown; replicating work from Neunuebel et al., 2015). Extracted signals were stored as traces of frequency over time or frequency contours.
Step 2: consolidating continuous signals
Mice are known to emit vocalizations containing frequency jumps or instantaneous jumps between two different frequencies. Owing to this discontinuity in frequency, each unique segment would be extracted as a unique signal using our previous methods (Warren et al., 2018a). To more accurately characterize mouse USV complexity, we aimed to extract vocalizations, including complex vocalizations containing one or more frequency jumps. Therefore, in vocal recordings from isolated males, independent viewers manually assessed whether successive vocal signals were part of a single vocalization or from different vocalizations. The independent viewers were four different members of the lab, each with at least a year of experience studying mouse ultrasonic vocalizations, and the inter-rater reliability was 96.4±1.5% (mean±s.d.), with a minimum value of 94.4%. Based on the assessment of the independent viewers, signals separated by less than 13 ms in time (start time of the second signal minus end time of the first signal) were consistently categorized as segments of a single vocalization that had been segmented into multiple parts. These vocalizations contained both continuous and complex components.
When an animal emitted a continuous vocalization, we assumed, a priori, that the whole sound came from a single animal. Thus, we could confidently assign the signal to a single animal prior to estimating the source location of each component. However, since complex signals can consist of multiple discontinuous sounds, it is possible that instead of being a single complex vocalization, the individual pieces of sound were actually unique vocalizations emitted by different animals. As such, consolidating these segments prior to determining which animal emitted each segment has the potential to inaccurately pinpoint the vocalizer and represent the vocal data. To avoid this potential pitfall, we asked trained viewers to assess vocal data from a single male and determine whether or not pairs of successive signals appeared to be single complex vocalizations containing frequency jumps. We found that successive signals separated by greater than 10 kHz (|start frequency of the second signal–end frequency of the first signal|) were consistently categorized as vocalizations containing a frequency jump.
Based on the findings of our trained viewers, two criteria needed to be met to collapse vocal signals into vocalizations. First, successive signals had to be separated by fewer than 13 ms. Second, successive signals needed to differ in frequency by fewer than 10 kHz. If successive signals met both of these criteria, they were consolidated into vocalizations. Signals falling within the temporal threshold but exceeding the frequency threshold were kept as independent elements until after we could discern which mouse emitted each element.
Step 3: sound source localization and signal assignment
Step 4: concatenating complex vocalizations
At this point, all unambiguously continuous vocalizations were assigned to the vocalizing mouse. However, vocal signals meeting the criteria for complex vocalizations remained separated. Thus, we next determined whether these signals were multiple vocalizations emitted by multiple animals, or instead individual components of a complex vocalization emitted by a single mouse. If successive signals were emitted by a single animal, they were consolidated into a single vocalization. If the two signals were emitted by two different vocalizers, each was extracted as a vocalization. If one of the two signals did not meet the threshold for assignment, meaning we were not 95% sure of the identity of the vocalizer for one of the signals, we assessed the MPI values for the unassigned signal. We temporarily assigned the vocalizer as the mouse with the highest MPI value, or the mouse that was most likely to have emitted that signal. If that mouse matched the vocalizer of the assigned signal, we consolidated the two signals into a single vocalization. If not, we extracted the two signals as two unique vocalizations, one assigned to a mouse and one unassigned. Once this was completed for all pairs of successive signals, all vocalizations lasting <10 ms were removed.
Once we had extracted and determined which mouse emitted each vocalization, we next aimed to determine whether acoustic complexity differed across the three social conditions. Only vocalizations assigned to specific animals were included in analyses. For the 1-mouse context, vocalizations assigned to any of the virtual mice were excluded. We therefore quantified complexity using two different metrics. First, we compared the proportion of vocalizations emitted by individual mice in each context that had zero (i.e. continuous), one, or multiple frequency jumps (Fig. 3). As a secondary approach, we determined the number of vocalizations containing a discrete change in the direction of frequency modulation (i.e. any point in the vocalization where the frequency was increasing and switched to decreasing, or vice versa; Seagraves et al., 2016). If this pattern existed, we considered the vocalization to be complex. Our second method for quantifying complexity required smoothing the vocalizations. Therefore, we first normalized the data such that every vocalization spanned 100 unique time points and had an average frequency of zero. We then applied an envelope function to each vocalization to mark the upper and lower bounds. We averaged the upper and lower envelopes to generate a smooth line through the signal, which was used to assess complexity.
Vocal clustering: k-means
As unique components of vocalizations containing frequency jumps could be separated by as much as 13 ms, some of the complex signals contained temporal gaps, or periods of time with no frequency data. However, our clustering algorithm required all input to be continuous. Therefore, we represented temporal gaps in vocalizations with frequencies of zero. Then, we began the clustering program by splitting all vocalizations into two clusters. Across all vocalizations categorized into cluster one, we found the mean and standard deviation of normalized frequency at each of the 100 time points. This process was repeated for all vocalizations categorized into cluster two. If more than 3% of vocalizations within either of the clusters had more than 25 of their 100 data points falling outside of 2.5 standard deviations from the group mean, we considered that there was too much variability in the shapes of the vocalizations within the cluster. Therefore, we reran the clustering program with one additional cluster, continuing to add one cluster at a time until we found the optimal number of clusters to sufficiently encompass the variability across all vocalizations. With each iteration, we saved the cluster identity of each vocalization so that we could track how clusters changed as we further segmented the data.
Vocal clustering: elbow function
Once we determined the optimal number of clusters to explain the variability across all of our vocalization data, we separated the vocalizations in each of the clusters into three groups: vocalizations with zero, one, or multiple frequency jumps. As we had saved the cluster identities with each iteration, we knew the cluster identity of each vocalization when there were only two clusters, when there were only three clusters, all the way through the optimal number of clusters. Starting with the continuous vocalizations, we computed the average variability in shape within and between vocalizations from each of the first two clusters. Variability within a cluster was calculated using Euclidean distance to compare the shapes of random pairs of vocalizations within the cluster. Variability between the two clusters was calculated using Euclidean distance to compare the shapes of random pairs of one vocalization from within the cluster of interest and one vocalization from the other cluster. We found the average of all within-cluster distances and all between-cluster distances and generated a ratio of within divided by between. This process was repeated with three through our pre-determined optimal number of clusters to provide us with difference ratios for all possible numbers of clusters with continuous vocalizations.
We fitted a line to the difference ratios across each possible number of clusters (2–23) and applied an elbow function to the result. The elbow function moves along the curve one bisection point (cluster) at a time and fits two lines; one to all points to the left of the bisection and one to all points to the right. The bisection point that minimizes the sum of error for the two fits, or the point of optimal trade-off between within- and between-cluster variability, is then determined. This indicates the optimal number of clusters. This analysis was also applied to both the 1-jump and multi-jump data to determine the optimal number of shapes needed to characterize continuous, 1-jump and multi-jump vocalizations. This methodology allowed us to determine the optimal number of vocal groups at each level of acoustic complexity.
Previous work using our progressive k-means clustering method has shown that male mice emit behavior-associated vocal signals, and this pattern holds when applying other clustering methods (Sangiamo et al., 2020). However, a vital aspect of our clustering procedure is the fact that we incorporate information about the vocalizing animal. Here, we added the quantification of structurally complex, multi-syllabic vocalizations, which allows us to attribute entire multi-syllabic vocalizations to individual animals. Thus, with this novel capacity to represent the full range of mouse USVs, we used the term vocalization type as a categorical label for the final clustering output.
Quantifying cluster similarity
To confirm that our clustering method was not being driven by a specific social context, we next quantified the within- and between-type similarity for each context separately (Fig. S3A–C). Thus, within a context we used a Euclidean distance measure to assess the similarity of vocalizations within a type. We randomly split the vocalizations from a type into two equally sized groups, excluding a single vocalization in the case of an uneven number of vocalizations, and found the distance between each pair of vocalizations (the first vocalization from each half, the second vocalization from each half, etc.).
To assess similarity between types, we used the same Euclidean distance measure, but compared vocalizations from different types. Thus, we first extracted all vocalizations of the type of interest. We then randomly selected that many vocalizations from the remaining vocalizations (i.e. vocalizations from all other types), and found the distance between each pair of vocalizations (the first vocalization from the type of interest versus the first vocalization from the remaining data, the second vocalization from both categories, etc.). This was repeated for all types of vocalizations to determine whether vocalizations of the type of interest were more similar to each other than they were to vocalizations of other types from the same context. This was repeated separately for all three social contexts.
To quantify vocal similarity across social contexts, we employed a similar Euclidean distance measure, but employed the measure across all three contexts (Fig. S3D). To quantify within-type similarity, we found all vocalizations from the type of interest across all three social contexts. We determined which context contained the fewest vocalizations of the type of interest, and randomly extracted that many vocalizations from the type of interest in each context. For example, 297, 598 and 723 type 1 vocalizations were emitted in the 1-mouse, 2-mouse and 4-mouse, respectively, so we randomly selected 297 type 1 vocalizations from each context. We next made pairwise comparisons, comparing the first vocalization of the type of interest between all possible pairs of contexts: 1-mouse versus 2-mouse, 2-mouse versus 4-mouse, and 1-mouse versus 4-mouse. This was repeated for the second vocalization through the final vocalization, and all resultant distances were averaged to generate a summary value for the type of interest. To generate a between-type comparison, we also randomly selected 297 vocalizations from across each of the other types. Then we compared vocalizations from type 1 emitted in the 1-mouse condition only with type 2 vocalizations from the same context. This was replicated for all 3 contexts and averaged to get a summary value for the difference between type 1 and type 2 vocalizations within a social context. This analysis was repeated for each possible pair of types to determine how similar vocalizations within a type were across contexts compared to how similar vocalizations were within a context but across type.
Quantifying proportional vocal emission
To determine how mice vocalized across social contexts, we assessed how frequently every mouse emitted each type of vocalization. To control for differences in the number of signals emitted by individual animals, we generated a vocal proportion. Thus, for each mouse, we determined the total number of vocalizations emitted. Then, we found the proportion of each type of vocalization out of the total number of vocalizations emitted (i.e. we calculated the total number of signals that the mouse emitted of each type and divided those values by the total number of signals emitted by that mouse). This was repeated for all mice in all conditions.
All statistical tests were nonparametric. Mann–Whitney tests were used for pairwise comparisons. For group-wise comparisons, a Kruskal–Wallis test was used followed by Dunn–Sidak post hoc tests to correct for multiple comparisons. All alpha values were set to 0.05.
Using an 8-channel microphone array to assign vocalizations (Fig. 1A; Fig. S1) to individual mice, we compared the complexity of male- or female-emitted vocalizations in three different contexts: an isolated context (when mice were exposed to urine and/or other bodily scent cues from the opposite sex), a dyadic context (when one male and one female were present) and a group context (when two males and two females were present). As shown in Fig. 1B–D, male mice vocalized in all three contexts (isolated: n=29 recordings, 3905 assigned vocalizations out of 4641 detected vocalizations; dyadic: n=13 recordings, 12,153 assigned male-emitted vocalizations out of 18,721 detected vocalizations; group: n=11 recordings, 22 males, 15,616 assigned male-emitted vocalizations out of 23,287 detected vocalizations). We therefore aimed to quantify how acoustic complexity differed across social contexts (Fig. 2A–C). Similar to previous work (Arriaga et al., 2012; Chabout et al., 2015, 2016; Holy and Guo, 2005; Matsumoto and Okanoya, 2018; Scattoni et al., 2008; Weiner et al., 2016), we split vocalizations into three groups based on complexity: signals without frequency jumps (continuous), signals with a single jump in frequency (1-jump) or signals with two or more jumps in frequency (multi-jump). We then found the proportion of vocalizations emitted by each male that was classified into each group (for visualization purposes, means averaged across individuals are represented in Fig. 2D). In the isolated condition, 92.73% (IQR=85.12–94.55%) of vocalizations emitted by an individual male were continuous (Fig. 2E), with 6.78% (IQR=4.80–12.86%) having one jump and 0% (IQR=0–1.40%) having multiple jumps. In the dyadic condition, in contrast, only 79.41% (IQR=78.09–90.73%) of vocalizations were continuous, while 15.44% (IQR=8.30–16.65) had one jump and 3.69% (IQR=1.82–5.41%) had multiple jumps. In a group setting, acoustic complexity was similar to the two-mouse condition. For group contexts, 81.00% (IQR=72.07–88.59%) of vocalizations were continuous, 14.82% (IQR=10.40-21.33%) contained a single frequency jump, and 3.53% (IQR=1.01–6.01%) contained multiple jumps. Moreover, the proportion of vocalizations that were continuous was significantly different across social contexts (Kruskal–Wallis with Dunn post hoc: χ2=18.34, P=1.04e–4), with isolated males emitting a significantly greater proportion of continuous signals than males in a dyad (P=0.0084) or a group (P=0.002). The proportion of continuous vocalizations emitted by males did not differ between dyads and groups (P=0.98). Isolated males emitted a significantly lower proportion of vocalizations with one jump than males in a dyad (P=0.022) or a group (P=0.0007), with no difference between dyads and groups (P=0.97). Lastly, isolated males also emitted a significantly lower proportion of vocalizations with multiple jumps than either dyads (P=0.0006) or groups (P=0.0006), with no difference detected between dyads and groups (P=0.93). Thus, our findings suggest that males, regardless of the number of partners, increase the complexity of their vocal emissions in social contexts compared with non-social contexts.
As a secondary quantification of acoustic complexity, we determined whether each male-emitted vocalization contained a turning point in frequency (switching from increasing in pitch to decreasing in pitch or vice versa (see Mahrt et al., 2013; Panksepp et al., 2007; Scattoni et al., 2008; Seagraves et al., 2016). Using this metric, male mice emitted a lower proportion of complex signals during one-mouse recordings compared to 2-mouse recordings (Fig. 2F; Kruskal–Wallis with Dunn post hoc: χ2=15.79, P=0.011; 1-mouse: median=48.39%, IQR=40.98–55.22%; 2-mouse: median=61.49%, IQR=57.26–62.49%) or 4-mouse recordings (P=0.0010; 4-mouse: median=61.29%, IQR=55.32–64.53). No differences were observed between the 2- and 4-mouse contexts (P=1.0). Therefore, these results provide further evidence that only a single social partner is needed for male acoustic complexity to reach an upper bound.
Although our results suggest that acoustic complexity differs based on the presence or absence of social partners, other factors may be at play. One possible explanation is that different experiences might contribute to the differences in acoustic complexity observed between singly and socially recorded mice. For the isolate-housed recordings of male mice, multiple prior experiences were uncontrolled. Some of the males used in the isolated recordings were marked with individual dye patterns, while others were unmarked. Some mice were previously exposed to the opposite sex, while others were not. Some animals lived with other mice, while others were isolate housed. In contrast, all males used in recordings with social partners were marked with dye patterns, were previously exposed to a female and were singly housed prior to recordings. To directly control for the potential impact that prior experience might play in our study, we ran two additional analyses using recordings of isolated males that had the same experiences as males in a group context (i.e. prior opposite-sex exposure and having their fur dyed; n=11 isolated males). When comparing the acoustic complexity of males with similar experiences, we found that there were still significant differences in the proportion of vocalizations containing a change in the direction of frequency modulation in the 1-mouse and 2-mouse contexts (Kruskal–Wallis with Dunn: χ2=12.75, P=0.01; 1-mouse: median=41.94%, IQR=31.28–51.99%; 2-mouse: median=61.49%, IQR=57.26–62.49%) as well as the 1-mouse and 4-mouse contexts (P=0.002, 4-mouse: median=61.27%, IQR=55.32–64.53%). These results were replicated when comparing the proportion of vocalizations with frequency jumps (Kruskal–Wallis with Dunn post hoc: χ2=9.98; 1-mouse versus 2-mouse: P=0.047; 1-mouse, median=6.45%, IQR=4.91–14.78%; 2-mouse, median=15.44%, IQR=8.30–16.65%; 1-mouse versus 4-mouse: P=0.006; 4-mouse, median=14.82%, IQR=10.40–21.33%). Therefore, these results align with evidence from prior studies (Hanson and Hurley, 2012; Seagraves et al., 2016) and suggest that the presence of a social partner played a substantial role in increasing acoustic complexity.
While females were long thought to be silent during mixed-sex interactions, recent work has shown that female mice vocalize while engaged with males, albeit less frequently than males (Heckman et al., 2017; Neunuebel et al., 2015; Sangiamo et al., 2020; Warren et al., 2020,b). Therefore, we sought to determine whether female mice modulate their acoustic complexity across social contexts. We found that females were vocally active in both dyadic (n=13 females, 1614 female-emitted vocalizations) and group (n=11 recordings, 1570 vocalizations from 22 females) settings. However, females were typically silent in isolation (non-social: n=35 females, 39 female-emitted vocalizations). The lack of female vocalizations in isolation is similar to prior findings (Maggio and Whitney, 1985) and prevents a powered comparison to the two social contexts. Thus, we only compared female vocal emission between dyadic and group contexts. Fig. 2G shows the mean proportion of vocalizations categorized as continuous, 1-jump or multi-jump across individual females. Continuous vocalizations constituted 88.12% (IQR=86.63–93.14%) of vocalizations emitted by females in a dyadic condition compared with 89.09% (IQR=86.36–92.94%) in group settings (Fig. 2H; Mann–Whitney: ranksum=232, P=0.96). Vocalizations containing a single frequency jump constituted 9.90% (IQR=6.26–12.14%) of female-emitted signals in dyads compared to 9.21% (IQR=4.12–13.46%) in group settings (Fig. 2H; Mann–Whitney: ranksum=238, P=0.91). Finally, vocalizations with two or more jumps constituted 0.94% (IQR=0–1.93%) of the signals in dyadic settings versus 0.91% (IQR=0–2.67%) in group settings (Fig. 2H; Mann–Whitney: ranksum=263.5, P=0.94). When quantifying acoustic complexity using our secondary metric, whereby a signal was considered complex if it contained at least one inflection in the direction of frequency modulation (Fig. 2I), the proportion of complex signals emitted by females was again similar between dyadic and group contexts (Mann–Whitney: ranksum=275.5; P=0.16). During dyadic interactions, 68.15% (IQR=64.51–69.55%) of female-emitted vocalizations switched direction in frequency modulation. Similarly, 63.06% (IQR=60.71–69%) of female vocalizations during group interactions changed direction. Together, these results suggest that the complexity of female-emitted vocalizations, like male-emitted vocalizations, is unaltered by increasing the number of social partners.
Previous work has shown that mice emit different types of vocalizations in different behavioral contexts (Matsumoto and Okanoya, 2018). Therefore, we assessed whether mice produced different types of vocalizations across the three social contexts. Using an automated k-means clustering algorithm, vocalizations emitted across each of the three contexts were clustered together based on shape. The clustering algorithm defined 23 unique vocalization types (Fig. S2). Based on evidence that mice emit vocalizations containing frequency jumps in specific behavioral contexts (Matsumoto and Okanoya, 2018), we reasoned that vocalizations containing frequency jumps may convey different meaning than continuous signals. Consequently, we sorted the vocalizations within each of the 23 types into groups based on complexity. This produced 23 types of continuous vocalizations, 23 types of vocalizations with a single frequency jump and 23 types of vocalizations with multiple frequency jumps. Then, to determine the optimal number of types necessary to accurately represent the vocalizations from each complexity level, we applied an elbow function to the vocalization clusters from each group. We found that 26 vocalization types were optimal to encompass the variability in both vocalization shape and complexity (Fig. 3; Fig. S2): 7 continuous types (vocalization types 1–7), 12 types with one jump (types 8–19) and 7 types with multiple jumps (types 20–26). Unique to this clustering approach, the method incorporates information about the vocalizing animal to ensure structurally complex, multi-syllabic vocalizations are produced by a single animal. To confirm that clustering was equally effective across all three social conditions, we used a Euclidian distance measure to compare within-type variability in shape (how similarly shaped are signals of the same type) and between-type variability (how similar in shape are vocalizations of different types). Within-type variability was consistently less than between-type variability (Fig. S3A–C). Furthermore, vocalizations within a type, but emitted in different contexts, were consistently more similar to each other than to different types of vocalizations emitted within the same context (Fig. S3D). These results demonstrate that, even with the addition of new vocalization types containing frequency jumps, our clustering method accurately grouped vocalizations with other similarly shaped signals across all social contexts.
With the new-found knowledge of the vocalization types that mice emit, we next determined whether the emission of specific types of vocalizations differed across social contexts (Fig. 4; statistics in Table 1). Males in dyadic conditions produced fewer type 1 and 2 vocalizations (continuous) than males in isolation. In contrast, males emitted more of types 4–6 (continuous), types 8–11 and 18 (1-jump), and types 20–26 (multi-jump) during interactions with a female than in the absence of a social partner. During group interactions, males emitted significantly higher proportions of types 4–7 (continuous), types 8–11 and 16–19 (1-jump), and types 20–26 (multi-jump) than in non-social conditions. Most notably, however, no significant differences were observed in male vocal emission between dyadic and group contexts. Interestingly, females significantly modulated the emission of only two vocalization types between dyadic and group contexts (Fig. 5; statistics in Table 2). These findings provide further evidence that male mice regulate vocal emission based on the presence (or absence) of a social partner. Moreover, male and female mouse vocal emission is generally unaffected by the presence of additional social partners.
With the ability to assign a full repertoire of ultrasonic vocalizations to individual mice interacting with conspecifics, we examined the role that social context plays in mouse acoustic complexity. Our results indicate that male mice alter both the complexity of their vocalizations and the relative proportion of the vocalization types produced in isolated and social contexts. The most interesting finding, however, is that the complexity of both male and female vocal emissions, as well as the types of vocalizations emitted, is preserved as the number of social partners increases from 1 to 3.
Context-specific alterations in acoustic complexity have been shown in numerous species. One of the most studied models is birds, wherein males often produce complex courtship songs to attract females (Byers and Kroodsma, 2009). Females are generally more attracted to males with larger repertoires of sound, altered phrasing or different song types (Buchanan and Catchpole, 1997; Collins, 1999; Mountjoy and Lemon, 1996; Reid et al., 2004). However, this phenomenon is not specific to birds. In geladas (Theropithecus gelada), females prefer to spend time near speakers playing complex calls compared with simple calls (Gustison and Bergman, 2016). In greater sac-winged bats (Saccopteryx bilineata), females consistently roost in the harem of males exhibiting greater variation in their vocal emissions than males producing fewer syllable types (Davidson and Wilkinson, 2004). In Túngara frogs (Engystomops pustulosus), females are more attracted to complex calls (Rand and Ryan, 1981). Thus, across the animal kingdom, enhancing acoustic complexity may be reproductively advantageous for males.
Mice also show context-specific alterations in acoustic complexity, specifically between isolated and social conditions (Matsumoto and Okanoya, 2018). C57/B6 mice emit more diverse signals while socially engaged than when exploring a novel environment or when isolated and under stress (Chabout et al., 2012). When a female is removed from a male–female dyad, fewer vocalizations containing frequency jumps are emitted and there are also fewer vocalizations that decrease in pitch (Yang et al., 2013). However, once the female is returned to the arena, vocal activity resembles that seen in the original dyadic conditions. In contrast, for the CBA/CaJ mouse, more vocalizations are emitted after a female is removed than while a female is present (Hanson and Hurley, 2012). Additional work showed that vocalizations are more complex in response to female scent cues than during social interactions (Chabout et al., 2015), which contrasts with our findings. One possible explanation for the differences is that each study used a different strain of mice. The previous study employed B5D2F1/J mice and we used C57BL/6J mice. Using C57BL/6J mice, Chabout et al. (2016) showed that more complex vocal syntax was produced during dyadic interactions than in isolation. Thus, these studies provide additional evidence that USV acoustic complexity may be regulated by context in different strains of mice.
Another example of the influence of social context on vocal emission in mice is the audience effect, whereby males upregulate the complexity of vocalizations in the presence of another male (Seagraves et al., 2016). However, males in the Seagraves et al. (2016) study were unable to access a live female. Instead, male vocal activity was recorded either solely in the presence of female scent cues or in the presence of female scent cues plus a live male. Our results extend this finding and directly show that male acoustic complexity also increases in the presence of a female listener. Taken together, these results suggest that male mice increase the complexity of vocal emissions in the presence of any social partner. Additionally, our results uncover a potential upper bound for mouse acoustic complexity that is reached with a single social partner, as the acoustic complexity of both males and females is maintained between contexts containing one or three partners. The findings are surprising because increasing the number of audience members is believed to increase acoustic complexity in many other species (Matos and Schlupp, 2005).
The internal states of an animal strongly influence vocal emission (Morton, 1977). In many species, motivational and emotional states may underlie context-dependent changes in the frequency [e.g. rats (Brudzynski, 2007; Burgdorf et al., 2008) and humans (Bachorowski and Owren, 1995)] or temporal structure [e.g. bats (Bastian and Schmidt, 2008) and tree shrews (Schehka et al., 2007)] of vocalizations. Moreover, vocal emission may be affected by arousal (Bell, 1974) as seen in prairie vole pups, where increases in heart rate co-occur with decreases in USV duration and complexity (Stewart et al., 2015). While these examples strongly argue that motivation, emotion and arousal play important roles in vocal emission, it is less clear how internal states influence mouse acoustic complexity and emission. Because we found that the complexity and types of mouse vocalizations mice are similar across different social contexts, one possible explanation is that the underlying emotional, motivational or arousal states are similar in the two social contexts. Alternatively, internal states may not largely impact mouse vocal emission, such that internal states only push acoustic complexity towards an upper limit. The exact behavioral repertoire of individual animals may also be a driving factor that underlies mouse vocal emissions. For instance, mice emit vocalizations containing frequency jumps prior to and during mounting behavior (Hanson and Hurley, 2012; Matsumoto and Okanoya, 2018). Additionally, more vocalizations are emitted when two males are investigating each other than when males are separated from each other (Seagraves et al., 2016). More directly, male mice have been shown to emit distinct types of signals during specific behaviors (Sangiamo et al., 2020). Therefore, a potential explanation for the lack of vocal differences between social contexts is that vocalizations are closely linked to behavior, and the behavioral repertoire of mice in the two social conditions may have been similar. In all likelihood, both internal state and behavioral actions influence vocal emission, but determining the specific contributions of each variable requires further investigation.
Female vocal emission is highly regulated and influences dynamic social interactions. For instance, calls by female Alaskan moose (Alces alces gigas) are thought to discourage advances from subordinate males and ensure mating opportunities with more dominant males (Bowyer et al., 2011). Female dunnocks (Prunella modularis), while consistently vocal during territorial conflicts with rival females, are even more vocal when competing for male attention (Langmore and Davies, 1997). Female mice have also been shown to regulate their vocal emissions. For example, the sex of a female's social partner influences her latency to vocalize, with females taking longer to vocalize in same-sex dyads than mixed-sex interactions (Warren et al., 2020). Furthermore, female vocalizations appear to change the behavior of males in specific behavioral contexts (Warren et al., 2020). Familiarity with a social partner also impacts vocal emission, with females vocalizing more in the presence of a novel than a familiar animal (D'Amato and Moles, 2001). Interestingly, our findings indicate that females do not alter the complexity or types of vocalizations emitted based on the size of their social group, with vocalizations being indistinguishable in the presence of one or multiple listeners. However, female vocal rate increased whenever a listener was present, as few vocalizations were detected when recording females in isolation. These results could also be interpreted as an increase in complexity in the presence of a social partner, as any type of vocalization is more complex than silence. Thus, because the vocal behavior of females is strongly influenced by multiple environmental and social conditions and, importantly, these signals directly modulate social interactions, our working models of mouse social communication need to account for female vocalizations.
In conclusion, our ability to localize sounds as animals freely interact enabled us to directly show that the complexity of ultrasonic vocalizations emitted by individual mice increases in the presence of a social partner. Moreover, acoustic complexity appears to reach an upper bound with the addition of a single listener, as vocal emission is similar in dyadic and group interactions. Our findings provide the foundation for understanding the relationship between complex social communication and natural behavior, as well as disentangling the roles that internal states and external cues play in regulating the vocal emissions of both male and female mice.
We thank the staff from the Life Science Research Facility for assistance as well as James Farmer and Jaime Quesenberry for their help in building the acoustic chamber. We thank Anita Schwarz and other members of the University of Delaware Information Technologies for facilitating our ability to process the data. Finally, we thank the reviewers for all their helpful feedback.
Conceptualization: M.R.W., J.P.N.; Methodology: M.R.W., J.P.N.; Software: M.R.W., M.S.S., D.T.S., R.S.C.; Validation: M.R.W., J.P.N.; Formal analysis: M.R.W.; Investigation: M.R.W., M.S.S., D.T.S., R.S.C.; Resources: J.P.N.; Data curation: J.P.N.; Writing - original draft: M.R.W.; Writing - review & editing: M.R.W., M.S.S., D.T.S., R.S.C., J.P.N.; Visualization: M.R.W.; Supervision: J.P.N.; Project administration: J.P.N.; Funding acquisition: J.P.N.
This work was funded by the University of Delaware Research Foundation, General University Research Program, and the National Institutes of Health (2P20GM103653 and R01MH122752). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Deposited in PMC for release after 12 months.
The authors declare no competing or financial interests.