Understanding the regulation of social behavioural expression requires insight into motivational and performance aspects. While a number of studies have independently assessed these aspects of social behaviours, few have examined how they relate to each other. By comparing behavioural variation in response to live or video presentations of conspecific females, we analysed how variation in the motivation to produce courtship song covaries with variation in performance aspects of courtship song in male zebra finches (Taeniopygia guttata). In agreement with previous reports, we observed that male zebra finches were less motivated to produce courtship songs to videos of females than to live presentations of females. However, we found that acoustic features that reflect song performance were not significantly different between songs produced in response to videos of females, and those produced in response to live females. For example, songs directed at video presentations of females were just as fast and stereotyped as songs directed at live females. These experimental manipulations and correlational analyses reveal a dissociation between motivational and performance aspects of birdsong and suggest a refinement of neural models of song production and control. In addition, they support the efficacy of videos to study both motivational and performance aspects of social behaviours.
The extent and quality of various social displays, including communicative and courtship behaviours, reflect an individual's motivation and performance. Motivation refers to the ‘drive’ to display a behaviour, whereas performance refers to the fine motoric aspects of the behaviour. For example, internal and external states can affect the likelihood of displaying maternal behaviours (e.g. pup retrieval and grooming in rodents), and the latency and efficiency of pup-directed behaviours can vary between individuals, as well as within individuals over time (Champagne et al., 2003; Clark et al., 2002; Stolzenberg et al., 2012). Both the motivation to engage in maternal behaviours and the performance of various components of maternal behaviour have been found to have important developmental consequences in rodents, non-human primates and humans, and such findings highlight the importance of investigating both motivation and performance to gain a comprehensive understanding of social behaviour (Meaney, 2001; Rilling and Young, 2014). However, motivation and performance are often studied independently, and relatively little is known about the relationship between mechanisms regulating motivational and performance aspects of behaviour. In particular, little is known about the extent to which factors that affect the motivation to display a behaviour similarly affect the performance of the behaviour.
Birdsong provides an excellent opportunity to assess the degree to which mechanisms underlying motivational and performance aspects of social behaviour are shared or independent. When visually presented with an adult female, adult male songbirds become motivated to sing, dramatically increasing the likelihood of producing courtship song and the amount of time spent singing. Individual differences in this motivation are important because female songbirds tend to prefer males that display greater motivation to sing (i.e. produce more song) (Bradbury and Vehrencamp, 2011; Catchpole and Slater, 2008; Gil and Gahr, 2002; Sakata and Vehrencamp, 2012). Furthermore, males alter a number of vocal performance features when producing courtship songs compared with non-courtship songs (Chen et al., 2016; Moser-Purdy and Mennill, 2016; Sakata and Vehrencamp, 2012; Toccalino et al., 2016; Vignal et al., 2004; Woolley and Kao, 2015). For example, male zebra finches produce songs that are faster and more acoustically stereotyped when courting female conspecifics than when singing in isolation (Chen et al., 2016; Cooper and Goller, 2006; Kao and Brainard, 2006; Sossinka and Böhner, 1980; Woolley et al., 2014). These performance-related song traits can affect a male's attractiveness and reproductive success, since female songbirds prefer the courtship version of an individual male's song, as well as males with song features that are generally characteristic of courtship song (e.g. faster songs; Gil and Gahr, 2002; Podos et al., 2009; Woolley and Doupe, 2008).
Despite knowledge about the functional relevance of motivational and performance aspects of birdsong, little is known about how experimental variation in the motivation to produce courtship song relates to experimental variation in song performance. Brain areas that underlie the motivation to sing project to sensorimotor brain regions that regulate song performance, suggesting that song motivation could influence song performance (reviewed in Riters, 2012; Riters et al., 2004; Woolley and Kao, 2015). In addition, seasonal changes in the motivation to produce courtship song have been found to covary with seasonal changes in song performance (Smith et al., 1997, 1995). On the other hand, some studies have found a dissociation between song motivation and performance (Alward et al., 2013; Ritschard et al., 2011; Toccalino et al., 2016).
Here, we investigated variation in vocal performance across conditions that are known to modulate the motivation to produce courtship song in songbirds. Video playbacks of social stimuli have been used to elicit a wide range of social behaviours in a variety of taxa, including invertebrates, fishes, reptiles and birds (Evans and Marler, 1991; Fleishman and Endler, 2000; Gonçalves et al., 2000; Guillette and Healy, 2017, 2019; Oliveira et al., 1999; Ophir et al., 2005; Ord et al., 2002; Rosenthal, 1999; Uetz and Roberts, 2002; Ware et al., 2016). Video playbacks have also been used to elicit courtship song in songbirds (Galoch and Bischof, 2007; Ikebuchi and Okanoya, 1999; Takahasi et al., 2005). Although male songbirds produce courtship songs toward videos of females, they have been found to produce fewer courtship songs to videos of females than to live presentations of females, suggesting that they are less sexually motivated to produce courtship songs in response to videos (Ikebuchi and Okanoya, 1999). However, it is not known whether performance aspects of courtship song (e.g. tempo and stereotypy) are similarly reduced for songs produced in response to video presentations of females. Previous studies of other social behaviours have found that behavioural performance can be distinct when individuals are presented with video or live presentations of conspecifics (Balshine-Earn and Lotem, 1998; Ord et al., 2002; Swaddle et al., 2006). Consequently, we analysed motivational and performance aspects of male zebra finch song in response to video and live presentations of females.
MATERIALS AND METHODS
Adult male zebra finches [Taeniopygia guttata (Vieillot 1817); >4 months; n=13] were bred and raised in our colony at McGill University. Males were socially housed in same-sex group cages and visually isolated from females. Birds were kept on a 14 h:10 h light:dark photoperiod, with food and water provided ad libitum. All procedures were in accordance with McGill University Animal Care and Use Committee protocols, as well as guidelines from the Canadian Council on Animal Care.
Stimulus females were videorecorded using a Sony DCR-SR 220 HD camcorder at 60 frames per second. We gathered footage of individual females perched at camera-level in front of a neutral background. Adobe Premiere 2017 was used for minor white balance corrections, cropping and trimming. Playback clips featured a silent, perched female engaged in a moderate level of activity (e.g. movements of head and along perch but no flying; Movie 1) and ended on a black screen. A total of six females were filmed with three clips created per individual.
Behaviour testing and song collection
Fig. 1 illustrates the experimental setup used during song collection. Male finches were isolated in individual cages (20×20×20 cm) inside sound-attenuating chambers (‘soundboxes’; TRA Acoustics, Ontario, Canada) from at least 1 day prior to experiments. All songs were recorded using an omnidirectional microphone (Countryman Associates, Inc., Menlo Park, CA, USA) positioned directly above the male's cage. During experiments, song was detected, digitized and recorded using a sound-activated system [Sound Analysis Pro v.1.04 (http://ofer.sci.ccny.cuny.edu/html/sound_analysis.html) digitized at 44.1 kHz]. A Microsoft Surface Pro 3 tablet (2160×1440 pixels) was used to playback videos and was fixed to a wall of the soundbox. The tablet was placed in the soundbox at least 10 min before the onset of testing and was positioned ∼12 cm from the male's cage. We sized the video playback window such that the stimulus bird in the video was approximately life-size at the distance between the cage and tablet. The screen was blank (black) when not displaying video stimuli. A camera mounted above the tablet provided a live stream of the experimental bird for monitoring. All experiments began within 2 h of lights turning on.
During experiments, we collected courtship songs from male zebra finches using a design similar to that described by Toccalino et al. (2016). Specifically, each male was briefly (∼30 s) exposed to six different females, three via live presentations and three via video presentations. During live female presentations, an experimenter opened the soundbox door and placed a cage housing a conspecific female next to the experimental male's cage, and then closed the soundbox door. The female remained in the soundbox for the duration of the presentation and was removed thereafter. During video presentations of females, an experimenter opened the soundbox door, started a video of a female, and then closed the door to the soundbox.
Males were exposed to a total of 6 randomly chosen stimulus females from a pool of 6 videotaped females and 12 live females. Females that were videotaped were distinct from those used for live presentations. Video and live presentations were grouped into three blocks (blocks A–C; Fig. 1C), with each block consisting of three consecutive exposures to either a video or live presentation of an individual female (exposures 1–3), followed by three consecutive exposures to the other stimulus type. Within each block of video presentations, males were exposed to distinct video clips of the same female. All presentations were separated by 5 min intervals. The order of conditions (video versus live presentation) within a block was pseudo-randomly determined to balance the order of conditions. The first condition presented in each block was determined by a coin flip, and, if the first condition of the first two blocks were the same (e.g. video first for blocks A and B), the order was reversed for the last block, ensuring that no experimental session consisted of blocks that each started with the same condition.
We categorized a male's song as directed toward the live or video presentation of a female if at least two of the following conditions were met during song production: (1) the male approached or oriented toward the stimulus females; (2) the male fluffed his plumage; and (3) the male pivoted his body from side to side (James and Sakata, 2015; Kao and Brainard, 2006; Morris, 1954; Toccalino et al., 2016). Typically, male zebra finches produce courtship song within a few seconds of stimulus presentation. Males in this study each produced a minimum of three courtship songs to live or video presentations of females [18.0±3.3 and 10.0±2.1 (mean±s.e.m.) song bouts per male, respectively, toward live and video presentations of females].
We also collected non-courtship, or undirected (UD) songs (i.e. songs produced spontaneously when alone) during the experiment to contrast with courtship songs. Undirected songs were generally produced during the 5 min intervals between female exposures. In cases where few UD songs were produced between female presentations, UD songs produced in the 30 min before and after the testing period were used for analysis (16.2±2.9 UD song bouts per male) (e.g. James and Sakata, 2015; Sakata et al., 2008; Toccalino et al., 2016).
We used the following definitions for our analyses (Fig. 2). ‘Song bouts’ are defined as epochs of singing that are separated by at least 1 s of silence (e.g. Johnson et al., 2002; Poopatanapong et al., 2006). Each song bout consists of a stereotyped sequence of vocalizations called a ‘motif’ that is repeated throughout the bout (Sossinka and Böhner, 1980; Zann, 1996). Motifs consist of distinct vocal elements (‘syllables’) that are separated by at least 5 ms of silence. The first motif of a bout is preceded by repetitions of brief vocal elements called ‘introductory notes’.
Our primary measure of courtship song motivation was the total amount of time (seconds) that males engaged in courtship song across all exposures to live or video presentations of females (‘time spent singing’). We also deconstructed this measure into various components, including the likelihood that males will produce courtship song on a given exposure and the total duration of song during each exposure. We additionally broke down the total song duration during each exposure into the number of bouts produced during each exposure, and the duration of each of those bouts. Bout durations were defined as the interval between the onset of the first syllable to the onset of the last syllable of the bout.
We analysed song features that are consistently affected by social stimuli and that have been used as indices of song performance (Sakata and Vehrencamp, 2012). In particular, we measured the number of introductory notes preceding song, song tempo and the variability of the fundamental frequency (FF) of syllables with flat, harmonic structure (Chen et al., 2016; Cooper and Goller, 2006; James and Sakata, 2014; Kao and Brainard, 2006; Sakata et al., 2008; Stepanek and Doupe, 2010). For these analyses, we first manually labelled syllables and introductory notes following amplitude-based element segmentation using custom software written in MATLAB (The MathWorks, Natick, MA, USA). Introductory notes were quantified by starting with the note immediately preceding the first syllable of the bout and counting backwards until we reached ≥1 s of silence. Motif duration was defined as the duration from the onset of the first syllable of the motif to the onset of the last syllable of the motif and was used as the metric for song tempo (e.g. James and Sakata, 2015; Kao and Brainard, 2006; Sakata et al., 2008). We restricted the analysis of song tempo to the first motif of the bout, because motif durations have been found to change across the song bout and because bout durations differ between live and video presentations (see Results; Chi and Margoliash, 2001; Cooper and Goller, 2006; Glaze and Troyer, 2006). Finally, we computed the FF of syllables with flat, harmonic structure (e.g. syllables ‘c,’ ‘d,’ and ‘e’ in Fig. 2D) by calculating the autocorrelation of a segment of the sound waveform and measuring the distance (in Hz) from the zero-offset peak to the highest peak in the autocorrelation function. We measured the FF on each rendition of the syllable, and then computed the coefficient of variation (CV; standard deviation/mean) of FF across all renditions of the syllable (40.7±6.3 renditions) within each condition. The CV of FF was used as an index of acoustic stereotypy, with low CVs reflecting high stereotypy (Sakata et al., 2008; Toccalino et al., 2016). We computed these measures of song performance for UD songs, songs directed at videos of females [video-directed (VD) song], and songs directed at live females [live-directed (LD) songs].
We compared song motivation between experimental conditions for all 13 males. However, five males produced courtship songs only during live presentations of females; therefore, in our direct comparisons of song motivation and performance during live and video presentations of females, data were restricted to the eight males that produced songs during both live and video presentations of females. Data were computed for each exposure in which song was produced (total song duration per exposure, bout duration, first motif duration, introductory notes and fundamental frequency).
Statistical analyses were conducted in R 2.15.1. We used linear mixed models (LMMs) and generalized linear mixed models (GLMMs) within the ‘lme4’ library (Bates et al., 2015) to compare singing behaviour across experimental conditions. Our experimental design consisted of three testing blocks (blocks A–C), with each block consisting of three consecutive exposures to videos of a single female and three consecutive exposures to a live female (exposures 1–3; Fig. 1C). Therefore, we ran three-way factorial models with Block (A–C; ordinal), Exposure (1–3; ordinal), Condition (live versus video; nominal) and all possible interactions as fixed effects. Because of the repeated-measures nature of this design, we also included Bird ID as a random factor. Furthermore, because birds can produce multiple bouts within an exposure and because bout number can affect some song features (see Results), we also ran four-way full-factorial models with the same three fixed effects plus Bout (1–3; ordinal).
In the analysis of the likelihood to produce courtship song, we had one binary response variable (whether the bird produced at least one courtship song bout or not during each exposure); therefore, we ran this model as a GLMM with a binomial error family. The number of song bouts produced during each exposure and the number of introductory notes preceding song bouts (see above) were count responses; consequently, we ran these models as GLMMs with a Poisson error family. Total time spent singing, total song duration per exposure, and song bout durations were highly skewed; therefore, these data were analysed with a gamma error family and a log link (data plotted following log-transformation for ease of presentation). Finally, the total number of exposures in which at least one song bout was produced and the duration of the first motif were analysed with LMMs with a Gaussian error family. Prior to running the statistical models, data were visually screened to assess model fit using Q–Q plots. To test the significance within each mixed model, we ran Type II Wald χ2 tests using the ‘car’ library (Fox et al., 2011).
We used a different statistical model to analyse experimental variation in the CV of FF. This is because the CV cannot be computed for a single rendition and needs to be computed across multiple renditions of the syllable. To provide reliable estimates of the CV of the FF of a particular syllable, we measured the CV across each rendition of a syllable for all songs produced during all blocks and exposures. The statistical model to analyse variation in the CV of FF also differs from those described above because birds can produce multiple syllables for which we calculated the CV of FF (n=13 syllables with flat, harmonic structure across the eight males). Consequently, for the analysis of the CV of FF, we ran an LMM with a Gaussian error family, and with Condition as the fixed effect and Syllable ID nested in Bird ID as a random effect so that we could directly compare the CV of the same syllable across conditions.
In addition to assessing differences in song performance across VD and LD songs, we also compared song performance of VD and LD songs with those of UD songs. For these analyses, we computed data for VD and LD song across all renditions of song (i.e. across blocks, exposures and bouts) and compared these values with those for UD song. We ran one-way models with similar parameterization as before, with Condition as the sole independent variable. We used a Poisson error family for introductory notes and a Gaussian error family for first motif durations and the CV of FF. Bird ID was a random variable for introductory notes and first motif durations, and Syllable ID nested in Bird ID was a random effect for CV of FF. For these analyses, we ran Tukey's tests with the Holm correction using the ‘multcomp’ library (Hothorn et al., 2008) for post hoc contrasts across the three conditions.
To gain further insight into the relationship between song changes across experimental conditions, we also analysed the extent to which motivational and performance changes driven by video presentations of females co-varied with motivational and performance changes driven by live presentations of females. We correlated the total amount of song a male produced in response to live presentations of females with the total amount of song a male produced to video presentations of females. In addition, we computed the percentage change of song features from UD to VD song and from UD to LD song, and correlated these changes. Because we measured the CV of FF of multiple syllables and because each syllable within a bird could change independently, we analysed these relationships using LMMs with the Gaussian error family with Bird ID as a random effect. All other relationships were analysed using Pearson's product–moment correlations.
Finally, we examined the extent to which individual variation in the differential motivation to produce courtship song to video and live presentations of females covaried with individual variation in the differential modulation of song performance across video and live presentations. Specifically, we correlated the difference in the amount of LD and VD song with the difference in the modulation of each song feature from UD to LD and UD to VD. Because the differential motivation to produce courtship song to videos or live presentations of females is summarised by one value per bird, we calculated the average change in the CV of FF across all syllables produced by each bird to relate to motivation (i.e. each bird has only one data point representing the average percentage change in the CV of FF of syllables). We used Pearson's product–moment correlations for all analyses.
Differences in the motivation to produce courtship song to video versus live presentations of females
We first counted, for each male (n=13), the number of exposures to live females or videos of females in which a male produced at least one bout of courtship song (out of nine exposures per male for each condition). We found that male zebra finches produced courtship song on significantly more exposures to live females [6.6±0.7 (mean±s.e.m.)] than they did to videos of females (3.6±0.9; =19.0, P<0.0001). Upon further inspection, we noted that, while most birds produced courtship songs to both live and video presentations of females (n=8), five birds sang exclusively towards live females (no males sang exclusively towards videos of females). However, the number of exposures with courtship song remained significantly higher for live presentations of females even when analyses were restricted to males that produced courtship song to both video and live presentations of females (live: 7.8±0.8; video: 5.9±0.8; =8.6, P=0.0034). The reduced courtship song production in response to videos suggests a lower motivation to produce courtship song to videos. It should be noted that variation in female video playbacks did not account for the difference in courtship song production between the five males that failed to produce any courtship song to video presentations of females and the eight males that produced at least one bout of courtship song to videos of females (see Table S1 and Appendix). Furthermore, variation in motivation to produce courtship song to live or video presentations was not linked to general variation in motivation to produce song as there was no significant correlation between the number of courtship songs and the number of undirected (UD) songs produced during the experimental period (r=0.12, P=0.7767), and no significant difference in the number of UD songs produced following video or live exposure to females (GLMM with Poisson error family:=0.05, P=0.8245; see also Appendix).
The total amount of time a male spends singing towards a female is widely considered to be a reliable measure of song motivation. As such, we compared the total amount of time male zebra finches sang to live and video presentations of females (i.e. total duration of song across all exposures). Because this study is focused on experimental differences in motivation and performance features, and because performance features of video-directed songs cannot be computed for males that did not sing to videos of females, we limited our analyses to birds that sang in both conditions (n=8). Overall, we found that these birds produced significantly more song towards live females (116±18 s) than they did to videos of females (46±8 s;=54.5, P<0.0001; Fig. 3A; Table S2).
Differences in the total amount of song produced to live versus video presentations of females could be caused by a number of factors, including variation in the probability of producing courtship song on each exposure, in the total amount of song on each exposure, in the number of song bouts produced on each exposure, and in bout durations. We first analysed whether males differed in their probability of producing courtship song towards live or video presentations of females on each individual exposure to a female stimulus. We performed a 3-way GLMM with Condition (live or video), Block (A–C) and Exposure (1–3; ordinal) as independent factors, Courtship (0 or 1; binomial) as the response variable, and Bird ID as a random effect. We found significant effects of Condition (=13.8, P=0.0323) and Block (=21.1, P=0.0122), indicating that birds were significantly more likely to produce courtship song to live females than to videos of females and that the likelihood of a male producing courtship song decreased across blocks (Fig. 3B). In addition, there was a marginally significant interaction between Condition and Block (=7.5, P=0.0588), with differences across conditions being larger for later blocks.
To further reveal the factors that contributed to the overall difference in the amount of courtship song produced to live versus video presentations of females, we examined the total amount of courtship song produced (in seconds) during each video or live exposure to a female (each exposure to a female stimulus was 30 s in duration; Table S2). We found significant effects of Condition (=69.6, P<0.0001), Block (=52.0, P<0.0001) and Exposure (=27.8, P<0.0001) on total song duration per exposure (Fig. 3C). Overall, song durations per exposure were longer in response to live presentations of females than they were to video presentations, and durations decreased across blocks and exposures. In addition, there was a significant interaction between Condition and Block (=20.3, P<0.0001), which was characterized by smaller changes in song durations across blocks for live presentations than for video presentations. As such, the difference in song durations between video and live exposures became larger over the blocks of testing.
Because birds produce courtship songs in bouts (i.e. epochs of song separated by ≥1 s of silence), differences in courtship song duration per exposure could be due to differences in the number of song bouts produced during each exposure as well as differences in the lengths of song bouts. Consequently, we first analysed the number of bouts that male zebra finches produced on each exposure (Table S2). We found a significant effect of Condition (=5.2, P=0.0221; Fig. 3D), with males producing more song bouts per exposure to live presentations of females (2.32±0.18 bouts per exposure) than they did to video presentations (1.70±0.12 bouts per exposure). While the interaction between Condition and Block was not statistically significant, visual inspections of the data indicate a trend for the difference between video and live presentations to become larger over the blocks of testing.
To analyse song bout duration, we ran a four-way factorial mixed effects model with the same fixed factors as above (Condition, Block and Exposure) as well as Bout (i.e. the serial order of bouts within each exposure; ordinal). Because birds rarely produced more than three bouts in an exposure, we limited our analysis to the first three bouts per exposure (models were rank deficiency when data for all bouts were included; Table S2). We found significant effects for all main factors (Condition: =40.3, P<0.0001; Block: =46.5, P<0.0001; Exposure: =21.0, P=<0.0001; Bout: =129.6, P<0.0001), as well as three-way interactions between Block, Exposure and Bout (=36.5, P<0.0001), and between Exposure, Condition and Bout (=9.8, P=0.0431; Fig. 3E). We also observed a significant interaction between Block and Exposure (=10.6, P=0.0318) and a marginal interaction between Block and Bout (=8.4, P=0.0795). Overall, bout durations were longer for songs produced to live presentations of females than for songs produced to video presentations of females. Additionally, bout durations decreased across blocks, across exposures within blocks, and across bouts produced within each exposure to a female stimulus.
Because of the complexity of the four-way model, we conducted another analysis limited to the data from the first bout (i.e. Bout not included as an effect in the model) to obtain a simplified depiction of variation in song bout duration (Table S2). We observed significant main effects for all three factors (Condition: =38.7, P<0.0001; Block: =34.1, P<0.0001; Exposure: =30.1, P=0.0001), as well as significant interactions between Block and Exposure (=18.3, P=0.0011) and between Block and Condition (=9.9, P=0.0071; Fig. 3F). Overall, the duration of the first bout of courtship song was longer for songs produced to live presentations of females than for songs produced to video presentations, and bout durations became shorter across blocks and exposures. The interactions were characterized by larger decreases across exposures during block A than during blocks B and C, and by larger decreases across blocks for video presentations than for live presentations.
Together, these analyses indicate that differences in total amount of courtship song in response to video and live presentations of females were due to differences in the likelihood of producing courtship song, the number of song bouts per exposure and the duration of individual song bouts.
Despite differences in the amount of courtship song produced to live versus video presentations of females, it is possible that individual variation in the motivation to produce courtship songs to videos of females is related to variation in the motivation to produce courtship songs to live presentations of females. Therefore, we correlated individual variation in the total amount of time males spent singing to live and video presentations of females. Consistent with the notion that motivation to court videos of females is related to the motivation to court live females, we found a significant correlation between the total amount of song produced to live versus video stimuli (n=8; r=0.73, P=0.0382; Fig. 3A).
Lack of differences in performance features of courtship songs produced to video versus live presentations of females
Our results suggest that male zebra finches are less motivated to court videos of females than live females, and we next sought to determine whether performance aspects also varied across songs directed at live or video presentations of females. To this end, we compared various measures of song performance (see Materials and Methods) between VD and LD songs among males that produced both types of songs (n=8 birds).
To analyse differences in the number of introductory notes preceding song, we first ran a four-way factorial model with Condition, Block, Exposure and Bout (limited to the first three bouts; see above) as fixed effects, Bird ID as a random factor, and the number of introductory notes before each bout of song as a Poisson response variable (Table S3). Importantly, we found no significant effect of Condition or interaction between Condition and other variables for the number of introductory notes. We only observed an effect of Bout (=123.9, P<0.0001), with the number of introductory notes decreasing across consecutive bouts produced during an exposure to a stimulus (Fig. 4A). We also ran a similar analysis with data limited to the first bout of courtship song on each exposure (Bout excluded as a factor) and, again, found no significant variation across conditions, blocks and exposures (Fig. 4B).
To analyse variation in song tempo between VD and LD song, we calculated the duration of the first motif of each song bout and analysed experimental variation in first motif durations using the same four-way factorial model as above (only first three bouts for an exposure). Only the first motif in each bout was analysed for this comparison because motif durations change as bout length increases (e.g. Chi and Margoliash, 2001; Glaze and Troyer, 2006; James and Sakata, 2014; James and Sakata, 2015) and because bout lengths differed between VD and LD song (Fig. 3; Table S2). There was no significant effect of any factor, including Condition, on song tempo (Fig. 4C). We also ran a three-way factorial model using only data from the first bout produced per exposure and, again, found no significant effects (Fig. 4D).
The FF of syllables with flat, harmonic structure is less variable from rendition-to-rendition when males direct song at females (Sakata and Vehrencamp, 2012; Woolley and Kao, 2015). We calculated the CV of FF across all syllable renditions in every bout of song and compared this variability between conditions (i.e. Condition is the only independent variable). We found a marginally significant difference between Conditions (=3.7, P=0.0545; Fig. 4E) with VD song tending to have lower CVs than LD song. However, no significant difference between VD and LD song was observed when only data from the first bout were analysed (=2.3, P=0.1266; Fig. 4F). This difference in the magnitude of differences between VD and LD song is primarily due to a decrease in the CV of FF for LD song when only the first bout of courtship song per exposure was analysed (compared with analysis of all bouts).
In our analyses of song motivation, we generally found the largest difference in the amount of courtship song in Block 3 (Fig. 3). Therefore, to further evaluate the relationship between song motivation and performance, we analysed performance measures only during Block 3. Consistent with the analyses of all blocks, there was no significant difference between LD and VD song for introductory notes (=0.5, P=0.4719), motif duration (=0.04, P=0.8504) or the CV of FF (=0.5, P=0.3062).
Courtship songs produced to video or live presentations of females are distinct in performance from undirected song
Overall, the preceding analyses indicate a lack of difference between VD and LD songs for three performance measures: the number of introductory notes, song tempo and spectral stereotypy. However, these analyses do not explicitly indicate whether VD songs are distinct from non-courtship songs (undirected or UD songs) in the same way that LD songs differ from UD songs. We therefore compared performance measures of every VD and LD song of a male to all his UD songs. We found a significant effect of Condition for introductory notes (=51.3, P<0.0001) with post hoc contrasts indicating that both VD and LD songs were preceded by more introductory notes than UD songs and that VD songs were preceded by more introductory notes than LD songs (P<0.0030 for all). We also found a significant effect of Condition on first motif duration (=39.7, P<0.0001), with post hoc contrasts indicating that motifs were shorter during VD and LD songs than during UD songs (P<0.0001 for both). Finally, we found a significant effect of Condition on the CV of FF (=8.8, P=0.0120) with post hoc contrasts indicating that the CV of FF was lower for VD songs compared with UD songs (P=0.0089).
The preceding analyses indicated that VD songs were distinct from UD songs, with mixed results regarding LD songs. However, our analyses above highlight how performance features can change across bouts (Fig. 4) and how the number of bouts produced per exposure differed between video and live presentations of females (Fig. 3); consequently, the previous results are confounded by experimental variation in the number of bouts per exposure. To examine variation without this confound, we conducted the same analyses with data restricted to the first bout of song per exposure. In addition, we limited our UD song data to songs preceded by at least 30 s of silence to approximate the first bout restriction for VD and LD songs (see Materials and Methods). Differences between UD song and either LD or VD song were consistent in this analysis as with the previous analysis that included data from all bouts. The number of introductory notes was significantly affected by Condition (=67.4, P<0.0001; Fig. 5A), with VD and LD songs being preceded by more introductory notes than UD song (P<0.0001 for each); the duration of the first motif was significantly different across Conditions (=34.3, P<0.0001; Fig. 5B), with first motif durations being shorter for VD and LD songs than for UD song (P<0.0002 for each); and the CV of FF was affected by Condition (=6.6, P=0.0366; Fig. 5C), with the CV of FF being significantly lower during VD song than during UD song (P=0.0318). Importantly, when data from only the first bout of LD and VD song were analysed, all song features were not significantly different between VD and LD songs. Taken together, these data indicate that males change their song performance when directing songs at video presentations of females and that the nature and degree of these changes are comparable between VD and LD songs.
To further investigate whether the modulation of VD and LD songs was consistent within individuals, we correlated individual variation in the magnitude of change in song performance from UD song to VD song with individual variation in the magnitude of change from UD song to LD song (data from first bout only). The relationship was significantly positive for the number of introductory notes (r=0.72, P=0.0430; Fig. 6A) and the CV of FF (=14.4, P=0.0015; Fig. 6C). The relationship was positive but not statistically significant for first motif duration (r=0.53, P=0.1766; Fig. 6B).
Lack of relationship between experimental variation in motivational and performance aspects of song
The lack of difference between various aspects of LD and VD song performance contrasts with the difference in the motivation to produce LD and VD song. This suggests that the motivation to produce songs to live versus video presentations of females is independent of song performance. To further investigate the relationship between motivational and performance aspects of song, we assessed whether individual variation in the differential motivation to produce courtship songs to video versus live presentations of females correlated with individual variation in the differential modulation of performance features (introductory notes, motif duration and variability of FF) from UD (baseline) song to VD or LD song (first bouts only). Specifically, we calculated the difference in motivation as the difference in total time spent singing LD and VD song and correlated this difference with the difference in performance modulation, measured as the difference in percentage change from UD to LD song (modulation when singing LD song) and from UD to VD song (modulation when singing VD song). Overall, we observed no significant correlations between experimental variation in motivation and performance (Fig. 7; introductory notes: r=−0.15, P=0.7219; first motif duration: r=0.39, P=0.3380; FF of CV: r=0.51, P=0.1975). Relationships were also not significant when we correlated differences in motivation with the differences between LD and VD songs (i.e. performance measures not normalized by UD song).
Male songbirds direct songs at females as part of their courtship ritual to secure copulations. This aspect of courtship can be analysed from both motivational and performance perspectives, with the former referring to the ‘drive’ to produce courtship song and the latter referring to the acoustic features of courtship song (e.g. song tempo and stereotypy). Both aspects of courtship song are important because deficits in either component can affect attractiveness and mating success (Gil and Gahr, 2002; Heinig et al., 2014; Sakata and Vehrencamp, 2012; Woolley and Doupe, 2008). However, little is known about the extent to which these aspects of courtship song are regulated by similar or distinct mechanisms. Indeed, because neural circuits regulating the motivation to sing project to brain areas that regulate song performance (Riters, 2012), it is possible that motivational and performance aspects of courtship song are linked.
Here, we took advantage of previous studies that outline experimental manipulations that affect the motivation to produce courtship song and assessed the degree to which such motivational variation was associated with variation in vocal performance. Previous studies indicate that male songbirds will produce courtship songs to videos of females but tend to be less motivated to sing to videos of females than to live presentations of females, as indicated by a reduction in the amount of time spent singing to video presentations (Galoch and Bischof, 2007; Ikebuchi and Okanoya, 1999; Takahasi et al., 2005). Consequently, we analysed whether the vocal performance of courtship songs produced to videos of females was distinct from songs produced to live females. Consistent with previous studies, we found that male zebra finches produced courtship songs to video presentations of female conspecifics but were less motivated to produce courtship songs to video presentations of females than live presentations of females. Specifically, males produced less than half the amount of courtship song during video presentations of females than during live presentations, and this difference was due to males being less likely to produce courtship songs during video presentations and producing shorter songs when courting videos of females (Fig. 3). Furthermore, whereas all males in this study produced courtship songs to live females, five males (out of 13) did not produce any courtship song to videos of females. Although other factors such as fatigue are important to consider in interpreting differences in the amount of song, we interpret variation in courtship song production as a reflection of variation in the motivation to court. Importantly, whereas zebra finch males appeared less motivated to produce courtship songs to videos of females, they produced courtship songs to videos that were indistinguishable in most ways from courtship songs produced to live females. In particular, the number of introductory notes preceding song, song tempo, and the variability of the fundamental frequency of syllables with flat, harmonic structure were not significantly different between VD and LD songs (Fig. 5). Consequently, these data support the notion that the motivation to produce courtship song is controlled by mechanisms independent of the regulation of song performance, a notion that is further supported by the finding that individual variation in motivation to produce VD and LD songs was not related to individual variation in the modulation of VD and LD performance (Fig. 7).
Such a dissociation between motivational and performance aspects of song has also been reported in previous studies (Cornil and Ball, 2010; Hampton et al., 2009; Kao and Brainard, 2006; Ritschard et al., 2011; Toccalino et al., 2016). For example, Toccalino et al. (2016) document that the familiarity of a female (i.e. repeated presentations of the same female) decreases the motivation of a male Bengalese finch to direct courtship song to that female but does not affect performance aspects of his courtship song. In addition, Alward et al. (2013) discovered that testosterone implants into the medial preoptic area increased the number of songs that male canaries produced to females but did not affect song performance measures such as song stereotypy. Collectively, these data suggest that distinct mechanisms contribute to motivational and performance aspects of birdsong and encourage experiments that further tease apart these aspects. Indeed, studies that revealed a dissociation between appetitive and consummatory aspects of copulatory behaviour (Moses et al., 1995; Pfaus et al., 1990; Riters et al., 1998; Seredynski et al., 2013) deeply shaped perspectives on social behavioural control and inspired a range of different experiments (Balthazart and Ball, 2007; Cornil et al., 2018).
Additionally, this interpretation suggests a need to revisit or build upon existing models of song motivation and control. Catecholamine (e.g. dopamine) release from midbrain and hindbrain circuits is hypothesized to contribute to the motivation to produce courtship song. For example, individual variation in the motivation to produce courtship song is correlated with variation in the number of dopamine-synthesizing neurons in the ventral tegmental area (VTA) of male zebra finches (Goodson et al., 2009), and manipulations of catecholaminergic neurons affect the likelihood that male zebra finches will produce courtship song to females (Barclay et al., 1996; Vahaba et al., 2013). The medial preoptic nucleus (POM) provides input to the VTA and the periaqueductral gray (PAG), and these inputs have also been proposed to influence the motivation to produce courtship song (Alward et al., 2013; Riters and Alger, 2004). Dopaminergic neurons in the VTA and PAG, and noradrenergic neurons in the locus coeruleus (LC) in turn project to various brain areas that regulate song control, including the avian basal ganglia nucleus Area X and the sensorimotor nucleus HVC (Appeltants et al., 2000; Castelino and Schmidt, 2010; Hamaguchi and Mooney, 2012; Maney, 2013; Tanaka et al., 2018), and dopamine or norepinephrine release into these areas affects neural activity and song performance (Cardin and Schmidt, 2004; Castelino and Ball, 2005; Ding and Perkel, 2002; Ihle et al., 2015; Leblois and Perkel, 2012; Leblois et al., 2010; Matheson and Sakata, 2015; Sasaki et al., 2006; Sizemore and Perkel, 2008; Solis and Perkel, 2006; Woolley, 2019). Taken together, this model suggests that variation in motivation should lead to variation in the amount of dopamine or norepinephrine released into areas like Area X or HVC, which should lead to variation in song performance. Our data do not support this model, suggesting that modifications or additional data are required. For example, further knowledge about the precise neural populations that regulate the repetition of introductory notes (e.g. Rajan, 2018; Rajan and Doupe, 2013), song tempo (Long and Fee, 2008; Zhang et al., 2017) and the variability of fundamental frequency (reviewed in Woolley and Kao, 2015), and about the extent to which these specific populations receive catecholaminergic inputs would allow us to refine these models that link motivation and performance. Furthermore, discovery of neurochemical systems that independently modulate song motivation or performance would greatly contribute to our understanding of this dissociation.
In addition to addressing models of vocal communication and social behaviour in songbirds, our results also extend previous studies in important ways by demonstrating that video presentations of female conspecifics lead to comparable changes to song performance as live presentations of females. The lack of significant differences in acoustic features between VD and LD song (Fig. 5) and the correlations in the degree of vocal modulations when males directed songs at live or video presentations of females (Fig. 6) indicate that videos of females are effective at eliciting the same suite of vocal performance changes as live presentations of females. From a mechanistic perspective, these data also suggest that videos of females engage the neural circuits for song performance to a comparable extent as live presentations of females. Neural activity in the anterior forebrain pathway (AFP) regulates context-dependent changes in the variability of fundamental frequency (reviewed in Brainard and Doupe, 2013; Murphy et al., 2017; Sakata and Vehrencamp, 2012; Woolley and Kao, 2015), whereas neural activity in the vocal motor pathway (VMP) has been proposed to regulate context-dependent changes to temporal features of songs such as song tempo or the number of introductory notes preceding song (Hampton et al., 2009; Matheson et al., 2016; Rajan and Doupe, 2013; Stepanek and Doupe, 2010). Although positive behavioural correlations do not necessarily indicate shared neural mechanisms, our data suggest that videos of females modulate neural activity in these circuits in the same way and to the same extent as live females and encourage future studies to measure such activity.
While the spectral and temporal features of songs measured in the current study are consistent with previous examinations of song performance (e.g. Sakata and Vehrencamp, 2012; Woolley and Kao, 2015; Murphy et al., 2017), total song duration has been interpreted as reflecting motivation as well as performance. Total song duration has been used as a proxy for male sexual motivation, as influenced by endocrinological activity or female attractiveness (e.g. Alward et al., 2013; Arnold, 1975; Cate, 1985; Cordes et al., 2015; Gil et al., 2006; Riters, 2012; Ritschard et al., 2011; Rutstein et al., 2007). However, other studies have considered song duration as a performance-related trait because courtship songs are longer than non-courtship songs and because female songbirds prefer males that produce more and longer songs (Gil and Gahr, 2002; Wasserman and Cigliano, 1991). Additionally, others have proposed that total song duration reflects both motivation and performance, since song duration can be modulated by female responses during courtship (Riebel, 2009). While we prefer the interpretation of song duration as a measure of motivation, it is important to acknowledge variation in explanations. Regardless of the interpretation, our analyses suggest that song duration is regulated by processes that are distinct from those controlling introductory notes, song tempo and acoustic stereotypy.
The reason for differences in the motivation to produce courtship song to live versus video presentations of females remains unknown. One possibility is that variation in female behaviour across conditions could account for this difference. For example, females in the videos were quiet and provided no real-time feedback to courting males. In contrast, although the behaviour of female stimulus animals was not quantified, live stimulus females can vocalize or posture during exposures to males. These behaviours could serve as feedback signals to the male and affect his motivation to produce song. As such, it is possible that male zebra finches perceived the females in the video as inattentive or uninterested in the male, which could have led to the male producing fewer and shorter songs towards these females (e.g. Rutstein et al., 2007; Ware et al., 2016). Furthermore, the lack of ultraviolet content in our videos could have influenced behavioural variation towards videos of females as ultraviolet visual information has been shown to be biologically important in avian mating behaviours (e.g. Johnsen et al., 1998). In this regard, a useful next step would be to assess how videos with different degrees of female vocalizations and movements influence the motivation to produce courtship song in male zebra finches (see also Carouso-Peck and Goldstein, 2019).
Broadly speaking, our results support the notion that video playbacks are a powerful tool to reveal the mechanisms by which individuals alter evolutionarily important behaviours, including vocal performance (Heinig et al., 2014; Podos et al., 2009; Sakata and Vehrencamp, 2012; Woolley et al., 2014). These findings also suggest that a standardized set of video stimuli can be used to reveal neural mechanisms underlying song motivation and performance (see Tables S1–S3) and provide additional impetus to evaluate how specific visual and/or auditory information regulate song motivation and performance.
Variation in the efficacy of videos to elicit courtship song from male zebra finches
We observed notable variation between videos of different females in the likelihood of eliciting courtship song from experimental males. The 13 experimental male zebra finches were exposed to videos of six individual female zebra finches (three distinct video samples per female). Each female stimulus was presented 19.5±2.6 (mean±s.e.m.) times, and males produced courtship song on 40±10% of exposures to these video stimuli. We observed a large range in stimulus efficacy, where the most effective female videos elicited courtship song on every exposure (high efficacy videos), and the least effective stimulus female elicited courtship song on 25% of exposures (low efficacy videos; Table S1). When comparing males that did not produce any VD song bouts (n=5 males; ‘noVD males’) to males that produced at least one bout of VD song (n=8 males; ‘VD males’), we did not find that noVD males were exposed only to lower efficacy videos. noVD males were exposed to videos of four different females (all females except ‘bl5b’ and ‘p47’), and while these videos did not evoke courtship song from noVD males, these same videos evoked courtship on 47-92% of exposures to VD males. Importantly, the videos of the other two females led to VD song on 25% and 100% of exposures to VD males (Table S1). As such, the ‘efficacy’ of the videos of females presented to noVD males to elicit courtship song was within the range of efficacies observed for other videos that VD males were presented with. While the range of females used for video presentations are limited in this study, this analysis suggests some males did not produce VD song because they were simply less motivated to produce courtship songs to videos of females and not because these males were exposed to videos of ‘lower quality’ females.
Lack of relationship between the motivation to produce undirected (UD) song and the motivation to produce courtship songs
We analysed the relationship between the motivation to produce UD song and the motivation to produce courtship (VD+LD) songs. To this end, we quantified the number of UD songs produced between video and live presentations of females and found that the number of UD songs during this period did not significantly correlate with the total number of directed songs (VD+LD songs; r=0.12, P=0.7767). This suggests that that motivation to produce courtship song is distinct from motivation to produce UD song.
We thank A. Jalayer for assistance with data analysis and S. C. Woolley for constructive input and feedback throughout the experiment.
Conceptualization: J.T.S.; Methodology: R.F., J.T.S.; Formal analysis: L.S.J., R.F., J.T.S.; Investigation: R.F.; Resources: J.T.S.; Data curation: L.S.J., R.F., J.T.S.; Writing - original draft: L.S.J., R.F., J.T.S.; Writing - review & editing: L.S.J., R.F., J.T.S.; Visualization: L.S.J., R.F., J.T.S.; Supervision: J.T.S.; Funding acquisition: J.T.S.
This work was supported by funding from the National Science and Engineering Research Council of Canada (05016 to J.T.S.); McGill University (Faculty of Science to R.F.), Canada Graduate Scholarship Master's (CGS-M to R.F.), Fonds de recherche du Québec–Nature et technologies (258824 to R.F.), Master's research scholarship (B1 to R.F.) and a Heller award (L.S.J.).
The authors declare no competing or financial interests.