The song of the adult male zebra finch is a well-studied example of a learned motor sequence. Song bouts begin with a variable number of introductory notes (INs) before actual song production. Previous studies have shown that INs progress from a variable initial state to a stereotyped final state before each song. This progression is thought to represent motor preparation, but the underlying mechanisms remain poorly understood. Here, we assessed the role of sensory feedback in the progression of INs to song. We found that the mean number of INs before song and the progression of INs to song were not affected by removal of two sensory feedback pathways (auditory or proprioceptive). In both feedback-intact and feedback-deprived birds, the presence of calls (other non-song vocalizations), just before the first IN, was correlated with fewer INs before song and an initial state closer to song. Finally, the initial IN state correlated with the time to song initiation. Overall, these results show that INs do not require real-time sensory feedback for progression to song. Rather, our results suggest that changes in IN features and their transition to song are controlled by internal neural processes, possibly involved in getting the brain ready to initiate a learned movement sequence.
The song motif (referred to as song) of the adult male zebra finch, consisting of a stereotyped sequence of sounds (syllables) interleaved with silent gaps (Fig. 1), is a well-established model for understanding learned movement sequences (Fee and Scharff, 2010). How such learned movement sequences are initiated in the brain remains poorly understood. Song is learned by young birds from a conspecific tutor during a critical period (Fee and Scharff, 2010). While song is typically part of a courtship ritual for mate attraction, birds also sing when they are alone (undirected song) (Sossinka and Böhner, 1980; Zann, 1996), making this an excellent model system to study motor preparation before self-initiated, learned movement sequences.
Song is preceded by the bird repeating a short vocalization called an introductory note (IN; Fig. 1) (Price, 1979; Sossinka and Böhner, 1980). Each song bout consists of a variable number of such INs followed by multiple repeats of the song. We have previously shown that intervals between successive INs and the acoustic properties of successive INs progress from a variable initial state (first IN in each song bout) to a more consistent ‘ready’ state (last IN in each song bout) just before the start of each song (Rajan and Doupe, 2013). Given the similarity to the reduction in variability associated with neural preparatory activity before the onset of simple movements (Churchland et al., 2006c), INs may represent vocalizations that help prepare the zebra finch brain to produce song. However, the mechanisms underlying IN progression to song remain unclear.
One possibility is that real-time sensory feedback could drive the progression of INs to song. Sensory feedback is important for song learning and maintenance in zebra finches (Konishi, 1965; Nordeen and Nordeen, 1992). In a related species, the Bengalese finch, recent work has shown that removal of auditory feedback changes the repeat number of individual syllables within song (Wittenbach et al., 2015). As INs are also repeating syllables, real-time sensory feedback could drive IN progression to song. Consistent with this hypothesis, previous studies disrupting proprioceptive feedback or auditory feedback have reported changes in the number of INs before song in some birds (Bottjer and Arnold, 1984). However, these changes have not been quantified rigorously and the specificity of these changes to removal of feedback has not been determined.
In order to assess the role of sensory feedback, we analyzed the number and properties of INs soon after removal of two important forms of sensory feedback, namely proprioceptive feedback from the syringeal muscles (Bottjer and Arnold, 1984; Vicario, 1991; Williams and McKibben, 1992) and auditory feedback (Konishi, 1965). We found that mean IN number before song and progression of INs to song were not affected by removal of either form of feedback. Further, the progression of INs to song was not affected by removal of neural input to the syringeal muscles. Finally, we found fewer INs and a quicker transition to song when the first IN was produced soon after calls (non-song vocalizations that are different from INs and song syllables). These data demonstrate that INs do not provide sensory feedback. Rather, INs may reflect internal neural processes, potentially involved in getting the zebra finch brain ‘ready’ to produce the learned song sequence.
MATERIALS AND METHODS
Experimental procedures performed at IISER Pune were approved by the Institute Animal Ethical Committee in accordance with the guidelines of the Committee for the Purpose of Control and Supervision of Experiments on Animals (CPCSEA, New Delhi). Experiments performed at UCSF (CA, USA) were approved by the UCSF Institutional Animal Care and Use Committee in accordance with NIH guidelines.
Birds and song recording
All birds (n=42) used in this study were >100 days post-hatch at the time of the experiment and were either purchased from an outside vendor (n=13) or bred at IISER Pune (n=16) or UCSF (n=13). Birds were kept in separate sound isolation boxes (Newtech Engineering Systems, Bangalore, India, or Acoustic Systems, Austin, TX, USA) for the duration of the experiment. All songs were recorded by placing a microphone (AKG Acoustics C417PP omnidirectional condenser microphone or B3 lavalier microphone, Countryman Associates, CA, USA) at the top of the cage. For birds in the tracheosyringeal nerve surgery (ts-cut) and sham-surgery groups (see below), we kept the position of the microphone the same for recording songs before and after surgical manipulations. Signals from the microphone were amplified using a mixer (Behringer XENYX 802) and then digitized on a computer at a sampling rate of 44,100 Hz using custom-written software. Songs were recorded in ‘triggered’ mode before and after surgery, such that data were saved when the microphone signal crossed a pre-set threshold. Along with the data that crossed the threshold, 1–3 s of data before and after threshold crossing were also saved. For a subset of birds, data were saved in ‘continuous’ mode, i.e. all of the data for the entire recording period. All songs were recorded in the ‘undirected’ condition. Songs of three of the birds used for the analysis of calls and their influence on song initiation have been used in a previous study for analysis of INs before song (Rajan and Doupe, 2013). The influence of calls on song initiation was not considered in the previous study. For the analysis of day-to-day changes in IN number and properties, we used data from 14 birds that were recorded on two different days (range: 1–3 days apart). Of these 14 birds, one bird was also used at a later time point for ts-cut surgery with a new set of pre- and post-surgery recordings and nine birds were also used for analysis of the influence of calls on INs. Pre-surgery recordings for 18/21 birds (n=5 ts-cut, n=6 sham surgery and n=7 deaf) were performed 0–2 days before surgery. For the remaining three birds (n=3 ts-cut), pre-surgery recordings were made 18, 14 and 5 days before surgery, respectively.
Tracheosyringeal nerve cut and sham surgery
Tracheosyringeal nerves were surgically cut using previously described protocols (Bottjer and Arnold, 1984; Vicario, 1991; Williams and McKibben, 1992). Briefly, birds (n=9) were deeply anesthetized by intramuscular injection of ketamine (30 mg kg−1), xylazine (3 mg kg−1) and diazepam (7 mg kg−1). Absence of a response to toe pinch was used to assess the depth of anesthesia. Birds were then placed on a platform with the ventral side facing up. A rolled tissue under the neck served to stretch and give easy access to the throat. Feathers were plucked and an incision of ∼10 mm was made. The trachea was exposed by removal of fat tissue. Using fine forceps, the tracheosyringeal (ts) nerve bundle on either side of the trachea was pulled away from the trachea and part of the nerve (n=9 birds, median length cut 4 mm, range 2–7 mm) was cut out on both sides using spring scissors (Fine Science Tools, Foster City, CA, USA). The skin was then glued using tissue adhesive (Vetbond, 3M). For sham surgeries (n=6), the same procedure was followed but the ts nerves were not cut. In two of the sham-surgery birds, some cuts were made on the thick membrane enclosing the esophagus. Birds typically resumed singing within 10 days of surgery. We considered songs produced on the second day of singing after surgery (sham surgery: 2–5 days and ts-cut: 3–10 days after surgery) for analysis because of the higher number of songs. For one bird, we did not have pre-surgery songs in the undirected condition, so we excluded this bird from analyses involving comparison with the pre-surgery condition (Figs 2–6). Data from this bird were included only for analysis of the influence of calls on the number and properties of INs (Figs 7 and 8). For the ts-cut group, birds with first motif syllables that were significantly different in duration from the IN were chosen; this made it easier to recognize the onset of the motif after surgery. The number of INs was not a consideration while choosing birds for the sham-surgery or ts-cut groups. For the first two birds in which we attempted ts nerve surgery, we did not observe any changes to song after surgery. While doing the other ts nerve surgeries, we realized that we had not cut the ts nerve in these two birds and so both birds were assigned to the sham-surgery group. The rest of the sham-surgery birds were chosen at random based on availability in our colony. IN number and song were not taken into consideration.
Deafening was done by bilateral removal of the cochlea under equithesin anesthesia using previously described protocols (Kojima et al., 2013; Konishi, 1965). All of the deaf birds (n=7) were also used in a previous study that examined the effects of deafening on song (Kojima et al., 2013). Here, we only analyzed the effects of deafening on IN number and properties. As we were interested in the role of real-time auditory feedback in progression from INs to song, we only analyzed IN number and properties for songs recorded 1 day post-deafening. Birds for deafening were chosen on the basis of their motif structure; the number of INs was not a consideration.
All analysis was performed using custom-written scripts in MATLAB. All data and scripts for analysis are available on request from the corresponding author (email@example.com).
Audio files were segmented into syllables based on a user-defined amplitude threshold. Syllables with less than 5 ms between them were merged and syllables with a duration shorter than 10 ms were discarded. Individual syllables were given labels in a semi-automatic manner. They were first assigned labels based on a modified template-matching procedure (Glaze and Troyer, 2006) or clustering based on acoustic features calculated using Sound Analysis Pro. Clustering was done using KlustaKwik (http://klustakwik.sourceforge.net/). Labels were then manually checked for all files.
The repetitive sequence or song motif for each bird was identified. Song bouts were defined as groups of vocalizations with at least one motif syllable that were separated from other such groups by more than 2 s of silence (Sossinka and Böhner, 1980). For a subset of birds (n=7 deaf birds; n=6 birds for analysis of call-song bouts and n=7 birds for analysis of day-to-day changes in IN number and properties) with triggered recordings, a number of files did not have 2 s of silence before the first vocalization in the file. However, as these were triggered recordings, we assumed that there was silence before the start of the file too and so we considered files with >0.5 s silence at the beginning of the file as valid bouts. For a given bird, we used the same criterion before and after surgery to ensure that the criterion did not affect our results. Syllables that were produced in isolation outside of song bouts were identified as calls. All kinds of calls (distance calls, short calls and intermediate calls) (Zann, 1996) were combined together.
As described earlier (Price, 1979; Rajan and Doupe, 2013; Sossinka and Böhner, 1980), syllables that were repeated at the beginning of a bout were considered as INs. Calls were not considered as INs. As described previously (Zann, 1993), 76.2% of our birds (n=32/42) produced only one IN type. The rest of the birds produced two IN types (n=10/42). For all the analyses described, we combined the multiple types of INs together.
For ts-cut birds, syllables and INs lost their characteristic acoustic structure and were reduced to harmonic stacks without any modulation (Fig. 2A, middle). However, durations of individual syllables and INs remained the same as pre-surgery (Fig. 2A,B, middle, Fig. 3; Fig. S3). In these birds, syllables were labeled using cluster analysis as described above and INs and motif syllables were matched to pre-surgery INs and motif syllables by examining plots for duration versus mean frequency for all syllables.
On average, we analyzed 124 song bouts per bird (median 98 song bouts per bird; range 11–428 song bouts per bird).
Temporal and spectral similarity
We quantified changes in song after removal of sensory feedback using temporal and spectral similarity. Temporal similarity was calculated as the maximum of the cross-correlation function between the normalized amplitude envelopes of a pre-surgery template motif and other pre-/post-surgery motifs (n=9 randomly chosen motifs from pre-surgery and n=10 randomly chosen motifs from post-surgery) (Roy and Mooney, 2007). The template motif was proportionally stretched ±20% to account for differences in duration of the entire motif. As a measure of random temporal similarity between any two zebra finches, we calculated temporal similarity for motifs from 10 random pairs of birds (n=10 motifs each). Spectral similarity (% similarity) was calculated using Sound Analysis Pro (five motifs pre-surgery were compared with five other motifs pre-/post-surgery) (Tchernichovski et al., 2000). Random spectral similarity was measured for 10 random pairs of birds (n=5 motifs each).
Characterization of IN progression
In each song bout, the last set of consecutive INs with inter-IN intervals <500 ms before the first motif syllable were considered for counting IN number (Rajan and Doupe, 2013; Sossinka and Böhner, 1980). All of our analysis was restricted to such sequences of INs present at the beginning of each bout.
Intervals between INs were measured as the duration between the end of an IN to the start of the next IN. The first interval was the interval between the first two INs satisfying the above criteria. The last interval was measured as the interval between the last IN and the first motif syllable. As a measure of the progression of IN timing, we quantified the ratio between successive IN intervals across all IN sequences. Ratios were averaged across bouts to obtain a mean ratio for each bird. A ratio <1 indicated a speeding up of successive intervals as shown earlier (Rajan and Doupe, 2013). The coefficient of variation (CV) was measured as the standard deviation divided by the mean.
To characterize acoustic properties of INs and their progression to song, we used the acoustic distance to the last IN and the ratio of the distance of successive INs, respectively. The acoustic distance is an inverse measure of similarity in acoustic properties between an IN and all last INs (Rajan and Doupe, 2013). We calculated four acoustic features, namely duration, log amplitude, entropy and mean frequency for each IN using the MATLAB code for Sound Analysis Pro (http://soundanalysispro.com/matlab-library). For each day, we randomly chose 50% of the last INs as the reference. The distance of the remaining last INs and the corresponding first INs in the same bouts was measured as the Mahalanobis distance of the IN from the reference last INs in the 4-dimensional space formed by the four acoustic features. As a measure of acoustic progression of INs, we calculated the ratio of distances of successive INs from the last IN for each IN sequence at the beginning of a bout (50% of the bouts were excluded as the last INs from these bouts were chosen as the reference). A ratio <1 indicated that successive INs became closer in distance (or more similar) to the last IN, as seen in intact birds (Rajan and Doupe, 2013).
Analysis of the influence of calls on the number and properties of INs
The influence of calls on IN number and properties was analyzed in 16 normal, unmanipulated birds. Bouts where the first IN began <2000 ms after the end of a call were considered as call-song bouts. Bouts with only INs at the beginning were considered as IN song bouts. Birds with a minimum of five IN song bouts and five call-song bouts were considered for this analysis. For each bird, the mean number of INs in IN song bouts was subtracted from the number of INs in each call-song bout. For each bird, the change in IN number in call-song bouts was then binned at 100 ms resolution starting at 40 ms after the end of the call to 1940 ms after the end of the call. Across all birds, we fitted an exponential function (MATLAB fit function) to characterize the dependence of this change in IN number on time between the end of the call and the start of the first IN. Similarly, we also fitted exponential functions to the change in the interval between the first two INs and change in acoustic properties of the first IN (Fig. S4).
For many of the feedback-deprived birds, we did not have enough call-song bouts to carry out a similar analysis. Instead, we divided call-song bouts into two categories: (1) bouts where the first IN started <200 ms after the end of the call and (2) bouts where the first IN started >200 ms after the end of the call. A 200 ms period was chosen based on the exponential fit (Fig. 7B) and data availability in the feedback-deprived birds. We calculated mean IN number, mean and variability of the interval between the first two INs and acoustic distance of the first IN for both these bout categories and compared them with the corresponding properties for IN song bouts (Figs 7 and 8). Only birds with >3 call-song bouts in both of these categories were considered for analysis. Further, we combined data for ts-cut and deaf birds as our previous results showed that both manipulations had no effect on IN number and properties.
We did not perform any power calculations to determine sample sizes for each group. However, sample sizes are comparable with other studies. As detailed below and in Table S1, we used non-parametric tests for most of our statistical comparisons because of small sample sizes (<10). We used a parametric test – repeated-measures one-way ANOVA and repeated-measures two-way ANOVA – only for analysis of the effect of calls on the number and properties of INs and the analysis of progression of IN features after feedback removal, respectively. Birds were excluded from the analysis only if there were no pre-surgery undirected song recordings or if there were too few song bouts (number specified in earlier sections). Both of these conditions were established before the start of the analysis. The investigators were not blinded for both the choice of animals in each group and the analysis of data. However, IN number or properties were not considered while choosing birds as described in an earlier section.
Wilcoxon signed-rank test was used for paired comparisons of temporal similarity (Fig. 2B), spectral similarity (Fig. 2C), changes in IN/motif syllable acoustic features (Fig. 3; Fig. S1), mean IN number (Fig. 4D), IN number CV (Fig. 4E) and progression in IN features (Fig. S3). For comparing progression in IN timing and IN acoustic structure after removal of feedback (Figs 5D,E and 6D), we used repeated-measures two-way ANOVA using IN position (first versus last) as one factor and time (pre-surgery versus post-surgery) as the second factor (MATLAB code from https://in.mathworks.com/matlabcentral/fileexchange/6874-two-way-repeated-measures-anova). For comparing IN number and properties in bouts where calls preceded the first IN, we used repeated-measures one-way ANOVA (Figs 7C,D and 8). If the ANOVA P-value was <0.05, we used a post hoc Tukey–Kramer test to identify groups that were significantly different (Figs 7C,D and 8). For comparing changes in IN number and properties after surgery with day-to-day changes, we used Kruskal–Wallis ANOVA (Fig. S2). Pearson's correlation coefficient was used to assess the correlation between first IN properties and time to song initiation (Fig. 9).
All of the tests used and the associated P-values are provided in Table S1. A significance level of P=0.05 was used throughout.
To see whether sensory feedback plays a role in song initiation, we analyzed the number and progression of INs after removal of either proprioceptive (n=8 birds) or auditory feedback (n=7 birds). As a control, sham surgeries were performed in a separate group of birds (n=6). As we were interested in self-initiated movement sequences, we focused on undirected songs produced when the bird was alone.
Song spectral structure alone is affected after ts nerve surgery
Proprioceptive feedback was removed by bilaterally cutting the ts nerve (Bottjer and Arnold, 1984; Vicario, 1991; Williams and McKibben, 1992) (ts-cut; n=8 birds, see Materials and Methods) and auditory feedback was eliminated by bilateral removal of the cochlea (Konishi, 1965) (deaf; n=7 birds, see Materials and Methods).
Despite small changes, songs of birds subjected to sham surgery remained more similar to pre-surgery songs than expected by chance in both spectral and temporal structure (Fig. 2). As described earlier (Konishi, 1965; Price, 1979), only minor changes to song characteristics (both temporal and spectral) were seen after deafening. Songs post-surgery remained more similar than expected by chance to pre-surgery songs (Fig. 2). The ts nerve contains both efferent and afferent nerves carrying motor input to the syringeal muscles and proprioceptive feedback from the syringeal muscles, respectively (Bottjer and Arnold, 1984). Nerve cuts disrupted both efferent and afferent nerves, resulting in the loss of song spectral structure immediately after nerve cut surgery (Fig. 2A,C) (Roy and Mooney, 2007; Vicario, 1991; Williams and McKibben, 1992). However, as described earlier, song temporal structure remained more similar than expected by chance to that before ts nerve surgery (Fig. 2A middle and Fig. 2B) as motor input to the respiratory muscles was not affected (Bottjer and Arnold, 1984; Roy and Mooney, 2007; Vallentin and Long, 2015; Vicario, 1991; Williams and McKibben, 1992). Thus, consistent with earlier studies (Bottjer and Arnold, 1984; Roy and Mooney, 2007; Vallentin and Long, 2015; Vicario, 1991; Williams and McKibben, 1992), we also found that cutting the ts nerve altered both proprioceptive and auditory feedback, while deafening disrupted only auditory feedback.
IN acoustic structure, not duration, is affected by ts-cut surgery
We next quantified changes to the acoustic structure of INs after removal of either proprioceptive or auditory feedback. Similar to changes in song syllable structure (Fig. 2A, middle column), INs also became harmonic stacks after surgery in ts-cut birds as seen by increased goodness of pitch and decreased frequency modulation (Fig. 2A middle, Fig. 3E,F, P<0.05, Wilcoxon signed-rank test). Despite this change, we could identify INs because IN duration, mean frequency, entropy and amplitude did not change significantly after surgery (Fig. 3A–D ts-cut; see Materials and Methods for details of IN identification procedure in ts-cut birds). The position of INs at the beginning of the bout was also maintained (Fig. 2A middle). Post-deafening, INs were softer and had reduced mean frequency (Fig. 3B,D, P<0.05, Wilcoxon signed-rank test). However, song syllables were also softer after deafening (Fig. S1, right column, P<0.05, Wilcoxon signed-rank test), suggesting that these changes could have been the result of a change in microphone position after surgery. No significant changes in IN acoustic structure were seen after sham surgery (Fig. 3, P>0.05, Wilcoxon signed-rank test).
Mean IN number before song is not affected by removal of proprioceptive or auditory feedback
We next analyzed the mean and variability of IN number before each song (Fig. 4A–C; see Materials and Methods). The mean number of INs before song (Fig. 4D) and the variability in IN number (measured by the CV – Fig. 4E) did not change significantly soon after surgery in sham-surgery, ts-cut and deaf birds (Fig. 4D and E, P>0.05, Wilcoxon signed-rank test). In fact, changes in mean IN number post-surgery for feedback-deprived birds were not different from day-to-day changes in IN number seen in normal, unmanipulated birds (Fig. S2A). This further strengthened our conclusion that IN number was unaffected by removal of proprioceptive or auditory feedback.
Progression of IN timing to song is not affected by removal of proprioceptive or auditory feedback
We have previously shown that progression of INs to song is accompanied by changes in both the timing and acoustic structure of INs within a bout (Rajan and Doupe, 2013). We first considered IN timing. Specifically, intervals between successive INs progress from a longer, more variable first interval to a shorter, more stereotyped interval between the last IN and song (Rajan and Doupe, 2013). This progression in IN timing was unchanged after surgery in sham-surgery, ts-cut and deaf birds (Fig. 5A–C). After surgery, the interval between the first two INs remained longer and more variable than the interval between the last IN and song in ts-cut, deaf and sham-surgery birds (Fig. 5D,E; P<0.05 for first versus last, repeated-measures two-way ANOVA). Importantly, removal of feedback did not alter either the mean or variability of both the first interval and the last interval (Fig. 5D,E; P>0.05 for pre- versus post-surgery, repeated-measures two-way ANOVA). A number of other aspects of IN timing were also not affected by removal of auditory or proprioceptive feedback and changes in IN timing post-surgery were similar to day-to-day changes seen in unmanipulated birds (Figs S2B–D and S3A – see Materials and Methods). These results showed that the timing of INs and their progression did not depend on intact sensory feedback.
Progression of IN acoustic features to song is not affected by removal of proprioceptive or auditory feedback
Similar to IN timing, the acoustic structure of INs has also been shown to progress to a consistent last-IN state just before song (Rajan and Doupe, 2013). Although individual INs in each bout looked very similar (Fig. 6A–C top), we have previously shown that the first IN is less similar to the last IN across bouts. We quantified this by calculating the similarity between the first IN and the last IN before and after surgery (acoustic distance to the last IN: the smaller the distance, the greater the similarity and vice versa; see Materials and Methods; see Fig. 6A–C for representative examples for sham-surgery, ts-cut and deaf birds). As we were interested in the progression, we calculated similarity to the last IN on the same day (pre-surgery last IN for pre-surgery and post-surgery last IN for post-surgery; see Materials and Methods). For each day, half of the last INs across all bouts were randomly chosen as a reference. The rest of the last INs and all of the first INs were then compared with this reference using the acoustic distance as an inverse measure of similarity (see Materials and Methods). The first IN was significantly different from the last IN (larger distance – Fig. 6D) before and after surgery in deaf birds (P<0.05 for first versus last IN, repeated-measures two-way ANOVA) and almost reached significance in ts-cut birds (P=0.0544 for first versus last IN, repeated-measures two-way ANOVA). However, this difference was smaller in sham-surgery birds both before and after surgery and did not reach significance (Fig. 6D, P=0.2469 for first versus last IN in sham-surgery birds, repeated-measures two-way ANOVA). Importantly, in all groups of birds, removal of feedback did not affect any of the measures of progression (P>0.05, pre- versus post-surgery, repeated-measures two-way ANOVA). These results showed that INs still progressed from a first IN that was significantly different from the last IN to a more consistent last IN even in the absence of auditory or proprioceptive feedback. A number of other aspects of IN acoustic structure progression to song were also not affected by removal of auditory or proprioceptive feedback and remained similar to day-to-day changes seen in unmanipulated birds (Figs S2E–G and S3B – see Materials and Methods). As mentioned earlier, ts-cut birds lacked neural input to the syringeal muscles in addition to the loss of proprioceptive feedback from the syringeal muscles. The continued progression of IN acoustic features suggested that this progression is a result of changing respiratory drive, as neural input to the respiratory muscles remained intact in these birds.
Overall, these results show that IN number and progression are not dependent on intact sensory feedback (auditory and proprioceptive). This suggested that IN progression to song is controlled by internal neural processes.
IN number is reduced when calls precede the first IN of a song bout
If IN progression is controlled by internal neural processes, we next asked whether the presence of calls (other non-song vocalizations) just before the first IN influenced the number of INs before song. Calls are partially learned or unlearned vocalizations that are acoustically distinct from song and are initiated by separate neural pathways (Simpson and Vicario, 1990; Vicario, 2004; Zann, 1996). Many aspects of calls are controlled by song motor nuclei and increased neural activity is seen in many of the song motor nuclei before and during calls (Benichov et al., 2016; Danish et al., 2017; Hahnloser et al., 2002; Kozhevnikov and Fee, 2007; Long and Fee, 2008; Rajan, 2018; Simpson and Vicario, 1990; Vyssotski et al., 2016; Yu and Margoliash, 1996). Further, we have previously shown the presence of higher levels of preparatory activity in the premotor nucleus HVC before the first IN when calls precede the first IN of a song bout (Rajan, 2018). Given these changes in neural activity when calls are present before the first IN, we expected changes in IN number. We tested this in a separate set of unmanipulated birds (n=16) by examining the number of INs in song bouts where calls (other non-song vocalizations) preceded the first IN (call-song bouts – see Materials and Methods).
Calls occurred at variable times before the first IN in a small fraction of bouts (Fig. 7A; mean±s.e.m. interval between end of call and start of first IN: 468.6±48 ms, mean±s.e.m. CV of interval between end of call and start of first IN: 0.85±0.07; n=16 birds). We observed fewer INs when calls occurred before the first IN (n=16 birds; mean±s.e.m. for IN song bouts: 3.7±0.24, for call song bouts: 3.4±0.25, P=0.03, Wilcoxon signed-rank test). This reduction was dependent on the time between the end of the call and the start of the IN: the shorter the time, the greater the reduction (Fig. 7B, adjusted R2=0.31 for an exponential fit, see Materials and Methods). In both feedback-intact and feedback-deprived birds, song bouts where the first IN began <200 ms after the end of a call had fewer INs when compared with song bouts with only INs at the beginning or song bouts where the first IN began >200 ms after the end of the call (Fig. 7C,D, P<0.05, repeated-measures one-way ANOVA and post hoc Tukey–Kramer test). These results showed that the presence of calls just before the first IN of a song bout correlated with fewer INs in both feedback-intact and feedback-deprived birds and further strengthened our conclusion that IN progression to song may be controlled by internal neural processes.
Calls just before the first IN of a song bout correlate with altered ‘initial’ state
Given that both IN timing and acoustic features progress towards a consistent ‘ready’ state just before song, we hypothesized that calls might reduce IN number by speeding up this progression. Consistent with this idea, song bouts where the first IN began <200 ms after the end of call had a significantly shorter interval between the first two INs when compared with song bouts with only INs or song bouts where the first IN began >200 ms after the end of the call (Fig. 8A,B; P<0.05, repeated-measures one-way ANOVA followed by post hoc Tukey–Kramer test). This was true both in feedback-intact (Fig. 8A) and feedback-deprived birds (Fig. 8B). In feedback-intact birds, the decrease in interval between the first two INs in bouts with calls was correlated with the time between the end of the call and the start of the first IN, though the strength of the correlation was moderate (Fig. S4A, adjusted R2 for exponential fit=0.16). In contrast to the changes in IN timing, neither the variability of the interval between the first two INs nor the acoustic structure of the first IN showed any differences based on whether calls were present before the first IN or not (feedback-intact birds: Fig. 8C,E; feedback-deprived birds: Fig. 8D,F; P>0.05, repeated-measures one-way ANOVA). However, in feedback-intact birds, relative to bouts with only INs, the acoustic structure of the first IN after a call was more similar to that of the last IN (Fig. S4B). The change in acoustic structure was correlated with the time between the end of the call and the start of the first IN, but the strength of the correlation was weak (Fig. S4B, adjusted R2 for exponential fit=0.09). Overall, these results showed that the presence of calls correlated with a change in IN timing (shorter interval between the first two INs), potentially causing the reduction in IN number before song.
‘Initial’ state of IN progression correlates with time to song initiation
Our results suggested that the progression of IN timing and acoustic features are controlled by internal neural processes possibly related to motor preparation. In other systems, neural preparatory activity is strongly correlated with the time to movement initiation: the greater the progress of preparation, the shorter the time to movement initiation (Churchland et al., 2006a; Shenoy et al., 2011, 2013). Similar to this, we found a significant correlation between the length of the interval between the first two INs and the time to song initiation in all birds (see example data from one bird in Fig. 9A; across all 16 birds, mean r=0.77, range=0.57–0.90). How similar the first IN was to the last IN was also correlated with the time to song initiation, albeit to a weaker extent in 14/16 birds (see example from one bird in Fig. 9B; significant in 14/16 birds, mean r=0.32, range=−0.39–0.62). These data suggested that IN timing and acoustic features reflect internal neural processes, possibly involved in preparing the zebra finch brain for song initiation.
In this study, we show that real-time auditory and/or proprioceptive feedback is not required for initiation of adult zebra finch song. We also show that the progression of INs, the repeated pre-song vocalizations, from a variable initial state to a more stereotyped final state is also independent of real-time sensory (auditory and/or proprioceptive) feedback. Further, we show, in both feedback-intact and feedback-deprived birds, that fewer INs are present when the first IN of a song bout occurs within 200 ms of the end of a call (other non-song vocalization). In such cases, IN timing was closer to the final state. Finally, the ‘initial’ state of IN progression was correlated with the time to song initiation. Overall, these results demonstrate that the progression of INs to song does not require real-time sensory feedback. Rather, progression of INs to song is controlled by internal neural processes possibly involved in preparing the motor system for song initiation.
Contributions of respiratory feedback to song initiation
One feedback that we did not alter is respiratory feedback from the air sacs (Méndez et al., 2010). However, previous work strongly suggests that respiratory feedback does not contribute to IN initiation. First, one earlier study showed that disrupting respiratory pressure during short syllables (of the order of 60 ms) did not disrupt song progression (Amador et al., 2013). Given that INs are short syllables of the order of 60 ms, INs may not require real-time respiratory feedback for progression to the next syllable (or song). Second, unilateral disruption of vagal feedback mostly affected syllables at the end of a song (Méndez et al., 2010). Finally, sparse, patterned neural activity of one class of neurons in the premotor nucleus HVC during singing was also not affected by removal of sensory feedback including respiratory feedback (Vallentin and Long, 2015). All of these data suggest that respiratory feedback does not play a role in IN progression.
Long-term requirement for sensory feedback
Song production in adult birds does not depend on real-time sensory feedback (Bottjer and Arnold, 1984; Konishi, 1965) and our results show that song initiation also does not depend on real-time sensory feedback. However, long-term song maintenance does require intact sensory feedback, as shown by song degradation starting many weeks after deafening (Horita et al., 2008; Nordeen and Nordeen, 1992; Williams and McKibben, 1992). Similarly, it is possible that sensory feedback could be necessary in the longer term for maintenance of IN progression to song (our study focused on songs produced within 10 days of removal of feedback). It would also be interesting to see whether song degradation seen at later time points after deafening is linked to (or caused by) a change in IN progression to song. If INs represent preparatory vocalizations, such a link would be expected as small changes in the neural preparatory state in primates are correlated with changes in features of the upcoming movement (Afshar et al., 2011; Churchland et al., 2006b).
Comparison of INs to motor preparation in other systems
Preparatory neural activity has been described as a slow change in neural activity, starting as early as 1 s before the start of a movement (Chen et al., 2017; Churchland et al., 2006b; Gao et al., 2018; Lee and Assad, 2003; Li et al., 2015; Maimon and Assad, 2006; Murakami et al., 2014; Romo and Schultz, 1987; Tanji and Evarts, 1976). One important characteristic of this preparatory activity appears to be a decrease in variability across trials (Churchland et al., 2006a,c). The decrease in variability as INs progress to song (Rajan and Doupe, 2013) is very similar to the decrease in variability in neural activity seen before the start of a movement. Together with our current data showing that sensory feedback is not important for progression of INs to song, these results suggest that INs may represent preparatory activity. Additionally, earlier studies have shown the presence of preparatory neural activity in song control areas well before the first IN of undirected song bouts (Hessler and Doupe, 1999; Kao et al., 2008; Rajan, 2018; Woolley et al., 2014) and directed song bouts (Daliparthi et al., 2018preprint). Thus, INs may reflect a continuation of this preparatory activity that begins hundreds of milliseconds before the first IN.
Overt movements in other systems as motor preparation
Our results suggest that overt vocalizations (INs) serve as preparatory activity. Previous studies describing neural preparatory activity in primates and rodents before the onset of a movement have not described similar overt movements as motor preparation (Chen et al., 2017; Churchland et al., 2006a; Gao et al., 2018; Murakami et al., 2014; Romo and Schultz, 1987; Tanji and Evarts, 1976). However, all of these studies have involved training animals to perform a task and animals are rewarded for maintaining stable posture without movements until a ‘go’ signal is provided for movement initiation. Therefore, overt preparatory movements, if present during the early stages of learning, would not be reinforced. This raises two interesting questions for further experiments. (1) Are overt movements present at early stages of learning in primates and rodents too? (2) Given that songbirds learn their song with internal reinforcement cues that only reinforce similarity to the tutor song (or tutor song memory) (Fee and Scharff, 2010), are INs learned similar to song learning? Additionally, there are human studies showing the presence of small eye movements (microsaccades) and small limb movements while waiting for a ‘go’ cue to perform an eye or limb movement (Betta and Turatto, 2006; Cohen and Rosenbaum, 2007; Corneil and Munoz, 2014). Changes in pupil size have also been shown to correlate with preparatory activity (Wang et al., 2015). This suggests that overt movements like INs may be more common before the start of naturally learned movements and may reflect motor preparation.
Mechanisms for IN progression to song
Our results show that sensory feedback is not essential for IN progression to song. Rather, the properties of INs correlate with the time to song initiation. How do the properties of INs change to progress to song? In our current study, we showed that the presence of calls prior to the first IN was correlated with shorter intervals between the first two INs and fewer INs before song. Similarly, shorter intervals between the first two INs have also been observed when neural preparatory activity in the premotor nucleus HVC precedes the first IN (Rajan, 2018). As calls are also associated with increased neural activity in many song control areas (Benichov et al., 2016; Danish et al., 2017; Hahnloser et al., 2002; Kozhevnikov and Fee, 2007; Vyssotski et al., 2016; Yu and Margoliash, 1996), the intervals between successive INs may reflect a history of increased activity within these inter-connected motor regions. The shorter interval might also lead to short-term plasticity that could facilitate song initiation by speeding up IN progression. Such short-term plasticity has been observed in the inputs to the premotor nucleus HVC (Coleman et al., 2007). Further experiments disrupting short-term plasticity or disrupting activity in motor control regions during IN production could help us to understand the mechanisms of IN progression to song.
Overall, our results show that real-time sensory feedback is not essential for INs to progress to song. Rather, changes in IN properties just before song initiation may reflect internal neural processes, potentially involved in preparing the zebra finch brain for initiation of the learned song sequence.
We would like to thank Prakash Raut for help with bird colony maintenance. We would also like to thank Michael Long, Hamish Mehaffey, Anand Krishnan, Deepa Subramanyam, Girish Deshpande, Sanjay Sane, Upi Bhalla and members of the Rajan and Krishnan Labs for useful discussions and comments on the manuscript. We also thank Allison Doupe, in whose lab the deafening experiments were carried out.
Conceptualization: D.R., R.R.; Methodology: D.R., R.R.; Software: D.R., R.R.; Formal analysis: D.R., R.R.; Investigation: D.R., S.K.; Writing - original draft: D.R., R.R.; Writing - review & editing: D.R., S.K., R.R.; Visualization: D.R.; Supervision: R.R.; Project administration: R.R.; Funding acquisition: R.R.
This work was supported by a Department of Biotechnology, Ministry of Science and Technology (DBT) Ramalingaswami Fellowship (BT/HRD/35/02/2006) and a grant from the Department of Science and Technology, Ministry of Science and Technology (DST SERB EMR/2015/000829) and intramural support from the Indian Institute of Science Education and Research (IISER) Pune to R.R. We would also like to acknowledge a graduate student fellowship from IISER Pune and travel support from Department of Biotechnology, Ministry of Science and Technology-Conference, Travel, Exhibition and Popular Lectures (DBT/CTEP/02/2018 0847433) and the Infosys Foundation Travel Award (IISER-P/InfyFnd/Trv/116) to D.R. The deafening experiments, carried out at UCSF were supported by a National Institutes of Health award R01 (MH55987) to Allison Doupe. Deposited in PMC for release after 12 months.
The authors declare no competing or financial interests.