Many species are able to vocally recognize individual conspecifics and this capacity seems widespread in oscine songbirds. The exact acoustic features used for such recognition are often not clear. In the zebra finch (Taeniopygia guttata), the song motif is composed of a few syllables repeated in a fixed sequential order and song bouts include several repetitions of the motif. Here, we used an operant discrimination task, the GO/NOGO procedure, to show that zebra finches are capable of individual vocal recognition even if the bird has to distinguish males that all produce an imitation of the same song model. Furthermore, we studied whether such individual vocal recognition was based on spectro-temporal details of song syllables, i.e. the local fine structure of the song, or on the sequential order in which song syllables are arranged in the song bout. To this end, we trained male and female zebra finches to discriminate songs of one male conspecific from those of four others. After learning this baseline discrimination, subjects were exposed to a novel set of stimuli originating from the same individuals, in order to test for their capability to generalise. Subjects correctly classified those novel stimuli, illustrating their ability for individual vocal recognition. Then they were exposed to hybrid stimuli combining the syllable sequences of one individual with the spectro-temporal features of another. Behavioural responses of subjects to hybrid stimuli suggest that they rely on spectro-temporal details of syllables and might pay less attention to syllable sequences for individual vocal recognition.

The capacity for individual vocal recognition is widespread amongst different avian species (Aubin and Jouventin, 2002; Falls, 1982). This capacity has for instance been demonstrated in the context of mate and parent–young recognition (Falls, 1982). Many of the species studied in these contexts are non-vocal learners, in which non-learned vocalizations are shared among individuals. In such species, individual vocal recognition is often based upon individual peculiarities of voice (Aubin and Jouventin, 2002).

In oscine songbirds, which are capable of vocal production learning, the capacity for individual vocal recognition has been particularly well documented in the context of territorial defence. Active singing behaviour plays an important role in such defence and territorial songbirds often respond less aggressively to song playback of well-established territorial neighbours than to playback of strangers (‘dear enemy phenomenon’; Temeless, 1994), indicating the capacity for neighbour recognition (e.g. Brindley, 1991; Godard and Wiley, 1995).

In vocal learners, individual vocal recognition can be based upon an individually unique set of vocalizations such as in songbird species with song-type repertoires (Gentner and Hulse, 1998). The degree of song sharing might be an important factor here: lower degrees of song sharing might allow for neighbour recognition (Moser-Purdy and Mennill, 2016) whereas larger degrees of song sharing might lead to individuals confusing shared song types of different singers (Beecher et al., 1994).

Individual vocal recognition can also be based upon an individual’s unique song type (Gentner and Hulse, 1998) such as in species in which each individual sings a single (usually individually different) song type, also referred to as ‘signature song’ (Weary et al., 1990). Examples are common yellowthroats (Geothlypis trichas; Wunderle, 1978), indigo buntings (Passerina cyanea; Emlen, 1971) and zebra finches (Taeniopygia guttata).

The zebra finch is an oscine songbird with male-only song production. This is a non-territorial species meaning that assays other than the neighbour–stranger discrimination have to be used to demonstrate individual vocal recognition, such as preference tests (Clayton, 1988) or operant discrimination tasks (Cynx and Nottebohm, 1992; Gess et al., 2011). We have recently found that zebra finches are able to discriminate songs produced by familiar individuals from songs produced by unfamiliar individuals even if all of these songs were shared, i.e. all were imitations of the same song type (N. Geberzahn, S. Zsebők and S. Derégnaucourt, unpublished results). These findings suggest that individual recognition might be possible despite a complete sharing of song types between individuals. Furthermore, individual identity seems to be coded not only in learned song but also in unlearned calls in this species (D'Amelio et al., 2017; Elie and Theunissen, 2018). Both findings suggest that zebra finches are able to individually recognize each other based on little individualized vocalizations. In the current study, we sought to confirm such individual vocal recognition based on shared songs. In addition, we investigated the perceptual mechanism for individual recognition in zebra finches by songs shared between all individuals. To this end, we made use of a colony of zebra finches that was founded by males having all vocally learned to imitate one and the same song type (L. Le Maguer, N. Geberzahn, L. Nagle and S. Deregnaucourt, unpublished results).

Zebra finch song begins with introductory syllables followed by a repeated sequence of a relatively rigid succession of syllables: the so-called motif (Sossinka and Böhner, 1980). It is this rigid succession of syllables in motifs that gives zebra finch song its reputation of being highly stereotyped. However, on a higher level of song organization – that is, on the song bout level – there is more variability: Hyland Bruno and Tchernichovski (2019) found that co-tutored zebra finches that acquired similar motifs showed variation in how motifs were strung together, with subsequent motif repetitions linked via a variable number of short ‘connector’ vocalizations. Likewise, L. Le Maguer, N. Geberzahn, L. Nagle and S. Deregnaucourt (unpublished results) found considerable individual variation in the song bout structure in colonies of zebra finches that were founded by males having all vocally learned to imitate one and the same song type.

Do zebra finches pay attention to variability in the overall song structure? Some lines of evidence suggest that they do: in addition to the acoustic form of syllables, zebra finches vocally learn to imitate the song structure from their father (Menyhart et al., 2015) and they can even be trained to swap syllable order during the course of vocal development (Lipkind et al., 2013). They can also be trained to discriminate between different syllable sequences (for review, see ten Cate, 2017, 2018).

However, other perceptual studies testing the ability of zebra finches to discriminate between normal versus reversed syllables as well as normal versus reversed syllable sequences suggest that zebra finches pay more attention to syllable structure and are less sensitive to changes in syllable order (Braaten et al., 2006; Lawson et al., 2018).

In the current study, we investigated the role of syllable sequences versus spectro-temporal details of syllables (or fine structure of syllables) for the process of individual recognition. By using an operant discrimination task, we trained male and female zebra finches to discriminate between songs of conspecifics based on singer identity. We then exposed those subjects to hybrid stimuli constructed from songs of the same singers as during the training. These hybrid stimuli combined the syllable sequences of one individual with the spectro-temporal details of another (or vice versa). The behavioural responses of our subjects to hybrid stimuli allowed us to examine whether zebra finches rely on syllable sequences or instead on spectro-temporal features of syllables for individual vocal recognition.

Subjects and housing conditions

We used 10 adult zebra finches, Taeniopygia guttata (Vieillot 1817) (5 males, 5 females), aged 303±99 days (mean±s.d.) at the start of the experiment from our breeding colony at the Université Paris Nanterre, France. Subjects were maintained in social groups in cages (118 cm×50 cm×50 cm) and had ad libitum access to a commercial tropical seed mixture, egg food, cuttlebone, bird's grit and water; the food was supplemented with fresh lettuce once a week. Throughout the different phases of the operant discriminant task, subjects were housed in the apparatus for operant conditioning (see below). Each bird was individualized using a numbered red ring. Subjects were held on a 14 h:10 h light:dark cycle (lights on 08:00 h–22:00 h Central European time).

Apparatus for operant conditioning

Experiments were carried out in custom-built sound-isolation chambers measuring 58 cm×51 cm and 57 cm high from inside, lined with acoustic foam and containing a metal wire cage equipped with two perches. Next to one perch, subjects could peck on either of two keys (one situated to the right, the other to the left of the perch). Vis-à-vis the keys, at the other end of the same perch, a feeder (containing tropical seed mixture and egg food) was located to which access was blocked by a transparent plastic window. Access to food was possible only by mastering the task (see below), which led to the plastic window being pulled up by a string connected to the arm of a stepping motor (Modelcraft Premium RC-CarServo 4519DBB). A camera (Handykam 420TVL COL CCD 12V PAL) placed inside the cage allowed monitoring of the birds' behaviour. A speaker (YAMAHA Monitor Speaker MS101 III, frequency response: 30 Hz to 20 kHz) was placed behind the wire cage. All experimental events were controlled and recorded by a custom-written MATLAB program (see Gess et al., 2011) that we had slightly modified.

Experimental procedures

Learning of the basic task

During this phase, subjects had to learn the basic task by trial and error. The basic task consisted of pecking on the right key in order to elicit the playback of a song and to subsequently peck on the left key in order to open the feeder for 10 s. Subjects had access to cuttlebone, grit and water at all times. They had access to tropical seed mixture and egg food in the feeder only if they executed the basic task. One and the same stimulus was used for this learning phase. This stimulus was a different song type from the stimuli used in the subsequent operant discrimination task. As long as we could not detect any pecking activity by the subject, the feeder was programmed to open for 60 s every 600–900 s (interval chosen randomly in this range). This allowed the subject to acclimate to the feeder and to get used to the movement of the plastic window and the sound of the stepping motor. We monitored the behaviour and the mass of the subject in order to make sure that it gained regular access to food (if not, we temporarily provided an additional food dish). All but two subjects (one female and one male) eventually learned the basic task, leaving a final sample size of 8 subjects (4 males and 4 females).

Training phase – learning to discriminate the first set of stimuli

Once a subject had mastered the basic task, it received training to discriminate two categories of songs with a first set of stimuli (N=16 song stimuli). As before, pecking the right key elicited playback of one of the stimuli. If a given stimulus belonged to the category GO (N=8 song stimuli, see ‘Stimuli’, below, for further details), the bird had to subsequently peck on the left key (i.e. to give a GO response) in order to obtain access to food for 10 s. However, if the stimulus belonged to the category NOGO (N=8 song stimuli), the bird had to withhold such a pecking response as pecking in this case would cause the light to go off for 10 s (constituting a mild punishment). Once a subject made correct responses to at least 75% of the stimuli for three or more consecutive blocks of 100 trials, it was transferred to the next experimental phase of the protocol, in which it received a second set of stimuli.

Generalization phase – transfer to the second set of stimuli

During the generalization phase, birds were exposed to a second, novel set of stimuli (N=16 novel song stimuli). Eight of these stimuli belonged to the same category as the GO stimuli in the previous training phase (see ‘Stimuli’, below, for further details) and the other eight belonged to the same category as the previous NOGO stimuli. The aim was to test whether subjects had formed categories during the training phase, and thus were able to distinguish between these two categories of stimuli, or whether they had learned how to respond to each single stimulus (rote memorization of individual stimuli) without forming categories. The birds had to reach the same criteria for correct responses as in the training phase in order to be transferred to the next phase in which the two previous sets of stimuli were combined.

Two sets of stimuli combined – acclimation to reduced rate of reinforcement

In this phase, subjects were exposed to the two sets of stimuli combined (N=32, 16 GO, 16 NOGO). As soon as they reached the criteria for correct responses (see ‘Training phase – learning to discriminate the first set of stimuli’, above), the rate of reinforcement for correct responses was lowered from 100% to 80% in order to get them used to a reduced rate of reinforcement. Subjects were transferred to the next experimental phase, the test phase, once they reached the criteria for correct responses (as described for the training phase) with this lowered reinforcement rate.

Test phase – exposure to hybrid stimuli

Each subject continued to respond to the 32 song stimuli (two previous sets combined now considered as baseline stimuli, reinforcement rate at 90%) and on 6% of the trials one of eight different renditions of hybrid stimuli (see ‘Stimuli’, below) was presented (hybrid stimuli were never reinforced). The test phase went on until a subject was exposed at least 25 times to each of the eight renditions of hybrid stimuli.

Stimuli

For stimulus selection, we made use of one out of three colonies of zebra finches, each of which was originally founded by males all singing an imitation of the same song type (Derégnaucourt et al., 2014; L. Le Maguer, N. Geberzahn, L. Nagle and S. Deregnaucourt, unpublished results). To this end, founder males were chosen (according to the quality of their imitation) amongst a pool of males that had been trained previously to produce the same song type, either by one-to-one live tutoring or by operant tutoring (Derégnaucourt et al., 2013). Amongst the offspring of this colony, we selected five males producing a close imitation of the colony song type and thus sharing their syllable types. Offspring in this colony nevertheless showed slight variation in the way they sequentially arranged the syllables in their songs (Fig. 1).

All songs used as stimuli were undirected songs recorded from these five different males that were unfamiliar to the subjects (Fig. 1A–E). Their songs had been recorded whilst the birds were individually housed in sound-isolation chambers. A microphone (Behringer C-2) was placed above the cage and songs were digitally recorded into wav files (sampling rate: 44.1 kHz, accuracy: 16 bit) using PreSonus Audiobox 1818VSL and Sound Analysis Pro software run on a PC (SAP freely available at http://soundanalysispro.com; Tchernichovski et al., 2004). Files were high-pass filtered at 0.45 kHz. We rescaled stimuli to root mean square equalized amplitudes using a script implemented in Praat (freely available at www.gbeckers.nl/pages/praat_scripts/rms_equalize.praat_script). All stimuli used from the training phase onwards were different renditions of one and the same song type.

Stimuli used during the training and generalization phase

Subjects had to discriminate between the songs of one individual male (eight different renditions) and those of four other males (two different renditions of each of them). For example, one bird was reinforced for pecking the left key each time it heard a song from bird A, and was not punished when withholding pecking each time it heard a song from bird B, C, D or E. Another bird was reinforced for pecking the left key each time it heard a song from bird B and was not punished when withholding pecking each time it heard a song from bird A, C, D or E, and so on. Half of our subjects learned to peck in response to songs of one individual bird (stimulus category GO) and to withhold pecking for the songs of four other individuals (stimulus category NOGO). We refer to those subjects as the group GO: INDV. The other half of the subjects learned to peck for songs of four different individuals (stimulus category GO) and to withhold pecking for the songs of one other individual (stimulus category NOGO). We refer to this second group as GO: MULT. This design was inspired by and resembled the one used by Gentner and Hulse (1998). We did not use a given stimulus more than 4 times (i.e. for more than four different subjects). To this end, we used 24.6±4.9 (mean±s.d.) different song renditions of each of the five individual stimulus birds (introduced above as A, B, C, D and E), giving a total of 123 different song renditions; 118 of these stimuli were used twice and 5 were used 4 times. Each subject was exposed to 32 different song renditions (16 stimuli each during the training and the generalization phase).

Creation of hybrid stimuli used during the test phase

Hybrid stimuli were created by using stimuli from the pool of 123 different song renditions used for the training and generalization phase. These renditions were chosen in such a way that each subject had not yet been exposed to them previously. Hybrid stimuli were created using two parallel instances of Avisoft-SASLab Pro (http://www.avisoft.com/). We replaced all syllables of a given song rendition of one individual (e.g. individual 1 in Fig. 2) with the equivalent syllables of a given song rendition of another individual (e.g. individual 2 in Fig. 2) by copying and pasting the syllables one by one. Thus, all syllables in a given hybrid stimulus were from one and the same recording of one bird and we did not mix syllables across different individuals. This meant that we kept syllable durations of one individual (e.g. individual 2 in Fig. 2) and silent gap durations of the other (individual 1 in Fig. 2).

The resulting hybrid stimulus thus combined the syllable sequence of one individual (e.g. individual 1 in Fig. 2) with the spectro-temporal features of syllables of another individual (e.g. individual 2 in Fig. 2). Note that whilst this hybrid stimulus represented the syllable sequence of one individual (e.g. individual 1 in Fig. 2), the temporal gaps of this individual (i.e. durational patterns of syllables and intersyllabic silences; see Araki et al., 2016; Sasahara et al., 2015) were not necessarily maintained.

Analysis

Analyses were conducted at individual and group levels. Analysis at the individual level was carried out with a binominal test and a subsequent Benjamini–Hochberg correction for eight comparisons (Benjamini and Hochberg, 1995). Analysis at the group level was done with paired t-test or Wilcoxon signed rank test, depending on whether or not data were normally distributed (tested using Shapiro–Wilk tests). We used R (version 3.5.0, R Core Team 2018, http://www.cran.r-project.org), except for the Benjamini–Hochberg corrections, which were calculated using spreadsheets.

Generalization phase

The proportion of hits (correct responses to stimuli of the category GO) averaged over the first four exposures for each of eight stimuli in the generalization phase was measured against a binominal distribution with a probability equivalent to chance level. The same was done for false alarms (incorrect responses to stimuli of the category NOGO). Chance level was calculated for each subject based on its individual performance just before the transfer to the novel set of stimuli. To this end, we first calculated for each subject the proportion of hits over the last four exposures for each of eight stimuli in the preceding training phase. Then, we likewise calculated for each subject the proportion of false alarms over the last four exposures for each of eight stimuli in the preceding training phase. Chance level was then calculated as the mean of these two proportions following the formula:
(1)
We only considered the last four exposures of the training phase in order to capture the performance right at the end of the training phase. Likewise, we selected only the first four exposures of each stimulus in the generalization phase to reduce the risk that birds had already learned to associate this new set of stimuli with the categories GO and NOGO (note that in the generalization phase, all stimuli were reinforced). In addition to the individual-based analysis, we conducted a comparison of the proportion of correct responses to each stimulus of the category GO (first four exposures for each of eight stimuli) and the incorrect responses to each stimuli of the category NOGO (first four exposures for each of eight stimuli) of the generalization phase at the group level.

Test phase – exposure to hybrid stimuli

The proportion of GO responses to hybrid stimuli was measured against a binominal distribution with a probability equivalent to chance level. This probability was individually calculated as the grand mean of the average of hit responses to the last six presentations of GO stimuli and the average of false alarms to the last six presentations of NOGO stimuli prior to a given hybrid stimulus presentation, thus reflecting the performance in response to already well-known stimuli (baseline stimuli) during the test phase.

Ethical note

The study was conducted according to the Association for the Study of Animal Behaviour guidelines on animal experimentation. Experimental authorization was provided by the French Ministry for National Education, Higher Education and Research (authorization no. 02609.02).

Discrimination between songs produced by one individual versus songs of four other individuals

All eight subjects that had successfully learned the basic operant task by trial and error also learned to discriminate between stimuli of the category GO versus NOGO of the first set of stimuli. This means they learned to discriminate between songs produced by one unfamiliar individual conspecific versus songs produced by four other unfamiliar individual conspecifics. The number of training blocks (each block containing 100 trials) necessary to pass to the generalization phase ranged from 8 to 70 (mean=24.8), which corresponded to a number of days ranging from 3 to 20 (mean=8.1).

Performance after transfer to the second (novel) set of stimuli

Just after having been transferred to the second and novel set of GO and NOGO stimuli, four out of eight subjects gave a higher proportion of correct responses to the GO stimuli (i.e. hits) than expected by chance (green bars in Fig. 3, Table 1). Seven out of eight subjects responded incorrectly to NOGO stimuli significantly less often than expected by chance (red bars in Fig. 3, Table 1), i.e. they gave significantly fewer false alarms. At a group level, birds responded with a significantly higher proportion of GO responses to GO stimuli (i.e. hits) than to NOGO stimuli (i.e. false alarms) (Fig. 4, Wilcoxon signed rank test with continuity correction: V=36, n=8, P=0.014).

Responses to hybrid stimuli

Six out of eight birds responded significantly more often to hybrid stimuli with GO responses when hybrids were created using the syllables of the category GO; for the remaining two subjects, the proportion of GO responses did not differ from chance level (Fig. 5, Table 2). All subjects responded significantly less often to hybrid stimuli with GO responses when hybrids were created using the sequences of the category GO (Fig. 5, Table 2). At a group level, birds responded with a significantly higher proportion of GO responses to hybrid stimuli created by syllables of the category GO than to hybrid stimuli created by sequences of the category GO (Fig. 6; paired t-test: t=−11.89, d.f.=7, P<0.001).

Male and female zebra finches were able to individually recognize unfamiliar conspecifics by their song even though all the unfamiliar conspecifics, between which they had to distinguish, produced imitations of one and the same song type and thus sang very similar songs. The perceptual mechanism for such individual vocal recognition was based on spectro-temporal features, i.e. the local fine structure, of syllables rather than the sequence in which syllables were arranged in a song bout.

Individual vocal recognition despite acoustically highly similar songs

Zebra finches correctly classified novel stimuli in the generalization phase, suggesting that they were able to discriminate between songs of unfamiliar conspecifics based on singer identity. Thus, despite the fact that all males sang an imitation of the same song type, they obviously nevertheless produced songs discriminable for the receiver. In order to correctly classify novel stimuli in the generalization phase, subjects must have made use of acoustic features that varied with singer identity.

We have recently found that male zebra finches are able to discriminate between songs from familiar and unfamiliar conspecifics, despite the fact that all singers sang the same song type (N. Geberzahn, S. Zsebők and S. Derégnaucourt, unpublished data). The current study confirms that individual vocal recognition is the most likely candidate mechanism for such discrimination between familiar and unfamiliar song.

The current study also shows that male and female zebra finches are capable of individual vocal recognition of unfamiliar conspecifics despite a complete sharing of song types between individuals. Another species, the song sparrow (Melospiza melodia), seems to lack such capabilities, as males confounded the shared song types of different singers in an operant discrimination task (Beecher et al., 1994). Such differences between these two species might be due to the fact that the song sparrow has a repertoire of several song types whereas zebra finches have only one song type. Alternatively, or in addition, such differences might be due to socio-ecological factors as, in contrast to song sparrows, zebra finches are highly gregarious and non-territorial birds living in flocks, with a high level of dispersion and probably fission/fusion dynamics regarding group structure (Zann, 1996). Having advanced capabilities of individual vocal recognition of conspecifics by auditory cues might thus be highly adaptive in zebra finches.

Acoustic feature used for individual vocal recognition

Zebra finches responded frequently with GO responses when hybrid stimuli combined the syllable structure of the category GO with the syllable sequence of the category NOGO. In contrast, they responded frequently with NOGO responses when hybrid stimuli combined the syllable structure of the category NOGO with the syllable sequence of the category GO. Thus, when being confronted with such conflicting information, i.e. local fine structure of syllables suggesting the identity of one singer whereas syllable sequence suggesting the identity of another singer, the fine structure of syllables was the more salient acoustic feature in guiding individual vocal recognition. There is one possible caveat to this conclusion: whilst in hybrid stimuli, syllable sequence suggested the identity of another singer, the temporal pattern of alternating syllables and intersyllabic silent gaps represented neither of the two singers. Thus, if zebra finches cannot perceive syllable sequences independently of such temporal patterns, they might in fact not have recognized the syllable sequences.

Nevertheless, our conclusion is in line with other perceptual studies suggesting that zebra finches pay more attention to syllable structure and are less sensitive to changes in syllable order when tested for their ability to discriminate between normal versus reversed syllables as well as normal versus reversed syllable sequences (Braaten et al., 2006; Lawson et al., 2018; for review, see Fishbein et al., 2019). Likewise, in a meta-analysis, Kriengwatana et al. (2016) found that zebra finches learned phonetic discriminations (discriminations based on variation in the acoustic features of syllables) faster than discrimination based on variation in artificial syllable sequences. However, zebra finches can in principle be trained to discriminate artificial syllable sequences and thus can use syllable sequence for song discrimination (ten Cate, 2017, 2018). They might thus have some flexibility to use different song features to discriminate song, even though they may have a preference for using spectro-temporal features. Here, we used several song renditions of each stimulus bird in the training and generalization phase. We speculate that this might have resulted in the perception of more within-individual variation in the syllable sequences than in the spectro-temporal features of the syllables by the subjects. If this is true, then the spectro-temporal features of the syllables might have provided a stronger signature of the individual singer than the syllable sequence and might have led subjects to ignore syllable sequence even more.

In contrast to previous perceptual studies using reversed syllables, reversed syllable sequences or artificial grammars, our study used only naturalistic stimuli in which all syllables were broadcast in a normal forward fashion (as opposed to reversed), and all syllables were arranged in natural forward sequences. Yet, despite using more naturalistic stimuli and thus less-pronounced differences, our results indicate a higher sensitivity in zebra finches to the local syllable structure than to the syllable sequence. Furthermore, we tested discrimination ability in a biologically meaningful context: individual vocal recognition. Our results therefore allow us to not only confirm earlier findings but also generalize them to a broader and more meaningful context.

It is also worth noting that most perceptual studies confine themselves to the song motif as the relevant song unit (e.g. Braaten et al., 2006; Cynx and Nottebohm, 1992; Lawson et al., 2018). It is perhaps not surprising that those studies revealed a low sensitivity for changes in syllable order given the rigid succession of syllables in motifs. The current study was conducted on a higher level of song organization, the level of song bouts. There seems to be some variability in the way zebra finches string together repeated motifs in song bouts (Hyland Bruno and Tchernichovski, 2019; L. Le Maguer, N. Geberzahn, L. Nagle and S. Deregnaucourt, unpublished results). Thus, one could have expected that individual identity might be coded at the level of song bout organization and accordingly that subjects would use such cues for individual vocal recognition. This seems not to be the case: zebra finches did not use the sequence of syllables in a song bout as a cue for individual vocal recognition.

Conclusions

Gentner and Hulse (1998) suggested four ways in which information about the individual identity of oscine singers might be coded: (1) individually unique spectral or voice characteristics, (2) individually unique song types, (3) in case of song-type sharing between different individuals, such shared song types may show individual variation, and (4) individual variation at the level of sequences. Are perceptual mechanisms for individual recognition in zebra finches based on those four coding strategies? Our experiment does not allow us to draw conclusions on whether unique spectral characteristics play a role. Under natural conditions, each male zebra finch produces a different song type, so that unique song types could play a role for individual vocal recognition. However, here we used song stimuli from five different males that all produced an imitation of the same song model. Thus, the individual to be recognized shared his song type with the birds from which he had to be distinguished. Our results suggest that subjects were nevertheless capable of individual vocal recognition and that they used individual variation in those shared songs, namely variation in the local fine structure of syllables. In contrast, zebra finches do not seem to use individual variation at the level of the sequencing of syllables in a song bout for individual recognition. At least in the current study, they did not do so when information provided by the local fine structure of syllables contradicted information provided by syllable sequence. In the future, it would be interesting to test whether individual variation at the level of sequences in the song bout might nevertheless be salient to receivers when such variation is not in contradiction to other cues. To this end, it could be interesting to apply a newly developed operant-training procedure (Lim et al., 2016; Elie and Theunissen, 2018). In this procedure, the bird does not have to wait for the end of the sequence to reply. Instead, it has the possibility to stop the stimulus diffusion when it reaches a decision. In such a setup, response latencies could be informative, because if the bird solely relies on the local fine structure of syllables, it would not have to wait until the end of the sequence before replying.

We thank Sarah Woolley for providing the Matlab script for the GONOGO procedure and Chloé Huetz for adapting it. We thank Sándor Zsebők for help with setting up the apparatus for operant conditioning. We thank Thierry Aubin, Hélène Courvoisier and Laurent Nagle for their help with building the sound-proof chambers. We thank Philippe Groué, Emmanuelle Martin and Ophélie Bouillet for taking care of the birds. Many thanks go to two anonymous reviewers for their constructive comments and suggestions.

Author contributions

Conceptualization: N.G., S.D.; Methodology: N.G., S.D.; Formal analysis: N.G.; Writing - original draft: N.G.; Writing - review & editing: S.D.; Project administration: S.D.; Funding acquisition: S.D.

Funding

Funding was provided by the Agence Nationale de la Recherche (ANR-12-BSH2-0009) and the Institut Universitaire de France (IUF) to S.D.

Araki
,
M.
,
Bandi
,
M. M.
and
Yazaki-Sugiyama
,
Y.
(
2016
).
Mind the gap: neural coding of species identity in birdsong prosody
.
Science
354
,
1282
-
1287
.
Aubin
,
T.
and
Jouventin
,
P.
(
2002
).
How to vocally identify kin in a crowd: the penguin model
.
Adv. Stud. Behav.
31
,
243
-
277
.
Beecher
,
M. D.
,
Campbell
,
S. E.
and
Burt
,
J. M.
(
1994
).
Song perception in the song sparrow: birds classify by song type but not by singer
.
Anim. Behav.
47
,
1343
-
1351
.
Benjamini
,
Y.
and
Hochberg
,
Y.
(
1995
).
Controlling the false discovery rate: a practical and powerful approach to multiple testing
.
J. R. Stat. Soc. Ser. B Methodol.
57
,
289
-
300
.
Braaten
,
R. F.
,
Petzoldt
,
M.
and
Colbath
,
A.
(
2006
).
Song perception during the sensitive period of song learning in zebra finches (Taeniopygia guttata)
.
J. Comp. Psychol.
120
,
79
-
88
.
Brindley
,
E. L.
(
1991
).
Response of European robins to playback of song: neighbour recognition and overlapping
.
Anim. Behav.
41
,
503
-
512
.
Clayton
,
N. S.
(
1988
).
Song discrimination learning in zebra finches
.
Anim. Behav.
36
,
1016
-
1024
.
Cynx
,
J.
and
Nottebohm
,
F.
(
1992
).
Role of gender, season, and familiarity in discrimination of conspecific song by zebra finches (Taeniopygia guttata)
.
Proc. Natl. Acad. Sci. USA
89
,
1368
-
1371
.
D'Amelio
,
P. B.
,
Klumb
,
M.
,
Adreani
,
M. N.
,
Gahr
,
M. L.
and
ter Maat
,
A.
(
2017
).
Individual recognition of opposite sex vocalizations in the zebra finch
.
Sci. Rep.
7
,
5579
.
Derégnaucourt
,
S.
,
Poirier
,
C.
,
Van der Kant
,
A.
,
Van der Linden
,
A.
and
Gahr
,
M.
(
2013
).
Comparisons of different methods to train a young zebra finch (Taeniopygia guttata) to learn a song
.
J. Physiol. Paris
107
,
210
-
218
.
Derégnaucourt
,
S.
,
Nagle
,
L.
,
Gahr
,
M.
,
Aubin
,
T.
and
Geberzahn
,
N.
(
2014
).
Cultural evolution of birdsong in the laboratory. Neuroscience Meeting Planner
,
Online. Program No. 365.09/UU29
.
Washington, DC
:
Society for Neuroscience
.
Elie
,
J. E.
and
Theunissen
,
F. E.
(
2018
).
Zebra finches identify individuals using vocal signatures unique to each call type
.
Nat. Commun.
9
,
4026
.
Emlen
,
S. T.
(
1971
).
The role of song in individual recognition in the indigo bunting
.
Z. Tierpsychol.
28
,
241
-
246
.
Falls
,
J. B.
(
1982
).
Individual recognition by sound in birds
. In
Acoustic Communication in Birds
(ed.
D. E.
Kroodsma
and
E. H.
Miller
), pp.
237
-
278
.
New York
:
Academic Press
.
Fishbein
,
A. R.
,
Idsardi
,
W. J.
,
Ball
,
G. F.
and
Dooling
,
R. J.
(
2019
).
Sound sequences in birdsong: how much do birds really care?
Philos. Trans. R. Soc. B
375
,
20190044
.
Gentner
,
T.
and
Hulse
,
S.
(
1998
).
Perceptual mechanisms for individual vocal recognition in European starlings. Sturnus vulgaris
.
Anim. Behav.
56
,
579
-
594
.
Gess
,
A.
,
Schneider
,
D. M.
,
Vyas
,
A.
and
Woolley
,
S. M. N.
(
2011
).
Automated auditory recognition training and testing
.
Anim. Behav.
82
,
285
-
293
.
Godard
,
R.
and
Wiley
,
R. H.
(
1995
).
Individual recognition of song repertoires in two wood warblers
.
Behav. Ecol. Sociobiol.
37
,
119
-
123
.
Hyland Bruno
,
J.
and
Tchernichovski
,
O.
(
2019
).
Regularities in zebra finch song beyond the repeated motif
.
Behav. Process.
163
,
53
-
59
.
Kriengwatana
,
B.
,
Spierings
,
M. J.
and
ten Cate
,
C.
(
2016
).
Auditory discrimination learning in zebra finches: effects of sex, early life conditions and stimulus characteristics
.
Anim. Behav.
116
,
99
-
112
.
Lawson
,
S. L.
,
Fishbein
,
A. R.
,
Prior
,
N. H.
,
Ball
,
G. F.
and
Dooling
,
R. J.
(
2018
).
Relative salience of syllable structure and syllable order in zebra finch song
.
Anim. Cogn.
21
,
467
-
480
.
Lim
,
Y.
,
Lagoy
,
R.
,
Shinn-Cunningham
,
B. G.
and
Gardner
,
T. J.
(
2016
).
Transformation of temporal sequences in the zebra finch auditory system
.
eLife
5
,
e18205
.
Lipkind
,
D.
,
Marcus
,
G. F.
,
Bemis
,
D. K.
,
Sasahara
,
K.
,
Jacoby
,
N.
,
Takahasi
,
M.
,
Suzuki
,
K.
,
Feher
,
O.
,
Ravbar
,
P.
,
Okanoya
,
K.
, et al. 
(
2013
).
Stepwise acquisition of vocal combinatorial capacity in songbirds and human infants
.
Nature
498
,
104
-
108
.
Menyhart
,
O.
,
Kolodny
,
O.
,
Goldstein
,
M. H.
,
DeVoogd
,
T. J.
,
Edelman
,
S.
(
2015
).
Juvenile zebra finches learn the underlying structural regularities of their fathers’ song
.
Front. Psychol.
6
,
1
-
12
.
Moser-Purdy
,
C.
and
Mennill
,
D. J.
(
2016
).
Large vocal repertoires do not constrain the dear enemy effect: a playback experiment and comparative study of songbirds
.
Anim. Behav.
118
,
55
-
64
.
Sasahara
,
K.
,
Tchernichovski
,
O.
,
Takahasi
,
M.
,
Suzuki
,
K.
and
Okanoya
,
K.
(
2015
).
A rhythm landscape approach to the developmental dynamics of birdsong
.
J. R. Soc. Interface
12
,
20150802
.
Sossinka
,
R.
and
Böhner
,
J.
(
1980
).
Song types in the zebra finch Poephila guttata castanotis
.
Z. Tierpsychol.
53
,
123
-
132
.
Tchernichovski
,
O.
,
Lints
,
T. J.
,
Derégnaucourt
,
S.
,
Cimenser
,
A.
and
Mitra
,
P. P.
(
2004
).
Studying the song development process rationale and methods
. In
Behavioral Neurobiology of Birdsong
(ed.
H. P.
Zeigler
and
P.
Marler
), pp.
348
-
363
.
New York
:
New York Acad Sciences
.
Temeless
,
E.
(
1994
).
The role of neighbours in territorial systems: when are they ‘dear enemies’?
Anim. Behav.
52
,
856
-
859
.
ten Cate
,
C.
(
2017
).
The linguistic abilities of birds
. In
Avian Cognition
(ed.
C.
ten Cate
and
S.
Healy
), pp.
249
-
269
.
Cambridge
:
Cambridge University Press
.
ten Cate
,
C.
(
2018
).
The comparative study of grammar learning mechanisms: birds as models
.
Curr. Opin. Behav. Sci.
21
,
13
-
18
.
Weary
,
D.
,
Norris
,
K.
and
Falls
,
J.
(
1990
).
Song features birds use to identify individuals
.
Auk
107
,
623
-
625
.
Wunderle
,
J. M. J.
(
1978
).
Differential response of territorial yellow throats to the songs of neighbors and non-neighbors
.
Auk
95
,
389
-
395
.
Zann
,
R.
(
1996
).
The Zebra Finch: A Synthesis of Field and Laboratory Studies
.
Oxford
:
Oxford University Press
.

Competing interests

The authors declare no competing or financial interests.