Summary
Barn owls can capture prey in pitch darkness or by diving into snow, while homing in on the sounds made by their prey. First, the neural mechanisms by which the barn owl localizes a single sound source in an otherwise quiet environment will be explained. The ideas developed for the single source case will then be expanded to environments in which there are multiple sound sources and echoes – environments that are challenging for humans with impaired hearing. Recent controversies regarding the mechanisms of sound localization will be discussed. Finally, the case in which both visual and auditory information are available to the owl will be considered.
Introduction
When a barn owl (Tyto alba) hears a sound of interest, it turns its head rapidly to train its gaze on the source. If it decides that the sound comes from prey, such as a vole or mouse, it plunges headlong into the darkness all the while keeping its head centered on the sound source. Just before impact, it retracts its head, replacing it with its talons, and lands atop the prey (Konishi, 1973a).
That this behavior could be elicited in complete darkness, coupled with anecdotal reports that barn owls dive into deep snow to extract prey, suggested that vision is not necessary (Payne, 1971). Payne and Drury (Payne and Drury, 1958) demonstrated the crucial role of hearing using a mouse towing a crumpled wad of paper in a darkened room with a barn owl. The owl struck the paper, thus ruling out vision and olfaction. In a more controlled setting, Konishi (Konishi, 1973b) confirmed the role of audition and measured the accuracy of these strikes with an owl named ‘Roger’ (after Payne), trained to strike at loudspeakers.
It has been nearly 40 years since Konishi (Konishi, 1973a) published the paper with the same title as this Commentary. In the meantime, the owl has become an established model of sound localization, and the study of this animal has contributed not just to the understanding of sound localization, but also to an understanding of neuroplasticity and neural computation. Below, I will attempt to review some of the more recent contributions.
Acoustical cues for the location of a sound source
In the visual system, objects in the environment cast an image on the retina, much as a camera lens casts an image on film. In the auditory system, there is no such spatially organized ‘image’ on the peripheral receptor, i.e. the array of hair cells along the basilar membrane. Space in the auditory system is, instead, computed primarily from the interaural differences in the arrival times (interaural time difference or ITD) and sound-pressure levels (interaural level difference or ILD) of the acoustical waveforms.
Unlike human ears, which are symmetrically located on either side of the head, the ears of the barn owl are asymmetrical in position and shape. As a result, ILD provides information on the vertical position of the sound source (the elevation), while ITD provides information on the horizontal position (the azimuth), as it does in humans. The correspondence between ITD and azimuth and between ILD and elevation, as well as their dependence on frequency, can be seen in Fig. 1 (Keller et al., 1998). Each diamond represents the space in front of the owl (down is below the owl; left is to the owl's left), and the colors represent ITD and ILD for different frequency bands. In the plots of ITD, the color changes from left to right, indicating that ITD varies with the sound source's azimuth for the entire range of frequencies used for sound localization (3–10 kHz) (Konishi, 1973b). ITD remains relatively, but not completely, constant across frequencies (Keller et al., 1998). ILD also varies with the source's azimuth at low frequencies (e.g. 3.5 kHz), but, as frequency increases, the axis along which ILD changes becomes increasingly vertical, allowing for the representation of elevation. Thus, a location in space can be specified by a spectrum of ILDs and, to a first approximation, a single ITD value. The manner in which the ears and head alter the magnitude and phase of sounds in a location-specific manner is called the ‘head-related transfer function’ (HRTF).
Filtering a sound with the location-specific HRTFs of each ear and presenting them over headphones reproduces the signal at the eardrums from a real source at that location. The sound source is said to have been placed in ‘virtual auditory space’, and the perception of space thus achieved, is highly realistic. By contrast, if one simply alters the magnitude or timing of the sound in one ear by the same amount across all frequencies, i.e. ignoring HRTFs, the experience is that of a sound ‘inside’ the head.
Behavioral evidence for the correspondence between cues and source position comes from studies in which an owl, trained to point its head at a sound source, had one of its ears plugged (Knudsen et al., 1979). Measurements showed that the signal in the plugged ear is not only attenuated but also delayed. Plugging the left ear, for instance, therefore causes the right ear's signal to seem louder and to lead, and as a result, the owl mis-localizes the sound, turning its head above and to the right of the loudspeaker (Knudsen et al., 1979). Other owls with asymmetrical ears, such as the Tengmalm's owl (Aegolius funereus), are presumed to use the binaural cues in a manner similar to the barn owl (Payne, 1971).
Humans use ITD and ILD to localize, respectively, the low- and high-frequency components of sounds along the azimuth (Strutt, 1907). This contrasts with the barn owl, which assigns the two cues to two different spatial axes. For humans to localize sounds in elevation, monaural cues, based on the shape of the magnitude spectrum in each ear, are thought to be important (reviewed by Blauert, 1997). Recent studies suggest that such ‘spectral-shape’ cues are not used for localization by owls. In a clever experiment, Egnor (Egnor, 2000) took the HRTFs of owls for certain test locations, inverted them, and applied them to the opposite ears. This ‘inverted-reversed’ filter preserves the ILD spectra for each test location but dramatically alters the monaural spectral shapes. Her data showed that the owls' ability to localize sounds filtered through the inverted-reversed HRTFs were as accurate and precise as their ability to localize sounds filtered through normal HRTFs.
In the visual, electrosensory and somatosensory systems, stimulus space (e.g. skin surface) is represented in the central nervous system by preserving the neighbor-relationships of axons in the primary projection. In the auditory system, space must be computed. Yet, the external nucleus of the barn owl's inferior colliculus (ICx) contains such a spatiotopic representation of the frontal hemisphere (Knudsen and Konishi, 1978). This map is composed of neurons, called space-map or space-specific neurons, which have well-circumscribed spatial receptive fields (SRFs). Lesions of the barn owl's auditory space map lead to scotoma-like defects in sound localization, and microstimulation of the space-map evokes a rapid head turn to that area of space represented at the point of stimulation (du Lac and Knudsen, 1990; Masino and Knudsen, 1990; Masino and Knudsen, 1992; Masino and Knudsen, 1993; Wagner, 1993).
As shown in Fig. 2, the two cues for sound localization, ILD (blue) and ITD (red), are processed in anatomically separate neural pathways starting at the cochlear nuclei, the nucleus angularis (NA) and the nucleus magnocellularis (NM), respectively. The two pathways converge in the central nucleus of the inferior colliculus (ICc), one synaptic step before the space map in the ICx (Takahashi and Konishi, 1988a; Takahashi and Konishi, 1988b; Takahashi et al., 1984). Below, the processes by which binaural cues are transformed into a spatiotopic map of auditory space are described.
Neural computation of ITD
The computation of ITD begins with the preservation and encoding of the monaural phase angles of each spectral component by phase-locking neurons of the NM (Sullivan and Konishi, 1984). In the barn owl, phase locking extends to neurons with best frequencies as high as 9 kHz at which strong ILDs are generated. ITD and ILD can therefore operate over the same frequency range, allowing the owl to localize sounds in two dimensions. The NM projects bilaterally to the nucleus laminaris (NL), the avian analog of the mammalian medial superior olivary nucleus. In the NL, interaural differences in the phase angles of each spectral component are computed by a binaural cross-correlation-like mechanism very similar to that originally proposed by Jeffress in 1948 (Carr and Konishi, 1990; Jeffress, 1948; Yin and Chan, 1990). As a result, neurons in the NL are selective for interaural phase difference (IPD).
The NL projects directly to the core of the contralateral ICc (ICc-core), which, in turn, projects to the lateral shell of the opposite ICc (the ICc-l) (Takahashi et al., 1989). Within the ICc-ls and the ICc-core, neurons are organized into tonotopic columns and, within each column, all cells are tuned to the same ITD. Cells in a column of the ICc-ls project convergently onto a cluster of space-map neurons in the ICx, which endows them with selectivity for the ITD preserved by the column and a sensitivity to a broad range of frequencies (Takahashi and Konishi, 1986; Wagner et al., 1987). In other words, a neuron of the space map (or of a column in the ICc) responds maximally at a particular ITD, called the ‘characteristic delay’, regardless of the frequency of the stimulus tone.
For the rodent IC neurons with low BFs, this 0.125-cycle shift in IPD puts the peaks of the ITD tuning curves of a large proportion of neurons outside of the naturally occurring range of ITDs for gerbils and guinea pigs (Grothe, 2003; McAlpine and Grothe, 2003). By contrast, in the owl, the peaks fall within the owl's range of naturally occurring ITDs [see fig. 1 of McAlpine and Grothe (McAlpine and Grothe, 2003)]. The rodents' distribution of ITD tuning, which seems perplexing in the context of a spatial map, makes sense if one considers that the IPD shift places the steepest slope of the neurons' ITD functions across the midline (Brand et al., 2002), thus optimizing discrimination of ITD. Indeed, in another rodent, the rat, spatial discrimination is finest at the midline (Kavanagh and Kelly, 1986).
It has been proposed that owls on the one hand, and mammals on the other, use, respectively, place codes and slope codes (Grothe, 2003; McAlpine and Grothe, 2003). Although this dichotomy frames the discussion effectively, the details are more complex. First, slope codes can and are implemented by the owl. A single source drives a neuron in the owl's space map to the extent that the source falls within its SRF. A focus of neural activity, or ‘neural image’, develops across the map as shown in Fig. 3A, which shows the SRFs (blue Gaussians) of neurons (colored circles) tuned for loci along the azimuth. Neurons at the edge of the image (e.g. those at –20 deg and 0 deg) have the source on the slopes of their SRFs. If the source moves (Fig. 3B), the firing of the cells at the edge of the neural image will change maximally, as would be the case in the mammalian IC under the slope-code hypothesis. The smallest change of sound-source position that an owl can resolve can be explained by the slopes and variances of the SRFs of the space-map cells (Bala et al., 2003; Bala et al., 2007).
Second, in the IC of the guinea pig, as well as in those of the cat and rabbit, there is an abundance of cells the ITD curves of which peak within the naturally occurring range (reviewed by Joris and Yin, 2007). When IC neurons are studied using stimuli in virtual auditory space, more than half of the cells in the guinea pig's IC have SRFs in frontal space (Sterbing et al., 2003). These cells are not organized as a map, but they are clustered, such that cells representing similar spatial loci are closer together. A sound source at a given location will therefore excite a particular cluster of IC cells, and it is, thus, the identity of the cluster that signals location. This is a place code. What is more, a topographic map of azimuth has been described in the superior colliculus of the guinea pig and in other mammals such as rats, cats, ferrets and monkeys (Gaese and Johnen, 2000; Hirsch et al., 1985; King and Hutchings, 1987; King and Palmer, 1983; Sterbing et al., 2002; Wallace et al., 1996), and one cannot ignore its role in spatial hearing. Thus, differences in owls and mammals might be more quantitative than qualitative and might be related to the degree of acuity afforded by each organism's collicular neurons.
Neural computation of ILD
Our understanding of the processing of ILD in the barn owl is less complete than our understanding of ITD processing. As indicated above, the spatial axis along which ILD varies is horizontal at low frequencies and becomes progressively more vertical at higher frequencies. As a result, for the high frequency neurons in the ICc, selectivity for ILD generally determines the position of the SRF along the vertical axis.
The sound level in the ipsilateral ear is encoded by cells in the NA (Sullivan and Konishi, 1984), which project contralaterally to the posterior subdivision of the dorsal nucleus of the lateral lemniscus [LLDp after Arends and Zeigler (Arends and Zeigler, 1986); formerly, the nucleus ventralis lemnisci lateralis pars posterior (VLVp) of Leibler (Leibler, 1976)]. The LLDp of the two sides are interconnected by a commissural projection that mediates a mutual inhibition. The neurons of the LLDp are therefore excited (E) by stimulation of the contralateral ear, via direct input from the NA, and are inhibited (I) by stimulation of the ipsilateral ear, via the commissural input (Fig. 2) (Takahashi et al., 1995; Takahashi and Keller, 1992). LLDp neurons are thus ‘EI’ cells, the discharge rates of which are sensitive sigmoidal functions of ILD (Goldberg and Brown, 1969).
The sigmoidal rate-ILD functions characteristic of the EI cells in the LLDp are transformed into peaked tuning curves in the ICc-ls by combining the ILD-response functions of the left and right sides (Adolphs, 1993). This transformation requires a bilateral projection from the LLDp to the ICc-ls, but the existence of an ipsilateral projection is controversial. The axon-tracing study of Adolphs (Adolphs, 1993) has demonstrated bilateral projections from the LLDp to the ICc, but Takahashi and Keller (Takahashi and Keller, 1992; Takahashi et al., 1995), who used a different tracer, reported only a contralateral projection.
Finally, it is in the ICc-ls that ILD and ITD cues are merged (Mazer, 1995). A clear cut topographical representation of ILD, however, has never been found in the lateral shell (Mazer, 1995).
Combining ITD, ILD, and frequency
The neurons of the ICc-ls are sensitive to both binaural cues, within each frequency channel. The final step in the formation of neurons that are selective for space is the combining of ITD and ILD information across frequency.
The application of virtual auditory space techniques has recently made it possible to assess the contribution of ILD to the SRF of the space-specific neuron. For any given location, ILD varies as a function of frequency in a complex manner. The HRTFs were altered so that the ILD spectrum of each location was preserved, but the ITD was held constant at a cell's preferred value, thus generating the SRF the cell would have had were it sensitive only to ILD (Euston and Takahashi, 2002). These altered HRTFs were used to filter broadband noises that served as the stimulus. These ILD-alone receptive fields (RFs) were generally horizontal swaths of activity at the elevation of the normal SRF of the cell (Fig. 4). An ITD-alone RF formed a vertical swath at the azimuth of the normal SRF of the cell. The normal SRF thus lies at the intersection of the ITD- and ILD-alone SRFs (Euston and Takahashi, 2002). At this intersection the optimal ITD and ILD spectra of the cell are present and are combined by a multiplication-like process (Peña and Konishi, 2001).
The shape of the ILD-alone RF is ultimately determined by the frequency bands that innervate the space-specific neuron and by the tuning of the cell for ILD at each of these frequencies (Arthur, 2004; Delgutte et al., 1999; Fuzessery and Pollak, 1985; Gold and Knudsen, 2000a). One simple hypothesis would be that a space-specific neuron is broadly tuned to frequency, and for each frequency band it would be sharply tuned to the ILD value that occurs at its SRF, as it was for ITD (Brainard et al., 1992). We therefore determined whether ILD tuning, measured with tones of different frequencies, could predict the shape of the ILD-alone RF or, conversely, whether the ILD-alone RF measured with noise could predict the frequency-specific ILD tuning (Euston and Takahashi, 2002; Spezio and Takahashi, 2003). Both of these approaches revealed, contrary to our initial hypothesis, that space-map neurons are not necessarily tuned sharply to ILD over a wide, continuous frequency range. Instead, their frequency tuning seems to be patchy, and, at some frequencies, the ILD tuning functions retained the sigmoidal shape seen at lower stages, specifically in the LLDp (Euston and Takahashi, 2002; Spezio and Takahashi, 2003). Whether this seemingly pattern-less tuning has advantages over that hypothesized earlier is not known.
Transformation from binaural cues to space
Once the binaural cues are computed, these cues must be translated into space. In a baby bird, which has a small head, the maximal ITD, generated by a source to the extreme right or left, might be some 90μs. A neuron tuned to an ITD of, say, 30μs represents a location about one-third of the distance to the extreme periphery. The adult, by contrast, has a maximal ITD of about 200μs (Keller et al., 1998), so the same neuron would represent a smaller angle. This process of translating cues to space occurs during a critical period, of up to ca. 200 days, during which visual input is crucial (Brainard and Knudsen, 1998). Birds that are blind from birth form inverted and otherwise distorted maps of auditory space (Knudsen, 1988). Moreover, if a baby bird matures with a pair of prisms that laterally shift the visual scene, the bird, upon maturing, mislocalizes a sound by the amount that the prisms shifted the visual world (Brainard and Knudsen, 1993; Knudsen and Knudsen, 1989). Correspondingly, a space-specific neuron will shift its auditory SRF by an amount equal to the prism shift. The plasticity is thought to occur in the ICx and involves the inputs from the optic tectum, which contributes visual input, and in the ICc, which contributes the spatially selective auditory input (Brainard and Knudsen, 1993; Feldman and Knudsen, 1997; Gold and Knudsen, 2000b; Hyde and Knudsen, 2000; Zheng and Knudsen, 1999). The process of audiovisual calibration in the owl is a clear example of supervised learning in the auditory system and has been thoroughly investigated and reviewed (Knudsen, 2002).
Multiple sound sources
In nature, there are typically several concomitantly active sources of sounds, as well as the reflections of these sounds, or echoes, from surfaces in the environment. Under such conditions, the sound waves from each source and from the reflective surfaces will add in the ears, and if the sounds have broad, overlapping spectra, the binaural cues will fluctuate over time in a complex manner (Blauert, 1997; Keller and Takahashi, 2005; Takahashi and Keller, 1994). Not only might such dynamic cues make it difficult to localize sounds, they might compromise the ability of the auditory system to signal the temporal characteristics of the sounds that are intrinsic to each source. Thus, the tasks of identifying a sound and localizing its source are intertwined. Yet, despite this complexity, people with normal hearing are able to determine ‘what came from where’.
The auditory space-map of the owl allows us to analyze the representation of the natural auditory environment in which there are multiple, concomitantly emitting sources, as well as reflections. When two sounds with overlapping spectra are presented simultaneously from two sources in the free field, the frequency-specific binaural cues at a given instant become vector sums of the cues corresponding to the location of each source. The resultant vector, moreover, reflects the amplitude and phase of each source's signal. If two sounds are identical in all respects, the vector summation results in a single auditory image located midway between the two sources. If the signals from the two sources are uncorrelated broadband noises, then, within any given frequency band at any given instant, the amplitude is likely to be higher for one source than the other. The ITD and ILD for that frequency band will approximate the values of the higher-amplitude source. In other bands, the situation would be reversed. With two uncorrelated noises, over the course of the stimulus, the cues will spend roughly equal amounts of time assuming the values corresponding to the loci of the two sources. There would be two separate foci of activity on the space map. However, because the binaural cues do not remain stable for any location for very long, the foci on the space-map neurons should generally be weaker and more diffuse than a focus evoked by a single source.
We can confirm that this is the case by recording from the owl's auditory space map, the neurons of which are tuned to the frequency-specific ITDs and ILDs at their SRFs (Fig. 5). When two sources of uncorrelated noises, diagonally separated by about 40 deg (in virtual auditory space) are passed through the SRF of a cell, the neuron fires when each source falls in its SRF but not when the two sources flank the SRF. A plot of firing rate against the sound pair's mean position generates two foci (Fig. 5B; see figure legend), indicating that the space-map neuron resolves the two sources (Keller and Takahashi, 2005; Takahashi and Keller, 1994).
If two sources, Source A and Source B, emit uncorrelated noises that are sinusoidally amplitude-modulated (SAM) at, for example, 55 and 75 Hz, respectively, the binaural cues will reach Source A's ITD(f) and ILD(f) values 55 times per second, and Source B's ITD(f) and ILD(f) values, 75 times per second (Fig. 5C,D). Thus, in the space map, the neurons that represent Source A and Source B will fire in synchrony at, respectively, 55 and 75 Hz. Information about amplitude-modulated (AM) rates and loci are thus both preserved. Note that with two sources, the AM noises drive the neurons at the appropriate rates, indirectly, by modulating the binaural cues. These acoustical principles have been suggested to play a role in the segregation of speech from the background (Faller and Merimaa, 2004; Roman et al., 2003).
Echoes
Acoustical reflections constitute a special case of multiple sound sources in that the reflection and the sound arriving directly from the active source are related in structure. Early psychoacoustical studies demonstrated that directional information conveyed by the direct (or leading) sound dominates perception and that listeners are relatively insensitive to directional information conveyed by reflections (or lagging sources) (Wallach et al., 1949). The perceptual dominance of the leading sound and a host of related perceptions are collectively referred to as the ‘precedence effect, and this has been reviewed extensively (Blauert, 1997; Litovsky et al., 1999). We concentrate below on those aspects that are directly relevant to the owl.
A subject's perceptual experiences under experimental precedence conditions depend upon the lead/lag delay. When the delay between leading and lagging sources is in the 1–5 ms range in humans, a single, fused auditory image is localized close to the leading source. This is termed ‘localization dominance’ or ‘the law of the first wavefront’ (Litovsky et al., 1999; Wallach et al., 1949). At the same time, spatial information regarding the lagging source is degraded, a phenomenon termed ‘lag discrimination suppression’ (Spitzer et al., 2003; Zurek, 1980). As the delay increases to about 10 ms, the lagging sound becomes more perceptible and localizable, and the ‘echo threshold’ is said to have been crossed. Psychoacoustical evidence for the precedence effect has been reported across taxa [for a detailed listing see Dent and Dooling (Dent and Dooling, 2004)].
The precedence effect is most often studied using clicks, which help to avoid the acoustical superposition of the leading and lagging sounds. Physiological studies, typically in the IC, show that the response of neurons to a lagging click is weak when the delay is short, but that this response increases when the delay is long (Fitzpatrick et al., 1995; Keller and Takahashi, 1996; Tollin et al., 2004; Tollin and Yin, 2003; Yin, 1994). Echo threshold is therefore thought to be related to the inter-click interval at which the strength of the response to the lagging click approaches that of the leading click. Such observations have led to the view that neurons responding to the leading sound inhibit and preempt the responses of neurons to the echo (Akeroyd and Bernstein, 2001; Fitzpatrick et al., 1995; Keller and Takahashi, 1996; Tollin et al., 2004; Tollin and Yin, 2003; Yin, 1994; Zurek, 1980). The lagging sound is thought to become localizable when the delay is long enough to allow the lateral-inhibition-like process, or ‘echo suppression’, to subside before the arrival of the lagging click.
Although clicks afford advantages for experimentation, sounds in nature often overlap temporally with their reflections, which arrive after short delays. As a result, there is a period of time when both leading and lagging sounds are present, called the ‘superposed segment’, flanked by periods of time when the leading or lagging sound is present alone (‘lead-alone’ and ‘lag-alone’ segments, Fig. 6A). What determines the echo threshold in this case? If we explicitly apply the results obtained with clicks to longer sounds, one would expect the echo threshold to depend upon the delay between the onsets of the leading and lagging sounds, which is equivalent to the length of the lead-alone segment. In other words, as the lead-alone segment is lengthened beyond roughly 10 ms, echo suppression should have subsided by the time the lagging stimulus arrived, and the lagging sound should be localized more frequently. Note, however, that when the leading and lagging sounds have the same duration, as they normally do in nature, the lead-alone and lag-alone segments are equally long. Therefore, an alternative possibility is that the lagging sound becomes perceptible when the lag-alone segment is long enough.
To discriminate between the alternative hypotheses, Nelson and Takahashi (Nelson and Takahashi, 2008) varied the length of one segment while holding the other constant, and examined the owl's head turns to the leading and lagging stimuli. Stimuli consisted of identical copies of a broadband noise burst (30 ms duration), one of which was delayed to serve as a simulated echo arriving from a reflective source at a different location.
The results supported the alternative hypothesis. First, the duration of lag-alone segment was held constant at a short value of 3 or 6 ms, and the lead-alone segment was extended from 3 to 24 ms. If it were the case that the lag is localized if it arrives after the cessation of echo suppression – about 10 ms after the lead's onset – then the owls should have localized the lagging sound more frequently when the lead alone segment was extended beyond 10 ms. Instead, the owls continued to localize the leading sound even when echo suppression should have been through. Second, when the lead-alone segment was held constant and the lag-alone segment was varied from 3–24 ms, the obverse was found: the owls' tendencies to localize the lag scaled with the duration of the lag-alone segment. These results suggest that the delay at the sound-pair's onset does not affect the localization of the reflection as one might infer from extrapolating the results from clicks to longer sounds. The idea that echo threshold is determined by the delay at the onset, per se, might thus be true only to the extent that this delay determines the length of the lag-alone segment when lead and lag are equally long.
In the natural condition, where the lead and lag alone segments are equal in length and the delay is short (<12 ms), the bird localizes the leading sound preferentially (Nelson and Takahashi, 2008; Spitzer and Takahashi, 2006). In other words, when equal in length, the lead-alone segment seems to have a higher perceptual salience than the lag-alone segment. A possible physiological reason is illustrated in Fig. 6B, which shows the post-stimulus time histogram of responses evoked in a population of space-map cells by a single sound emitted from the cells' SRFs (‘target alone’). Like cells encountered throughout the central auditory system, the cells' discharges are highest at the onset and decline to a lower, steady-state level. The response of the population to a lead/lag pair when the target (sound in the SRF) led or lagged is shown in Fig. 6C,D. Cells representing the location of the leading stimulus respond vigorously at the start of the sound pair, as they would to the onset of a single sound, i.e. during the lead-alone segment (between blue arrowheads). When the lagging sound arrives, the leading sound is already ‘in progress’ and the firing rate drops dramatically owing to spatial decorrelation (Fig. 6D; see ‘multiple sound sources’). When the leading sound terminates, only the lagging sound remains, and the cells tuned to the lagging sound's location discharge, but at a rate similar to the lower, steady-state level (between red arrowheads). The weaker response of the cells at the lag location can therefore be explained by the response dynamics of space-map neurons to single sources, and specialized circuitry devoted to the suppression of echoes need not be invoked, at least, for stimuli used in this study (Nelson and Takahashi, 2008).
Finally, Fig. 6E plots the recovery of the population response to the lagging stimulus against the duration of the lag-alone segment (red line). The proportion of saccades to the lagging stimuli are superimposed (green line), and it is clear that the forms of the two functions are similar, which is consistent with the idea that the neural response to the lag-alone segment determines the salience of the lagging sound.
Audiovisual integration
Although barn owls are known for their feats of sound localization, their vision is also keen. What is localization like when both auditory and visual information are available? Studies of mammalian multisensory integration have suggested that the combination of visual and auditory information increases performance beyond that afforded by one modality by itself, especially at low light and sound amplitudes (Stein and Meredith, 1993). This idea was examined in the barn owl by Whitchurch and Takahashi (Whitchurch and Takahashi, 2006).
One of the simplest models for audiovisual integration is the ‘stochastic facilitation’ model of Raab (Raab, 1962). According to this hypothesis, also called the ‘race model’, auditory and visual information ‘race’ along their pathways to reach a common neural station that triggers a reaction, such as the redirection of the head or eyes (i.e. a saccade). This trigger zone responds to the information that arrives earlier. If, on the whole, auditory and visual information have an equal chance of arriving earlier, the overall reaction time would be slightly faster than that evoked by either modality alone. This faster response occurs stochastically and requires no additional neural processing.
By using auditory and visual stimuli having a wide range of intensities, Whitchurch and Takahashi (Whitchurch and Takahashi, 2006) examined the effects of stimulus strength on saccade latency and saccade error. The statistics of saccades to unisensory targets were used to predict responses to the bisensory targets, and the predictions were compared to empirical observations.
Acoustically guided saccades were found to have shorter saccade latencies than visually guided saccades, but the visually guided saccades were more precise. Interestingly, audiovisual saccades occurred no earlier than did auditory saccades nor were they more accurate than visual saccades, which is at odds with the idea that the combination of two modalities yields better performance than does either modality alone. The stochastic facilitation model is based on the competition between two separate streams of sensory information. Does the modality that ‘wins’ the race determine all characteristics of the ensuing saccade or does the presence of the stream that ‘lost’ have an influence? In other words, do audiovisual saccades with auditory-like saccade latencies also have auditory-like precision?
Interestingly, the answer was no. Audiovisually guided saccades in the owl had, instead, the best features of both acoustically and visually guided saccades. They were both fast and accurate. These results support a model of sensory integration in which the faster modality initiates the saccade and the slower modality remains available to refine the saccade trajectory (Corneil et al., 2002).
Conclusions
In the roughly 40 years since Mark Konishi published his seminal review, the barn owl has not only become an established model for the neural mechanisms of sound localization, but has also contributed to fields beyond spatial hearing. Detractors have questioned the relevance of discoveries made in a specialized bird to human hearing. This criticism misses the point. The specializations of the owl, such as its ability to phase lock at high frequencies or the formation of the space map, make it easier to study the fundamental processes, such as the coding of ITD or the integration of information across frequencies. Having understood these principles in a species specialized for spatial hearing, one can then determine the modifications that must be made in the theories and models gleaned from the specialist so that they work in more generalized animals. This is the power of the comparative approach.
Glossary
- EI cells
EI cells are excited by sounds in the contralateral ear and inhibited by inputs in the ipsilateral ear. This innervation renders the cells especially sensitive to differences in the magnitudes of the sounds in the two ears (ILD).
- HRTF (head related transfer function)
A mathematical description of the way in which the magnitude and phase angles of spectral components in a sound are altered by the head and ears in a direction-specific manner.
- IC (inferior colliculus)
The IC is a major way-station in the central auditory system. Nearly all information must pass through the IC to reach higher auditory areas, such as the medial geniculate nucleus (nucleus ovoidalis in birds).
- ICc (central nucleus of the inferior colliculus)
A subdivision of the inferior colliculus found medial to the ICx where the space-map is found. It consists of at least two further subdivision, the core and the lateral shell.
- ICc-core (core of the contralateral central nucleus of the inferior colliculus)
The ICc core, subdivision of the ICc, consists of a tonotopic array of neurons that are sensitive to ITD. It receives a direct projection from the NL and projects directly to the contralateral ICc-ls.
- ICc-ls (lateral shell of the opposite central nucleus of the inferior colliculus)
The ICc-Is, subdivision of the ICc, consists of a tonotopic array of neurons that are selective for both ITD and ILD. It receives a direct projection from the contralateral IC-core and projects directly to the ipsilateral ICx.
- ICx (external nucleus of the inferior colliculus)
A subdivision of the IC found along the lateral margin of the IC just deep to the tectal ventricle in birds. In the barn owl, the neurons in the ICx are sharply tuned for the location of sound sources and are organized to form a topographic representation of auditory space.
- ILD (interaural level difference)
The difference in the magnitude of the sound in the left and right ears. In owls, ILD is a cue for the location of a sound source along the horizon at low frequencies. At high frequencies, it is a cue for the elevation of a sound source. In the literature, ILD is sometimes referred to as the interaural intensity difference (IID).
- ITD (interaural time difference)
The difference in the time-of-arrival of sounds in the left and right ears. In owls, ITD is the acoustic cue for the location of the sound source along the horizon.
- Lag discrimination suppression
A perceptual phenomenon encountered when a sound and its echo are presented. When subjects experience lag-discrimination suppression, they are unable to locate the source of the echo.
- LLDp (posterior subdivision of the dorsal nucleus of the lateral lemniscus)
One of the auditory nuclei found in the rostral pontine tegmentum with its somata interspersed amongst the fibres of the lateral lemniscus. It is thought to compute ILD. The nucleus LLDp was formerly called the VLVp.
- Localization dominance
A perceptual phenomenon encountered when a sound and its echo are presented. When subjects experience localization dominance, they hear only one sound and localize it at the location of the direct sound.
- NA (nucleus angularis)
One of the cochlear nuclei. Found in the caudal pons, its cells are sensitive to the amplitude sounds.
- NL (nucleus laminaris)
A nucleus of the caudal pons located just medial to the nucleus magnocellularis (NM). Innervated by the NM of both sides, it is thought to be involved in the computation of ITD.
- NM (nucleus magnocellularis)
One of the cochlear nuclei. Found in the caudal pons, its cells synchronize their firing to the fine structure of sounds.
- Place code
A place code describes the way in which an object in space (e.g. a loudspeaker) is represented in the brain as the activity of a cluster of neurons. Importantly, the identity of the neuron signifies the location of the object.
- Saccade
A rapid turn of the eyes or head towards an object. Owls, whose eyes are nearly immobile, make only head saccades.
- Slope code
Slope codes refer to the way in which a change in the location of a stimulus' location is signified by a change in the firing rates of neurons.
- SRF (spatial receptive field)
The locations in space from which a cell can be excited.
- Virtual auditory space
Sounds can be filtered with HRTFs and presented over headphones. By filtering with HRTFs, the frequency-specific ITDs and ILDs that are characteristic of a sound source at a particular location are imparted, giving a highly realistic auditory-spatial experience over headphones. Sounds thus presented are said to be presented in virtual auditory space.
This work was supported by the NIH (RO1 DC003925). Deposited in PMC for release after 12 months.