The role of sound in Drosophila melanogaster courtship, along with its perception via the antennae, is well established, as is the ability of this fly to learn in classical conditioning protocols. Here, we demonstrate that a neutral acoustic stimulus paired with a sucrose reward can be used to condition the proboscis-extension reflex, part of normal feeding behavior. This appetitive conditioning produces results comparable to those obtained with chemical stimuli in aversive conditioning protocols. We applied a logistic model with general estimating equations to predict the dynamics of learning, which successfully predicts the outcome of training and provides a quantitative estimate of the rate of learning. Use of acoustic stimuli with appetitive conditioning provides both an alternative to models most commonly used in studies of learning and memory in Drosophila and a means of testing hearing in both sexes, independently of courtship responsiveness.
Over half a century of research has established the role of sound in Drosophila courtship. A male extends one wing and vibrates it at a nearby female, producing a stereotyped pattern of pulse and sine song. Throughout the genus Drosophila, these wingbeat songs are species specific in their pattern and harmonic content (Hoikkala and Lumme, 1987; Hoy et al., 1988); D. melanogaster is no exception (Bennet-Clark and Ewing, 1967; Bennet-Clark and Ewing, 1969). Females detect this acoustic signal with their antennae (consisting of the arista and Johnston's organ), as has been shown by recordings of sound-evoked field potentials in Johnston's organ and from the antennal nerve (Ewing, 1978; Eberl et al., 2000; Tauber and Eberl, 2003).
Research into learning in Drosophila has been going on for about as long as study of courtship. In this time, most work has used aversive stimuli to test learning of chemical cues (e.g. Quinn et al., 1974; Dudai et al., 1976; Tully and Quinn, 1985; Pitman et al., 2009). For example, an odor is presented along with an electrical shock via the substrate; after a number of such trials, flies learn to associate the odor with the shock and make avoidance responses when presented with the (previously neutral) odor alone. In terms of classical conditioning, shock is an unconditional stimulus (US), avoidance is an unconditional response (UR) and odor is a conditional stimulus (CS). If associative learning occurs, the UR is evoked by the CS alone after a series of CS–US pairings. Some studies have used appetitive rather than aversive conditioning, with sucrose as the US and proboscis extension as the UR (e.g. Tempel et al., 1983; Chabaud et al., 2006). When a fly steps in sugar water, its tarsal chemoreceptors trigger a feeding reflex that extends the proboscis, through which it sucks the fluid (Dethier, 1976). This proboscis-extension reflex (PER; Fig. 1D) is a fixed act, common to many insects, that has been used in studies of olfactory and taste learning in a variety of insects in addition to Drosophila (e.g. Nelson, 1971; Bitterman et al., 1983; Daly and Smith, 2000).
Much of the interest in learning and courtship in Drosophila is due to its status as a model organism in which mutants can be easily screened. Although many auditory mutants have been identified (Caldwell and Eberl, 2002), there is a limitation in that `the only known acoustic behavior of fruit flies is their response to courtship songs' (Inagaki et al., 2010). In fact, all current methods of screening for auditory mutants are based on courtship, by testing either the receptivity of females or the tendency of males to court one another when stimulated with pulse song (Eberl et al., 1997; Inagaki et al., 2010). A new method of behaviorally testing hearing in both sexes, independent of courtship, could advance the study of hearing in Drosophila.
In the present study, we employed an appetitive conditioning protocol using sugar water as a reward, a non-courtship sound as a neutral stimulus and proboscis extension as an indicator of learning. We also recorded from Johnston's organ to verify that our stimuli were audible. To our knowledge, no prior studies of learning in Drosophila have used acoustic stimuli and only a few have used appetitive conditioning.
MATERIALS AND METHODS
Training and testing were done according to the timeline in Fig. 2. Each fly was given six training trials. In `paired' trials, flies were rewarded with 5 s access to sucrose (1 mol l–1 solution) 5 s after the onset of a 10 s sound stimulus. In `unpaired' trials, sucrose was presented 30 s after the end of the sound, while sucrose was never presented in `no-reward' trials. In all three types of trial, PER strength was rated during the first 5 s of sound stimulation, before the onset of any reward, and flies were presented with water 120 s after the sound stimulus to prevent dehydration and to wash off any sucrose remaining on the tarsi. Water presentation was isolated from sound stimulation by 2–3 min before and after, making it unlikely to affect training. The six training trials were followed by two trials testing for retention, one at 15 and one at 25 min after training, in which only the sound was presented and PER strength was rated.
The training device (Fig. 3) was built on a rotating 16 cm diameter kymograph drum (Bird Kymograph no. 70-060; Phipps & Bird Inc., Richmond, VA, USA), based on published designs (Vargo et al., 1983; Holliday and Hirsch, 1986; Brigui et al., 1990). The drum rotated fully in 5 min, presenting sound, sucrose and water on the schedule shown in Fig. 2. The sound stimulus was activated when a magnet on the drum moved past a reed switch 1 cm away from the drum. Closing the switch made no audible sound and did not transmit vibration to the drum. Sucrose solution and water were delivered from two different 8×160 cm strips of filter paper (Whatman no. 2300 916) mounted on the drum, fed by reservoirs on top of the drum. Three flies were tested at a time, each one loaded into a pipette tip and placed with its foreleg tarsi in contact with the drum. The three fly heads filled the frame of a video camera for later analysis.
The conditional stimulus was a 10 s, 400 Hz tone (DynaScan Corp. 3011 function generator; Ivine, CA, USA) broadcast through a 16 cm paper cone woofer and directed at the flies through a plastic funnel, with the 24 mm opening of the funnel 20 mm from the flies, placing flies in the near field. To avoid transmission of vibration to the rotating drum, the speaker was held by a stand not attached to the drum. Intensities were calibrated in dB SPL (re. 20 μPa) with a Brüel & Kjær type 2209 sound level meter with a type 4138 1/8 in microphone in the location that would be occupied by the central fly. Three intensity levels were used, 65 dB SPL (quiet – just above auditory threshold), 85 dB SPL (moderate – near the natural level of courtship song) (Bennet-Clark, 1971) and 108 dB SPL (loud). All three training conditions, paired, unpaired and no reward, were tested at each of the three intensities.
Proboscis extension is an unconditional response when the fly's tarsi contact sucrose solution (Nelson, 1971; Médioni and Vaysse, 1975; McKenna et al., 1989). All training and test sessions were recorded on video for later frame-by-frame analysis. PER strength was categorized as no response, weak response or strong response (Fig. 1) (Chabaud et al., 2006). Rating was carried out by two observers, at least one of whom did not know the training protocol in use.
Three-day-old virgin female D. melanogaster (Canton-S-5 strain) (McKenna et al., 1989) were taken from our laboratory cultures. They were starved for 24 h before testing to increase their motivation to feed.
In Pavlovian conditioning, learning is an increase in response over time when a stimulus predicts a reward (paired trials), compared with a constant response over time when stimulus and reward are uncorrelated (unpaired trials). To test whether learning occurs, we used a parametric statistical model in which differences between paired and unpaired groups and the change in response within each group over time are integrated into a logistic regression model:
where P(t) is the probability that a fly responds at trial t (t=0 being the initial trial). These equations describe sigmoidal growth curves for probability of response as a function of trial number. The rate of growth is reflected by the regression coefficients βT (for the test, paired condition) and βC (for the control, unpaired condition), while the baseline response is reflected by the intercept, α.
Consequently, we can ascertain whether learning occurs using just three statistical tests. First, the log-odds of a response at t=0 are equal to α for both groups. Thus, we test whether the initial response rate is the same for the paired and unpaired groups, using a test for the difference between two proportions or a related procedure such as Fisher's exact test. We call this test T0 and its null hypothesis H0(0). Next, any change in response rate over time should be monotonic, as the log-odds are linear functions of t with slopes βT and βC for the paired and unpaired groups, respectively. In particular, if the stimulus induces learning, there should be an increase in the response rate of the paired group (i.e. the regression coefficient βT is positive) but not of the control group (i.e. βC is zero or less). Thus, our second (T1) and third (T2) statistical tests concern the hypotheses:
Rejection of H0(2) along with failure to reject H0(0) and H0(1) would be evidence of Pavlovian learning.
Because three tests are performed, the significance level for each test must be adjusted to ensure that the overall experimental Type I error is controlled at a level of 0.05. Using the conservative Bonferroni procedure, hypotheses H0(0), H0(1) and H0(2) are tested at the reduced level 0.05/3=0.0167. To account for the fact that responses across trials within each fly are dependent on each other, estimates of the regression coefficients (i.e. learning rates), their standard errors, and P-values used in tests of H0(1) and H0(2) are obtained using the technique of generalized estimating equations (GEE) (Zeger and Liang, 1986; Verbeke and Molenberghs, 2001). Using GEE rather than likelihood-based methods ensures that the standard error estimates for regression coefficients reflect the dependence of the responses recorded for each fly across the series of trials, thereby avoiding inappropriate assessments of statistical significance.
As an independent test of stimulus audibility, we recorded from Johnston's organ. Recording and stimulus calibration were done as previously (Arthur et al., 2010). A fly was dorsally tethered with paraffin and positioned such that the plane of the arista was parallel to sound waves emanating from a speaker. A tungsten electrode was inserted in the second antennal segment (Johnston's organ) and the Bayesian QUEST procedure (Watson and Pelli, 1983) was used to adaptively quantify thresholds to a 400 Hz stimulus. Thresholds to periodic oscillations at the fundamental and second harmonic of the stimulus frequency were tracked in parallel (King-Smith et al., 1994), with a stimulus deemed above threshold if the response was greater during the 0.5 s stimulus presentation than during an equal amount of time immediately preceding it. The customary Weibull function was used as the assumed neurometric function with the threshold criterion set to 75%, the slope (β) to 0.05 based on pilot data, the probability of failure at infinite intensity (δ) to 0.01, and the probability of success at negative infinity (γ) to 0.5. To facilitate validation of the estimated slope parameter, each fly was used in ≥200 trials, with a uniform random intensity within the 55–95% range of the current estimated threshold for each trial (supplementary material Fig. S1). The final threshold estimate was taken to be the mean of the posterior density function when its standard deviation dropped to 3 dB. Immediately after Johnston's organ recordings were concluded, control recordings were conducted in the contralateral eye of each fly to check for stimulus artifact due to coupling with the speaker. Both antennae were removed for control recordings because attenuated potentials from antennae may be recorded throughout the head, as we and others have observed in mosquitoes (Wishart et al., 1962; Arthur et al., 2010). In all cases the reference electrode was placed in the thorax through a coxal stump following removal of a leg.
Sound intensity and calibration
Because antennae respond to the particle-velocity component of sound (Bennet-Clark, 1971), intensities ought to be calibrated with velocity microphones and presented in dB SPVL (re. 50 nm s–1). However, measurement of particle velocity is difficult in confined spaces; indeed, both velocity and pressure can vary considerably over small distances in echoic environments. Most studies of Drosophila hearing and courtship take place in small and echoic chambers (e.g. Eberl et al., 1997; Inagaki et al., 2010), with calibration microphones often placed near but outside the chamber. Some use pressure microphones, some use velocity microphones, and some give no calibration detail at all. As a result, it is difficult to compare intensities across studies.
Calibration of stimuli for learning was done as described above with a pressure-sensitive microphone at the central fly position. We subsequently mapped the sound field between the funnel exit and the kymograph drum with both a pressure microphone and a pressure-gradient (particle-velocity) microphone as detailed below. Intensity was highest at the center of this space, dropping off at the edges and near the drum, with velocity varying more than pressure. At our moderate intensity level, for example, at the surface of the drum we measured 85 dB SPL and 95 dB SPVL at the location of the central fly, and 82 dB SPL and 90 dB SPVL at the edge of the funnel; moving the microphones away from the drum and toward the center of the funnel exit, we measured 88 dB SPL and 109 dB SPVL. The nominal values we report came from the center near the drum; intensities experienced by the three flies would vary somewhat by position and possibly by trial. While it is unlikely that intensities were precisely 65, 85 or 108 dB SPL for each fly in each trial, it is certain that the 108 dB stimulus was louder than the 85 dB stimulus, which was louder than the 65 dB stimulus.
Without the constraint of putting flies in a training apparatus, physiological recording took place in a far cleaner acoustic environment. In an anechoic far field, dB SPL and dB SPVL are numerically equal (Bennet-Clark, 1971) because a pressure of 20 μPa (0 dB SPL) corresponds to a velocity of 50 nm s–1 (0 dB SPVL). We calibrated in dB SPVL using a pressure-gradient microphone (Knowles NR-23158) as described previously (Arthur et al., 2010) and in dB SPL using a pressure microphone (Brüel & Kjær type 4138). The results confirm that our recording location was in fact anechoic and far-field at 400 Hz.
The results of the training study are shown in Fig. 4. For the purpose of analysis, weak and strong PER responses were lumped to create simple response and no-response categories. Results in the no-reward condition were indistinguishable from those in the unpaired condition, so our analysis considered only the paired and unpaired trials. It is clear that flies in the unpaired condition did not learn to associate sound with sucrose: the proportion responding stayed at the initial level throughout. Flies in the paired condition showed an increase in response probability as trials progressed, indicating that they associated the sound with the sucrose reward. Statistically, these observations are borne out at each intensity level by the three tests described above. The initial response probability did not differ between paired and unpaired groups (Fisher's exact test: P=0.45 at 65 dB, P=0.49 at 85 dB, P=0.24 at 108 dB), regression coefficients of unpaired trials are not significantly greater than zero (GEE Z-test: P=0.43 at 65 dB, P=0.13 at 85 dB, P=0.30 at 108 dB), and regression coefficients of paired trials are positive (GEE Z-test: P=0.0167 at 65 dB, P=0.0006 at 85 dB, P<0.0001 at 108 dB). Thus, the results meet the standards of learning as described by our logistic regression model. In the two test trials, responses remained high, showing that the association between sound and sucrose was retained 15 and 25 min after the end of training.
To test whether learning required the auditory modality, we conducted a set of paired trials at 108 dB with 14 flies in which the antennae were ablated. These flies did not show an increase in response as trials progressed (GEE Z-test: βT=–0.18±0.12, mean ± s.e.m., P=0.11; data not shown).
It is clear from Fig. 4 that stimulus intensity affected the proportion of flies responding in all trials, in both paired and unpaired groups. As sound intensity increased, the proportion responding decreased (chi-square test: P<0.0001). However, the slope of the logistic regression (βT) did not vary significantly with intensity.
Recordings from Johnston's organ (Fig. 5) show that all stimuli were audible. The response consists of a phasic negative deflection of the baseline followed by a tonic positive deflection. In addition, the field potential oscillates at the frequency of the stimulus and its harmonics. Our Bayesian threshold search was based on the oscillatory potential; post hoc analysis of the baseline shift showed its threshold to be higher than that of the oscillatory potential. In one fly, the baseline shift was not phasic but was a tonic negative deflection (supplementary material Fig. S2) as in Aedes mosquitoes (Cator et al., 2009; Arthur et al., 2010). Across all 11 flies tested, thresholds with a 75% criterion were 65±1 dB (mean ± s.e.m.) for the stimulus frequency (F0) and 64±1 dB for twice the stimulus frequency (F1) with a 0.5 s stimulus. These levels are approximately those used as the least intense CS in training experiments. However, these may be conservative estimates of threshold. With longer stimuli, such as the 10 s used in our learning experiments, thresholds could be lower if flies integrate over long periods.
Appetitive conditioning with acoustic stimuli
Our results and analysis show that female Drosophila learned to associate a tone with a sucrose reward. While Drosophila has long been a model system for the study of learning and memory, most previous work has used chemical stimuli as the CS and aversive stimuli, generally electrical shock, as the US (e.g. Quinn et al., 1974; Dudai et al., 1976; Tully and Quinn, 1985; Pitman et al., 2009). In contrast, we used an acoustic CS and a rewarding US. The prior study most similar to ours was by Chabaud and colleagues, who examined appetitive conditioning with an olfactory CS (Chabaud et al., 2006). Their wild-type flies started from a baseline of 25–40% PER, increasing to 60–75% PER after five trials, similar to the baseline and change we found in the 65 dB group.
Sound is neutral in the context of feeding but salient in the context of courtship. We chose 400 Hz as the CS in the hope of finding an auditory stimulus that was neutral but audible. This frequency is sufficiently far from that of the sinusoidal component of courtship song (160 Hz) (Wheeler et al., 1988) that it is unlikely to be perceived as courtship, but likely to be within the audible range for insects that hear with their antennae. Recordings from Johnston's organ verified that the CS was above threshold.
Given our assumption that 400 Hz is neutral, neither attractive nor aversive, the effect of intensity on proboscis extension is puzzling. While the 65 dB group started from a baseline of 40% PER, the 108 dB group started at only 10% PER. Many animals, including insects, respond to loud sounds with an acoustic startle response that, while not eliciting escape behavior, freezes or disrupts ongoing behavior (Eaton, 1984; Hoy, 1989). This may account for the low PER at 108 dB. Indeed, informal analysis of video collected before each trial found flies in the 108 dB group extending their proboscises, in the absence of stimuli, to an extent approximately equal to that found during the 65 dB trials. Thus, it appears that proboscis extension was reduced by loud sound. Despite its effect on background responsiveness, intensity had no effect on the rate of learning (βT in Fig. 4), contradicting the usual expectation that rate increases with CS intensity (Davey, 1981). Taken together, these observations suggest that sound intensity can be kept near threshold in learning studies, avoiding startle without slowing the rate of learning. If a lower background response is desired, it can be achieved by increasing sound intensity.
Analysis of learning
Because our statistical tests for learning differ from those commonly used in Drosophila learning research, we offer an explanation and comparison with other methods. A logistic model with GEE estimation, hereafter referred to as LGEE, is generally applicable to associative learning. It has been used in several other learning studies; our method is most similar to that recommended by Hartz and colleagues (Hartz et al., 2001) and used by Shafir and colleagues (Shafir et al., 2005). Use of LGEE is motivated by two statistical considerations. First, the logistic model captures the dynamic, sequential nature of learning in a simple model that permits meaningful tests for time trends within groups as well as for differences in learning rates between groups. Indeed, logistic regression is the most commonly used statistical tool for modeling trends in proportions (Collett, 2002). Second, LGEE uses all of the data collected from each subject on each trial while accounting for the dependence of responses within a subject across trials.
Although established as a learning model in other research communities, the use of LGEE with Drosophila is novel. Many studies simply report the difference between post-training and pre-training responses and compare it between paired and unpaired groups. While this is a valid way to determine whether learning occurs, it provides no information about the rate of learning. Chabaud and colleagues (Chabaud et al., 2006) improved on this by assessing learning as follows. (1) For each trial, employ a chi-square or related test (such as Fisher's exact) to check whether the proportion of responses in the paired group differs from that in the unpaired group. (2) For each group, apply Cochran's Q-test to determine whether the proportion responding is constant over time. Thus for six training trials, a total of eight statistical tests are required, six to compare responses between groups in each trial and two to test for constant response in each group. We henceforth refer to this set of procedures as CHIQ.
The CHIQ procedure enforces no preconceived notion of learning, in the form of either trends over time or the direction of differences between paired and unpaired groups. Thus, there is ambiguity in how one detects learning. Must each of the chi-square tests be statistically significant? If not, which of them must be significant in order to conclude that learning has occurred? The role of Cochran's Q-test is unclear when used with the full set of between-group tests. It is not even clear that the test is valid in this context. As originally designed, Cochran's Q-test was not meant to evaluate the equality of proportions across trials within a group unless more stringent assumptions are imposed on the probability of response for all subjects in all pairs of trials (Bhapkar, 1973; Somes and Bhapkar, 1977). A further drawback of CHIQ is the need to adjust significance levels for the larger number of tests. For an overall type I error of 0.05, each of the eight tests that would be required for our data would have to be done at a level of 0.05/8=0.00625, as opposed to the 0.0167 required for LGEE. This, combined with the fact that none of the tests used in CHIQ are directional, greatly increases its conservatism in assessing learning. The associated price is a loss of statistical power, possibly severe, for detecting learning when learning in fact exists.
LGEE equates learning with a monotonic increase in the proportion responding over a series of trials in the paired group but not in the unpaired group, while CHIQ detects any kind of change in response profiles, either between or within groups. In theory, this gives CHIQ flexibility, trading loss of statistical power for greater robustness. For example, the LGEE assumption of monotonicity could be violated by satiation, although this is not a concern with a relatively small number of trials and a well-designed experiment. In principle, however, we argue that true learning must manifest itself, at least initially, in non-decreasing (if not monotonically increasing) responses, and the primary goal should be detection of that directional trend. For this purpose, LGEE is powerful and robust. Furthermore, it would be easy to extend the logistic model to allow both non-decreasing and non-monotonic trends, although as with CHIQ, adding such flexibility complicates the definition of learning.
Recordings from Johnston's organ confirm that the auditory stimuli in learning tests were above sensory threshold. Our threshold of 64–65 dB SPVL in females is close to the 72 dB SPL threshold found behaviorally in males (Eberl et al., 1997). The oscillatory component of the evoked field potential was qualitatively similar to that recorded in mosquitoes (e.g. Tischner, 1953; Wishart et al., 1962). Most baseline shifts were phasic, as described for Culex mosquitoes by Warren and colleagues (Warren et al., 2009), but one fly showed a sustained deflection such as we found in Aedes mosquitoes (Cator et al., 2009; Arthur et al., 2010). Atypical recordings from one of 11 flies might be dismissed as damage during preparation, but this fly had a normal threshold and its responses were consistent and similar to those found in other species. We suspect that the nature of the baseline shift varies with electrode placement in a heterogeneous population of scolopidia (Kamikouchi et al., 2006; Kamikouchi et al., 2009); future studies could systematically vary placement to test this.
Recording from Johnston's organ in Drosophila is not difficult, and we suggest that future studies using auditory stimuli in associative learning assays be combined with recordings to narrow down the anatomic location of deficits. Similarly, if auditory mutants are used in associative learning assays, it is important to test them with other learning assays to control for possible pleiotropic effects on learning.
Aside from the nice historical coincidence that Pavlov's original model works for flies as well as for dogs, there are good scientific reasons to develop diverse learning models for Drosophila. For example, the learning mutants dunce and rutabaga fail to learn odors in a shock-avoidance protocol but learn normally in an appetitive task (Tempel et al., 1983), while aversive and appetitive learning of odors take place through different neural and biochemical pathways in Drosophila larvae (Honjo and Furukubo-Tokunaga, 2009). Learning mutants may act differently not only in different protocols but also with different CS modalities. In general, a greater variety of tests would allow a finer parsing of the array of learning mutations.
In principle, our methods could be adapted to test hearing in both sexes independently of courtship, facilitating the discovery of new auditory mutations. While screening for auditory mutations in males is well established (Eberl et al., 1997), there is no known way to screen auditory mutations in females beyond testing for courtship receptivity (Inagaki et al., 2010). Given that factors other than hearing are likely to affect courtship in both sexes, a test of hearing that does not rely on courtship could be valuable; our method seems to be the only such test available at present.
Finally, we urge that researchers in Drosophila learning adopt the logistic regression model with generalized estimating equations. It is a more statistically valid means of analysis than those commonly used today. At the same time, it allows different learning mutants to be quantitatively compared not only in their overall level of learning but also in their rate of learning, which may differ between mutants and could eventually provide insight into mechanisms.
This research was supported by NIH award 5R01DC103-36 to R.R.H. Deposited in PMC for release after 12 months.
We thank Dustin Rubinstein for thoughtful comments and discussion, Margaret Nelson for advice on the setup, Kristin Gawera for drawing the fly in Fig. 3, and two anonymous reviewers for helpful comments on an earlier version of the manuscript.