SUMMARY
A continuous schedule of reinforcement (CR) in an operant conditioning procedure results in the acquisition of associative learning and the formation of long-term memory. A 50 % partial reinforcement (PR) schedule does not result in learning. The sequence of PR—CR training has different and significant effects on memory retention and resistance to extinction. A CR/PR schedule results in a longer-lasting memory than a PR/CR schedule. Moreover,the memory produced by the CR/PR schedule is resistant to extinction training. In contrast, extinction occurs following the PR/CR schedule.
Introduction
Operant conditioning is a form of associative learning and results from the contingency established between a behavioural response and the presentation of a reinforcing stimulus (Mackintosh,1974). The response—reinforcer contingency can be defined as the probability of receiving the reinforcement for performing the specific behaviour compared with the probability of receiving the reinforcing stimulus in the absence of the behavioural response. In operant conditioning, the frequency of emitting the behavioural responses can be increased or decreased as a result of applying positive or negative reinforcing stimuli, respectively(Domjan and Burkhard, 1993). For optimal conditioning, it is important that the reinforcing stimulus not be presented when the subject is not performing the particular behaviour. Operant conditioning, therefore, results in a specific association between a behavioural response and an external stimulus.
Different schedules of reinforcement are used in studies of operant conditioning and memory of the association. A schedule of continuous reinforcement (CR) involves 100 % contingency between the behaviour and the reinforcement; that is, reinforcement is presented every time the animal performs the behaviour. Partial reinforcement (PR), however, refers to any schedule in which there is less than 100 % contingency so that there are instances when the animal's behaviour is not reinforced. In some operant learning experiments, PR may interfere with the initial acquisition of the operant response, especially when a negative stimulus is used as the reinforcing stimulus; in others, PR has been found to lead eventually to superior performance (e.g. Weinstock,1958).
If, following conditioning, animals receive extinction trials (in which there is no reinforcement), the acquired association is lost, resulting in a behavioural phenotype that resembles the naïve state(Pavlov, 1927). The most important variable determining the magnitude of the behavioural effects of an extinction procedure is the schedule of reinforcement used in the acquisition phase of learning (Domjan and Burkhard,1993). A PR-induced behaviour is more resistant to extinction than a CR-induced behaviour. This phenomenon has been termed the partial reinforcement extinction effect (PREE) and was first described in the work of both Skinner and Humphreys in the late 1930s(Skinner, 1938; Humphreys, 1939). How can partial, but not continuous, reinforcement offer resistance to extinction? One suggestion proposed by Amsel(1972) is that a `disruptive process', based on non-reward, emerges in partial reinforcement acquisition. This disruptive process does not occur in ordinary CR training, so there is no`counterconditioning'; extinction is therefore rapid after CR training. This has sometimes been referred to as the `frustration theory'. Another possibility is that, if the subject does not receive reinforcement after each response during training, it may not immediately `notice' when reinforcement ceases, as in extinction training. The change in reinforcement conditions is more dramatic and noticeable if reinforcement ceases after continuous reinforcement. This particular explanation of the PREE is called the discrimination hypothesis (Domjan and Burkhard, 1993) and is somewhat similar to the Pearce—Hall model in that a PR schedule maintains attention because trial outcomes are always `surprising' (Bouton and Sunsay,2001).
Operant conditioning protocols have been used in both vertebrates(Chen and Wolpaw, 1995; Feng-Chen and Wolpaw, 1996) and invertebrates (Horridge, 1962; Hoyle, 1980; Forman, 1984; Hawkins et al., 1985; Susswein et al., 1986; Cook and Carew, 1989a,b,c; Nargeot et al., 1997). We have studied operant conditioning of aerial respiratory behaviour in the freshwater mollusc Lymnaea stagnalis(Lukowiak et al., 1996). The advantage in using Lymnaea stagnalis as a model system for the study of the neuronal basis of associative learning and its memory is that it has a relatively simple behavioural repertoire and a relatively simple nervous system that is easily accessible to neurophysiological analysis(Spencer et al., 1999;Benjamin et al., 2001; Inoue et al.,2001). Lymnaea stagnalis is a pulmonate mollusc that makes periodic visits to the water surface to replenish its air supply. It is a bimodal breather that possesses a three-interneuron network whose necessity and sufficiency have been demonstrated to mediate aerial respiratory behaviour(Syed et al., 1990, 1992). Recently, we have demonstrated neural correlates of associative learning and its memory in one of the central pattern generator (CPG) neurons, RPeD1(Spencer et al., 1999).
Aerial respiration in Lymnaea stagnalis occurs at the water interface and is achieved by opening and closing movements of its respiratory orifice, the pneumostome. Respiratory behaviour can be operantly conditioned by applying a mechanical stimulus to the open pneumostome whenever the animal attempts to breathe. This aversive reinforcement to the open pneumostome results in its immediate closure and a significant reduction in the overall aerial respiratory activity (Lukowiak et al., 1996).
Although operant conditioning has been studied before in invertebrates,little or no effort has been made to explore the effects of different contingency patterns on the ability to learn in these model systems. Three questions are addressed in the present paper. (i) Is PR is sufficient to induce learning and a subsequent memory similar to that produced by a CR procedure? (ii) Once the acquisition of learning and its consolidation into memory using a CR procedure has occurred, can a PR procedure extend memory persistence? (iii) Finally, in snails subjected to a CR/PR training procedure,will the behaviour be more resistant to extinction (i.e. PREE), as has been demonstrated in various vertebrate preparations for over 60 years? The present findings serve as a basis for future experiments in which the neuronal mechanisms that occur under partial reinforcement in comparison with continuous reinforcement may be elucidated.
Materials and methods
Lymnaea stagnalis
Lymnaea stagnalis (L.), originally obtained from Vrije Universiteit in Amsterdam, were laboratory-bred in our snail facility at the University of Calgary. Snails were 20-25 mm in length. All animals used in these studies had continuous access to food (lettuce) in their home aquaria.
Training and testing procedures
Apparatus and the general operant training procedure
A 11 beaker filled with 500 ml of eumoxic water was made hypoxic by bubbling N2 through it for 20 min. The individually marked snails were placed into the hypoxic water for a 10 min period of acclimation. During this period, they were free to open and close their pneumostome. At the end of this period, all snails were gently pushed under the water, and the training then began.
The reinforcing stimulus used in these experiments was a tactile stimulus to the pneumostome area applied as the pneumostome began to open. The tactile stimulus resulted in closure of the pneumostome; the snails usually remained at the water's surface. The tactile stimulus used throughout these experiments did not elicit the whole-body withdrawal/escape response. The time of each stimulus was recorded.
Experiment 1
Continuous reinforcement (CR). Snails were subjected to three 30 min training sessions with each training session separated by 1 h. A 30 min memory test session was administered the following day to test for long-term memory. Every pneumostome opening was immediately followed by a tactile stimulus to the pneumostome area, resulting in immediate closure of the pneumostome.
Partial reinforcement (PR). Snails were subjected to three 30 min training sessions separated by 1 h. Snails receiving PR were given reinforcement on every second opening (50 % of openings were reinforced) and fell into two categories: those receiving reinforcement every odd-numbered opening (i.e. reinforced on the first, third, fifth pneumostome opening, etc.)and those receiving reinforcement every even-numbered opening (i.e. reinforced on the second, fourth, sixth pneumostome opening, etc.). The time of each pneumostome opening was recorded.
Experiment 2
CR only. Snails received two 45 min training sessions separated by 1 h. A 45 min memory test session was administered either 2 or 3 days later. Again, every pneumostome opening was immediately followed by a tactile stimulus to the pneumostome area, resulting in immediate closure of the pneumostome. The time of each stimulus was recorded.
CR followed by PR. Snails received two 45 min training sessions separated by 1 h on day 1 in which every pneumostome opening resulted in reinforcement (i.e. the CR schedule). On day 2, they received two further 45 min training sessions separated by 1 h. However, these snails now received reinforcement on every odd-numbered pneumostome opening (i.e. the PR schedule). Three days later (day 5), a 45 min memory test was given. During the memory test, all openings were reinforced (i.e. the CR schedule).
PR followed by CR. Snails first received two 45 min PR training sessions separated by 1 h on day 1. On day 2, they received a further two 45 min CR training sessions separated by 1 h. Three days later (day 5), a 45 min memory test was given. During the memory test, all openings were reinforced(i.e. the CR schedule).
Experiment 3
CR followed by PR and extinction training. Snails initially received two 45 min CR training sessions separated by 1 h on day 1. On day 2,they received a further two 45 min PR training sessions separated by 1 h. On day 3, they received two 45 min extinction training sessions separated by 1 h. In the extinction training sessions, no reinforcement stimuli were administered. That is, animals were allowed to open their pneumostome without receiving any reinforcement. The following day (day 4), a 45 min memory test was given. During the memory test, all openings were reinforced (i.e. the CR schedule).
PR followed by CR and extinction training. Snails first received two 45 min PR training sessions separated by 1 h on day 1. On day 2, they received two 45 min CR training sessions separated by 1 h. On day 3, they received two 45 min extinction training sessions separated by 1 h. The following day (day 4), a 45 min memory test was given. During the memory test,all openings were reinforced (i.e. the CR schedule).
Operational definitions of learning, memory and extinction
We have operationally defined learning and memory as we have previously done (Lukowiak et al., 1996, 2000; Spencer et al., 1999). Briefly, associative learning is defined as a significant effect of training on the number of attempted pneumostome openings (one-way analysis of variance,ANOVA, P<0.05; followed by a post-hoc Fisher's LSD protected t-test, P<0.05 within each separate group). The number of attempted pneumostome openings in the final training session had to be significantly less than the number of attempted pneumostome openings in the first session.
Memory was defined as being present if: (i) the number of attempted pneumostome openings in the memory test session was not significantly different from the number of attempted openings in the last training session and (ii) the number of attempted openings in the memory test session was significantly less than the number of attempted openings in the first training session.
Extinction was defined as being present if the number of attempted pneumostome openings in the memory test session was significantly greater than the number of attempted openings in the last training session.
Results
The first question we asked was whether it was necessary for the snail to receive the reinforcing stimulus every time it began to open its pneumostome to produce associative learning. Using a 50% partial reinforcement (PR)schedule in which we delivered the tactile stimulus to the first and then every other opening of the pneumostome (the `odd' reinforcement schedule),snails (N=20) received three 30 min PR training sessions with each training session separated by a 1 h interval. A second cohort of snails(N=20) received a slightly different PR schedule of training. These snails (the `even' group) received the tactile stimulus on the second and then every other attempted pneumostome opening. A control group of snails(N=15) received the reinforcing stimulus on every opening for three 30 min sessions with a 1 h rest interval between each session (the continuous reinforcement group, CR). The control CR group(Fig. 1A) demonstrated learning and long-term memory. However, neither the `odd' nor the `even' PR group demonstrated learning, and we did not, therefore, test for long-term memory(Fig. 1B,C). In both these groups, there was no significant difference in the number of attempted pneumostome openings across the three training sessions. Thus, a PR schedule does not result in learning in operant conditioning of aerial respiration in Lymnaea stagnalis.
Next, we wished to examine whether the PR schedule before or after the CR schedule differentially influenced learning and/or the duration of memory retention. To perform these experiments, we used a slightly different CR training procedure to produce learning and long-term memory. We used two naive groups of snails (N=20 and N=19) to show that two 45 min CR training sessions separated by a 1 h interval result in learning and long-term memory that persists for 2 days but not for 3 days(Fig. 2). We then turned our attention to the effect that a combined PR/CR schedule had on the ability of snails to learn and form memory using the two 45 min training session procedure. Thus, one group of snails (the CR/PR group, Fig. 3A) was given two 45 min CR training sessions with an interval of 1 h between each training session on day 1 and two 45 min PR training sessions of on day 2. A second group (the PR/CR group, Fig. 3B) of snails received the PR training sessions on day 1 and the CR training sessions on day 2. In the CR/PR group, the snails exhibited learning and long-term memory when tested 3 days after the final PR training session (i.e. on day 5). That is,the number of attempted pneumostome openings in the memory test session was not significantly different from the number in session 4 (the last training session) but was significantly different from that in session 1(P<0.01) (Fig. 3A). In the PR/CR group, there was learning, but long-term memory was not demonstrated 3 days later. That is, the number of attempted pneumostome openings on the second CR training session (session 4) was statistically smaller than the number of attempted openings in the first CR training session(session 3). However, the number of attempted openings in the memory test session was significantly greater than the number in the last CR training session (session 4) but was not different from the number in the first CR training session (session 3) (Fig. 3B). Two main conclusions can be drawn from these data. The first is that the order in which snails receive CR and PR training alters their ability to form long-term memory. The second conclusion is that partial reinforcement occurring after the acquisition of learning prolongs the persistence of memory.
Previously, we have demonstrated that this associatively learned behaviour can be extinguished (McComb et al.,2001). We now wished to explore whether the different reinforcement schedules (CR versus PR) used above affected the process of extinction. We therefore subjected two different groups of snails to CR/PR or PR/CR reinforcement schedules prior to administering extinction training (see Materials and methods). As in the experiments shown in Figs 2 and 3, each group of animals received two 45 min training sessions separated by a 1 h interval of either CR or PR on day 1. On day 2, they again received two further 45 min training sessions separated by a 1 h interval (if they received CR on day 1, they received PR on day 2 and vice-versa). On day 3, both groups (CR/PR and PR/CR) received two 45 min extinction training sessions, with each training session separated by a 1 h rest interval. Twentyfour hours later, we tested for extinction in both groups. If extinction had occurred, we would expect there to be no memory. That is, the number of attempted pneumostome openings observed in the extinction test session should be significantly greater than the number on the last operant conditioning training session. The CR/PR group showed no evidence of extinction(Fig. 4). That is, memory was still observed [i.e. the extinction test session was not significantly different from session 4 (the last training session) but was significantly different from the first CR training session (session 1)]. In contrast, memory was not found in the PR/CR group, showing that extinction had occurred. That is, following the extinction training sessions, the number of attempted pneumostome openings in the extinction test session was significantly different from that in the last operant training session (session 4).
Previously, we have demonstrated that yoked control snails do not exhibit learning or long-term memory. Although we performed yoked control experiments on all CR procedures used here, the results have not been presented because they are similar to those published previously (see Lukowiak et al., 2000; Haney and Lukowiak, 2001).
Discussion
In this study, we examined the effect of partial reinforcement (PR) on the acquisition of learning, its consolidation into memory and the resistance of memory to extinction in Lymnaea stagnalis. Although these phenomena have been examined previously in various mammalian species(Mackintosh, 1974), this is the first time to our knowledge that they have been successfully investigated in a molluscan model system. Here, we have shown that a 50 % PR schedule is not sufficient for acquisition of associative learning. Typically, Lymnaea stagnalis can be quickly operantly conditioned with as little as one brief 15 min session of continuous reinforcement (CR)(Lukowiak et al., 2000; Smyth and Lukowiak, 2001). However, neither the `odd' nor the `even' PR schedule was sufficient for the acquisition of associative learning(Fig. 1B,C). In contrast, the three 30 min sessions of CR were sufficient for the acquisition of associative learning and long-term memory. Gonzalez et al.(2000) recently reported similar results from a study involving rats trained on the Morris place task. They showed that animals trained to escape from water failed to learn the location of a submerged platform when it was presented on only 50 % of trials. However, animals exhibited improvement on the acquisition task when the platform was present on 75 % and 100 % of trials. We have not yet attempted to determine whether a 75 % PR schedule would be sufficient for the acquisition of associative learning in Lymnaea stagnalis. Taken together, these results suggest that, while rapid learning occurs over a CR schedule, the use of PR prior to CR has detrimental consequences on learning and subsequent memory formation.
A number of hypotheses have been developed to explain why learning does not occur with a PR schedule. However, none of them adequately explains why learning was not observed. For instance, Williams' invariance hypothesis(Williams, 1989) suggests that the reduced rate of acquisition with PR training may be due to a decrease in the number of reinforcement stimuli delivered. That is, using a PR schedule,the snails receive fewer tactile stimuli in each training session than occurs with the CR schedule. Fewer reinforcement trials would, in this scenario, lead to poorer or no learning. As appealing and intuitive as this hypothesis is,our data are not totally consistent with it. Both associative learning and longterm memory occur with 15 min training sessions(Lukowiak et al., 2000; Smyth and Lukowiak, 2001). In those two studies, snails received approximately the same number of tactile stimuli as the snails did in the present study with the PR schedule. It may be that with a longer PR training session, such as a 1 h session, learning could be observed. A single 1 h CR session is sufficient to produce learning and long-term memory that persists for at least 1 day(Lukowiak et al., 2000). Such experiments are planned in the future.
In addition to the snails' inability to form a learned association with the 50 % PR schedule, we showed that this PR training procedure has a number of significant effects on subsequent memory formation. The first of these effects was the detrimental (i.e. decreased length of memory persistence) effect on the ability to form memory following subsequent CR training sessions(Fig. 3B). That is, even though there was no significant change in the number of tactile stimuli delivered over the two PR training sessions, these `PR-challenged' snails could not form memories as long-lasting as could naïve snails with subsequent CR training (Fig. 3). This result could be interpreted as a form of `blocking', which has previously been seen in both vertebrate and molluscan preparations(Sahley et al., 1981). It is not understood why this blocking effect occurs at a mechanistic level. However, the effect is not due to the fact that there were two PR sessions together with two CR sessions. When the two CR sessions occurred first,long-term memory was still observed (in fact, it existed for longer; see below) following the PR sessions (compare Fig. 2B with Fig. 3A). Thus, the order of CR versus PR sessions is of obvious importance. One advantage of our model system over most other systems is that we may be able to determine at the level of a single neuron, RPeD1, known to be necessary for aerial respiratory behaviour, what the cellular changes are that accompany PR/CR or CR/PR training.
The second effect on memory formation of PR training was an increase in the persistence of long-term memory when PR training occurred after learning had occurred with CR training. Two 45 min CR training sessions result in a memory that persists for 2 but not for 3 days(Fig. 2). However, we found that if, following the two CR training sessions, snails received two 45 min PR sessions, long-term memory persisted for at least 3 days(Fig. 3A). That is, long-term memory was extended by at least 1 day. Again, this demonstrates that the presentation order of PR and CR has significantly different effects. The order of CR/PR presentation can thus either increase or decrease memory persistence. These data parallel the differences in memory persistence that occur when`massed' versus `spaced' training are compared. While the same level of performance is achieved (i.e. learning) with massed versus spaced training, spaced training normally produces a much longer-lasting memory(Lukowiak et al., 1998). CR reinforcement at the beginning of the acquisition phase of learning appears to be necessary but, once some threshold has been reached, a PR schedule can be implemented to maintain the acquired response(Hothersall, 1966).
A third effect of PR on memory retention was the finding that, following the CR/PR training sequence, memory was resistant to extinction(Fig. 4). Previously, we have demonstrated that the associatively learned decrease in aerial respiration can be extinguished (McComb et al.,2001). Using similar extinction protocols we found that, following the CR/PR training sequence, extinction was not observed. That is, memory was still present. This has been termed the partial reinforcement extinction effect (PREE) (Skinner, 1938; Humphreys, 1939). Experimenters have noted this phenomenon in learning studies involving vertebrates ranging from toads (Muzio et al., 1994) to humans(Svartdal, 2000; Leonard, 1975). To our knowledge, this is the first time that PREE has been demonstrated for operant conditioning in a mollusc.
We still do not understand why the PR schedule of reinforcement following learning acquisition results in a more persistent memory. Amsel's `frustration theory' (Mackintosh, 1974)proposes that non-reinforced conditioning, as occurs with PR, leads to an internal state called frustration. Frustration ultimately leads to increased attention, thus allowing for stronger associations between the behaviour and the reinforcing stimuli. This would explain why a PR schedule after acquisition with a CR schedule might lead to a longer-lasting memory or one more resistant to extinction. However, in the context of our experiments, it is uncertain how the absence of a `poke to the pneumostome' would lead to`frustration' in the snails. A second hypothesis is termed the Pearce—Hall model (Bouton and Sunsay,2001). In this scenario, the intermixture of reinforced and unreinforced trials increases the `attention' of the subject. Attention is increased because the trial outcomes are always `surprising'. As stated above,increased attention should increase association strength and thus allow memory to be more persistent. Whether any of these hypotheses is adequate to explain the underlying neuronal mechanisms of learning and retention of memory will be studied in our model system.
Although the PREE has been well documented at the behavioural level, few studies have been performed to determine its underlying neuronal mechanisms. Evidence from lesion studies in vertebrates points to the hippocampus as the site responsible. Thus, the PREE is prevented if lesions are made in mammals before training to the hippocampus(Gonzalez et al., 2000; Jarrard et al., 1986) or to the dorsal ascending noradrenergic bundle, which projects to the hippocampus(Owen et al., 1982). Moreover,lesions to the medial pallium, the amphibian homologue of the hippocampus,also prevent the PREE (Muzio et al.,1994). The discovery of the PREE in Lymnaea stagnalisoffers the opportunity to explore this effect at the level of single neurons known to be both necessary and sufficient for aerial respiratory behaviour(Syed et al., 1990, 1992; Spencer et al., 1999).
Acknowledgements
This study was supported by the CIHR.