The Cost of Strategic Control: Attenuation of Adaptation Jordan A. Taylor, Azeen M. Ghorayshi, and Richard B. Ivry Department of Psychology, University of California, Berkeley When instructed, people can employ an explicit strategy to compensate for a visuomotor rotation. However, as training continues, participants make increasingly larger errors in the direction of the strategy (Mazzoni & Krakauer, 2006). This overcompensatory drift is due to an implicit motor learning process that uses an error signal based on the difference between the planned reach location and the observed visual feedback. This system appears to be isolated from the strategic plan, and with adaptation, overrides a successful strategy. Here we present a series of behavioral experiments that allow us to develop a statespace model to elucidate the interaction of strategic and implicit contributions to visuomotor adaptation. First we replicated the drift effect reported by Mazzoni and Krakauer (2006). Participants viewed a display of eight circles separated by 45°, and moved to a cued target. A counterclockwise visual rotation (-45°) was unexpectedly introduced, resulting in a substantial error (Fig. 1). Participants were then told to use a "corrective" strategy, aiming to the circle immediately clockwise to the cued target. This strategy initially resulted in minimal error; the cursor landed near the cued target. However, errors gradually increased in the direction of the strategy-defined target, such that by the end of the block, movement heading was about 57° from the cued target. This block was followed by a no-rotation block during which participants were told to stop using the strategy. An aftereffect provided a signature of implicit adaptation. We initially employed an output-disturbance, state-space model of motor learning (Eqs. 1-3). This model gradually compensates for a visuomotor rotation (Fig. 2, black). However, without a model of the explicit strategy, it cannot capture the rapid initial reduction in error or the overcompensatory drift. Therefore, we modeled the strategy as a setpoint/reference signal that fed directly into the output equation as well as the control equation. The model (Eqs. 4-5) predicts the initial reduction in errors and that participants will eventually overcompensate by 45° to reach the strategy setpoint (Fig. 2, cyan). The magnitude of the drift is attenuated when the strategy leakage gain (Ks) is less than 1 (Fig. 2, magenta). The strategy leakage in the setpoint model captures the observed drift and aftereffect (Fig. 1, fit is black line). The setpoint model predicts that participants will eventually overcompensate by 45°; to test this (unlikely) prediction, we extended the strategy phase to 320 trials in Experiment 2. Drift was always present over the first hundred movements. With more trials, the drift was reversed with participants reducing endpoint error (No-Reward group, Fig 3a). However, the washout block revealed large aftereffects, indicating that implicit adaptation remained robust. Two additional groups were tested to explore variables that we expected may attenuate the drift and reduce implicit learning. A “Reward” group received bonus points based on target accuracy. By emphasizing target accuracy, we hypothesized that participants would adjust their strategy to improve performance and thereby reduce drift. For a “No-Aiming Target” group, only the target circle was visible on each trial. We expected this would attenuate drift because the absence of the aiming target would eliminate a source of visual information used to generate the error signal for implicit adaptation. As expected, drift was attenuated for both groups (Figs. 3b-c). Moreover, the aftereffect was reduced (Fig. 3d), especially in the No-Aiming group. To account for the influence of reward, we modeled the strategy as a function of the expected reward (Sutton & Barto, 1990). This model (Eq. 6) provides an excellent fit (Fig. 4a). The model indicates that, compared to the Reward group, participants in the No-Reward group tolerated a larger error window (Fig. 4c). Interestingly, the No-Reward group was equally sensitive to reward but gave greater weighting to previous rewards (Fig. 4d). To account for the effect of eliminating the aiming target, we modeled the controller as a weighted combination of visual and prioprioceptive feedback relative to the strategy (Eq. 7; see Cheng & Sabes, 2007). Again, the model provided an excellent fit (Fig. 4a), with the parameters indicating that prevention of the direct comparison of visual error and aiming location dramatically reduced implicit learning (Fig. 4b,d). We find that the overcompensatory drift can be accounted for by a state-space learning model in which an embedded strategic setpoint varies as a function of expected reward. This mechanism allows for rapid improvements in performance during the initial stages of learning. However, a cost associated with utilizing this mechanism is a reduction in the magnitude of implicit learning.
B
C
(4) yn = xn + sn + drot (5) un = Kssn - Kyn
Setpoint Model
1. The state (x) is updated by the decaying memory (A) of the previous state and the sensitivity (B) to the feedback controller (u) 2. Target error (y) in movement (n) is a function of the implicit state (x) plus the contribution of visuomotor rotation (drot) 3. For simplicity, the controller (u) was modeled as output feedback, which is the error (y) multiplied by a negative gain (K). 4. Error (y) is the output of the strategy (s), implicit state (x), and the rotation (drot) 5. A strategy (s) multiplied by a strategy gain (Ks) and was incorporated into u, and s≡-drot for 41< n <121. D
Target Error
A
Disturbance Model
Aftereffect Aftereffect
30
30
25
25
20 15
Target Error
Figure 2: Model predictions. Disturbance model (black). Setpoint model (cyan). Setpoint with variable leakage gain (Ks, magenta). Rotation is present for 80 movements (white region), and absent before and after this phase (shaded region).
Figure 1. Replication experiment. The rotation was presented for 80 movements (white region). Participants experienced the rotation for the first two movements before being instructed to use the strategy (magenta circle). Setpoint model fit (black)
(1) xn+1 = Axn + Bun (2) yn = xn + drot (3) un = -Kyn
10
No−Reward Reward No−Aiming Targets
20 15 10 5 0
5
1
20
40
60
80
Movement Number
0
1
20
40
60
80
Movement Number
Figure 3. Experiment 2. Participants first practiced moving to the cued target without a rotation (black) and while using the strategy without a rotation (orange). Rotation block (white region). The rotation was experienced for the first 2 movements without using the strategy (Xs). A) No-Reward (rotation, blue; washout, cyan). B) Reward (rotation, red; washout, magenta) C) No-Aiming Targets (rotation, green; washout, lt. green). D) Aftereffects (binned) Strategy Reward Model (6) sn+1 = sn + δ(rn - wr(E{r(1:n-1)})) a. rn = e
2 −yn 2σ 2
Aiming Target Model (7) un = Kssn - K[wn vn + (1-wv)pn] a. vn ≡ yn b. pn ≡ yn - drot
6. The strategy (s) is updated by the previous strategy, weighted (δ) by the difference between the previous reward and weighted (wr) expectation (E) of reward from previous rewards. The reward (r) is a Gaussian function of the error (y) and the reward window is given by σ. 7. The control signal (u) is a weighted combination (wv) of vision (v) and proprioception (p) relative to the strategy setpoint (s). The visual feedback (v) is the same as the error (y) and the proprioceptive feedback (p) is the same as (y), but is not rotated. Equations 1, 4, 5 and 6 were used to model the differences between the No-Reward and Reward conditions. To model the differences between the Reward and No-Aiming Targets conditions, the parameters from Eq. 6 were held fixed and Eq. 7 replaced Eq. 5.
A B C D Figure 4. Model fits for the No-Reward (blue), Reward (red), and NoAiming Targets (green) conditions. B) Changes over time in the hidden implicit state (dashed) and strategy (solid). C) Estimated reward window (σ) for No-Reward group (blue) and Reward group (red), Shading is 95% confidence interval of the mean. D) Model parameters of interest between No-Reward (blue) and Reward (red) conditions: strategy leakage (Ks), reward sensitivity (δ), expectation of reward (wr). Weighting of visual error, relative to the strategy, (wv) was fit separately for Reward (red) and No-Aiming Targets (green).