Using Inverse Optimal Control To Predict Human Reaching Motion in Collaborative Tasks Jim Mainprice1 , Rafi Hayne2 , Dmitry Berenson2 1
{
[email protected] ,
[email protected],
[email protected]} Max-Planck-Institute for Intelligent Systems, Autonomous Motion Department, Paul-Ehrlich-Str. 15, 72076 Tbingen, Germany 2 Robotics Engineering Program, Worcester Polytechnic Institute, 100 Institute Rd, Worcester, MA 01609.
A great deal of work in the fields of neuroscience [1], [2], [3] and biomechanics [4] has sought to model the principles underlying human motion. However, human motion in environments with obstacles has been difficult to characterize. Furthermore, human motion in collaborative tasks where two humans share a workspace is difficult to model due to unclear social, interference, and comfort criteria. In this work we present a method to learn the cost function of a motion planner that mimics human collaborative manipulation tasks. Our approach is based on Inverse Optimal Control (IOC), which, by considering a set of demonstrations allows us to find a cost function that balances different feature functions. The demonstrations are generated through motion capture, while the feature functions are designed to avoid interference and collision with the partner as well as maintain smoothness of the trajectories. Prediction of human motion is then performed by iterative replanning using the trajectory optimizer STOMP [5], which is able to handle difficult environmental constraints. IOC, occasionally named Inverse Reinforcement Learning, is the problem of finding the cost or reward function that an agent optimizes when computing a trajectory or policy. The early Apprenticeship learning approach [6] consists of solving iteratively the forward problem, modifying the weights at each iteration. Our approach is based on the more recent PIIRL algorithm [7], which does not require solving the forward problem and thus allows handling highdimensional continuous state spaces by only requiring local optimality of the demonstrated trajectories. This work is supported in part by the Office of Naval Research under Grant N00014-13-1-0735 and by the National Science Foundation under Grant IIS-1317462.
Type No replanning With replanning No replanning With replanning
µ
σ min max Joint center distances 52.89 9.66 39.94 67.09 44.91 6.62 36.15 55.20 Task space 49.22 8.25 37.75 63.78 36.20 8.13 24.81 50.77
TABLE I DTW PERFORMED BETWEEN THE DEMONSTRATION OF F IGURE 1 AND THE TRAJECTORIES PLANNED , RESULTS ARE AVERAGED OVER 10 RUNS
Fig. 1. Shared workspace assembly experiment (left) and a demonstration of the benefits of replanning on a difficult example (right). Original motion (red) and predicted motions with (blue) and without (green) replanning.
We applied our framework to data gathered from two humans performing pick and place tasks in close proximity (see Figure 1). To demonstrate the efficacy of our approach we provide test results that compare the learned cost functions with hand tuned versions and without iterative replanning. We evaluated Dynamic Time Warping between the demonstrations and the trajectories obtained from planning with the human kinematic model used in the learning phase (see Table I). We found that we are able to capture a cost function for collaborative reaching motions that outperforms baseline methods in terms of generalizing to unseen reaching examples. Motions obtained with and without replanning are shown in Figure 1. R EFERENCES [1] T. Flash and N. Hogan, “The coordination of arm movements: an experimentally confirmed mathematical model,” The journal of Neuroscience, vol. 5, no. 7, pp. 1688–1703, 1985. [2] E. Burdet, R. Osu, D. W. Franklin, T. E. Milner, and M. Kawato, “The central nervous system stabilizes unstable dynamics by learning optimal impedance,” Nature, vol. 414, no. 6862, pp. 446–449, 2001. [3] E. Todorov and M. I. Jordan, “Optimal feedback control as a theory of motor coordination,” Nature neuroscience, vol. 5, no. 11, pp. 1226– 1235, 2002. [4] Wu and et al, “Isb recommendation on definitions of joint coordinate systems of various joints for the reporting of human joint motion – part ii: shoulder, elbow, wrist and hand,” Journal of biomechanics, vol. 38, no. 5, pp. 981–992, 2005. [5] M. Kalakrishnan, S. Chitta, E. Theodorou, P. Pastor, and S. Schaal, “STOMP: Stochastic trajectory optimization for motion planning,” in ICRA, 2011. [6] P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse reinforcement learning,” in ICML, 2004. [7] M. Kalakrishnan, P. Pastor, L. Righetti, and S. Schaal, “Learning objective functions for manipulation,” in ICRA, 2013.