Short Papers

Viewer
Transcript

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON ROBOTICS

1

Short Papers On Intensity-Based Nonmetric Visual Servoing Geraldo Silveira Abstract—This paper considers the problem of vision-based robot stabilization where the equilibrium state is defined via a reference image. Differently from most solutions, this study directly exploits the pixel intensities with no feature extraction or matching and uses only nonmetric information of the observed scene. Intensity-based techniques provide higher accuracy, whereas not requiring metric information increases their versatility. In this context, this paper further exploits the epipolar geometry and its intrinsic degeneracies. Such degeneracies always occur when that stabilization is sufficiently close to the equilibrium, regardless of the object shape. This remarkable fact allows the development of new vision-based control strategies with varying degrees of computational complexity and of prior knowledge. Importantly, they are arranged hierarchically from the simplest to the state-of-the-art ones, all in a unified framework. Three new local methods are then presented, and their closed-loop performances are experimentally assessed using both planar and nonplanar objects, under small and large displacements, simulating and employing a six-degree-offreedom robotic arm. Index Terms—Direct methods, epipolar geometry, image registration, vision-based control, vision-based estimation, visual servo control.

I. INTRODUCTION Visual servoing refers to the control of a robot from the feedback of images. Its typical application consists in stabilizing the robot at a pose defined via a reference image. The vast majority of solutions to this problem require some metric information to provide a provably stabilizing control law. This holds even for the so-called image-based visual servoing technique [1], which requires depth estimates. Indeed, there exist only a few works on nonmetric visual servoing, in spite of its increased versatility and robustness [2], most likely because of the difficulty of finding an interesting control error that is diffeomorphic to the camera pose and is regulated by a control law that does not depend on any metric knowledge. An early work on nonmetric visual navigation is described in [3], where a ground robot is used. In [4], four Degrees of Freedom (DOFs) of a holonomic robot are stabilized. The nonmetric methods presented in [5] and [6] take control of all six DOFs but consider only planar objects or pure rotations between the reference and initial frames. A general technique to stabilize all six DOFs is proposed in [7], which is called Direct Visual Servoing (DVS). It is general in the sense that it deals with both planar and nonplanar objects, under both translational and rotational displacements, all in a unified control law.

Manuscript received June 13, 2013; revised December 31, 2013; accepted April 1, 2014. This paper was recommended for publication by Associate Editor R. Eustice and Editor B. J. Nelson upon evaluation of the reviewers’ comments. The author is with the Center for Information Technology Renato Archer (CTI), Division of Robotics and Computer Vision (DRVC), CEP 13069-901 Campinas, Brazil (e-mail: [email protected]). This paper has supplementary downloadable material available at http://ieeexplore.ieee.org. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TRO.2014.2315712

Independent of whether metric information is used or not, techniques of vision-based estimation can generally be classified into feature- or intensity-based. The former requires the extraction of a sufficiently large set of primitives (e.g., points, lines, etc.), and their association within images. The corresponding features represent the input to the estimation procedure. Its nicest property in terms of estimation refers to its relatively large domain of convergence. On the other hand, it depends on some particular features, on an error-prone feature matching, and on special tuning schemes. Despite these drawbacks, the vast majority of existing visual servoing strategies are based on such class. Differently, there are no steps of feature extraction or matching within intensitybased techniques. These techniques directly exploit the intensity value of the pixels in order to estimate the needed parameters. Thus, they make use of raw and dense image data, wich allows to attain high levels of versatility and accuracy. Another advantage refers to their possibility of ensuring robustness to arbitrary illumination changes, even in color images [9] and in omnidirectional ones [10]. On the other hand, real-time algorithms based on this class rely on local optimization methods since global ones [11] are usually too time-consuming to be considered in such a setting. Intensity-based techniques have been applied for visual servoing six-DOF holonomic robots either requiring metric information (see, e.g., [12]–[15]), given a reference image or desired pose, or not requiring any metric knowledge (see, e.g., [7], [9], and [16]), given the former. Given numerous images of the workspace, they have been used for offline learning and afterward servoing those robots in [17] and [18]. This paper addresses intensity-based nonmetric general techniques to control six-DOF holonomic robots, given a reference image. The DVS technique falls into this class, whose control law uses optimal estimates. The main novelty of this study concerns the investigation of different suboptimal (although still general and effective) control techniques resulting from the simplification of that general one. This is possible because of the intrinsic degeneracies of the epipolar geometry. As will be shown, such degeneracies always occur when the visionbased stabilization is sufficiently close to the equilibrium, regardless of the object shape and of the estimation algorithm. This remarkable fact allows to develop a family of new visual servoing strategies with varying degrees of computational complexity and of prior knowledge, all in a unified framework. Three new local methods are presented, and their closed-loop performances are experimentally assessed in different scenarios. II. THEORETICAL BACKGROUND This section defines the notation used throughout this paper, as well denote the as recalls essential models and methods. Let v and v Euclidean norm and an estimate of the variable v, respectively. A superscripted asterisk, e.g., v ∗ , is used to indicate that v is defined relative to the reference frame F ∗ . The notations [w]× and vex([w]× ) represent, respectively, the antisymmetric matrix associated with the vector w = [w1 , w2 , w3 ]⊤ and its inverse mapping: ⎤ ⎡ ⎡ ⎤ w1 0 −w3 w2 [w]× = ⎣ w3 0 −w1 ⎦ , vex([w]× ) = ⎣ w2 ⎦ . (1) w3 −w2 w1 0 Finally, let us define the operator

Pa (W ) = W − W ⊤ ,

1552-3098 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

(2)

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2

IEEE TRANSACTIONS ON ROBOTICS

stated in Remark 2.1 and can be rewritten as G p ∝ [ I 0 ] Q [ p∗⊤ ρ∗ ]⊤ , with Q = 0

e 1

∈ SA(3),

(5)

where the Lie group SA(3) is homeomorphic to SL(3) × R3 . Hence, all global parameters are encoded in the matrix Q, i.e., the camera displacement and the projective basis. The object structure relative to this basis and to the reference frame is described by the vector of all parallaxes ρ∗ = [ρ∗1 , ρ∗2 , . . . , ρ∗n ]⊤ , which represent local parameters. B. Basic Estimation Framework

Fig. 1. Projective geometry. Given the 3-D point m ∗ , its projection p ∗ in the image I ∗ is related to its projection p in I by the point Gp ∗ and the point e (the epipole) multiplied by the projective parallax. For a 3-D point m ∗π that lies on π, which can be at infinity, the corresponding point of p ∗ is the point Gp ∗ . The latter also holds if e = 0, ∀m ∗ .

which is equal to two times the projection operator of a square matrix W ∈ R3 ×3 into its antisymmetric part. A. Projective Epipolar Geometry In the projective geometry, the general relation between corresponding points p ↔ p∗ in two perspective images (see Fig. 1) is given by ∗

∗

p ∝ Gp +ρ e

2

∈P ,

(3)

where the symbol “∝” denotes proportionality up to a nonzero scale factor, G ∈ SL(3) (the Lie group SL(3) is the special linear group of (3 × 3) matrices having determinant one) is a projective homography relative to a basis plane, e ∈ R3 denotes the epipole (strictly speaking, it is an element of the projective space P 2 ), and ρ∗ ∈ R is the projective parallax of the 3-D point whose projection in the reference image I ∗ is p∗ , relative to that plane. This parallax is proportional to the distance of that 3-D point to that plane and is inversely proportional to its depth. The epipole is proportional to the translation between F ∗ and F. The basic epipolar constraint that arises from the existence of two viewpoints can be obtained by multiplying both sides of (3) on the left by p⊤ [e]× , i.e., 0 = p⊤ F p∗ , where F ∝ [e]× G is the so-called fundamental matrix. Although, in general, this matrix is uniquely determined for a pair of images, there are degenerate cases where at least two distinct (i.e., linearly independent) fundamental matrices verify that constraint. Degenerate cases of particular importance for visual servoing are described next. Remark 2.1 (Degenerate Cases): Important degenerate cases of the epipolar constraint arise for 1) pure rotational displacements, regardless of the object shape and of the estimation algorithm. In this case, the translation is zero, and thus, e = 0, which always occurs at the equilibrium; 2) objects at infinity, regardless of their actual shape and of the estimation algorithm. In this case, ρ∗ = 0 for all of their points; 3) planar objects, regardless of the estimation algorithm. In this case, ρ∗ = 0 for all of their points. Let us note that all these degenerate cases verify ρ∗ e = 0

(4)

irrespective of the object point, i.e., ∀ρ∗i , i = 1, 2, . . . , n, where n is the number of pixels that describe the object. The general relation for a pair of corresponding points (3) encompasses all degenerate cases

The basic framework for intensity-based estimation is the direct image registration. Direct image registration consists in searching for the parameters that best transform the current image I such that each pixel intensity I(p) matches as close as possible the corresponding one in the reference image I ∗ (p∗ ). Hence, an appropriate transformation model I ′ (·) is needed. Let us consider a purely geometric transformation model for the sake of simplicity (see [19]) for a photogeometric one:

≥ 0, (6) I ′ x(z), p∗ = I(p) = I w(x(z), p∗ ) where a warping operator w : SA(3) × Rn × P 2 → P 2 can be defined from (5), with variables x = {Q, ρ∗ } and its respective parameterization1 z = [υ ⊤ γ ⊤ ]⊤ ∈ Rm , i.e., x = x(z) = {Q(υ), ρ∗ (γ)}. A typical direct image registration system can then be stated as the following nonlinear optimization problem [19]: minm

z ∈R

n

2

2 1 ′ 1 I x(z), p∗i − I ∗ (p∗i ) = d x(z) , 2 i= 1 2

(7)

which seeks the parameters z to describe the variables x that minimizes the norm of the vector of intensity differences

⎡ ′ ⎤ I x(z), p∗1 − I ∗ (p∗1 ) ′ ∗ ∗ ∗ I x(z), p2 − I (p2 ) ⎥

⎢ ⎥ ⎢ (8) d x(z) = ⎢ ⎥ ∈ Rn . .. ⎦ ⎣ .

I ′ x(z), p∗n − I ∗ (p∗n )

The nonlinear optimization problem in (7) can be solved by standard iterative methods, e.g., Gauss–Newton. These methods are based on an approximation of the cost function in the Taylor series. Thus, their convergence properties strongly depend on such approximation and on the initial guess. These methods consist of the following steps (a more in-depth treatment can be found in, e.g., [21]). Given an initial guess 0 sufficiently close to the solution ( x0 on the transformation variables x might be the identity element), its increment zk ∈ Rm is computed at iteration k by xk ), (9) zk = −αL+ x d( with an optimal α > 0 (α ≈ 1 around a local minimum) and, for standard methods −1 ⊤ (10) L+ x = H x Jx ,

where Jx ∈ Rn ×m denotes the Jacobian matrix2 of (8) with respect x ∈ Rm ×m is a positive-definite matrix that suitably to x at z, and H

1 The natural local parameterization of Q ∈ SA(3) is through the related Lie algebra sa(3) [20], whose coordinates of an element of this tangent space are denoted υ ∈ R8 + 3 , i.e., let Q = Q(υ). In turn, the object structure ρ∗ must also be regularized. This can be performed by choosing a real-valued function, and the number of parameters γ such that dim(γ) ≪ n, that adequately describe its surface, i.e., let ρ∗ = ρ∗ (γ). 2 The analytical determination of the Jacobian matrix depends on various factors, such as the chosen parameterization z of each element of x, and even on the chosen approximation method of the cost function. A compendium of all possible analytical forms for such matrix is thus out of the scope of this paper. In any case, it can also be calculated via numerical differentiation.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON ROBOTICS

3

approximates3 the Hessian matrix of the cost function. The increment k via (9) updates the variable x k , k + 1 = x(zk ) ◦ x x

(11)

and the process is iterated until convergence. The symbol “◦” denotes the composition operator associated with the involved group. For example, if a matrix Lie group is involved, the operation to be performed is the matrix multiplication (see [20] for details). Finally, the convergence can be established e.g., when x(zk ) is sufficiently close to the identity element of the involved group, i.e., when zk < ǫe for some small ǫe > 0, or when a maximum number of iterations is reached. Remark 2.2: Although every optimization method (including the different choices for the parameterization and approximation and, thus, for the Jacobian, for the Hessian, etc.) impacts differently on the overall servoing performance, the general ideas of this paper hold independently of the chosen one, as will be shown next. III. PROPOSED TECHNIQUES This section proposes new nonmetric techniques for servoing from pixel intensities directly. Indeed, the estimation system is built upon the vector of intensity differences d ∈ Rn and is constructed using only nonmetric parameters. These new methods are partially inspired by the DVS technique, together with the intrinsic degeneracies of the epipolar geometry. In particular, it is shown here how to benefit from such degeneracies to develop more computationally efficient direct methods. In the sequel, consider a camera-mounted six-DOF holonomic robot observing a motionless rigid object of unknown shape. Let the control inputs be denoted ν ∈ R3 and ω ∈ R3 , respectively, the translational and rotational velocities of the camera, whose intrinsic parameters are gathered in K ∈ R3 ×3 . These parameters (or at least their estimates) are always needed to control all six DOFs. A. Full Optimization This intensity-based nonmetric visual servoing technique relies on the simultaneous, Full Optimization (FO) of all geometric variables. The control system thus uses optimal estimates at all times. It is, therefore, the method of choice when no prior knowledge of the system is available. On the other hand, its estimation procedure can be computationally demanding. This method briefly formulated as follows. Given an initial can ∗be 0, ρ 0 , the increments υ k ∈ R8 + 3 and γ k ∈ Rd im (γ ) 0 = Q guess x are computed at iteration k for the current image by

⊤ υ⊤ k γk

⊤

+ = −α LQ Lρ d( xk ),

(12)

where LQ ∈ Rn ×1 1 , Lρ ∈ Rn ×d im (γ ) , and n is the number of pixels considered for exploitation. Those increments update the variables via k, k + 1 = Q(υ k ) ◦ Q Q

∗k + 1 ρ

∗

= ρ (γ k ) ◦

∗k , ρ

(13) (14)

⊤ and the process is iterated until convergence, e.g., [υ ⊤ k γ k ] < ǫe .

3 Examples

of this approximation are 1) for the steepest descent, it is simply

x = I; 2) for the Gauss–Newton method, it is H x = J⊤x Jx , where [·]+ then H

corresponds to the pseudoinverse operation on rectangular matrices; and 3) for x = J⊤x Jx + σD, where σ > 0 and the Levenberg–Marquardt method, it is H m ×m D∈R is a diagonal matrix, e.g., D = I or D = diag J ⊤ x Jx .

As for the control aspect, let us define the translational and rotational nonmetric control errors εFν ∈ R3 and εFω ∈ R3 as

F − I [p∗⊤ ρ∗ ]⊤ [K−1 0] Q εν (15) =

, [K⊤ 0]⊤ εFω vex Pa [K−1 0] Q

∈ SA(3) is the optimal matrix, and ρ∗ ∈ R is the optimal where Q parallax of the chosen control point p∗ ∈ P 2 . Both estimates can be used as initial guesses for the next image. Finally, let us define the control law as F ν εν , (16) =λ εFω ω with λ > 0. Convergence of the servo can be established when, e.g., [εFν ⊤ εFω ⊤ ] < ǫc for some small ǫc > 0. Proposition 3.1: This FO technique corresponds to a reduced version of the DVS [7]. The equilibrium εF = [εFν ⊤ εFω ⊤ ] = 0 of the closedloop system is, hence, locally exponentially stable. Proof: The proof is given in Appendix A. B. Partial Optimization Within this visual servoing technique, the estimation strategy performs the optimization of only a subset of the variables. The idea consists in optimizing only the parameters related to the camera displacement, i.e., Q. Those related to the structure, i.e., ρ∗ , is provided by the user and is left unoptimized. This scheme is partially inspired by the degenerate cases (see Remark 2.1), e.g., if the servo is around the equilibrium or if the object is near planar, then ρ∗i e → 0, ∀i ∈ {1, 2, . . . , n}. It is, thus, less computationally expensive than the previous method, but relies on these particular situations. 0, ρ ∗0 , 0 = Q This method can be stated as follows. Given a guess x the increment υ k ∈ R8 + 3 is calculated at iteration k for the current image via xk ). (17) υ k = −αL+ Q d( k via This increment is used to update the estimate Q k, k + 1 = Q(υ k ) ◦ Q Q

(18)

and the process is iterated until convergence, e.g., υ k < ǫe , whereas ∗0 is not adjusted. the vector of parallaxes ρ As for the control aspect, let us define the respective translational 3 Q 3 and rotational nonmetric control errors εQ ν ∈ R and εω ∈ R as

Q − I [p∗⊤ ρ∗ ]⊤ [K−1 0] Q εν 0 = (19)

, [K⊤ 0]⊤ εQ vex Pa [K−1 0] Q ω

∈ SA(3) is the obtained estimate, which can be used as initial where Q guess for the next image, and ρ∗0 ∈ R is the parallax of the chosen control point. Finally, the respective control law can be defined as Q ν εν =λ . (20) εQ ω ω ⊤ Q⊤ εω ] < ǫc . Convergence of the servo is reached when, e.g., [εQ ν Corollary 3.1: The technique presented in [6] corresponds to a particular case of the partial optimization (PO). Indeed, that case considers only planar objects or pure rotations, i.e., ρ∗i e = 0 ∀i, leading to x = G [see (5)]. In this case, υ k ∈ R8 and LQ = LG ∈ Rn ×8 in (17), and the ⊤ Q⊤ εω ] = 0 of the closed-loop system is proven equilibrium εQ = [εQ ν to be locally asymptotically stable.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 4

IEEE TRANSACTIONS ON ROBOTICS

C. Poor Man’s Method This section presents the simplest technique in terms of computational complexity. Its low requirements are mainly because of the applied estimation strategy: Only one iteration of (17) is performed, as its computation can be relatively costly, and, as in the previous method, only involves the parameters related to the camera displacement. Those parameters associated with the object structure are provided by the user and are not adjusted. 0 = can be stated as follows. Given a guess on x This ∗method 0, ρ 0 , the increment υ 0 ∈ R8 + 3 is computed only once for every Q captured image via x0 ). (21) υ 0 = −αL+ Q d( 0 is also performed only once for every image: The update of Q 1 = Q(υ 0 ) ◦ Q 0, Q

(22)

∗0 which can be used as guess for the next image. Again, the variable ρ is provided by the user and is not updated. As for the poor man’s translational and rotational control errors, respectively, ε1ν ∈ R3 and ε1ω ∈ R3 , let us define them as

1 1 − I [p∗⊤ ρ∗ ]⊤ [K−1 0] Q εν 0 = (23)

. 1 [K⊤ 0]⊤ ε1ω vex Pa [K−1 0] Q

Finally, its respective control law can be defined as 1 ν εν . =λ ε1ω ω

Fig. 2. Setup of the task for a camera translation of about 50% of the depths. (Top) Configurations of the camera frame relative to the object, seen from different viewpoints. (Bottom) Reference and initial images, respectively.

(24)

Convergence of the servoing task can be established when, e.g., [ε1ν ⊤ ε1ω⊤ ] < ǫc . The technique presented in [13] has some similarities to the Poor Man’s method (PM), e.g., both methods perform a single-step descent with fixed (and guessed) structure parameters while in closed-loop operation. As for the main differences, the proposed method does not perform any step in open loop, allows to integrate motion in (22), which leads to an estimate closer to the correct one, and does not require any metric information. In spite of these improvements, the PM still presents weak convergence properties, as stated next in Proposition 3.2. As a consequence, its success still highly depends on the initial conditions and on prior knowledge of the system. Proposition 3.2: The monotone convergence domain of the PM is included in the monotone convergence domain of the FO. Proof: The proof is given in Appendix B. IV. SIMULATION RESULTS This section presents simulation results obtained by the proposed visual servoing techniques on different scenarios. The considered object is a sphere whose centroid is chosen as the control point. A radial basis function with a total of 25 centers is used to model that object. The focal lengths of the camera are of 500 pixels with no skew, and its principal point is the middle of the image. The sampling period for all cases is of 30 ms. The convergence criterion for the servoing and (when applicable) the estimation uses ǫc = 10−5 and ǫe = 10−7 , respectively. All pixel intensities within a region of interest of size 200 × 200 pixels are exploited. Although the entire image could be used, the number of pixels to be processed is imposed since computational constraints exist in realistic situations. The nonmetric parameters needed by the control laws are estimated using a constant α = 1, the Jacobians as described in [19] and references therein, and [·]+ as the classical pseudoinverse operation. In this case, the time complexity per iteration grows approximately linearly with respect to the number of parameters and of pixels (see [19]).

Two scenarios have been set up to assess the performance of the three new visual servoing strategies. The scenarios consist of a large and a small camera displacement between the initial and the reference poses. This quantity has been varied to show the impact of such an important aspect and to allow for the consideration of actual nonplanar objects (i.e., ∃i : ρ∗i e = 0) on visual servoing systems. Indeed, as that amount of displacement approaches zero, the entire scene (regardless of its shape) behaves as the plane at infinity (see Remark 2.1). In both scenarios, the control gain is initially set to λ = 1, and the initial guesses for the very first image are the correct values corrupted with independent Gaussian noise on the parallaxes. Let us remark that they can be estimated, e.g., by a global optimization method [11] with no feature extraction or matching. Although those initial conditions can generally be given to the FO much further away from the correct values than to the other ones, for comparison purposes, those guesses are provided to all techniques. For subsequent images, the estimates obtained by each method for the current image are used as initial guesses for the next image. The results for all strategies are described next.

A. Large Displacement This first scenario consists of a camera displacement between the reference and the initial frames relatively large with respect to the scene depths, whose median is 0.72 m away from the reference camera pose. Indeed, a translation of [0.1, 0.1, 0.3] m (norm: 0.33 m) and a rotation of π/6 × [0.1, 0.3, −1] rad (norm: 31.46◦ ), both relative to the initial frame, have been carried out. Hence, this scenario comprises a translational displacement of approximately 50% of the scene depths. Those are obviously not available for the algorithms. See Fig. 2 for this setup. The stabilization task converges for all techniques, except for the PM with λ = 1 (but converges with λ = 0.06). The evolution of the control errors during the visual servo for all strategies is shown in Fig. 3. The behavior of each technique is described as follows.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON ROBOTICS

Fig. 3. Evolution of the translational and rotational control errors, respectively, under relatively large displacement and λ = 1. (Top) Results for FO. (Middle) Results for PO. (Bottom) Results for PM which fails but converges for λ = 0.06 (not shown).

1) Poor Man’s Method (PM): In this scenario, the PM does not converge for any λ > 0.1 but works with λ = 0.06. This gain tuning is indeed crucial for task convergence using this method. This comes directly from its construction: It performs a single-step descent with fixed parameters. The obtained estimates are then unlikely to be the correct ones, especially for such large perturbations. Moreover, those estimates may change rapidly from image to image. The gain has, thus, to be reduced to avoid moving the estimates too far from the solution. Using λ = 0.06, the task takes a lot of time (images) to stabilize, not only because the gain had to be small but because of the shaking “robot” motions as well. This technique takes 6028 images to converge for such a scenario. 2) Partial Optimization (PO): Using this technique, the servoing task converges for both control gains λ = 1 and λ = 0.06, and the “robot” performs less undesirable motions than in the previous case. Nevertheless, some of those can still be observed for λ = 1. This does not occur for λ = 0.06, whose performance is similar to the full optimization method below. Using λ = 1, this technique takes 320 images to converge, whereas for λ = 0.06, convergence is established after 5215 images for such a setup. 3) Full Optimization (FO): Using this technique, the servo is successfully completed for both control gains λ = 1 and λ = 0.06. Further, the computed velocities are smooth, as well as the convergence is faster than previous techniques for both control gains. Using λ = 1, the FO takes 311 images to converge, whereas for λ = 0.06, convergence is established after 4883 images for such a scenario. These results show that not only the rate but also the domain of convergence are increasingly augmented for the PM, PO, and FO methods

5

Fig. 4. Evolution of the translational and rotational control errors, respectively, under relatively small displacement and λ = 1. (Top) Results for FO. (Middle) Results for PO. (Bottom) Results for PM, where a peak can be seen.

(see also Proposition 3.2). This occurs at the expense of increasingly requiring more computational power to estimate more parameters and/or to perform more iterations per image.

B. Small Displacement Let the camera displacement between the reference and the initial frames of the previous section be simply divided by a factor of 10. Therefore, this scenario comprises a translational displacement of approximately 5% of the depths. In this setup, all strategies converge using λ = 1. See Fig. 4 for the evolution of their control errors during the servo. The behavior of each technique is described next. 1) Poor Man’s Method: In this scenario the PM converges even for λ = 1. However, the “robot” still performs an undesirable motion at the beginning of the task (see the peak at the bottom left of Fig. 4). This technique takes 256 images to converge for such a case. This result together with the convergence for λ = 0.06 under relatively large camera displacement suggest that the PM requires the application of a variable gain to improve its convergence properties. 2) Partial Optimization: Using this technique, the “robot” does not perform undesirable motions as in the previous case. In fact, its behavior is comparable with the state-of-the-art full optimization-based method, while it is less computationally intensive. This is expected as the entire scene becomes the plane at infinity (i.e., ρ∗i e → 0, ∀i). Convergence is established after 215 images for this scenario. 3) Full Optimization: The visual servo has successfully performed using this technique, as expected. In this case, the servoing task is completed after only 180 images.

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 6

IEEE TRANSACTIONS ON ROBOTICS

Fig. 6. Evolution of the control errors and Cartesian displacement, respectively, for the setup shown in Fig. 5. (Top) Results for PO, which converges at image #879 using λ = 0.3. (Bottom) Results for the PM, which only converges at image #3990 using a small gain of λ = 0.06. Fig. 5. Experimental setup with a camera translation of about 1.5 times the object depths. Both robot and camera are only coarsely calibrated. (Top) Reference and initial poses, respectively. (Bottom) Reference and initial images, respectively. All pixels within the white square are exploited.

V. EXPERIMENTAL RESULTS This section reports the results using a camera-mounted six-DOF robotic arm. These experiments use a planar object, which is placed at about 0.3 m away from the reference camera pose. The initial robot pose relative to the reference one is of [−0.08, 0.38, −0.21] m (norm: 0.44 m) and a rotation of about [−0.28, −0.28, −0.28] rad (norm: 28.13◦ ). Hence, this scenario comprises a translational displacement of approximately 150% of the scene depths. Obviously, this information is not made available for the proposed algorithms. To show the robustness of the techniques, a coarsely calibrated, conventional webcam is used. Its focal lengths are simply set to 400 pixels with no skew, and its principal point as the middle of the image, which has 320 × 240 pixels and is captured at 30 Hz. Furthermore, the hand/eye calibration is also very poor: Only a tilt angle of 20◦ relative to the end effector is informed. The reference template has 70 × 70 pixels, and its center is chosen as the control point. Although the entire image could be exploited, that size is used due to our system’s available computational power. As in the previous section, the needed nonmetric parameters (i.e., the homography in this case) are estimated using a constant α = 1, the Jacobian as described in [19], and [·]+ as the classical pseudoinverse operation. The initial guess for the first image is provided by a global optimization routine. The experimental setup is shown in Fig. 5. In spite of all those perturbations and miscalibrations, the servo is successfully performed with final Cartesian errors of 0.2 mm in translation and of 0.2◦ in rotation, for a stop condition of ǫc = 10−4 . Some results are shown in Fig. 6 and are briefly presented as follows. 1) Poor Man’s Method: In this scenario of relatively large displacement, the PM only converges using a small control gain of λ = 0.06. In this case, the positioning takes a total of 3990 images. 2) Partial Optimization: The PO technique converges using a higher control gain λ = 0.3. In this case, the servo takes 879 images. This result is also provided as supplementary multimedia material. 3) Full Optimization: Given that a planar target has been used in this setup, then ρ∗ = 0 for all their points. Therefore, the FO technique is equivalent to the PO for such a scenario.

VI. CONCLUSION This paper has investigated different techniques for intensity-based nonmetric visual servoing. It has shown that they can be effective even if the control parameters are not optimal. This especially holds around the equilibrium state, regardless of the object shape and of the estimation algorithm. The reasons are twofold: That state is associated with the intrinsic degeneracies of the epipolar geometry and the pixel intensities are directly exploited without extracting or matching any image features. While the former reason permits for gracefully simplifying the problem, the latter ensures that the task is completed if and only if the current frame converges to the reference one. This framework has indeed allowed to develop a family of methods, which are arranged hierarchically in terms of computational power and prior knowledge. As examples, three new local techniques have been described in this paper. The proposed suboptimal methods are of special interest when the available computing power is low compared with the amount of visual data or if the camera displacement between the current and reference frames is small relative to the scene depths. If no prior information is available, optimal methods are still the method of choice. Future work can be devoted to the categorization of the all possible subtypes of methods within the proposed unified framework, including particular displacements, different objects, initial conditions, optimization procedures, etc. This raises other research topics, such as the definition of criteria for online selection and switching among the methods. It is, thus, believed that this study opens up a new research direction within visual servoing. Finally, another direction consists in extending the proposed framework for the control of other mechanical systems, such as nonholonomic and underactuated robots.

APPENDIX A PROOF OF PROPOSITION 3.1 The translational and rotational nonmetric control errors proposed in the DVS technique, i.e., εν ∈ R3 and εω ∈ R3 , are given as

εν εω

=

(H − I) m∗′ + ρ∗ e′ , ϑµ

(25)

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON ROBOTICS

7

and

where H = K−1 G K; e′ = K−1 e; m∗′ = K−1 p∗ ∗

∗

(26) 2

and ρ ∈ R is the parallax of the chosen control point p ∈ P . The rotational error εω in (25) is computed from H ∈ R3 ×3 via

1 vex H − H⊤ , 2

real arcsin(r) ,

ϑ= π − real arcsin(r) , r=

µ=

(27) if tr(H) ≥ 1 , otherwise

r , r

(28) (29)

(30)

With respect to the rotational control error, by substituting into (25) the matrix Q, the operator defined in (2), and r ∈ R3 from (27) and (29), one has (31) εFω = 2r = 2ϑ−1 rεω . Around the equilibrium, one can also show that εFω ≈ 2εω as ϑ−1 r ≈ 1. The control error εFω can then be viewed as a reduced (local) version of εω and is used in this study to provide for a unified framework. The stability analysis is, hence, similar.

APPENDIX B PROOF OF PROPOSITION 3.2 Let β1 , βF , βe ∈ [0, 1), δ1 , δF > 0, and t ≥ 0 index the images. Provided the servo is stable, the monotone convergence domain of the PM is defined by x1 ) − εt + 1 (x) ≤ β1 εt ( x1 ) − εt (x), εt + 1 (

(32)

x1 ) in the domain so that for all εt ( x1 ) − εt (x) < δ1 , D1 = εt (

(33)

the poor man’s control error is monotonically decreasing, i.e., x1 ) ≤ εt ( x1 ) since 0 ≤ β1 < 1. Similarly, the monotone εt + 1 ( convergence domain of the FO is defined by xF ) − εt + 1 (x) ≤ βF εt ( xF ) − εt (x), εt + 1 (

(34)

so that the respective control error is monotonically decreasing for all xF ) in the domain εt ( xF ) − εt (x) < δF . DF = εt (

(35)

Although it is not straightforward to rigorously determine the “size” of the convergence domains D1 and DF separately, it is less difficult to compare them relatively with each other. 0 , if the iterative estimation procedure conGiven the initial guess x verges toward the solution x, then xk − x ≤ βe2 xk −1 − x ≤ · · · xk + 1 − x ≤ βe

x1 − x ≤ βek + 1 x0 − x ≤ βek

as

k → ∞.

(36)

(37)

Provided the unified servoing framework, applying (37) and (36) in the control error for an image at t + 1, and then using (32), yield εt + 1 ( xF ) − εt + 1 (x) ≤ εt + 1 ( x1 ) − εt + 1 (x) ≤ β1 εt ( x1 ) − εt (x).

where tr(·) denotes the trace of a matrix. If r = 0, then µ is not determined and, thus, can be chosen arbitrarily (e.g., µ = [0, 0, 1]⊤ ). The demonstration that (15) is a slightly modified version of (25) is straightforward. Let us start with the translational control error. Given Q ∈ SA(3) in (5), by substituting (26) into (25), one obtains εFν = εν .

k → x F x

(38) (39)

This result shows that if the PM converges, then the FO also converges. As a remark, such a result is in fact very conservative as only a subset of the parameters is updated in the PM. REFERENCES [1] F. Chaumette and S. Hutchinson, “Visual servo control part I: Basic approaches,” IEEE Robot. Autom. Mag., vol. 13, no. 4, pp. 82–90, Dec. 2006. [2] L. Thaler and M. A. Goodale, “Beyond distance and direction: The brain represents target locations non-metrically,” J. Vis., vol. 10, no. 3, pp. 1–27, 2010. [3] P. A. Beardsley, I. D. Reid, A. Zisserman, and D. W. Murray, “Active visual navigation using non-metric structure,” in Proc. IEEE Int. Conf. Comput. Vis., 1995, pp. 58–64. [4] V. Kallem, M. Dewan, J. Swensen, G. Hager, and N. Cowan, “Kernelbased visual servoing,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., San Diego, CA, USA, 2007, pp. 1975–1980. [5] M. Vargas and E. Malis, “Visual servoing based on an analytical homography decomposition,” in Proc. Joint IEEE Conf. Decision Control Eur. Control Conf., 2005, pp. 5379–5384. [6] S. Benhimane and E. Malis, “Homography-based 2D visual servoing,” in Proc. IEEE Int. Conf. Robot. Autom., Orlando, FL, USA, 2006, pp. 2397– 2402. [7] G. Silveira and E. Malis, “Direct Visual Servoing: Vision-based estimation and control using only nonmetric information,” IEEE Trans. Robot., vol. 28, no. 4, pp. 974–980, Aug. 2012. [8] M. Irani and P. Anandan, “All about direct methods,” in Proc. Workshop Vis. Algorithms, Theory Pract., 1999. [9] G. Silveira and E. Malis, “Visual servoing from robust direct color image registration,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., St. Louis, MO, USA, 2009, pp. 5450–5455. [10] G. Silveira, “Photogeometric direct visual tracking for central omnidirectional cameras,” J. Math. Imag. Vis., vol. 48, no. 1, pp. 72–82, 2014. [11] R. Horst and P. M. Pardalos, Eds., Handbook of Global Optimization. Norwell, MA, USA: Kluwer, 1995. [12] G. Silveira, E. Malis, and P. Rives, “The efficient E-3D visual servoing,” Int. J. Optomechatron., vol. 2, no. 3, pp. 166–184, 2008. [13] C. Collewet, E. Marchand, and F. Chaumette, “Visual servoing set free from image processing,” in Proc. IEEE Int. Conf. Robot. Autom., Pasadena, CA, USA, 2008, pp. 1050–4729. [14] S. Han, A. Censi, A. D. Straw, and R. M. Murray, “A bio-plausible design for visual pose stabilization,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2010, pp. 5679–5686. [15] A. Dame and E. Marchant, “Mutual information-based visual servoing,” IEEE Trans. Robot., vol. 27, no. 5, pp. 958–969, Oct. 2011. [16] G. Silveira and E. Malis, “Direct visual servoing with respect to rigid objects,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., San Diego, CA, USA, 2007, pp. 1963–1968. [17] S. K. Nayar, S. A. Nene, and H. Murase, “Subspace methods for robot vision,” IEEE Trans. Robot., vol. 12, no. 5, pp. 750–758, Oct. 1996. [18] K. Deguchi, “A direct interpretation of dynamic images with camera and object motions for vision guided robot control,” Int. J. Comput. Vis., vol. 37, no. 1, pp. 7–20, 2000. [19] G. Silveira and E. Malis, “Unified direct visual tracking of rigid and deformable surfaces under generic illumination changes in grayscale and color images,” Int. J. Comput. Vis., vol. 89, no. 1, pp. 84–105, 2010. [20] F. W. Warner, Foundations of Differential Manifolds and Lie Groups. New York, NY, USA: Springer-Verlag, 1987. [21] D. G. Luenberger, Linear and Nonlinear Programming. Reading, MA, USA: Addison-Wesley, 1984.

small and large displacements, simulating and employing a six-degree-of- freedom robotic arm. Index TermsâDirect methods, epipolar geometry, image registration, vision-based control, vision-based estimation, visual servo control. I. INTRODUCTION. Visual servoing refers to the control of a robot from the feedback.

Download PDF

593KB Sizes 0 Downloads 142 Views

Report

Short Papers

Recommend Documents