Submission to International Journal of Control, Automation, and Systems

1

Simplified Epipolar Geometry for Real-time Monocular Visual Odometry on Roads Sunglok Choi, Jaehyun Park, and Wonpil Yu Abstract: Simplified epipolar geometry is proposed in this paper to accelerate monocular visual odometry for ground vehicles. The vehicles on roads or indoors exhibit planar motion locally, and such prior has been effectively utilized to speed up monocular visual odometry. However, we observed that the previous planar motion models frequently failed because their over-simplification did not accept small non-planar motion caused by abrupt bumps or camera vibration. In this paper, simplified motion models are relaxed and their corresponding algorithms for relative pose estimation are derived. Effectiveness of the proposed algorithms is demonstrated by two types of experiments: relative pose estimation with synthetic data, and monocular visual odometry with real image sequences. In the first experiment, the proposed approximated 5-point algorithm provided similar (sometimes better) accuracy with the original 5-point algorithm, but spent almost 15 times less computing time. In the second experiment, we also observed that monocular visual odometry with our algorithm had almost 9 times faster outlier rejection than previous approaches. Keywords: epipolar geometry, relative pose estimation, planar motion, visual odometry, monocular camera

1.

INTRODUCTION

Visual odometry have attracted more attention due to its reliable performance and various applications. It provides more accurate trajectory compared to wheel odometry and inertia measurement units (IMU). Wheel odometry suffers from wheel slippage, and IMU requires integrating linear acceleration and angular velocity over time. Visual odometry directly measures camera motion including slippage and generates trajectory by accumulating displacement between adjacent frames. In contrast to visual SLAM, it is computationally light and scalable to the number of landmarks and traversal distance. Basically, it only relies on feature correspondence between adjacent images or locally built feature maps. Visual odometry is much preferred when a vehicle moves along a long trajectory without loops (e.g. daily driving [1] and Mars rovers [2]). Even though stereo visual odometry is popular and provides enough accuracy with little computation, monocular visual odometry needs to be investigated more. First of all, there are many devices equipped with a single camera, Manuscript received April 17, 2014; revised August 14, 2014; accepted December 22, 2014. This work was supported by two R&D programs of the Korea Ministry of Knowledge and Economy (MKE), 3D Perception and Robot Navigation Technology for Unstructured Environments (M002300090) and Development of Low-cost Autonomous Navigation Systems for a Robot Vehicle in Urban Environment (10035354). Sunglok Choi, Jaehyun Park, and Wonpil Yu are with Intelligent Cognitive Technology Research Department, Electronics and Telecommunications Research Institute (ETRI), Daejeon, South Korea (e-mail: {sunglok, jaehyun, ywp}@etri.re.kr).

8-point Algo. [3]

Feature Extraction & Matching 74

5-point Algo. [9]

74

Outlier Rejection Scale Estimation (0.2) 54 Pose Refinement (5.6) 58

Simplified Epipolar Geometry Approx. 5-point Algo.

74

3-point Algo. (R’2t’2)

74

6.7 2.6

100 (unit: [msec])

Fig 1: Each computing time of monocular visual odometry was measured on the 7th image sequence in KITTI odometry dataset [3]. They were based on monocular version of LIBVISO2 and performed in same configuration described in Section 5. (a single thread in Intel Core i7 2.8GHz).

for example vehicular video recorders and cellular phones. Its hardware simplicity also enables smaller size and economic advantages. However, monocular visual odometry intrinsically suffers from two major problems, scale ambiguity and high computation burden. In this paper, we focus on the second problem, high computation burden. In case of the well-known open-source library, LIBVISO2 [4], its stereo version spends nearly 0.01 seconds per a frame, but its monocular version needs roughly 10 times more computing time. It devotes nearly half of its computation to reject incorrect feature correspondences regardless of camera motion, so-called outliers. Epipolar geometry is a key criterion to remove such outliers in monocular visual odometry, and 5-point or 8-point algorithms are common tools to reject them. Since two algorithms

2

Submission to International Journal of Control, Automation, and Systems

need more points than stereo odometry (3 points), they spend much more computing time in rejecting the outliers. Moreover, 5-point algorithm involves quite complex computation to solve high-order polynomials even though it provides more accurate results than 8-point algorithm. The high computation problem is described in Figure 1. The original LIBVISO2 and its variant with 5-point algorithm did not satisfy 10 fps constraint comes from KITTI odometry dataset. Epipolar geometry is briefly reviewed in Section 2.1and the computation issue in outlier rejection is explained in Section 2.2in detail. In this paper, we simplify epipolar geometry under assumption of planar motion and derive algorithms to estimate relative pose between two adjacent images. Many ground vehicles are working on flat or locally flat surfaces (e.g. roads, rails, and indoors) and show approximately planar motion. Based on this reasonable prior, planar motion, we can significantly accelerate relative pose estimation and also outlier rejection. In addition to previous 1-point [5] and 2-point [6] simplification, we introduce 3-point and approximated 5-point algorithms. The previous works are shortly reviewed in Section 3.2,and two types of proposed algorithms are described in Section 3.3 and 3.4,respectively. We also evaluated their accuracy and computing time using synthetic data and real image sequences. Two experiments verified that proposed algorithms improved not only computing time but also accuracy. Our benchmark with synthetic data focused on performance of epipolar geometry estimation itself, which is described in Section 4.Our second experiments with real image sequences aims at monocular visual odometry including the proposed algorithms as a part of outlier rejection. Section 5contains our experimental configuration and results in detail. 2.

BACKGROUND

2.1. Epipolar Geometry Epipolar geometry (a.k.a. two-view geometry) describes geometric relation of two images incorporated with their common observation. When a point in a 3D space is observed on two images, its two projections on each normalized image plane hold a transformation as x0 = Rx + t ,

(1)

where x and x0 are the projections on the 1st and 2nd normalized image planes, respectively, and R and t are rotation and translation to localize the 1st camera with respect to the 2nd camera. Two observations, x and x0 , are written as 3-dimensional vectors in homogeneous coordinate, and R is a 3-by-3 rotation matrix, and t is also a 3-dimensional vector (up to scale). Reversely, rotational and translational motion from 1st to 2nd camera are derived as R0 = R> t0 = −R> t .

(2) (3)

Due to x0 · (t × x0 ) = 0, an essential matrix E and its key constraint, so-called epipolar constraint, is derived as  x0> Ex = 0 E = [t]× R , (4) where [t]× is a skew-symmetric matrix to convert cross product to matrix multiplication. An essential matrix is estimated from minimally five points [7–9], so-called 5point algorithm, whose process involves calculating coefficients and finding roots of 10th-order polynomials. Rotation and translation are easily decomposed from an essential matrix through singular value decomposition (SVD) [10]. Two uncalibrated observations, xˆ and xˆ 0 , on images (unit: pixel) have a relationship with their calibrated projections, x and x0 , on their normalized image plane (unit: meter) as follows: xˆ = Kx 0

0 0

xˆ = K x ,

(5) (6)

where K and K0 are 3-by-3 camera calibration matrices which contain intrinsic parameters of each camera. Similarly, from uncalibrated observations, a fundamental matrix F is defined as  xˆ 0> Fˆx = 0 E = K0> FK , (7) To estimate a fundamental matrix, at least seven points are necessary, so-called 7-point algorithm, but normalized 8-point algorithm is popularly utilized because of its simplicity and robustness against noise. Sometimes locally extracted features and their matching are regardless of camera motion because of moving objects and imperfect matching (due to repetitive or monotonous patterns). Epipolar constraint (4) is a quite effective tool to reject such outliers. 2.2. RANSAC and Its Computing Time RANSAC is the most popular approach to reject outliers. RANSAC is an iterative algorithm whose each iteration is composed of hypothesis generation and its evaluation. A hypothesis (e.g. an essential matrix) is calculated by random samples from data. If all selected data are inliers, their hypothesis may be close to the ground truth. In contrast, if some of selected data are outliers, their hypothesis will be far from the truth. Therefore, in RANSAC, the number of iterations should be enough to select samples which are all inliers during its overall iterations. From this requirement, the number of necessary iterations T [11] is probabilistically derived as  log 1 − Ps  T= (8) log 1 − γ M where γ is the ratio of inliers to whole data, and M is the number of sampled data for each hypothesis, and Ps

Submission to International Journal of Control, Automation, and Systems

3 θy

6

The Necessary Iteration of RANSAC (T)

10

5

10

γ = 0.75 γ = 0.50 γ = 0.25 2

4

10

ρ

Z

φxz

3

10

2

10

X

1

(a) Top-view (XZ plane) 1

10

2

0

10

1

2

3 4 5 6 The number of Samples (M)

7

8

Fig 2: The necessary number of iteration T is exponentially increased with respect to the number of samples M for hypothesis generation. We used the success probability as Ps = 0.999. is probability to succeed in sampling only inliers at least once. Figure 2 presents the number of required iterations with respect to the number of sampled data M and inlier ratio γ. It is clear that larger number of sampled data induces exponentially more number of iterations. In RANSAC, its computing time transac [11], including hypothesis evaluation, is calculated as

ρ 1

2

θz X

Y

(b) Front-view (XY plane)

θx

ρ φyz

1

Y

Z

(c) Side-view (YZ plane)

Fig 3: Two camera locations are described in (a) top-view, (b) front-view, and (c) side-view perspectives. Full degree-of-freedom (DoF) motion is represented by 6 variables in the spherical coordinate (green: rotation, red: translation).

where ρ is a scale factor which means the amount of motion, and φxz and φyz are direction of translation on a XZ and YZ plane, respectively. Figure 3 describes them more transac = T (tg + Nte ) , (9) clearly in three different viewpoints. We extracted motion from true trajectories of the given public datasets and where N is the number of whole data, and tg and te is comanalyzed their range and distribution. Table 1 shows the puting time to generate a hypothesis and evaluate it with a range of each motion variable, and Figure 4 presents each single datum. Therefore, while keeping reliability Ps condistribution for the KITTI dataset (No. 0–10). stant, RANSAC can be accelerated by two approaches: 1) We could observe useful prior knowledge from our mocutting down M or 2) reducing tg or Nte . tion analysis. First, trajectories of ground vehicles were so smooth that its piecewise rotation and translation was 3. SIMPLIFIED EPIPOLAR GEOMETRY quite small. Between two adjacent image frames, rotation was mostly within ±3 degrees, and translational di3.1. Motion Analysis: Smooth and Planar rection was also within ±9 degrees. It is significant difFirst of all, we analyzed trajectories generated by ground ference from conventional structure-from-motion (SfM) vehicles on roads and floors to discover their tendency of using photos taken at random viewpoints. Second, their motion. We utilized three public datasets: KITTI odometry dataset [3], Rawseeds-Bicocca datasets [12], and ualberta- motion was almost planar motion even though roads had bumps, uphill, and downhill. In the KITTI dataset, θx and csc-flr3-vision dataset [13]. Their overall traveled disθz was within 1 degrees, and φyz was within 3 degrees. tance was almost 27 km with more than 57 × 103 frames. Their extreme values were appeared due to environmental A vehicle’s trajectory is a series of the vehicle’s pose repfactors (e.g. bumps and beginning of up/downhill) and veresented by 4-by-4 transformation matrix, P. Since the hicle’s motion itself (e.g. vibration, turning, acceleration, trajectory is accumulation of relative pose between adjaand deceleration). Third, the range of translational direccent frames, relative pose (a.k.a. motion) from i-th frame tion, φxz and φyz , is mostly larger than the range of rotation, to j-th pose frame are derived by θ  0  x , θy , and θz . In three datasets, φxz was nearly 2.6, 1.1, Ri→ j t0i→ j −1 and 4.9 times larger than θy , respectively. It means that = P j · Pi . (10) 0 1 their motion was far from circular motion which should hold φxz = 0.5θy . It is mainly because of large longituThe motion is decomposed into ZYX Euler angles and 3D dinal offset from a vehicle’s driving axle to its mounted vector in the spherical coordinate as follows: camera. R0 = Rz (θz ) · Ry (θy ) · Rx (θx ) (11)   cos φyz sin φxz 3.2. 2 DoF and 3 DoF Simplification: A Review , t0 = ρ  sin φyz (12) There have been meaningful researches on simplified cos φyz cos φxz epipolar geometry to reduce the number of sampled data

4

Submission to International Journal of Control, Automation, and Systems

Table 1: Various information on three public datasets: sensor, trajectory, and motion. Set # means the number of independent trajectories (D: raw images with distortion, R: rectified images without distortion). Frame # means the total number of images. The range of each motion variable is decided by 1 and 99 percentile. Dataset Information

Trajectory Information

Set #

KITTI Odometry

outdoor (road)

11 (R)

Rawseeds – Bicocca

indoor (library)

Altitude

Avg. Speed

θx [deg]

θy [deg]

θz [deg]

ρ [m]

φxz [deg]

φyz [deg]

23,201

1241x376

10 Hz

22,177 m

-58 m – +66 m

9.563 m/s

-0.48 – +0.50

-3.31 – +3.14

-0.41 – +0.41

0.056 – 2.631

-9.92 – +8.92

-3.58 – +1.56

5 (D)

30,896

320x240

Varying

4,533 m

0

0.147 m/frame



-7.34 – +6.63



0.043 – 0.541

-27.0 – +24.0



1 (R)

513

640x480

Varying

73 m

0

0.144 m/frame



-1.46 – +7.06



0.035 – 0.210

-12.8 – +20.0



−4

−2

0 θx [deg]

(a) θx

2

4

6

−6

−4

−2

0 θy [deg]

(b) θy

2

4

6

−6

Number of Occurrences

Number of Occurrences

Number of Occurrences −6

Motion Range Information

Resolution Frame Rate Distance

Number of Occurrences

ualberta-csc- indoor flr3-vision (corridor)

Frame #

−4

−2

0 θz [deg]

2

4

(c) θz

6 −15

−10

−5

0 φxz [deg]

(d) φxz

5

10

15 −15

Number of Occurrences

Place

Number of Occurrences

Name

−10

−5

0 φyz [deg]

5

10

15

0

0.5

(e) φyz

1

1.5 ρ [m]

2

2.5

3

(f) ρ

Fig 4: Distribution of each motion variable in KITTI odometry dataset is presented through a histogram. M. Ortin and Montiel [6] proposed epipolar geometry under planar motion (3 DoF) and applied it to indoor image sequences. Planar motion between two images is described by   sin φxz R01 = Ry (θy ) and t02 = ρ  0  . (13) cos φxz In contrast to 6 DoF general motion, 3 DoF planar motion enables to estimate epipolar geometry (up to scale) using only two points. Two variables, θy and φxz , is solved by Newton’s method [5, 6] or linear formulation [14]. Scaramuzza et al. [5] introduced much more simplified motion model, planar circular motion (2 DoF). Since the relation, φxz = 0.5θy , is valid under circular motion, the single unknown φxz can be calculated from only one point. Its onepoint solution is a simple closed form so that it can significantly decrease computation of RANSAC (due to small tg and M = 1). RANSAC with 1-point algorithm is called as 1-point RANSAC [5] and applied to many other researches because of its low computation and robustness. Civera et al. [15] utilized 1-point RANSAC as a preliminary step of outlier filtering. Scaramuzza et al. [16] also exploited circular motion approximation to estimate motion scale ρ incorporated with Ackerman kinematics and known camera’s longitudinal offset from vehicle’s driving axle. Recently, Lee et al. [17] applied such motion approximation and kinematics to generalized cameras attached on a vehicle. In addition, Choi et al. [18] improved 1point RANSAC through normalization and a simple geometric error function. The normalization is effective when 3D points lies on a flat XZ plane, and the error function accelerates RANSAC more by reducing te . However, our two experiments (Section 4 and 5) revealed that 2 DoF or 3 DoF planar approximation is some-

times over-simplification. Even though a road is locally planar, a vehicle sometimes faces abrupt non-planar motion due to environmental and motion factors such as bumps and vibration. In this paper, we propose less simplified epipolar geometry which can serve non-planar motion with higher DoF. 3.3. 4 DoF Simplification We additionally impose one more DoF to the 3 DoF motion model (13) to compensate for its missing non-planarity. At first, we describe motion with one more translational component, φyz . It is inspired from vibration modeling of a vehicle which approximates vibration as y-directional translation. The augmented motion model is represented as   cos φyz sin φxz . R01 = Ry (θy ) and t03 = ρ  sin φyz (14) cos φyz cos φxz which has 1 DoF rotation and 3 DoF translation. Moreover, such up-and-down motion also can be described by θx instead of φyz . We also add one more DoF to rotation parts, and the alternative motion model becomes   sin φxz R02 = Ry (θy ) · Rx (θx ) and t02 = ρ  0  , (15) cos φxz which has 2 DoF rotation and 2 DoF translation. We had applied such approximated model to our driving event analyzer, Personal Driving Diary [1]. 3-point Algorithms: These 4 DoF motion models enable to estimate relative pose (up to scale) with three pairs of correspondences. The epipolar constraint with these

Submission to International Journal of Control, Automation, and Systems

We also apply one more simplification,

models is a nonlinear equation as f (ζ ; x, x0 ) = x0> Ex = x0> [−R0> t0 ]× R0> x = 0 ,

(16)

where ζ is unknown motion variables, [θy , φxz , φyz ]> or [θx , θy , φxz ]> . We use Newton’s method (a.k.a. NewtonRaphson method) to find a root of this nonlinear equation. Newton’s method is an iterative algorithm to search a locally optimal root of the given equation using its derivative (or Jacobian for multiple variables). From the 1storder Taylor expansion, f (ζk+1 ) ≈ f (ζk ) + J(ζk )(ζk+1 − ζk ), the refined root is derived as ζk+1 = ζk − J+ (ζk ) f (ζk )

(17)

under assumption of f (ζk+1 ) = 0. A pseudo-inverse of a matrix J is written in J+ . Newton’s method reaches a locally optimal root by iterating Equation (17). To accelerate our implementation, we calculated δ by solving Jδ = f directly1 instead of calculating the pseudo-inverse J+ . When M pairs of matched points are given, the constraint becomes a M-by-1 matrix and its Jacobian is a Mby-3 matrix as follows2 :   f (ζ ;x1 ,x01 )  f (ζ ;x2 ,x02 )    f(ζ ) =  (18)  and ..   . f (ζ ;xM ,x0M )    J(ζ ) =   

∂ ∂ θy ∂ ∂ θy

∂ ∂ θy

f (ζ ;x1 ,x01 ) f (ζ ;x2 ,x02 ) .. . f (ζ ;xM ,x0M )

∂ ∂ φxz ∂ ∂ φxz

∂ ∂ φxz

f (ζ ;x1 ,x01 ) f (ζ ;x2 ,x02 ) .. . f (ζ ;xM ,x0M )

5

∂ ∂ φyz ∂ ∂ φyz

∂ ∂ φyz

f (ζ ;x1 ,x01 )



f (ζ ;x2 ,x02 )   . ..   . f (ζ ;xM ,x0M )

sin θi sin θ j ≈ 0

(i, j ∈ {x, y, z}) .

(21)

As a result, 3D rotation matrix becomes a quite simple form as 

cos θy Ra = θz cos θy − sin θy

−θz 1 θx cos θy

 sin θy −θx  . cos θy

(22)

Additionally, for more simplicity, we represent translation in the rectangular coordinate, not the spherical coordinate. 3D translation vector (up to scale) is written as     ρx tx t3 = ρy  = ρz ty  , ρz 1

(23)

where tx and ty are the ratio of x and y-directional translation to z-direction translation, respectively. Experimentally, estimating translation in the rectangular coordinate had similar accuracy to its estimation in the spherical coordinate but required quite less computation. Approximated 5-point Algorithm: To estimate relative pose (up to scale), this simplified 6 DoF motion requires at least 5 points similar to conventional 5-point algorithms [7–9]. However, our approximation provides the computationally more tractable equation, f (ζ ; x, x0 ) = x0> Ex = x0> [t]× Rx = 0 ,

(24)

(19) We assign the initial variable as ζ = [0, 0, 0]> , which is based on our motion analysis that rotation and translation is small for ground vehicles between two adjacent images. Experimentally, one or two iterations were enough to find a reliable root using Newton’s method with the epipolar constraint because the true motion was close to zero. 3.4. 6 DoF Simplification We derive less simplified epipolar geometry from full DoF motion. According to our motion analysis, two rotations, θx and θy , were quite small. All previous simplifications assign their values as 0, but we avoid such extreme approximation. Instead, we assume that their values are small, not zero, so that their sine and cosine functions can be linearized as follows: sin θx ≈ θx , sin θz ≈ θz , cos θx ≈ 1, and cos θz ≈ 1 . (20) 1 In case of OpenCV, it is much faster to use cv::solve() instead of cv::invert(). 2 Equation (19) is a Jacobian matrix for ζ = [θ , φ , φ ]> . y xz yz The Jacobian for ζ = [θx , θy , φxz ]> is also similarly driven.

where ζ is a vector of five unknowns, [θx , θy , θz ,tx ,ty ]> . In contrast to the previous models, our approximated motion is based on motion from the 2nd image to the 1st image, R and t, not R0 and t0 . It is another key to make the epipolar constraint simple and tractable even though our model keeps full DoF motion. Similarly, we solve five unknowns using Newton’s method mentioned in 3.3.Newton’s method finds a locally optimal root of Equation (24) numerically. Even though it cannot find all available solutions together, it mostly finds the correct motion because it is close to zero (smooth motion) and we give the initial unknown as zero. Above all, it is much faster than conventional 5-point algorithms which need to solve 10thorder polynomial equations. According to our benchmark, the approximated 5-point algorithm was almost 15 times faster than the original 5-point algorithms. Even though our algorithm cannot reduce the number of sampled data M, it can speed up RANSAC through reducing computing time to generate a hypothesis, tg in Equation (9). In summary, Table 2 enumerates various relative pose estimators which come from different DoF of simplified motion models.

6

Submission to International Journal of Control, Automation, and Systems

Table 2: Diverse simplification levels of relative pose estimators: M means the required number of points. tg is computing time to estimate motion with the minimal number of points. Each computing with more number of points are presented in Figure 6 (a). We set only 1 iteration for Newton’s method in our benchmark and odometry experiments. 3-pt algo (R01 t03 ) is 3-point algorithm with 1 DoF rotation and 3 DoF translation as shown in Equation (14). Similarly, 3-pt algo (R02 t02 ) is the other 3-point algorithm with 2 DoF rotation and 2 DoF translation as presented in Equation (15). Two 5-point algorithms are from Li and Hartley [8] and Stewenius et al. [9], respectively. Model

Rotation

Translation

M

Solver

Computing Time (tg)

1-pt algo [5]

E

1 DoF

1 DoF

≥1

closed (1) / SVD (>1)

0.000 / 0.003 [msec]

2-pt algo [5]

E

1 DoF

2 DoF

≥2

Newton (solve)

0.004 [msec]

3-pt algo (R’1t’3)

E

1 DoF

3 DoF

≥3

Newton (solve)

0.004 [msec]

3-pt algo (R’2t’2)

E

2 DoF

2 DoF

≥3

Newton (solve)

0.004 [msec]

Aprx. 5-pt algo

E

Aprx. 3 DoF

3 DoF

≥5

Newton (solve)

0.006 [msec]

5-pt algo [8]

E

3 DoF

3 DoF

≥5

SVD, rpoly

0.158 [msec]

5-pt algo [9]

E

3 DoF

3 DoF

≥5

SVD, eigen (QZ decomp.)

0.091 [msec]

8-pt algo [10]

F

3 DoF

3 DoF

≥8

SVD

0.021 [msec]

Norm. 8-pt algo [10]

F

3 DoF

3 DoF

≥8

SVD

0.021 [msec]

Name

4.

BENCHMARK: RELATIVE POSE ESTIMATION

4.1. Configuration We evaluated various relative pose estimators in the view of accuracy and computing time. In our evaluation, we included nine algorithms shown in Table 2. We performed them in three different configurations: circular planar motion, planar motion, and non-planar motion. In each configuration, we executed each algorithm while changing eight variables - the magnitude of noise σ , the number of observed points M, and also six motion elements ζ ∗ = ∗ , φ ∗ , ρ ∗ ]> . We used synthetically generated [θx∗ , θy∗ , θz∗ , φxz yz data to control motion, observation, and noise. We spread 105 three-dimensional points around the initial camera location. The randomly selected M points were projected on two images apart as much as the given true motion ζ ∗ . The camera calibration matrix for two images was  600 0 600 K= 0 0 0

 320 240 . 1

(25)

We also applied unbiased Gaussian noise to simulated observations as follows: xˆ = Kx + [εx , εy , 0]>

(εi ∼ N(0, σ 2 )) .

(26)

We define the magnitude of noise as the standard deviation of Gaussian, σ . We quantified rotational and translational accuracy of an estimated ζ through er (ζ ; ζ ∗ ) =



∗ θi − θi and

(27)

∗ φi − φi .

(28)

i∈{x,y,z}

et (ζ ; ζ ∗ ) =

∑ i∈{xz,yz}

Each computing time was also measured by OpenCV API, cv::getTickCount() function. All algorithms were implemented in C++ using linear algebra functions in OpenCV, and they are executed in Intel Core i7 2.8GHz (using a single core). For statistically meaningful results, we ran each algorithm 1000 times with randomly sampled points under the same configuration and variables. Among 1000 values, we chose their median as their representative value in order to avoid effect of algorithm’s failure. 4.2. Results and Discussion Performance of nine algorithms is presented in Figure 5, 6, and 7. To highlight our discussion, we select 12 results from 72 combinations (3 configurations, 8 variables, and 3 criteria). We learned several meaningful lessons from this performance evaluation. First, motion prior (e.g. planar motion) is quite effective to improve accuracy and reduce computing time if the prior is valid. 1-point and 2-point algorithms had the best accuracy with only small number of points as shown in Figure 5. However, their accuracy was significantly degraded when their prior is invalid. Even though a small amount of non-planar motion (e.g. θz > 0.2 degrees), two algorithms became worse than others (Figure 6 and 7). Similarly, as shown in Figure 6, two 3-point algorithms were more accurate than the others when their additional DoF was effective, but they also deteriorated when the given motion is out of their DoF. Second, the approximated 5-point algorithm had the best performance in the 3rd configuration which simulated the extremely large motion of KITTI odometry dataset. Especially, it was much more robust to image noise compared to 5-point and 8-point algorithms as presented in Figure 5 (a)-(b) and 7 (a)-(b). The original 5-point algorithms were also more robust than 8-point algorithms, but they were less accurate as the number of points were increased

Submission to International Journal of Control, Automation, and Systems

14

2

1.5

Translation Error [deg]

Rotation Error [deg]

2.5

12

1

10 8 6

1.8

1−pt algo 2−pt algo 3−pt algo (R1t3) 3−pt algo (R2t2) Aprx. 5−pt algo 5−pt algo [Li06] 5−pt algo [St.06] 8−pt algo Norm. 8−pt algo

8 1−pt algo 2−pt algo 3−pt algo (R1t3) 3−pt algo (R2t2) Aprx. 5−pt algo 5−pt algo [Li06] 5−pt algo [St.06] 8−pt algo Norm. 8−pt algo

1.6 1.4 1.2 1 0.8

6

0.6

4 2

0 0

0.2

0.4 0.6 Magnitude of Noise (σ) [pixel]

0.8

0 0

1

(a) er while varying σ

0.4 0.6 Magnitude of Noise (σ) [pixel]

0.8

0 1

1

4 3

1

0.2

0.2

5

2

0.4

0.5

1−pt algo 2−pt algo 3−pt algo (R1t3) 3−pt algo (R2t2) Aprx. 5−pt algo 5−pt algo [Li06] 5−pt algo [St.06] 8−pt algo Norm. 8−pt algo

7

Translation Error [deg]

1−pt algo 2−pt algo 3−pt algo (R1t3) 3−pt algo (R2t2) Aprx. 5−pt algo 5−pt algo [Li06] 5−pt algo [St.06] 8−pt algo Norm. 8−pt algo

Rotation Error [deg]

3

7

2

(b) et while varying σ

4

8 16 32 64 128 Number of Observed Points (M)

256

0 1

512

2

(c) er while varying M

4

8 16 32 64 128 Number of Observed Points (M)

256

512

(d) et while varying M

∗ , φ ∗ , ρ ∗ ]> = Fig 5: In planar and circular motion, default variables were σ = 0.5, M = 8, and ζ ∗ = [θx∗ , θy∗ , θz∗ , φxz yz > [0, 4, 0, 2, 0, 1] whose units are degree or meter. 3

12

10

1−pt algo 2−pt algo 3−pt algo (R1t3) 3−pt algo (R2t2) Aprx. 5−pt algo 5−pt algo [Li06] 5−pt algo [St.06] 8−pt algo Norm. 8−pt algo

10

3.5

−1

8

6

4

2.5

Rotation Error [deg]

0

1−pt algo 2−pt algo 3−pt algo (R1t3) 3−pt algo (R2t2) Aprx. 5−pt algo 5−pt algo [Li06] 5−pt algo [St.06] 8−pt algo Norm. 8−pt algo

4

3

Rotation Error [deg]

Computing Time [msec]

1

10

4.5

3.5

10 Translation Error [deg]

2

10

2 1.5

3 2.5 2 1.5

1

10

1 2

0.5

0.5

−2

10

1

2

4

8 16 32 64 128 Number of Observed Points (M)

(a) tg while varying M

256

512

0 −5

0 φyz [deg]

5

0 −2

∗ (b) et while varying φyz

−1.5

−1

−0.5

0 θx [deg]

0.5

1

1.5

0 −2

2

(c) er while varying θx∗

−1.5

−1

−0.5

0 θz [deg]

0.5

1

1.5

2

(d) er while varying θz∗

∗ , φ ∗ , ρ ∗ ]> = [0, 4, 0, 8, 0, 1]> Fig 6: In planar motion, default variables were σ = 0.5, M = 8, and ζ ∗ = [θx∗ , θy∗ , θz∗ , φxz yz whose units are degree or meter.

60

60 Aprx. 5−pt algo (K = 1) Aprx. 5−pt algo (K = 5) Aprx. 5−pt algo (K = 10)

40

30

20

10

0 −60

Aprx. 5−pt algo (K = 1) Aprx. 5−pt algo (K = 5) Aprx. 5−pt algo (K = 10)

50

Rotation Error [deg]

50

Rotation Error [deg]

(Figure 5 (c)-(d) and Figure 7 (c)-(d)). However, the approximated 5-point algorithm sustained its accuracy similar to 8-point algorithms. The approximated 5-point algorithm seems to effectively utilize the motion prior, small x and z-directional rotation. In addition to accuracy, it spent nearly 15 times less computing time than the original 5point algorithms as demonstrated in Table 2 and Figure 6 (a). Third, the approximated 5-point algorithm was effective until its linearized components were given within ±20 degrees. The approximated 5-point algorithm was based on local optimization (Newton’s method) and involved linearizion on angular components θi . Therefore, the algorithm cannot reach correct values if the given values are far from its initial values for iterative optimization. As shown in Figure 8, the approximated 5-point algorithms had reasonable rotational error, less than 5 degrees, when the true rotation was given within ±20 degrees. Surprisingly, the number of iteration was not sensitive estimating rotation within ±20 degrees. Finally, we also observed difference and similarity between two different versions of 3/5/8-point algorithms, respectively. First of all, 3-point algorithm (R02 t02 ) were more reliable than 3-point algorithm (R1 t3 ) in the 3rd configuration as shown in Figure 7. It seems to result from an intrinsic limitation of epipolar geometry whose translation is less accurately estimated than rotation. Two version of 5-point algorithms had similar accuracy, but their computing time was inconsistent as shown in Figure 6 (a). The implementation of Stewenius’s 5-point algorithm used thin SVD, not full SVD, so that it was much faster than the implementation of Li’s

40

30

20

10

−40

−20

0 θy [deg]

20

(a) er while varying

40

θy∗

60

0 −60

−40

−20

0 θz [deg]

20

(b) er while varying

40

60

θz∗

Fig 8: In extremely non-planar motion, default variables were σ = 0.5, M = 5, and ζ ∗ = ∗ , φ ∗ , ρ ∗ ]> = [0, 0, 0, 0, 0, 1]> whose [θx∗ , θy∗ , θz∗ , φxz yz units are degree or meter. The variable K is the number of iterations used in Newton’s method.

algorithm. From the given m-by-n matrix A, thin SVD generates n-by-n U matrix if m > n, not m-by-m U matrix (A = USV> ). We also observed that the normalized 8-point algorithm was more reliable than its original when more number of points or larger magnitude of noise were given as shown in Figure 5 (a)-(b). 5.

EXPERIMENT: MONOCULAR VISUAL ODOMETRY

5.1. Monocular Visual Odometry We evaluated each algorithm’s performance as a part of monocular visual odometry. Our visual odometry was

8

Submission to International Journal of Control, Automation, and Systems

3

18

4

16

3.5

2 1.5 1

12 10 8 6

0 0

0.4 0.6 Magnitude of Noise (σ) [pixel]

0.8

(a) er while varying σ

1

2 1.5

20

15

10

5

0.5

2 0.2

2.5

1

4 0.5

25

3 Rotation Error [deg]

Translation Error [deg]

Rotation Error [deg]

14 2.5

30

Translation Error [deg]

3.5

0 0

0.2

0.4 0.6 Magnitude of Noise (σ) [pixel]

0.8

(b) et while varying σ

1

0 1

2

4

8 16 32 64 128 Number of Observed Points (M)

(c) er while varying M

256

512

0 1

2

4

8 16 32 64 128 Number of Observed Points (M)

256

512

(d) et while varying M

∗ , φ ∗ , ρ ∗ ]> = [1, 4, 1, 2, 2, 1]> Fig 7: In non-planar motion, default variables were σ = 0.5, M = 8, and ζ ∗ = [θx∗ , θy∗ , θz∗ , φxz yz whose units are degree or meter.

based on monocular version of LIBVISO2 [4], so-called VISO2-M. It consists of four steps: feature extraction, feature matching, outlier rejection, relative pose refinement, and motion scale estimation. First of all, it detects locally salient points and associates them with points on the previous image [19]. The salient points are detected by 5by-5 corner kernel responses with non-maxima suppression (NMS). The feature points are associated by sumof-absolute-differences (SAD) of two 11-by-11 windows. Bucketing is also adopted to spread the feature points uniformly on images. Some correspondences are regardless of camera motion due to incorrect association or moving objects. Therefore, VISO2-M needs to exclude such outliers because they make pose estimation completely wrong. To reject outliers, RANSAC is utilized in conjunction with each algorithm (for hypothesis generation) and Sampson distance (for hypothesis evaluation) defined as follows: ed (F; xˆ , xˆ 0 ) =

(ˆx0> Fˆx)2 (Fˆx)21 + (Fˆx)22 + (F> xˆ 0 )21 + (F> xˆ 0 )22

, (29)

where (Fˆx)i is the i-th element of a vector Fˆx. Table 3 presents our experimental configuration including two RANSAC parameters, the number of iterations T and the inlier threshold τ. Next, the normalized 8-point algorithm was utilized to get more refined relative pose using inliers. From our benchmark, the normalized 8-point algorithm had enough accuracy and reasonable computing time with a hundred of points without any approximation on motion. Finally, unknown motion scale is resolved from the known and constant height of the camera from the ground. In order to find the scale factor, VISO2-M adopt kernel density estimation (KDE) with Gaussian kernel. Instead of Gaussian kernel, we applied asymmetric Gaussian kernel [20] which is more robust to complex urban structures. Local bundle adjustment may refine estimated relative pose and scale factor more, but we exclude it to examine each algorithm’s performance more clearly. 5.2. Results and Discussion We applied nine methods of monocular visual odometry to KITTI odometry dataset (on-road) and alberta-csc-flr3vision dataset (indoor) mentioned in Table 1. Among 21

image sequences in the KITTI dataset, we used first 11 image sequences because they have true trajectories for performance evaluation. We used similar accuracy measures proposed in the KITTI Vision Benchmark Suites [3], and we also recorded computing time using the same computer and OpenCV API mentioned in Section 4.Table 3 contains each method’s averaged values of rotational, translational errors, and computing time. Figure 9 shows their rotational and translation errors in detail, and two example trajectories are presented in Figure 10. As a result, we can analyze nine methods on two datasets in the view of accuracy and computing time. First of all, 5point algorithms and the approximated 5-point algorithm had the best accuracy in two dataset, respectively. 5-point algorithms were slightly better in the on-road dataset (as much as 0.009 deg/m and 0.1 %), and the approximated 5point algorithm was better in the indoor dataset as much as 0.04 deg/m and 0.4 %). Their accuracy was only slightly different in two datasets. However, the approximated 5point algorithm was almost 9 times faster than its originals in outlier rejection. Even though the approximated algorithm itself was nearly 15 times faster in the benchmark, it became 9 times faster in outlier rejection because of shared computation of RANSAC, Nte , in Equation (9). In overall, visual odometry with the approximated version was nearly 2 times faster than other visual odometry with 5-point and 8-point algorithms. It was because VISO2-M spent almost 74 milliseconds (in the KITTI dataset) for feature extraction and matching, which was another major computation. In the perspective of computing time, 1-point algorithm was the best, but its accuracy was worse than the approximated 5-point algorithms. The performance of 1 and 3-point algorithm (R02 t02 ) was quite different in two datasets - better in the indoor dataset and worse in the on-road dataset. Two algorithms provided reasonable accuracy in the indoor dataset compared to 5 and 8-point algorithms, but they became quite worse in the on-road dataset. It resulted from more non-planar motions in the on-road dataset, and it was also observed in our benchmark as shown in Figure 6 and 7. Finally, we also observed that our asymmetric kernel was better than Gaussian kernel for scale estimation. In two datasets,

Submission to International Journal of Control, Automation, and Systems

9

Table 3: In our experiments, we utilized nine methods including VISO2-M [4] and 1pt/1&2pt [5, 18]. T and τ is the number of iterations and the inlier threshold in RANSAC, which were adjusted to have best performance in two datasets. VISO2-M used Sampson distance defined in the normalized image coordinate so that its value, 0.00001, is quite small compared to others. Visual odometry ‘1pt/1&2pt’ also adopted another error functions, a simple geometric error [18], whose value is 0.4 degrees. In the experimental results of each dataset, R. Error and T. Error mean averaged rotational and translational error defined in [3], whose units are [deg/m] and [%], respectively. C. Time means average of total computing time from feature extraction to scale estimation, and tRANSAC is averaged computing time for only outlier rejection. Their units are millisecond. We highlight top four results for readability. Outlier Rejection

Name

Algorithm

M

VISO2-M[3]

Norm. 8-pt algo

8

1pt/1&2pt[5]

1-pt algo

1

20 0.2

3pt(R’1t’3)/8pt(N) 3-pt algo (R’1t’3)

3

3pt(R’2t’2)/8pt(N) 3-pt algo (R’2t’2) 5pt(Aprx)/8pt(N)

Relative Pose Refinement

τ

T

KITTI Odometry Dataset (No. 0 - 10)

ualberta-csc-flr3-vision Dataset

Scale Estimation

R. error

T. Error

C. Time

transac

R. error

T. Error

2000 0.00001 Norm. 8-pt algo

C. Time

transac

Gaussian

0.0241

12.2910

136.2176

54.3616

0.3057

7.7843

86.3782

49.2720

1-pt/2-pt algo

Asymmetric

0.1377

20.9022

78.4252

0.0869

0.2695

5.3737

36.1370

0.0623

200 0.4

Norm. 8-pt algo

Asymmetric

0.1472

27.1240

85.5055

2.5599

0.9425

17.1357

40.1787

1.9268

3

200 0.4

Norm. 8-pt algo

Asymmetric

0.0356

8.5931

83.9500

2.5751

0.2336

5.2154

38.9146

1.9890

Aprx. 5-pt algo

5

400 0.2

Norm. 8-pt algo

Asymmetric

0.0124

5.5956

87.6342

6.6617

0.2399

4.1650

42.2703

5.4133

5pt[8]/8pt(N)

5-pt algo [8]

5

400 0.2

Norm. 8-pt algo

Asymmetric

0.0115

5.4411

154.5291

72.9371

0.2802

4.5548

106.1110

68.9995

5pt[9]/8pt(N)

5-pt algo [9]

5

400 0.2

Norm. 8-pt algo

Asymmetric

0.0115

5.5480

140.4536

57.9916

0.2571

4.4264

85.4588

48.0260

8pt/8pt(N)

8-pt algo

8

2000 0.2

Norm. 8-pt algo

Asymmetric

0.0145

5.8645

135.4325

52.2455

0.2898

6.3030

83.3840

46.0656

8pt(N)/8pt(N)

Norm. 8-pt algo

8

2000 0.2

Norm. 8-pt algo

Asymmetric

0.0155

5.7668

141.1565

57.5542

0.3116

5.3972

89.6711

52.1340

16

0.05 0.04 0.03

12 10 8

0.02

10 VISO2−M 3pt(R2t2)/8pt(N) 5pt(Aprx)/8pt(N) 5pt(St06)/8pt(N) 8pt(N)/8pt(N)

0.7 0.6

VISO2−M 3pt(R2t2)/8pt(N) 5pt(Aprx)/8pt(N) 5pt(St06)/8pt(N) 8pt(N)/8pt(N)

9 8 Translation Error [%]

Rotation Error [deg/m]

0.06

0.8

VISO2−M 3pt(R2t2)/8pt(N) 5pt(Aprx)/8pt(N) 5pt(St06)/8pt(N) 8pt(N)/8pt(N)

14 Translation Error [%]

VISO2−M 3pt(R2t2)/8pt(N) 5pt(Aprx)/8pt(N) 5pt(St06)/8pt(N) 8pt(N)/8pt(N)

0.07

Rotation Error [deg/m]

0.08

0.5 0.4 0.3 0.2

7 6 5 4

6 0.01 0 0

0.1

50

100

150 200 250 Path Length [m]

300

(a) Rotational Error

350

400

4 0

50

100

150 200 250 Path Length [m]

300

350

(b) Translational Error

400

0 0

3

10

20 30 Path Length [m]

40

(c) Rotational Error

50

2 0

10

20 30 Path Length [m]

40

50

(d) Translational Error

Fig 9: Rotational and translational errors of five representative methods are presented with respect to all possible subsequences of the given lengths. (a) and (b) is for KITTI odometry dataset, and (c) and (d) is for alberta-csc-flr3-vision dataset. translational error with asymmetric kernel was improved almost 2 times than the original Gaussian kernel as shown in Figure 9 (b) and (d). 6.

CONCLUSION

In this paper, we introduce simplified epipolar geometry and evaluated them in synthetic and real data. From our motion analysis and experiments, we verified that 2 and 3 DoF motion models are often over-simplification. We proposed less simplified motion models (4 and 6 DoF) which can also take advantage of prior knowledge for ground vehicles - nearly planar and smooth motion. Such motion prior enables our proposed relative pose estimators faster and more accurate, and we demonstrated their effectiveness in two types of experiments: algorithm itself and monocular visual odometry. We believe that our work contributes to build a bridge between the previous full DoF algorithms and 2 and 3 DoF algorithms.

From our experiments, we would like conclude that the approximated 5-point algorithm is the best choice for monocular visual odometry for ground vehicles in the view of computing time and accuracy. In addition, more simplified algorithms can be applied to ground vehicles if the vehicles are working on quite flat planes (e.g. indoor) with little camera vibration. Especially, even though circular motion is not valid, 1-point algorithm seems quite helpful as a preprocessing [15] to roughly filter out outliers because of its computational superiority. We also think that our proposed algorithms can be sufficiently applied to embedded devices because of its less computational burden.

As further works, we will investigate fast feature extraction/matching (or direct method) and local bundle adjustment for more accurate monocular visual odometry.

10

Submission to International Journal of Control, Automation, and Systems

Truth VISO2−M 3pt(R2t2)/8pt(N) 5pt(Aprx)/8pt(N) 5pt(St06)/8pt(N) 8pt(N)/8pt(N)

150

100

Z [m]

50

0

−50

−100 −250

−200

−150

−100 X [m]

−50

0

50

(a) KITTI odometry dataset #7 Truth VISO2−M 3pt(R2t2)/8pt(N) 5pt(Aprx)/8pt(N) 5pt(St06)/8pt(N) 8pt(N)/8pt(N)

5

0

Z [m]

−5

−10

−15

−20

−15

−10

−5

0

5 X [m]

10

15

20

25

(b) alberta-csc-flr3-vision dataset

Fig 10: Two example trajectories: (a) the 7th image sequence in KITTI odometry dataset and (b) albertacsc-flr3-vision dataset. ACKNOWLEDGEMENT The authors would like to thank to Dr. Andreas Geiger, Dr. Nghia Ho, Dr. Henrik Stewenius, and Jonathan Klippenstein for their open-source software and datasets.

REFERENCES [1] M. S. Ryoo, S. Choi, J. H. Joung, J.-Y. Lee, and W. Yu, “Personal driving diary: Automated recognition of driving events from first-person videos,” Computer Vision and Image Understanding, vol. 117, pp. 1299–1312, 2013. [2] Y. Cheng, M. W. Maimone, and L. Matthies, “Visual odometry on the mars exploration rovers,” IEEE Robotics and Automation Magazine, vol. 13, pp. 54– 62, 2006. [3] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the KITTI Vision Benchmark Suite,” in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2012.

[4] A. Geiger, “LIBVISO2: Library for VISual Odometry 2.” [Online]. Available: http://www. cvlibs.net/software/libviso2.html [5] D. Scaramuzza, F. Fraundorfer, and R. Siegwart, “Real-time monocular visual odometry for on-road vehicles with 1-point RANSAC,” in Proceedings of IEEE International Conference on Robotics and Automation (ICRA), 2009. [6] D. Ortin and J. Montiel, “Indoor robot motion based on monocular images,” Robotica, vol. 19, pp. 331– 342, 2001. [7] D. Nister, “An efficient solution to the five-point relative pose problem,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, pp. 756– 770, 2004. [8] H. Li and R. Hartley, “Five-point motion estimation made easy,” in Proceedings of International Conference on Pattern Recognition (ICPR), 2006. [9] H. Stewenius, C. Engels, and D. Nister, “Recent developments on direct relative orientation,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 60, pp. 284–294, 2006. [10] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision. Combridge, 2003. [11] S. Choi, T. Kim, and W. Yu, “Performance evaluation of RANSAC family,” in Proceedings of British Machine Vision Conference (BMVC), 2009. [12] “The Rawseeds Project.” [Online]. Available: http: //www.rawseeds.org/home/ [13] A. Howard and N. Roy, “The Robotics Data Set Repository (Radish),” 2003. [Online]. Available: http://radish.sourceforge.net/ [14] S. Choi, J. Lee, J. H. Joung, M. Ryoo, and W. Yu, “Numerical solutions to relative pose problem under planar motion,” in Proceedings of International Conference on Ubiquitous Robot and Ambient Intelligence (URAI), 2010. [15] J. Civera, O. G. Grasa, A. J. Davison, and J. M. Montiel, “1-point RANSAC for EKF-based structure from motion,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2009. [16] D. Scaramuzza, “1-point-RANSAC structure from motion for vehicle-mounted cameras by exploiting non-holonomic constraints,” International Journal of Computer Vision, vol. 95, pp. 74–85, 2011. [17] G. H. Lee, F. Fraundorfer, and M. Pollefeys, “Motion estimation for self-driving cars with a generalized camera,” in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2013. [18] S. Choi, J. H. Joung, W. Yu, and J.-I. Cho, “What does ground tell us? monocular visual odometry under planar motion constraint,” in Proceedings of the International Conference on Control, Automation, and Systems (ICCAS), 2011.

Submission to International Journal of Control, Automation, and Systems

[19] A. Geiger, J. Ziegler, and C. Stiller, “StereoScan: Dense 3D reconstruction in real-time,” in Proceedings of IEEE Intelligent Vehicles Symposium (IV), 2011. [20] S. Choi, J. Park, and W. Yu, “Resolving scale ambiguity for monocular visual odometry,” in Proceedings of International Conference on Ubiquitous Robot and Ambient Intelligence (URAI), 2013. [21] H.-K. Lee, K. Choi, J. Park, and H. Myung, “Selfcalibration of gyro using monocular SLAM for an indoor mobile robot,” International Journal of Control, Automation, and Systems (IJCAS), vol. 10, pp. 558–566, 2012.

Sunglok Choi received a B.S. degree in mechanical and aerospace engineering from Seoul National University, Seoul, South Korea in 2006, and received a M.S. degree in robotics from Korea Advanced Institute of Science and Technology (KAIST), Daejoen, South Korea in 2008. From 2006 until January 2008, he was a research assistant in Robot Intelligence Technology Lab in KAIST. Since 2008, he has been a research scientist in ETRI, Daejeon, South Korea. His research interests include autonomous navigation, geometric computer vision, and robust regression. Jaehyun Park received a B.S. degree in computer engineering from Dongeui University, Busan, South Korea in 2008 and received a Ph.D. degree in electrical engineering from Pusan National University in 2012. Currently, he is a research scientist in ETRI, Daejeon, South Korea. His current research interests include localization, 3D perception, and intelligent systems. Wonpil Yu is with Intelligent Cognitive Technology Research Department, ETRI, where his current jobs are developing autonomous navigation on urban and unstructured environments. Prior to joining ETRI in 2001, he worked for ADD at Daejeon, South Korea, where he was involved in the development of a precision stabilizer for radar seeker system. He received the B.S. degree in control and instrumental engineering from Seoul National University, Seoul, South Korea in 1992 and M.S. and Ph.D. degrees in Electrical Engineering from KAIST, Daejeon, South Korea in 1994 and 1999, respectively. His areas of research include robot vision, image processing, and robot navigation.

11

Simplified Epipolar Geometry for Real-time Monocular ...

This work was supported by two R&D programs of the Ko- rea Ministry of .... stant, RANSAC can be accelerated by two approaches: 1) ...... we also recorded computing time using the same computer .... Odometry 2.” [Online]. Available: http://www. cvlibs.net/software/libviso2.html ... Institute of Science and Technology (KA-.

868KB Sizes 4 Downloads 165 Views

Recommend Documents

Geometry-Based Next Frame Prediction from Monocular Video
use generative computer graphics to predict the next frame to be observed. ... Recent frame prediction methods based on neural net- works [28], [24], [26], [14] ..... three-dimensional point cloud C. The x,y,z coordinates of the projected points in .

Monocular Navigation for Long-Term Autonomy - GitHub
computationally efficient, needs off-the-shelf equipment only and does not require any additional infrastructure like radio beacons or GPS. Contrary to traditional ...

Monocular Navigation for Long-Term Autonomy - GitHub
Taking into account that the time t to traverse a segment of length s is t = s/vk we can calculate the robot position (bx,by) after it traverses the entire segment as:.

Monocular Obstacle Detection
Ankur Kumar and Ashraf Mansur are students of Robot Learning Course at Cornell University, Ithaca, NY 14850. {ak364, aam243} ... for two labeled training datasets. This is used to train the. SVM. For the test dataset, features are ..... Graphics, 6(2

Resolving Scale Ambiguity for Monocular Visual ...
times help to overcome the scale ambiguity. ... public datasets: the KITTI odometry dataset (on-road) ..... “Real-time monocular visual odometry for on-road.

Geometry for Exam.pdf
... การศึกษาเรขาคณิตอยา่ งจริงจงัอีกคร้ังจนเกดมีเรขาคณิต ิ. ที่แตกต่างจากเรขาคณิตของยูคลà¸

A generative traversability model for monocular robot ...
Jul 31, 2012 - able to make my computer do it...” [Marr, 1982]. Sapienza et ... Binary Image Segmentation, the segmentation of an image into two classes: [0,1].

pdf-175\realtime-data-mining-self-learning-techniques-for ...
... loading more pages. Retrying... pdf-175\realtime-data-mining-self-learning-techniques ... numerical-harmonic-analysis-by-alexander-paprotny.pdf.

Distributed QoS Guarantees for Realtime Traffic in Ad Hoc Networks
... on-demand multime- dia retrieval, require quality of service (QoS) guarantees .... outside interference, the wireless channel has a high packet loss rate and the ...

Realtime HTML5 Multiplayer Games with Node.js - GitHub
○When writing your game no mental model shift ... Switching between different mental models be it java or python or a C++ .... Senior Applications Developer.

Learn to Write the Realtime Web - GitHub
multiplayer game demo to show offto the company again in another tech talk. ... the native web server I showed, but comes with a lot of powerful features .... bar(10); bar has access to x local argument variable, tmp locally declared variable ..... T

ADOW-realtime-reading-2017.pdf
September-November 2017 TheTenthKnot.net. SEPTEMBER. OCTOBER. Page 1 of 1. ADOW-realtime-reading-2017.pdf. ADOW-realtime-reading-2017.pdf.

VBA Macros Simplified for Beginners.pdf
... you can password protect a macro in. Excel from being viewed (and executed). Page 3 of 58. VBA Macros Simplified for Beginners.pdf. VBA Macros Simplified ...

Simplified-Aircraft-Design-For-Homebuilders.pdf
There was a problem loading more pages. Retrying... Simplified-Aircraft-Design-For-Homebuilders.pdf. Simplified-Aircraft-Design-For-Homebuilders.pdf. Open.

Two Simplified Proofs for Roberts' Theorem
Abstract. Roberts (1979) showed that every social choice function that is ex-post implementable in private value settings must be weighted VCG, i.e. it maximizes the weighted social welfare. This paper provides two simplified proofs for this. The fir

MUVISYNC: REALTIME MUSIC VIDEO ALIGNMENT ...
computers and portable devices to be played in their homes or on the go. .... lated cost matrix and the path through this matrix does not scale efficiently for large ...

geometry posters for blog.pdf
Page 3 of 3. geometry posters for blog.pdf. geometry posters for blog.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying geometry posters for blog.pdf ...

Geometry-Geometry Extended Updated.pdf
tools called the Law of Sines and the Law of Cosines, hand how to recognize when the information provided is not enough to. determine a unique triangle.

geometry posters for blog.pdf
is loaded with ideas to spruce up your geometry unit! Page 3 of 3. geometry posters for blog.pdf. geometry posters for blog.pdf. Open. Extract. Open with. Sign In.