Cheng Chen Han Wang Intelligent Robotics Lab School of EEE, Nanyang Technological University 50 Nanyang Avenue, Singapore 639798
[email protected] [email protected]
Appearance-Based Topological Bayesian Inference for Loop-Closing Detection in a Cross-Country Environment
Abstract
1. Introduction
In this paper, an appearance-based environment modeling technique is presented. Based on this approach, the probabilistic Bayesian inference can work together with a symbolic topological map to relocalize a mobile robot. One prominent advantage offered by this algorithm is that it can be applied to a cross-country environment where no features or landmarks are available. Further more, the loopclosing can be detected independently of estimated map and vehicle location. High dimensional laser measurements are projected into a low dimensional space (mapspace) which describes the appearance of the environment. Since laser scans from the same region share a similar appearance, after the projection, they are expected to form a distinct cluster in the low dimensional space. This small cluster essentially encodes appearance information of the specific region in the environment, and it can be approximated by a Gaussian distribution. This Gaussian model can serve as the “joint” between the topological map structure and the probabilistic Bayesian inference. By employing such “joints”, the Bayesian inference in the metric level can be conveniently implemented on a topological level. Based on appearance, the proposed inference process is thus completely independent of local metric features. Extensive experiments were conducted using a tracked vehicle traveling in an open jungle environment. Results from live runs verified the feasibility of using the proposed methods to detect loop-closing. The performances are also given and thoroughly analyzed.
During the process of simultaneous localization and mapping, when a vehicle revisits a place (called closing-a-loop), it needs to associate its observations with the environment it mapped previously, this problem is often referred to as loop-closing detection or the revisiting problem. Loop-closing detection is widely acknowledged as a major problem within the SLAM community. Given consecutive measurements from a 2D range scanner and inertial sensors, the goal of this work is to localize the loop-closing at the topological level. More specifically, after dividing the map into a series of topological nodes, our objective is to identify which node is the one where loop-closing takes place. In this work, the challenges come from the fact that no features or landmarks can be robustly detected in the crosscountry environment. Here we compare the testing field with two other popular outdoor environments, as shown in Figure 1. It can be seen that, in both Victoria Park and the car park, there exist quite a few observable landmarks such as trees, walls and corners. However, in the open jungle environment, there are no apparent geometrical patterns. The first contribution of this work is an appearance model built from raw 2D range scans.All the measurement frames are segmented into a sequence of groups. Each group corresponds to a certain region in the environment. In the mapping context, it is regarded as a submap of the environment. This process is illustrated in Figure 2. Subsequently, all measurements are projected to a low dimensional space using principal component analysis (PCA). Suppose we have measurement frames from two submaps A and B, as shown in Figure 3(b); for a typical 2D laser range scan at the resolution of 0.5 degree, each frame contains 361 data. A 3D coordinate is employed to illustrate this high
KEY WORDS—appearance, SLAM, topology, Bayesian, PCA, loop-closing detection
The International Journal of Robotics Research Vol. 25, No. 10, October 2006, pp. 953-983 DOI: 10.1177/0278364906068375 ©2006 SAGE Publications
953
954
THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006
(b) Car park E, NTU
(c) Open jungle
50
50
45
45
45
40
40
40
35
35
35
30
30
30
25
Y, unit: m
50
Y, unit: m
Y, unit:m
(a) Victoria Park
25
25
20
20
20
15
15
15
10
10
10
5
5
0 −50
−40
−30
−20
−10
0
10
20
X, unit:m
(d) Victoria Park
30
40
50
0 −30
5
−20
−10
0
10
X, unit: m
(e) Car park E, NTU
20
30
0 −30
−20
−10
0
10
20
30
X, unit: m
(f) Open jungle
Fig. 1. A comparison of environments where SLAM is performed. Sub-figure(a), (b) and (c) are the photos taken at Victoria Park, Sydney; car park E, NTU; and a cross-country environment. Sub-figure (d),(e),(f) are respectively the raw 2D range scans taken from these environments.
Fig. 2. The consecutive scans collected when the vehicle moves are segmented into a sequence of submaps. This procedure will be elaborated in Section 3.
Chen and Wang / Loop-Closing Detection
955
Fig. 3. Combining symbolic map topology with probabilistic Bayesian inference. (A)The topological graph built by segmenting the successive 2D range scans. For each new 2D scan, the conditional probability of observing this frame conditioned can be calculated. This probability model makes it possible to build Bayesian network (E) on the topological level.
dimensional measurement space, which is denoted as x−y−z. By conducting PCA, these measurement frames are projected to a low dimensional (now, 2D) space where it is more convenient to segment them. This space is called map space, denoted as x −y , see Figure 3(c). In the map space, projected measurements from the same submap are expected to gather within a compact cluster. For each of above clusters, its distribution is approximately Gaussian, as in Figure 3(d). Compared with the huge quantity of raw range data collected in a submap, the mean/variance representation can drastically reduce the computational complexity, but can nevertheless catch the information contained in the raw data. The second contribution of this paper is to utilize such an appearance model to bridge the gap between the topological map representation and the probabilistic Bayesian inference.
It is known that the map topology is inherently symbolic, which is difficult to infer in a numerical way. The above Gaussian distributions make it possible to build probabilistic observation models for the map topology. With these models, the popular metric level probabilistic Bayesian inference can be conveniently transplanted to the topological level, see Figure 3(e). Informed decisions can thus be made for loop-closing detection given a sequence of measurements. We demonstrate that, by combining these two techniques, the advantages of both are exploited: the presented algorithm is capable of performing Bayesian inference on a topological level, without any reliance on metric-level features. This characteristic enables the algorithm to detect loop-closing in a cross-country environment, where feature-extraction algorithms are prone to yield poor performance. Additionally, the
956
THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006
proposed algorithm uses the sensorial information that is outside the central SLAM estimation loop, so it will not utilize a potentially erroneous state (vehicle pose) to make a decision regarding the fusion of measurements. Therefore, even if the vehicle’s self-location provided by SLAM has a large error, the loop-closing detection algorithm can still work properly. This paper is organized as follows. The next section will give some background knowledge about the loop-closing detection; Section 3 will describe how the map topology is constructed, then in Section 4, we will set out the appearance modeling process for the map topology using 2D laser scans; then Section 5 will explain how to incorporate the appearancebased techniques into a Bayesian inference framework; finally, the results and performance analysis will be shown in Section 6.
2. Background Simultaneous localization and mapping (SLAM) algorithms try to build a model of the environment and concurrently localize the robot itself. From the fundamental work by Smith, Self, and Cheeseman (1988), and Leonnard, Durrant-Whyte, and Cox (1991), to recent convergence proof by Dissanayake et al. (2001), many SLAM algorithms addressing different SLAM issues have been developed. According to the widely referenced convergence proof presented by Dissanayake et al. (2001), when the robot revisits a place and propagates the error in the loop-closing back to all the components of the map, the map becomes more correlated. As this process iterates, the components inside the map will finally become fully-correlated, and thus the map converges. Given the role loop-closing plays, it is widely acknowledged that loop-closing is crucial to solving the SLAM problem. Before conducting loop-closing, the robot should first detect it. More specifically, the robot needs to associate its current observation with a certain part of the environment it mapped some time ago. As elaborated in Kuipers and Beeson (2002), the difficulties here lie in two aspects: perceptual aliasing, in which different places appear the same; and measurement variability, where the same place appears differently. When the algorithm cannot handle perceptual aliasing, it may take an unexplored place as somewhere already mapped and then give a false positive report. On the other hand, if the algorithm is too conservative, measurement variability would be difficult to deal with. The algorithm may then report false negative, i.e., the vehicle cannot detect the loop-closing although it has already been in the mapped place. Various schemes have been proposed to accurately detect loop-closing. Most of them fall into two classes. The first one is to exploit information from a single observation and then make a deterministic judgement. If one single frame of measurement is not yet sufficient, the second one is to accumulate information from a sequence of observations and then make a “batch” decision. A review of these two techniques follows.
2.1. Deterministic Methods A feature map is the most popular way to describe an environment and thus detect the loop-closing. In a feature map, the environment is modeled as a combination of certain geometrical patterns such as circles (Guivant and Nebot 2001), corners (Arras et al. 2003), lines (Jensfelt and Kristensen 2001) and more recently, polyline (Veeck and Burgard 2004). A more general corner detection technique has also been developed by Madhavan and Durrant-Whyte (2004), which can be regarded as a 1D version of the SIFT (Lowe 2004). For visual sensors, Zhou, Wei, and Tan (2003) developed the multi-dimensional histogram to represent the rich information within the observed image, such as colors, edges and textures. Lamon et al. (2001) introduced a low dimensional representation called an image fingerprint sequence for measuring the similarity between image frames. Similar comparison can also be applied to the image histogram, as proposed by Ulrich and Nourbakhsh (2000). For a map representation such as an occupancy grid (Elfes 1987; Konolige 1997), Gutmann and Konolige (1999) used correlation to detect possible matches between the robot’s current observation and the previously built map. Such a correlation-based technique was also used by Duckett and Nehmzow (2001), in whose research, the correlation was applied to the histogram of the occupancy grids. Similar to the grid map, point matching has also recently been used in SLAM to serve as the observation model in Lu and Milios (1997) and Thrun (2001). From the perspective of tracking, loop-closing can be detected when some already mapped features fall into the uncertainty gate of their predictions (Guivant and Nebot 2001), such techniques essentially calculate a weighted distance between the observed feature and the estimated one. Since the static features always have a fixed position relative to each other, Neira and Tardós (2001) proposed the joint compatibility test to exploit the inter-feature relations. 2.2. Non-Deterministic Methods If one frame of measurement is not sufficient to make a reasonable judgement, a straight forward alternative is to accumulate information over time. A sequence of measurements taken at different times can be analyzed in a “batch” manner to verify the loop-closing. A multiple-hypotheses-tracking based loopclosing detection algorithm was proposed by Tomatis, Nourbakhsh, Iand Siegwart (2002). The possible closing positions compete with each other, until two of them finally become dominant. These two represent the vehicle’s current position and the position in the already-built map. Markov localization proposed by Fox, Burgard, and Thrun (1998) is another implementation of above idea to localize a mobile robot. In contrast to Fox, who used only a single beam of laser range, Gutmann and Konolige (1999) used the corre-
Chen and Wang / Loop-Closing Detection lation between the map and a whole frame of measurement for the observation model. Topological Bayesian inference reduces the high dimensional robot pose space into topological nodes’ space, whose dimensionality is much lower. The combination of map topology and Bayesian inference is found in Kuipers and Beeson (2002), in which the authors used unsupervised learning to let the algorithm learn by itself how to map observations into different topological nodes. Unfortunately, with such a scheme it is not yet known how serious the perceptual aliasing problem could be. Recently, Modayil, Beeson, and Kuipers (2004) combined the dynamic Bayesian network with map topology to build a large scale map. In the Monte Carlo sampling scheme, the distribution of vehicle’s pose within the map is represented as a set of weighted particles (Doucet, de Freitas, and Gordon 2001). Thrun, Fox, and Burgard (2000) first introduced the Monte Carlo approach to the robot localization, and demonstrated attractive robustness and efficiency. Ranganathan and Dellaert (2004, 2005) further combined the Monte Carlo sampling with map topology under the Markov localization framework. By doing so, the correct map topology can be learned from the space of all possible topologies. Stewart et al. (2003) recently developed a hierarchical Bayesian approach for the revisiting problem, this approach divides the environment into a set of local map patches which are connected. A hidden Markov process is modeled to represent the transitions between these patches. However, it is not clear yet how well the priors over this map structure can be learned.
3. Building the Map Topology Segmenting the observed measurements (2D range frames) to different places is a crucial procedure in submap based SLAM (Kuipers et al. 2004; Beeson, Jong, and Kuipers 2005). From the topological perspective, such segmentation abstracts continuous sensory experience to a graph of atomic structures, and these structures constitute the basic components of a topological map. From the perspective of learning, just as with other appearance-based techniques, the loop-closing detection technique in this paper needs a supervised learning process, to teach itself how to distinguish measurement frames from different regions. To perform such supervised learning, the sequence of measurement frames are first required to be automatically labeled to distinct places (submaps). All of these labeled frames then form a pool of training samples, from which further classification rules are learned. An important property of the labeling process in this work is that it is carried out online and incrementally, so the map structure is to some extent encoded in the vehicle’s trajectory. If the incoming observations are only labeled according to the
957
similarities between them (Kuipers and Beeson 2002), e.g., using a clustering technique such as K-means (Duda, Hart, and Stork 2000), the map structure information acquired when the vehicle travels cannot be incorporated in this training process. For example, given two measurement frames similar but far from each other, a purely appearance-based classification technique will label them as from the same place. On the other hand, if the map is only segmented according to the map structure, e.g., the volume of the submap as in Guivant and Nebot (2001), observations within the same submap could be distinct. Consequently, the topological reasoning would be difficult to carry out. In this paper, the above two kinds of labeling strategies are integrated. Either the change of the environment’s structure, or the change of the exterior sensor observations, will divide the vehicle’s experience into disjoint segments, i.e., initialize a new submap. Here, intersections of the road are used to detect the shift between the environment’s structures. Such intersections are indicated by the change of the heading direction of the vehicle, because the vehicle is supposed to always navigate itself following the road. When its heading changes greatly, a reasonable assertion is that the vehicle has moved from one place to another. Appearance-based segmentation is not trivial because of the measurement variability. The desired algorithm must capture the major structure of the input range scan, which is often encoded in the low frequency domain. It should also be non-sensitive to the local distracters which are in the high frequency domain. In this paper, wavelet is employed to remove those high frequency details and preserve the structural information of the range scan. Wavelet is a well-understood technique for information compression and noise removal, here we apply the 3 level db1 wavelet to each frame of 2D range scan, and then a vector whose length is only 1/8 of the original measurement is acquired, as in Figure 4. By comparing the Euclidean distances between these vectors, the shifts between submaps can be detected. Interested readers can turn to Daubechies (2002) for more details. Readers may be confused to see that two dimensionality reduction techniques, PCA and wavelet, are both used here. However, please note that the above wavelet segmentation does not have any recognition capability. It is only employed to detect “new” regions, which results in the segmentation of the map into a topological graph. Whether the detected new region has already been mapped or not can only be answered by going through all the previously acquired information. This recognition process will be elaborated in the following section. The segmentation strategy is tested in a large cross-country jungle environment, as detailed in Section 6. During the trial, 19 053 frames of 2D scans were collected; the total length of the trajectory was over 3500 meters. A reference map was built to illustrate the shape of the environment, as shown in Figure 5. As indicated, there is a loop in this trajectory.
958
THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006
100
Raw data 50
0
0
50
100
150
200
250
300
350
400
1000
Level 1 500
0
0
20
40
60
80
100
120
140
160
180
200
20
30
40
50
60
70
80
90
100
10
15
20
25
30
35
40
45
50
600
Level 2
400 200 0
0
10
400
Level 3
200
0
0
5
Fig. 4. The db1 wavelet used in this work. Because a 3 level wavelet is employed, the raw data can be finally compressed to a vector whose size is only 1/8 of the original size. It can be seen that the high frequency noise is mostly removed in the compressed result.
The whole map is finally segmented into 35 submaps, based on both the heading directions of the vehicle, and the similarities between continuous measurements. This online segmentation’s result is plotted in Figures 6 and 7. The solid dots represent the changes of the vehicle’s heading, ranging from −π to π . The thin curve is the change of the sensor measurements; the peaks of this curve represent large dissimilarities between successive observations, which indicate the possible transitions from one submap to another. How the submaps are segmented can be easily observed from this figure: for example, submap No. 25 is initialized when both the vehicle’s heading and the range observations change vastly; for submap No. 19, the heading direction of the vehicle does not change much, but the wavelet checking reports a high dissimilarity.
Although the above submap representation makes it convenient to build an observational model for the map topology, it is difficult to encode the vehicle’s metric level motion into the Bayesian inference. For example, a submap can be sufficiently long that, given a reading from inertial sensors, it is impossible to predict whether the vehicle is still inside this submap or has moved outside of it. In this paper, the topological node is defined as a segment of a submap which is at a fixed resolution. By dividing the submap into topological nodes, it will be much more convenient to incorporate the vehicle’s inertial sensor measurements into the Bayesian framework. For example, in this paper, the topological node’s length is set to be 10 meters, so if the vehicle is reported to have moved 12.7 meters, it can be predicted that the vehicle has probably moved to a certain nearby node.
Chen and Wang / Loop-Closing Detection
959
400
Trajectory ended here 350
300
North, unit: m
250
200
Loop closing here
150
Trajectory started here 100
50
0
−50
0
100
200
300
400
500
600
700
800
East, unit: m
Fig. 5. The environment where the trial was conducted. This map was built by rendering the 2D laser scans to the vehicle poses read from GPS/INS, which can be regarded as ground truth. Please note that this map is only used for reference and illustration purpose. It dose not provide any information to the loop-closing detection algorithm.
change of heading change of appearance 4
←1
3
←10
2
←25
1 0
←13
−1
←28 ←31
←16
←34
−2
←7 −3 −4
←19
←4 0
2000
←22 4000
6000
8000
10000
12000
14000
16000
measurement ID
Fig. 6. The incoming measurements are segmented according to either the similarities between them or the changes of vehicle’s heading directions. In fact, the dissimilarities computed based on wavelet are much larger. To display them in a single figure with heading’s changes, the dissimilarities are all scaled smaller. For clarity, for every three submaps’ IDs, only one is marked.
960
THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006 400
14
350
15 16
34
North, unit: m
300
35
250
11 12
32 1
200
2 7 45
150
19
10 3
17 18
13
33
89 31
6
20 21 22
30
100
29
0 −50
23
28
50
0
100
200
300
27
24 26 25
400
500
East, unit:m
600
700
800
Fig. 7. The submaps are represented as rectangles, the size of the rectangle is determined by the length and max width of the road boundary.
4. Appearance Model for Topological Map 4.1. Advantages of Using Appearance Model In contrast to those techniques based on a geometrical model (Guivant and Nebot 2001; Lamon et al. 2001; Duckett and Nehmzow 2001) which try to register the sensor measurements to a model of the environment, appearance-based techniques (Krose et al. 2001; Crowley, Wallner, and Schiele 1998; Porta, Verbeek, and Krose 2005) are not designed to capture the relations between observations and the map geometry. Instead, they directly build the environment’s representation in the sensor space, i.e., the space spanned by the sensor values themselves. For example, an appearance-based approach can learn what makes the observations in a corridor different from those in a square room, but it does not necessarily distinguish the doors or walls. From the perspective of feature extraction, the “features” extracted by appearance-based techniques could be uninformative to human eyes. The advantage of appearance-based techniques is that the a priori knowledge required to model the map, such as the definition of walls and corners, is no longer necessary. In other words, appearancebased techniques can work without the conventional “feature extraction” routing; this is highly beneficial when the vehicle navigates in cross-country environments where features are difficult to extract. Principal component analysis (PCA) is a widely used tool to handle the high dimensional measurements space. A PCA-
based recognition/localization algorithm was originally introduced in the computer vision community by Turk and Pentland (1991). Thereafter, PCA and its derivatives have achieved tremendous popularity in the pattern recognition domain, and have been successfully applied to various artificial intelligence applications, e.g., face recognition (Yang et al. 2004), object detection (Ali and Shah 2005) and bio-informatics (Yeung 2001). Also using PCA,Vlassis and Krose (1999) proposed a robot localization algorithm which used appearance information to localize a mobile robot in the indoor environment. A similar implementation for 2D range data was developed by Crowley, Wallner, and Schiele (1998), in which synthetic range scans were calculated and used to train the appearance model. To the authors’ knowledge, this is the first time that 2D range data are used for appearance-based mobile robot localization. Although innovative, this approach is essentially not completely appearance-based because it still relies on a composite range map to generate synthetic scans, and to build such a map could be quite challenging in the outdoor environment. A known problem of the original PCA is that it could be timeconsuming to build the eigenspace as the robot travels; Artac, Jogan, and Leonardis (2002) employed an incremental approach to conduct PCA for image data. The appearance-based solution most similar to ours is the one proposed by Krose et al. (2001), in which a sophisticated algorithm is presented to calculate the probability of observing a certain scene given a robot pose. This approach,
Chen and Wang / Loop-Closing Detection which used a panorama camera, showed attractive localization results in an office environment. However, we believe its performance could be even better if the topological structure of the environment is incorporated in the localization process, as presented in this paper. Using a laser scanner’s range data for PCA can outperform image-based PCA in two ways. First, the laser scanner’s measurement frame is much smaller than the image frame. A typical 2D scan contains 361 data measurements, however, a typical 320 × 240 gray image is of the size 76,800 pixels. Second, images captured from cameras are prone to being affected by illumination conditions; PCA may give false results under variable light. Range data, on the other hand, are not sensitive to the illumination conditions. 4.2. Eigen-Representation Suppose that at time t, the labeling approach discussed in previous subsection has segmented m submaps, say, S1 , S2 , . . . , Sm ; each of them comprises a series of measurements, si,1 , si,2 , . . . , si,n(i) , where i = 1, . . . , m. Each measurement is a typical laser scan including 361 range measurements (scalar) for each angle at a 0.5 degree resolution (this may be different for other operational modes of the sensor). It can be regarded as a vector of dimension 361, or, equivalently, a point in a 361-dimensional space. One of the advantages of the appearance modeling is that it can automatically handle the out-of-range data, for which the laser beams are not reflected by any object. These kinds of readings are often encoded as a special value by the sensor. Appearance modeling processes these values without any discriminations and let the data manifest the low-dimensional manifold by themselves. Frames of each submap will not be randomly distributed in this huge measurement space and thus can be described by a relatively low dimensional subspace. PCA can find the vectors that best account for the distribution of frames within the entire measurement space. These vectors define a subspace of measurement space. Each vector is of length 361, and is a linear combination of the original measurements. At time t, the average frame of the whole measurement set is computed by: m n(i) T i=1 j =1 si,j t = ψt1 ψt2 · · · ψt361 . (1) = m i=1 n(i) As the vehicle moves and new range scans are observed, this average frame evolves over time, as in Figure 8. For the j th frame of the ith submap, it differs from this average by a vector: 1 T 2 361 φi,j · · · φi,j . φi,j = si,j − t = φi,j Then all the φi,j from all the submaps are subjected to PCA, which seeks a set of normal vectors that can best describe the distribution of the data.
961
In most cases, only the dominant part of the distribution is necessary, the details can be ignored (Turk and Pentland 1991). So the normal vector which describes the distribution of the data can be much smaller than the original measurement data. In this implementation, the first λ eigenvectors which correspond to the biggest λ eigenvalues are chosen. These eigenvectors are symbolized as u1t , u2t , u3t , ..., uλt , where: T k,1 ukt = uk,2 · · · uk,361 ut k = 1, 2, . . . , λ . (2) t t These eigenvectors define a space with dimensionality λ, which represents the most predominant information about the measurements. In this paper, all the map modeling is conducted in this space; for convenience, it is referred to as map space. The first four are shown in Figure 9. These eigenvectors can catch the statistical features of the measurements: e.g., in the first sub-figure, the points in lower part are much denser than the ones in upper part, because this is a common property shared by all frames of measurements (due to the fixed angular resolution of 2D laser scanner); and these 4 sub-figures have the basic shape of a road, because for most of the time, the vehicle traveled in a road-like environment. When a new measurement sx is available, it is projected into the map-space by a simple operation: wxk = (ukt )T (sx − t )
(3)
where k = 1, 2, . . . , λ. This describes a set of point-by-point multiplications and summations. These weights (scalar) form a low dimensional vector which can be used to represent this measurement frame sx : T Wx = wx1 wx2 · · · wxλ . (4) The vector Wx essentially describes the contribution of each eigenvector in representing the input measurement frame sx , by treating these eigenvectors as a basis set for measured frames. This low dimensional vector is the core of our appearance-model. It provides SLAM with a convenient tool to represent measurement frames, as well as local environment (by averaging all the measurement frames inside). For convenience, in this context, such a projection of a measurement frame in the eigenspace is called eigenframe. As can be observed, this modeling process is completely independent of any metric features or landmarks. 4.3. Computing the Probabilistic Observational Models A probabilistic model in the mapspace is necessary for a Bayesian inference process. Given a certain submap Si , where i ∈ {1, . . . , m}, we can project all measurement frames within this submap into the map space using (3). Then there will be n(i) vectors at the length of λ, here they are denoted as: Wi1 , Wi2 , . . . , Win(i) .
(5)
962
THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006 The average till 1301 80
70
70
60
60
50
50
North, unit:m
North, unit:m
The average till 1001 80
40
30
40
30
20
20
10
10
0
0
−10 −70
−60
−50
−40
−30
−20
−10
0
10
−10 −80
20
−60
−40
East, unit: m
(a) No. 1 - No. 1001
70
70
60
60
50
50
40
30
30
20
10
10
0
0
−20
40
40
20
−40
20
The average till 2001 80
North, unit:m
North, unit:m
The average till 1601
−60
0
(b) No. 1 - No. 1301
80
−10 −80
−20
East, unit: m
0
20
40
East, unit: m
−10 −80
−60
−40
−20
0
20
40
60
East, unit: m
(c) No. 1 - No. 1601
(d) No. 1 - No. 2001
Fig. 8. The average of all the collected measurements. It can be seen that when the vehicle moves forward, the average of the training pool evolves accordingly.
The center of this cluster corresponds to a vector which can best describe this submap: 1 j W¯ i = W . n(i) j =1 i n(i)
(6)
This center’s estimate comes with a variance, which is computed by: σi =
1 n(i) − 1
n(i) j =1
W¯ i − Wij 2
(7)
where σi can also be regarded as the trace of a diagonal covariance matrix which shows how these n(i) points are distributed in the map space. Here this distribution is approximated as Gaussian: Wi ∼ N [W¯ i , (σi )2 ].
(8)
Given an incoming measurement zt at time t, after projecting it into the map space using (3), we could obtain its eigenframe Wt . Let Yti represent the fact that the vehicle is within submap Si at time t; the probability of observing zt
Chen and Wang / Loop-Closing Detection Eigenvector corresponds to the 1st biggest Eigenvalue
Eigenvector corresponds to the 2 nd biggest Eigenvalue
40
80
70
30
60
25
50
20
40
y :unit:m
y :unit:m
35
15
30
10
20
5
10
0
0
−5 −12
−10
−8
−6
−4
−2
0
2
4
6
−10 −40
8
−20
0
20
x : unit: m
70
70
60
60
50
50
40
40
30
30
20
20
10
10
0
0
−10
0
80
Eigenvector corresponds to the 4 th biggest Eigenvalue 80
y :unit:m
y :unit:m
Eigenvector corresponds to the 3 th biggest Eigenvalue
−20
60
(b) 2nd
80
−30
40
x : unit: m
(a) 1st
−10 −40
963
10
20
30
40
x : unit: m
−10 −60
−40
−20
0
20
40
60
80
x : unit: m
(c) 3rd
(d) 4th
Fig. 9. The eigenvectors corresponding to the biggest 4 eigenvalues. Please note that these figures are expressive: they catch the basic shape of the measurements acquired in the testing field. A problem of displaying these eigenvectors is that they are not supposed to be at the scale of the original measurement frames, so the displayed is the result after normalization.
5. Appearance-Based Bayesian Inference on a Topological Level
conditioned on Yti can therefore be calculated as: 1 W¯ − W 2 t i . p(zt |Yti ) = exp − 2 (σi )2
(9)
This formulation is of great importance in vehicle localization. It provides a probabilistic way to model the connection between the 2D scanner’s observations and the places (nodes) in the topological map, without any knowledge about features or landmarks.
Because of the existence of perceptual aliasing and measurement variability, the probability in (9) alone is not sufficient to detect loop-closing. Therefore, to improve the robustness of the loop-closing detection, the Bayesian inference process was introduced so that we could fuse information acquired from a sequence of observations. In most of the cases, matching a sequence of measurements with the previously built map is an exhausting task, because the number of possible solutions could be exponentially high.
964
THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006
The Markov assumption offers an effective way to bypass the above matching problem. It assumes that only the one-step-previous action/states can affect the vehicle’s current state. This essentially divides the whole expensive matching problem into a chain of small matching problems. The correct candidate is expected to survive all the individual testings, while those false ones are not. Bayes Law then provides an efficient way to connect these individual testings in a probabilistic manner, so that the matching results can be propagated through the whole Markov chain to the end. 5.1. Topological Bayesian Inference From the perspective of loop-closing detection, the task of Bayesian inference is to localize the vehicle’s current position within its previously built map (the topological network). A probability will be assigned to each topological node T x ; the goal of the loop-closing is then to find which topological node T has the highest probability that the vehicle is currently within it: T = arg max p(Ttx |Zt , Ut−1 , SHt−1 ) x
x = 1, 2, ..., n (10)
where n is the total number of the topological nodes, Zt is the whole set of observations till time t, and SHt−1 and Ut−1 are respectively the set of detected shifts between submaps and transitions between topological nodes till time t − 1. At each time instance t, Bayesian inference calculates the vehicle’s position distribution over the topological node space, p(Tt |Zt , Ut−1 , SHt−1 ). To be consistent with Section 4.2, here m is used to denote the total number of submaps, so this probability can be further marginalized over submaps: p(Tt |Zt , Ut−1 , SHt−1 ) =
m
p(Tt |Ytj , Zt , Ut−1 , SHt−1 )p
j =1
(Y |Zt , Ut−1 , SHt−1 ). j t
(11)
Since each topological node is inside a definite submap, the conditional probability of p(Tt |Ytj ) can be calculated as: p(Tt |Ytj ) =
p(Tt ) if Tt ∈ Y 0 otherwise. j t
(12)
This probability can be expressed by a function ν(Tt , Ytj ) which takes the value 1 when Tt ∈ Ytj and 0 in other cases. Therefore, eq. (11) can be re-written as: p(Tt |Zt , Ut−1 , SHt−1 ) =
m
ν(Tt , Ytj )p(Tt |Zt , Ut−1 , SHt−1 )
j =1
p(Ytj |Zt , Ut−1 , SHt−1 ).
(13)
The second item on the right side of (13) is the estimation for the vehicle’s state in the topological node space. By applying Bayes Rule and assuming that the estimation problem is Markovian, it can be calculated as: p(Tt |Zt , Ut−1 , SHt−1 ) = p(Tt |zt , Zt−1 , Ut−1 , SHt−1 ) = p(zt |Tt , Zt−1 , ut−1 , sht−1 )p(Tt |Zt−1 , ut−1 , sht−1 ).
(14)
Since the observation will not be affected by the vehicle motion and previous observations, theZt−1 , ut−1 and sht−1 in the first item on the right side should be omitted. We may further notice that, the topological nodes do not have any appearance characteristics, therefore the observation zt is actually independent of the topological node and p(zt |Tt ) can be regarded as a constant: p(zt |Tt , Zt−1 , ut−1 , sht−1 ) = p(zt |Tt ) = c
(15)
The second item on the right hand in (14) is for calculating the prior on the likelihood of topological nodes. As explained in Section 3, the topological nodes only model the vehicle’s motions and do not encode any appearance information. Therefore, they are labeled continuously and the transitions on the topological level are completely independent of the shifts on the submap level. Then we can drop the item sht−1 , and there will be: p(Tt |Zt−1 , ut−1 , sht−1 ) = p(Tt |Zt−1 , ut−1 ) n i i = p(Tt |Tt−1 , ut−1 )p(Tt−1 |Zt−1 )
(16)
i=1 i where p(Tt |Tt−1 , ut−1 ) is the transitional model in the topologi |Zt−1 ) is the state of the topological ical node space, and p(Tt−1 node in previous step. The second item on the right side of (11) is the estimation for the vehicle’s state in the submap space. As with the estimation for the topological node in (14), this likelihood can be computed using Bayesian inference based on a Markovian assumption:
p(Ytj |Zt , Ut−1 , SHt−1 ) = p(zt |Ytj , Zt−1 , ut−1 , sht−1 ) p(Ytj |Zt−1 , ut−1 , sht−1 )
(17)
where the first item can be simplified into p(zt |Ytj ) because the observations are apparently independent of the vehicle’s movement and previous observations. This probability represents exactly the observational model we constructed in Section 4. The second item p(Ytj |Zt−1 , ut−1 , sht−1 ) is the prior probability of the vehicle’s state in the submap space. Given the fact that the motion between submaps is independent of the motion between topological nodes, we can drop the item ut−1
Chen and Wang / Loop-Closing Detection and then this probability is calculated as: p(Ytj |Zt−1 , ut−1 , sht−1 ) =
m
k k p(Ytj |Yt−1 , sht−1 )p(Yt−1 |Zt−1 ).
k=1
(18) All above inferences can be illustrated by the Bayesian inference network in Figure 10: 5.2. Motion Models in Submap Space and Node Space It can be seen that, in contrast toconventional Markov localization (Fox, Burgard, and Thrun 1998), here two kinds of transitional models (also referred as motion model) exist: transition among submaps, and transition among topological nodes. The transition between topological nodes is formulated as: p(T i |T j , ut−1 )
(19)
where i and j are the IDs of two certain topological nodes. This probability is used to model the predicted motion read from inertial sensors. Given an odometry input vt−1 , the number of topological nodes that the vehicle has traveled since the last time instance can be computed as: vt−1 ut−1 = (20) φnode where φnode is the fixed size of the topological node. This equation reveals the advantage of introducing the topological node level in the map representation hierarchy: by doing so, the continuous dead-reckoning process becomes discrete and therefore can be integrated with other discrete variables in the Bayesian inference process. Obviously ut−1 is not so accurate an estimate because of the round operator [·], the transitional probability can thus be finally formulated as:
p(T |T , ut−1 ) = exp i
j
ut−1 − i − j σt2
2 (21)
where σt is a manually set parameter to model the confidence we have when we calculate the number of topological nodes from odometry. For instance, if the vehicle has traveled 50 meters, and the topological node’s size φnode is 10 meters. Equation (20) will show that the vehicle has traveled 5 nodes, while actually it could be only 4 nodes. There is an ambiguity here and σt is employed to model such uncertainty. Basically, the above equation gives bias toward non-loopclosing. It assumes that if one loop-closing has happened, in the following a few steps, the vehicle’s trajectory should be consecutive in terms of both time and geography, or in other words, another loop-closing is not so likely to happen again. We recognize that this may be quite a strong assumption, especially in the indoor environment which is quite compact and
965
complicated. However, as we noticed in the experiment, in the outdoor jungle environment where the proposed algorithm is designed for, the environment is quite sparse and loop-closing does not happen so frequently. On the other hand, Removing this component is equivalent to assuming that the motion of the vehicle could be “random” as a result of the potential loopclosing, and therefore the valuable odometry information may not be exploited in the inference process. The transitional probability from submap i to submap j is denoted as: p(Y i |Y j , sht−1 )
(22)
where sht−1 represents the report from submap segmentation routing. Apparently, if a submap shift is detected, there could be two possible explanations: first, the detection is correct, the vehicle has moved to the next submap, with a probability γ ; or the detection is a false alarm, the vehicle is still in the current submap, with a probability 1 − γ . The transitional probability can be calculated as follows: i = j+1 γ 1 − γ i=j p(Y i |Y j , sht−1 ) = 0 else Please note that in the above equation, the situation could exist in which Y i is the first submap and Y j is the last one, here i = j + 1 still stands.
6. Experimental Results 6.1. Platform Experiments were carried out to test the performances of the appearance-based topological Bayesian inference. The platform for the experiment is a tracked vehicle. For testing purposes, the sensors were also mounted on a pickup (Figure 11) in the same layout as they were mounted on the tracked vehicle. More details about the experiments can be found in Ng et al. (2004). Since the test platform is a tracked vehicle, its motion cannot be read through dead-reckoning. As elaborated in previous sections, the proposed technique utilizes only the local vehicle transformations, i.e., translation and steering. These data can be conveniently calculated from the measurements of IMU. Although its error will accumulate over time, the commercial IMU is already enough for the segmentation purpose. It must be noted that, the GPS data used in this paper are only for reference and result analysis. During the experiments, no GPS/INS information is involved. 6.2. Testing Environment The testing field is a square-like environment in a crosscountry jungle. A photo of the testing field is shown in Figure 1. In the experiment, the vehicle traveled a trajectory of
966
THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006
Fig. 10. The Bayesian inference network in our topological Bayesian inference.
tially throughout the map, rather than within each individual submap. This is consistent with the Bayesian inference network in Figure 10, which shows that the transitions in the topological nodes’ space is independent of the submap space. 6.3. The λ
Fig. 11. The pickup used to simulate the layout of the sensors.
about 700 meters, and collected more than 4000 frames of range measurements. Because of some mechanical problems, the vehicle stopped for a short while during the trial, so we manually “truncate” those data frames which correspond to the duration when vehicle stopped. In this environment, the map is segmented into 12 submaps. For readers’ convenience, we only plot the first 8 of them which correspond to the first loop (see Figure 13). The topological nodes are labeled accordingly in Figure 14. As can be observed, the topological nodes are indexed sequen-
To explore the eigenvalue spectrum, in Figure 15 we plot the percentages which represent how much variance the first n eigenvectors account for. This figure shows that, although we have a large quantity of data, the first few of them are already sufficient to describe them in the eigenspace. In this paper, the first 40 of them are used to represent the environment information, or mathematically, λ = 40. As demonstrated, these 40 eigenvectors can provide about 90 percent of the total observed information. To examine how λ in (4) can affect the performance of the Bayesian inference, in Figure 16, we plot the estimation results under different λ. This figure shows that the estimation error is big when the λ is either too small (e.g., 3 or 5) or too big (150 or 300). This agrees with appearance model’s theory: a too small λ means that only very little of the observed information is used in the calculation, in other words, a lot of valuable knowledge is ignored in the PCA process. Bayesian inference then gives poor estimations with insufficient information. In contrast, a large λ results in the incorporation of a lot of redundant information, such as noise and small distracters. Such redundant information can easily confuse the Bayesian inference and then lead to a false loop-closing report.
Chen and Wang / Loop-Closing Detection
967
180
160
140
North, unit: m
120
100
80
60
40
20
0
−20 −20
0
20
40
60
80
100
120
140
160
180
East, unit: m Fig. 12. Map of the testing environment. As in the map in Figure 5, this map is built from GPS/INS data and is for reference use only.
200 M6 M5
M7 150
M4
North, unit:m
←submap:13
100
50 M8 0 M3
M1 M2 −50 −50
0
50 100 East, unit:m
150
200
Fig. 13. The environment is divided into a set of submaps. Each submap has its own coordinate system, which is indicated by two arrows. The size of the submap is represented by a dashed rectangle.
968
THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006
28 160
25 31
140
22
North, unit: m
120 100 80
19
60 40 20
16
1
0
13
4
−20
7
−40 −50
0
50
100
10
150
200
East, unit: m
Normalized weight of each corresponding eigenvalue
Fig. 14. Topological nodes are obtained by further segmenting the submaps in a fixed resolution. All the topological nodes are indexed sequentially, because of limited space, we plot only one index for each three topological nodes.
1
0.8
0.6
0.4
0.2
0
0
50
100
150
200
250
300
350
400
Sorted eigenvector
Fig. 15. The X-axis represents the sorted ID of eigenvectors, the Y-axis represents the normalized accumulated eigenvalues up to the corresponding ID.
Chen and Wang / Loop-Closing Detection
969
Error for different λ 10 ground truth λ=3 λ=5 λ=60 λ = 150 λ= 300
8
6
Initialization region
Topolocal node
4
2
0
−2
−4
GPS ground truth −6
−8
−10
2
4
6
8
10
12
14
16
18
Estimations Fig. 16. The error between the estimation and GPS ground truth. The X-axis is the ground truth; coordinates in Y-axis represent the distances from the estimation to the vehicle’s real position, the units are topological nodes. For example, the coordinate (x = 5, y = 2) means at the fifth estimation, the error is 2 topological nodes away from the topological node in which the vehicle actually is.
In short, we can regard the appearance modeling as a process to collect only the information useful for distinguishing different places, and remove those which are useless. Too small a λ over-removes the useless information which damages the useful information, while too large a λ under-removes the useless information and gives false “features”. Although such over-remove and under-remove situations exist, how to choose the λ is actually not a problem. According to our experiments, for λ ranging from 20 to 80, the inference can all give satisfactory results.
Two measurement frames from submap No. 1 and No. 2 are depicted in Figure 19. Their corresponding eigenframes are shown in Figure 20. These two frames are distinct, which agrees with the fact that they are from different submaps. By comparing them with Figures 18(a) and (b), the highly similarities between submap No. 1’s eigen-representation and eigenframe No. 3161 can be observed. Such similarity is also found between submap No. 2 and No. 3351. These facts proove the validity of detecting loop-closing in the mapspace. 6.5. Loop-Closing Detection Results
6.4. From Euclidean Space to Mapspace The map modeling in eigenspace can catch most of the properties of the environment in the Euclidean space. In Figure 17, submap No. 1 and No. 2 are depicted in the global coordinates; it can be observed that these two submaps are very different, as a result, their representations in the eigen-space also demonstrate great dissimilarity, as in Figure 18.
We calculate the observation probability for each measurement frame conditioned on each submap, the results are shown in Figure 21. The flat lines in the center correspond to the time when vehicle stopped (the senor was still working). It can be noticed that there are at least 4 submaps which have similar probabilities at the point of loop-closing (near measurement No. 3000). In this case, the submap No. 1, which has the
THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006
180
180
160
160
140
140
120
120
100
100
North, unit: m
North, unit: m
970
80 60 40
80 60 40
20
20
0
0
−20
−20
submap No.1 −40
submap No.2
−40 −40
−20
0
20
40
60
80
100
120
140
160
180
−40
−20
0
20
40
East, unit: m
60
80
100
120
140
160
180
East, unit: m
(a) Submap No. 1
(b) Submap No. 2
500
500
400
400
300
300
200
200
100
100
Value
Value
Fig. 17. The two different submaps in the global coordinates.
0
0
−100
−100
−200
−200
−300
−300
−400
−400
−500
2
4
6
8
10
Item ID
(a) For submap No. 1
12
14
−500
2
4
6
8
10
12
14
Item ID
(b) For submap No. 2
Fig. 18. The averaged eigenframes used to represent the two submaps in Figure 17. Since the submaps’ appearances are completely different, their representations using eigenframes are also distinct.
Chen and Wang / Loop-Closing Detection
Frame3351
50
50
45
45
40
40
35
35
30
30
y :unit:m
y :unit:m
Frame3161
25
25
20
20
15
15
10
10
5
5
0 −25
−20
−15
−10
−5
0
971
5
10
15
20
0 −25
25
−20
−15
−10
−5
x : unit: m
0
5
10
15
20
25
x : unit: m
2D scan No. 3161
2D scan No. 3351
Fig. 19. These two frames are from different submaps.
EigenFrame 3351 500
400
400
300
300
200
200
100
100
Value
Value
EigenFrame No.3161 500
0
0
−100
−100
−200
−200
−300
−300
−400
−400
−500
0
5
10
15
20
25
Item ID
Eigenframe No. 3161
30
35
40
−500
0
5
10
15
20
25
30
35
40
Item ID
Eigenframe No. 3351
Fig. 20. The eigenframes computed for the two range scans in Figure 19. These two eigenframes are significantly different. By comparing them with Figure 18(a) and (b), respectively, it can be seen that a measurement and the submap from which it is observed are close in the eigenspace. This property, along with the dissimilarities showed previously, demonstrates the validity of conducting loop-closing detection in the eigenspace.
972
THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006
p(z|submap) 1 1 2 3 4 5 6 7 8
Re−visiting submap 1
Vehicle stopped here for a while
each verticle line represents the initialization of a new submap
1000
1500
2000
2500 3000 measurement ID
3500
Fig. 21. The observational probability for measurement z conditioned on different submaps. For the readers’ convenience, only the first 8 submaps’ curves are plotted. These probabilities are all high because they are not yet normalized. Please note that, no GPS or other positioning sensors’ measurements are used to acquire this result.
highest probability, is the correct result. However, it is still dangerous to simply use a Nearest Neighborhood criterion to detect the loop-closing. When possible loop-closings are detected, the topological Bayesian inference procedure is initialized to confirm it. The results of the Bayesian inference are depicted in the Appendix (Figures 24 to 35). In each figure, the probability distribution over the topological space is plotted. The corresponding geometric information about each topological node can be found in Figure 14. The vehicle’s actual position read from GPS is marked by an arrow. The proposed algorithm’s performance can therefore be observed by comparing the node with the highest probability with the indicated one. Given that all the sequence of measurements verify the loop-closing hypothesis, a batch decision can thus be made that the vehicle has revisited submap No. 1 at the measurement ID 3000.
correspondingly, 300 meters, the initialization costs less than 10 sec. Since the loop-closing detection algorithm only builds the topological level environment model, it is not required to be embedded into the local boundary map building. So the loop-closing detection and Markov localization can be run on an individual computer. Additionally, PCA is conducted only when a new submap is initialized, as noticed in Figures 12 and 5, such shifts do not happen very often. According to our trials, submap shift is detected on average every 1 minute. As a result, a 10 second lag is still affordable for the algorithms. After the projection from measurement space to the map space is computed, the eigenframe can be computed in constant time (less than 20 ms) for each incoming measurement. Even the submap number scales with travelled distance, in most current SLAM literatures, this number will not surpass 100, and the Bayesian inference can still be run in realtime.
6.6. Computational Efficiency
7. Conclusion and Discussion
We also computed the time requirement of conducting PCA, more specifically, the parameters of the projection from measurement space to map space. The algorithms are implemented using Matlab, and run on a computer with one Pentium IV 2.0 GHz processor. We found that within 2000 frames, or
This paper presents an innovative approach for detecting the loop-closing for SLAM in a highly unstructured cross-country environment where no geometrical landmarks are available. We elaborate how to use linear dimensionality reduction technique, i.e., PCA, to model the environment’s appearance.
Chen and Wang / Loop-Closing Detection
973
Time requirement in initializing each submap 30
25
Time unit: sec
20
15
10
5
0 500
1000
1500
2000
2500
3000
3500
4000
measurement ID
Fig. 22. The time requirement when a submap is initialized. As the measurement frames accumulate, the algorithm requires more and more time to calculate the mapping parameters from measurement space to the map space.
submap 2
200 100 0 −100 −200 −300
submap 1
−400 −500 −600 400
submap 3 200
2000 0
1500 −200
1000 −400
500 −600
0
Fig. 23. The 3-dimensional illustrative map space. Three submaps are plotted here: triangles, crosses and circles represent submaps 1, 2 and 3 respectively. These three submaps’ geometrical information can be found in Figures 5 and 7. As can be seen, PCA sometimes cannot properly handle the nonlinearity of the manifold.
974
THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006
After the high dimensional measurements are projected into a low dimensional manifold, their distributions are approximated by a series of Gaussian models, the major contribution of this paper is to use these Gaussian models to calculate the observation probability for Bayesian inference. By doing so, the appearance model of the environment could be integrated into the topological Bayesian inference process, so that the metrical level feature information is no longer required. The experimental results demonstrate that the proposed technique can robustly detect large scale loop-closing in a cross-country environment. Another desirable characteristic of the proposed algorithm is that, in contrast to the conventional loop-closing detection techniques which rely on feature tracking, in our inference framework no vehicle pose estimation is necessary. In other words, the loop-closing detection in this work is out of the central SLAM estimation loop. So even when the vehicle’s localization error is huge, the detection algorithm can nevertheless work properly. This is especially important for a tracked vehicle moving through the cross-country environment, because in this case, the error of the vehicle’s self-localization could become quite large after even a short distance. With respect to such a large error, it is impossible to use any “gating” mechanism to find whether the vehicle has re-visited a certain place. Although the presented algorithm has several advantages, it nevertheless has its own limitations. Currently, we can see that the presented algorithm could be further improved in following two aspects. 7.1. Handling Nonlinear Manifold PCA and the linear discriminant analysis (LDA) assume that the extracted features are linear functions of the input data. However, such linearity assumption may sometimes not lead to good results. Such a nonlinear manifold is illustrated in Figure 23. As can be seen, the distribution of projected input data from three submaps cannot be linearly segmented in the 3D space. This limitation of PCA has been widely discussed in the pattern recognition domain. Recent research has found that the observations are often controlled by a small number of factors such as the view angle. Such a relationship, even though
nonlinear globally, is often smooth and approximately linear in a local region. More powerful algorithms have been employed to exploit such local linearity, e.g., LLE (Roweis and Saul 2000) and ISOMAP (Tenenbaum, de Silva, and Langford 2000). Recently, innovative applications of nonlinear dimensionality reduction techniques in mobile robotics have been individually developed by Kumar, Guivant, and DurrantWhyte (2004) and Kumar et al. (2005). Here the authors argue that, integrating more powerful dimensionality reduction tools in the appearance-based topological Bayesian inference framework is a promising research direction and deserves further investigations. 7.2. Handling Viewpoint Variance As elaborated in previous sections, appearance-based loopclosing detection essentially combines two steps. The first one is a supervised learning process which teaches the algorithm how to distinguish different places. In the second step, it uses the learned knowledge to classify the new observed data. It is then important to understand that the appearance-based approaches completely rely on the given samples to understand the environment. If there are no samples from a certain aspect of the environment, the appearance-based technique cannot learn from them. Consequently, it cannot recognize such a place in the future, even the vehicle has already been there with a different pose. In short, such techniques are originally not supposed to be viewpoint-invariant. However, as observed in Figure 13, when the vehicle visits and re-visits submaps No. 1,2, . . . ,5, its trajectories are not exactly same. The proposed algorithm can still work properly in this situation, which demonstrates the appearance model’s capability to handle moderate viewpoint invariance. Meanwhile, it can also be noticed that, in a cyclic environment as in Figure 13, because of the road constraint and the fact that the vehicle is non-holonomic, for a fixed sensor, its viewpoint cannot change vastly during the revisiting unless the vehicle goes through a different trajectory. So in a cyclic environment, the viewpoint is often naturally constrained.
Appendix See Figures 24–35.
Chen and Wang / Loop-Closing Detection
0.0302
975
0.0305
0.03 0.03 0.0298
GPS 0.0296
0.0295
Prob
Prob
Estimation
Estimation
GPS 0.0294
0.029
0.0292
0.029
0.0285
0.0288 0.028 0.0286
0.0284
0
5
10
15
20
25
30
0.0275
35
0
5
10
15
node ID
20
25
30
35
25
30
35
node ID
(a) Step 3001
(b) Step 3051 0.032
0.031
Estimation 0.0305
0.031 0.03
GPS
GPS Prob
Prob
0.03
Estimation
0.0295
0.029
0.029
0.0285
0.028 0.028
0.027 0.0275
0.027
0
5
10
15
20
node ID
(c) Step 3101
25
30
35
0.026
0
5
10
15
20
node ID
(d) Step 3151
Fig. 24. The probability distribution over submaps and the topological network at the position corresponding to No. 3001 to No. 3151. As observed in (b), the estimation of the proposed algorithm does not match the actual position read from GPS because currently the Markov process is in initialization, current information is not sufficient to correctly compute where the vehicle is.
976
THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006
0.032
0.032
Estimation 0.031
0.031
Estimation
GPS
GPS
0.03
0.03
Prob
0.029
Prob
0.029
0.028
0.028
0.027
0.027
0.026
0.026
0.025
0
5
10
15
20
25
30
0.025
35
0
5
10
15
node ID
20
25
30
35
node ID
(a) 3201
(b) 3251
Fig. 25. The probability distribution over submaps and the topological network at the position corresponding to measurements No. 3201 to No. 3251. Compared with Figure 24, it can be seen that the estimated vehicle position has become much more accurate as new sensor data is received.
current position
Topological shift happened! 200
160 140
150
120
←submap:13
80
North, unit:m
North, unit:m
100
iter:3301 60 40
100
50
20 0
0 −20
Frame:3301
−40 −20
0
20
40
60
80
100
East, unit: m
(a) GPS position
120
140
160
180
−50 −50
0
50
100
150
200
East, unit: m
(b) p(zt |Ytj )
Fig. 26. A topological shift detected at iteration No. 3301. The vehicle is supposed to move from its previous submap to a new one. All the submap’s observational probability are recalculated and plotted as above.
Chen and Wang / Loop-Closing Detection
977
0.032
0.031
Estimation GPS
0.03
Prob
0.029
0.028
0.027
0.026
0.025
0
5
10
15
20
25
30
35
node ID
Fig. 27. The probability distribution over submaps and the topological network at the position corresponding to No. 3351. Up to this point, the estimations from Bayesian inference are satisfactory.
current position
Topological shift happened! 200
160 140
150
120
←submap:13
80
North, unit:m
North, unit:m
100
iter:3401 60 40
100
50
20 0
0 −20 −40
Frame:3401 −20
0
20
40
60
80
100
East, unit: m
(a) GPS position
120
140
160
180
−50 −50
0
50
100
150
200
East, unit: m
(b) p(zt |Ytj )
Fig. 28. Another topological shift detected at iteration No. 3401. According to the information gathered previously, this shift should be detected a short while later. Unfortunately, such a shift is reported because of the fact that the vehicle’s current trajectory is different from the one in the first loop. So there is now actually a conflict between the Bayesian estimation and the new observation. If such a conflict persists in the following iterations, the Bayesian inference will just report the loop-closing hypothesis at iteration 3000 is false.
978
THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006
0.032
0.032
0.031
0.031
GPS
GPS
0.03
0.03
Estimation
Estimation Prob
0.029
Prob
0.029
0.028
0.028
0.027
0.027
0.026
0.026
0.025
0
5
10
15
20
25
30
0.025
35
0
5
10
15
node ID
20
25
30
35
20
25
30
35
node ID
(a) 3451
(b) 3501
0.032
0.034
0.033 0.031
Estimation
0.032
GPS Estimation
0.03
GPS
0.031 0.029
Prob
Prob
0.03
0.028
0.029
0.028 0.027 0.027 0.026 0.026
0.025
0
5
10
15
20
node ID
(c) 3551
25
30
35
0.025
0
5
10
15
node ID
(d) 3601
Fig. 29. The probability distribution over submaps and the topological network at the position corresponding to measurements No. 3451 to No. 3601. The conflict introduced in Figure 28 “confused” the Bayesian inference process. Consequently, the error of our algorithm grows during these 4 steps. The question to be answered here is: how should we judge whether such errors are caused by the wrong loop-closing hypothesis, or by the temporary observation error?
Chen and Wang / Loop-Closing Detection
979
0.038
Estimation 0.036
0.034
0.032
Prob
GPS 0.03
0.028
0.026
0.024
0
5
10
15
20
25
30
35
node ID
Fig. 30. The probability distribution over submaps and the topological network at the position corresponding to No. 3651. The keep-coming correct observations since iteration No. 3351 finally compensate that error. In this figure, the estimation is quite close to the vehicle’s actual position.
current position
Topological shift detected
160
160
140
140
120
120
100
100
North, unit:m
North, unit:m
Frame:3701
80
iter:3701 60 40
80 60 40
20
20
0
0
−20
−20
−40
−40 −20
0
20
40
60
80
100
East, unit: m
(a) GPS position
120
140
160
180
−20
0
20
40
60
80
100
120
140
160
180
East, unit: m
(b) p(zt |Ytj )
Fig. 31. A topological shift is detected at iteration 3701. The vehicle’s actual position is depicted in (a), the observational probability of each submap is plotted in (b) by gray shading.
980
THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006
0.035
Estimation 0.034
0.033
GPS
0.032
Prob
0.031
0.03
0.029
0.028
0.027
0.026
0
5
10
15
20
25
30
35
node ID
Fig. 32. The probability distribution over submaps and the topological network at the position corresponding to No. 3751. The topological Bayesian inference gives the excellent estimation for the vehicle’s pose. Put in the loop-closing context, the keep-coming information verifies the loop-closing hypothesis.
current position
Topological shift detected
160
160
140
140
120
120
100
100
North, unit:m
North, unit:m
Frame:3801
80
iter:3801 60 40
80 60 40
20
20
0
0
−20
−20
−40
−40 −20
0
20
40
60
80
100
East, unit: m
(a) GPS position
120
140
160
180
−20
0
20
40
60
80
100
120
140
160
180
East, unit: m
(b) p(zt |Ytj )
Fig. 33. A topological shift is detected at frame No. 3801, then the likelihood of the vehicle’s current position is updated.
Chen and Wang / Loop-Closing Detection
981
0.034
0.033
GPS Estimation
0.032
Prob
0.031
0.03
0.029
0.028
0.027
0.026
0
5
10
15
20
25
30
35
node ID
Fig. 34. The probability distribution over submaps and the topological network at the position corresponding to measurement No. 3851.
0.035
0.034
0.034
0.033
Estimation GPS Estimation
0.033
0.032
GPS
0.032
0.031
0.03
Prob
Prob
0.031
0.03
0.029
0.029
0.028
0.028
0.027
0.027
0.026
0.026
0
5
10
15
20
node ID
(a) 3901
25
30
35
0.025
0
5
10
15
20
25
30
35
node ID
(b) 3951
Fig. 35. The probability distribution over submaps and the topological network at the position corresponding to measurement Nos 3901 to 3951. The estimation from Bayesian inference matches well with the ground truth, which demonstrates the accuracy and robustness of the proposed algorithm.
982
THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH / October 2006
Acknowledgments We appreciate the help from Dr Javier Ibañez-Guzmán and Ng Teck Chew when the experiments were conducted.
References Ali, S. and Shah, M. 2005. A supervised learning framework for generic object detection in images. Proceedings of International Conference on Computer Vision, pp. 1347– 1354. Arras, K. O., Castellanos, J., Schilt, M., and Siegwart, R. 2003. Feature-based multi-hypothesis localization and tracking using geometric constraints. Robotics and Autonomous Systems 44:41–53. Artac, M., Jogan, M., and Leonardis, A. 2002. Mobile robot localization using an incremental eigenspace model. Proceedings of International Conference on Robotics and Automation, pp. 1025–1030. Beeson, P., Jong, N. K., and Kuipers, B. 2005. Towards autonomous topological place detection using the extended voronoi graph. Proceedings of IEEE International Conference on Robotics and Automation (ICRA-05). Crowley, J. L., Wallner, F., and Schiele, B. 1998. Position estimation using principal components of range data. Proceedings of International Conference on Robotics and Automation , pp. 3121–3128. Daubechies, I. 2002. Ten lectures on wavelets. Philadelphia: SIAM. Dissanayake, M. W. M. G., Newman, P. M., Clark, S., DurrantWhyte, H. F., and Csorba, M. 2001. A solution to the simultaneous localization and map building problem. IEEE Transactions Robotics and Automation 17(3):229–241. Doucet, A., de Freitas, N., and Gordon, N. 2001. Sequencial Monte Carlo Methods in Practice. New York: SpringerVerlag. Duckett, T. and Nehmzow, U. 2001. Mobile robot selflocalization using occupancy histograms and a mixture of Gaussian location hypotheses, Robotics and Autonomous Systems 34:117–129. Duda, R. O., Hart, P. E., and Stork, D. G. 2000. Pattern Classification. New York: John Wiley and Sons. Elfes, A. 1987. Occupancy grids: a probabilistic framework for robot perception and navigation. Journal of Robotics and Automation 3:249–265. Fox, D., Burgard, W., and Thrun, S. 1998. Active Markov localization for mobile robots. Robotics and Autonomous Systems 25:195–207. Guivant, J. E. and Nebot, E. M. 2001. Optimization of the simultaneous localization and map-building algorithm for real-time implementation, IEEE Transactions on Robotics and Automation 17(3):242–257. Gutmann, J.-S. and Konolige, K. 1999. Incremental mapping of large cyclic environments. Proceedings of Conference
on Intelligent Robots and Applications (CIRA). Jensfelt, P. and Kristensen, S. 2001. Active global localisation for a mobile robot using multiple hypothesis tracking, IEEE Transactions Robotics and Automation 17:748–760. Konolige, K. 1997. Improved occupancy grids for map building, Autonomous Robots 4:351–367. Krose, B. J. A., Vlassis, N., Bunschoten, R., and Motomura, Y. 2001. A probabilistic model for appearance based robot localization, Image and Vision Computing 19:381–391. Kuipers, B., and Beeson, P. 2002. Bootstrap learning for place recognition. Proceedigns of National Conference on Artificial Intelligence(AAAI), pp. 2512–2517. Kuipers, B., Modayil, J., Beeson, P., MacMahon, M., and Savelli, F. 2004. Local metrical and global topological maps in the hybrid spatial semantic hierarchy. Proceedings of International Conference on Robotics and Automation, pp. 4845–4851. Kumar, S., Guivant, J., and Durrant-Whyte, H. 2004. Informative representations of unstructured environment. Proceedings of International Conference on Robotics and Automation, pp. 212–217. Kumar, S., Ramos, F., Upcroft, B., and Durrant-Whyte, H. 2005. A statistical framework for natural feature representation. Proceedings of International Conference on Intelligent Robots and Systems, pp. 1–6. Lamon, P., Nourbakhsh, I., Jensen, B., and Siegwart, R. 2001. Deriving and matching image fingerprint sequences for mobile robot localization. Proceedings of International Conference on Robotics and Automation, pp. 1609–1614. Leonnard, J. J., Durrant-Whyte, H., and Cox, I. J. 1991. Mobile robot localization by tracking geometric beacons. IEEE Transactions on Robotics and Automation 3:376– 382. Lowe, D. G. 2004. Distinctive image features from scaleinvariant keypoints. International Journal of Computer Vision 2:91–110. Lu, F. and Milios, E. 1997. Globally consistent range scan alignment for environment mapping. Autonomous Robots 4:333–349. Madhavan, R. and Durrant -Whyte, H. 2004. Natural landmark-based autonomous vehicle navigation. Robotics and Autonomous Systems 46:79–95. Modayil, J., Beeson, P., and Kuipers, B. 2004. Using the topological skeleton for scalable global metrical map-building. Proceedings of International Conference on Intelligent Robots and Systems, pp. 1530–1536. Neira, J. and Tardós, J. D. 2001. Data association in stochastic mapping using the joint compatibility test. International Journal of Robotics Research 17:890–897. Ng, T. C., Jian, S., Ziming, G., Ibañez-Guzmán, J., and Cheng, C. 2004. Vehicle following with obstacle avoidance capabilities in natural environments. Proceedings of International Conference on Robotics and Automation, volume 5, pp. 4283–4288.
Chen and Wang / Loop-Closing Detection Porta, J. M., Verbeek, J., and Krose, B. 2005. Active appearance-based robot localization using stereo vision. Autonomous Robots 18:59–80. Ranganathan, A. and Dellaert, F. 2004. Inference in the space of topological maps: an MCMC-based approach. Proceedings of International Conference on Intelligent Robots and Systems, pp. 1518–1523. Ranganathan, A. and Dellaert, F. 2005. Data driven MCMC for appearance-based topological mapping. Proceedings of Conference on Robotics: Science and Systems. Roweis, S. and Saul, L. 2000. A global geometric framework for nonlinear dimensionality reduction. Science 290:2323– 2326. Smith, R., Self, M., and Cheeseman, P. 1988. Estimating uncertain spatial relationships in robotics. In Cox, I. J. and Wilfong, G. T. (eds.) Autonomous Robot Vehicles, pp. 167– 193, New York: Springer Verlag. Stewart, B., Ko, J., Fox, D., and Konolige, K. 2003. The revisiting problem in mobile robot map building: A hierarchical Bayesian approach. Proceedings of the Conference on Uncertainty in AI (UAI). Tenenbaum, J. B., de Silva, V., and Langford, J. C. 2000. A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2326. Thrun, S. 2001. A probabilistic on-line mapping algorithm for teams of mobile robots. International Journal of Robotics Research 20:335–363. Thrun, S., Fox, D., and Burgard, W. 2000. Robust Monte Carlo localization for mobile robots. Artificial Intelligence 128:99–141. Tomatis, N. Nourbakhsh, I., and Siegwart, R. 2002. Hybrid si-
983
multaneous localization and map building: closing the loop with multi-hypotheses tracking. Proceedings of International Conference on Robotics and Automation, pp. 2749– 2754. Turk, M. and Pentland, A. 1991. Face recognition using eigenfaces. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 586–591. Ulrich, I. and Nourbakhsh, I. 2000. Appearance-based place recognition for topological localization. Proceedings of International Conference on Robotics and Automation, pp. 1023–1029. Veeck, M. and Burgard, W. 2004. Learning polyline maps from range scan data acquired with mobile robots. Proceedings of International Conference on Intelligent Robots and Systems , pp. 1065–1070. Vlassis, N. and Krose, B. 1999. Robot environment modeling via principal component regression. Proceedings of International Conference on Intelligent Robots and Systems, pp. 677–672. Yang, J., Zhang, D., Frangi, A. F., and Yang, J. 2004. Twodimensional pca: A new approach to appearance based face representation and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(1):131–137. Yeung, K.Y. and Ruzzo, W.L. 2001. Principal component analysis for clustering gene expression data. Bioinformatics 17:763–74. Zhou, C., Wei, Y., and Tan, T. 2003. Mobile robot selflocalization based on global visual appearance features. Proceedings of International Conference on Robotics and Automation, pp. 1271–1276.