In Visual Communications and Image Processing 2000, Proceedings of SPIE vol. 4067 (2000), pp. 1490-1498, Perth, Australia, 20-23 June 2000.

Multiresolution Feature-Based Image Registration Chiou-Ting Hsu1 and Rob A. Beuker2 1

Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan Video Processing Group, Systems Laboratory Eindhoven (PS-SLE), Philips Semiconductors B.V.

2

ABSTRACT Image registration is the fundamental task used to match two or more partially overlapping images and stitch these images into one panoramic image comprising the whole scene. To register two images, the coordinate transformation between a pair of images must be found. In this paper, a multiresolution feature-based method is developed to efficiently estimate an eight-parametric projective transformation model between pairs of images. The proposed method has been tested and work well on many images of static scene taken with a hand-held digital still camera. Keywords: image registration, projective coordinate transform, feature points, wavelet transform

1. INTRODUCTION Image registration is used to match two or more partially overlapping images and stitch them into one panoramic image of the scene. To register two images, the coordinate transformation between a pair of images must be found from class of transformations. The optimal transformation depends on the types of relation between the overlapping images. To find the relationship between two images we rely on the estimation of the parameters of the transformation model. The number of parameters depends on the chosen transformation model. A common assumption is that the coordinate transformations between two images are rigid planar models. Rigid planar transformation is composed of scaling, rotation, and translation changes, which map the pixel (x1,y1) of image I1 to the pixel (x2,y2) of another image I2:  x2   x1  cosθ − sinθ   x1  t x   y  = s R  y  + T = s sinθ cosθ   y  + t   2  1    1  y  The rigid transformation is sufficient to match two images of a scene taken from the same viewing angle but from different position. That is, the camera can rotate about its optical axis. In the case of remote sensing, where the distance approaches infinity, the transformation between the captured images behaves like a planar rigid transformation. The projective transformation model involves eight parameters, mk ( k = 0 ~ 7) , m0 m1   x1  m2    +   x 2  m3 m4   y1  m5  y  = x  2 [m6 m7 ]  1  + 1  y1  that account for distortions which occur when images of a 3D static scene are taken at fixed location, or when multiple images of a planar scene are taken from arbitrary locations. In practice, the projective model is also a good approximation for a 3D scene when the movement of the camera is small compared to its depth, and for a scene when the relative distances between objects in the scene is small compared to its depth. In this paper, the projective transformation model is estimated efficiently by using the rigid planar parameters as a good initial guess. And the images are assumed to be taken from a hand-held digital still camera (DSC) while the rotation about its optical axis and scaling change are due to manual operation.

1

In Visual Communications and Image Processing 2000, Proceedings of SPIE vol. 4067 (2000), pp. 1490-1498, Perth, Australia, 20-23 June 2000.

As the number of possible transformation grows, the computational costs quickly become unmanageable. To reduce the computational costs, measurements are based only on features points of the images instead of using the entire images. In this work, a multiresolution feature-based technique is applied. The input images are initially decomposed into their multiresolution representations and a number of feature points in the coarest resolution are determined. Next, feature point pairs are matched at the coarest resolution level and then used to derive the projective parameters. Lastly, a hierarchical coarse-to-fine refinement is performed to efficiently derive the projective parameters at full image resolution and finally merge the warpped images into one image mosaic.

2. MATCHING ALGORITHM The overall flow chart of the proposed algorithm is illustrated in Figure 1. 2.1 FEATURE POINTS EXTRACTION Initially, input images are decomposed into their multiresolution representation by a discrete wavelet transform. Based on the multiresolution structure, a number of feature points in the coarsest level are determined by multiscale edge correlation method [1][2]. Herein, the multiscale edge detection [1] and its improved edge correlation method [2] will be described successively. 2.1.1 Multiscale edge detection In [1], the multiscale edge detection method based on wavelet transform is derived. Let ψ 1j ( x, y ) = 21 ψ 1 ( 2x , 2y ) and ψ 2j ( x , y ) = 21 ψ 2 ( 2x , 2y ) be two wavelets which are dilated of ψ1(x,y) and ψ2(x,y) by a factor of 2 j, respectively . j

j

j

j

j

j

The 2D wavelet transform of an image I(x,y) at scale 2 j (i.e. level j) can be decomposed into two independent directions as W j1 I ( x, y ) = I *ψ 1j ( x , y ) and W j2 I ( x , y ) = I *ψ 2j ( x , y ).

Let S(x,y) be a 2D smoothing function. The multiscale sharp variation points (i.e. edge points) can be obtained from a wavelet transform if ∂S ( x, y) ∂ S( x, y) ψ 1 ( x, y) = and ψ 2 ( x, y) = . ∂x ∂y It is derived in [1] that

∂Sj

∂ (I * Sj )(x, y) and ∂x ∂x ∂S ∂ Wj2 ( x, y) = I *ψ 2j (x, y) = I *(2 j j )(x, y) = 2 j (I *S j )(x, y). ∂y ∂y

Wj1 (x, y) = I *ψ 1j (x, y) = I *(2 j

)(x, y) = 2j

That is, r r W j1 ( x, y)   ∂ (I * S j )(x, y )   2  = 2 j  ∂∂x  = 2 j ∇( I * S j )( x, y) where ∇(I * S j )(x, y) denotes the gradient vector.   W j ( x, y)     ∂y ( I * S j )(x, y )  Hence, the two components of the wavelet transform are proportional to the two components of the gradient vector. At each level, the modulus of the gradient vector is proportional to 2

2

M j ( x, y) = W j1 I ( x, y ) + W j2 I ( x, y ) . If the local maxima of Mj are located and thresholded with a given value, then the edge points at scale 2 j are detected. 2.1.2 Edge correlation

2

In Visual Communications and Image Processing 2000, Proceedings of SPIE vol. 4067 (2000), pp. 1490-1498, Perth, Australia, 20-23 June 2000.

For image registration, more specific and robust feature points are preferred than noisy edge information. Another criterion named “edge correlation” is introduced in [2]: n −1

ER n ( j, x, y) = ∏ M j −i ( x, y) , i=0

where n indicating the number of scales involved in the multiplication, and j is the initial scale for edge correlation. Based on the edge correlation, the direct multiplication of wavelet transform data at adjacent scales is employed to distinguish important edges from noise. This method is based on the fact that sharp edges have large signal over multiple scales, while noise vanishes across scales. Therefore, correlation of the wavelet gradient over different scales will sharpen the major edges and suppress noise or unsignificant features. For an image with many high frequency components, only local maxima of the edge points are retained as image feature points [3], as shown in Figure 2. This is done in order to improve the efficiency for calculating matching feature points between the two images. Moreover, the estimation of accurate transformation is then applied only on the matched feature points set instead of the entire images, this reduces computational cost quite significantly. An additional cost saving is due to the down sampling performed during multiresolution decomposition, the computation of measurements and pairwise matching is carried out in the coarsest resolution level. 2.2 FINDING THE MATCHING PAIRS 2.2.1 Find all possible matching pairs After the feature points in the coarest level are extracted from both images, the set of all possible matching pairs { p ⇔ q } between the two images is determined. The similarity between any two feature points (px,py) and (qx,qy) is measured by their normalised cross-correlation. Notably, the cross-correlation must be normalised since local image intensity would otherwise influence the measure. This measure has the property that it measures correlation on a scale ranging from [-1,1]. For each feature point p, find the peak value of metric (which should be larger than a predefined threshold) to determine its best match point q. 2.2.2 Eliminating the false matching pairs Furthermore, we devised a consistency test which is based on the assumption that when rotation and scaling changes are small, the difference between the translational vectors of two consistent matching pairs is smaller than a threshold [4]. For example, let MP = {pk ⇔ qk , k = 1,2,L, N m } be a set of matching pairs, where Nm represents the number of elements in MP, pk is a feature points in I1 and qk is a feature points in I 2 . Then the relationship of pk and qk is qk = s R pk + T cos(0) - sin(0) 1 0 . if s ≅ 1 and R ≅  =    sin(0) cos(0) 0 1 v v If the pairs {pi ⇔ qi} and {pl ⇔ ql} are correct matching pairs, then the difference between Ti = qi − pi and Tl = ql − pl should be small. Figure 3 shows an example. ≅ pk + T ,

v v Therefore, the difference between Ti and Tl , determines whether matching pairs are consistent, and for the set of all possible matching pairs, a list of counters records how many other pairs are consistent with current pair. For thoses matching pairs with the largest consistency count, select the matching pair with maximum matching similarity (i.e. normalised correlation value) as a reliable pair, and eliminate all the matching pairs not consistent with the selected pair. 2.3 DERIVING THE TRANSFORMATION PARAMETERS

3

In Visual Communications and Image Processing 2000, Proceedings of SPIE vol. 4067 (2000), pp. 1490-1498, Perth, Australia, 20-23 June 2000.

After the elimination of the false-matching pairs, the retained correct matching pairs are used to derive the image transformation parameters. 2.3.1 Deriving four rigid planar parameters Initially, four rigid planar parameters (i.e. scaling, rotation angle, and two translation shifts) are calculated by a singular value decomposition (SVD) method [5][3]. Let M = {pi ⇔ qi , i = 1,2, L, N c } be the set of correct matching pairs, where Nc is the total number of correct matching pairs. In order to derive four rigid planar parameters (s, R and T) based on the relation:  tx  cosθ − sinθ  qi = s R pi + T = s  pi +   .   sinθ cosθ   ty  The error function is defined as Nc

Φ = ∑ s R pi + T − qi

2

i =1

By minimising Φ, a set of solutions can be derived [3] as follows . 1 N 1 ∂Φ pi and q = From = 0 , we can obtain, T = q - s R p , where p = ∑ ∂T Nc i=0 Nc ∂Φ And from = 0 , we can derive ∂s c

Nc

∑q . i

i=0

Nc

s=

∑ q~ R p~ t i

i =1 Nc

∑ ~pit ~pi

i

, where ~pi = pi − p and q~i = qi − q .

i=1

Then, Φ is reorganised as 2

 N ~t ~  ∑ qi R pi  N i =1  . t~ ~ Φ = ∑ qi qi −  N t ~ i =1 ~ pp c

c

∑ c

i

i

i =1

Therefore, minimising Φ can be converted into maximising the term  N Φ′ = ∑ q~i t R ~pi  .  i =1  c

The singular value decomposition (SVD) method is used to solve R by following the steps: 1. Calculate the 2×2 matrix H, that is N ~ H= p q~ t .

∑ c

i

i

i=1

t

2. Find the SVD of H, i.e. H=PΛQ . 3. R = QPt After the rotation matrix R is found, the scale factor s and translation vector T can be found according to the above equations. 2.3.2 Deriving eight projective parameters Subsequently, the four rigid planar parameters are used as the initial guess to resolve the eight projective parameter problem by iteratively minimizing the distances between the coordinates of the feature points and the projected coordinates of their matching points using the Levenberg-Marquardt method [6].

4

In Visual Communications and Image Processing 2000, Proceedings of SPIE vol. 4067 (2000), pp. 1490-1498, Perth, Australia, 20-23 June 2000.

That is, if pi = ( pix , piy ) and qi = ( qix , qiy ) are one matching pair of feature points, 1. Use SVD-based method described above to find four parameters ( s,θ , t x and t y ) of rigid planar transform. 2. Use the four rigid planar parameters as an initial guess for the projective parameters, that is, q ′ix  q ′  =  iy 

 m0 m  3

m1   pix  m2   + m4   piy  m5  = p  [m6 m7 ]  ix  + 1  piy 

− sin θ   pix  t x   +  cos θ   piy  t y  . p  [0 0]  ix  + 1  piy 

cos θ s  sin θ

3. Iteratively minimize the sum of the distance between ( qix , qiy ) and ( q′ix , q′iy ) for each matching pair by LevenbergMarquardt method. That is, to minimize χ 2 = ∑ d i2 , where di = (qix′ − qix )2 + (qiy′ − qiy )2 , by calculating partial derivatives for each Nc

i =1

parameter ( mk , k = 0 ~ 7) : ∂q′  ∂d i ∂q′ 1 = (q ix′ − q ix ) ix + (q iy′ − qiy ) iy  , ∂ mk  ∂mk d i  ∂ mk and then update mk ( k = 0 ~ 7) for each iteration.

The above minimization process converges quickly to the accurate eight projective parameters and the solution is not trapped within local minima since the four rigid planar parameters behave as a good initial estimate. It is worthwhile to note that the minimization is only applied to the coordinates of the matching pairs instead of on the image intensities of the entire image as is currently typical of the optical flow based method [7]. Therefore, the efficiency is significantly improved over the optical flow based method. 2.4 HIERARCHICAL MATCHING REFINEMENT Next, a hierarchical coarse-to-fine refinement is performed to refine the eight parameters estimated in the coarsest resolution. The refining steps are as follows: 1. Reconstruct both images in the coarse resolution up to one higer resolution layer by inverse wavelet transformation. 2. Multiply the coordinates of each matching pairs by two and adjust the eight estimated parameters to fit the corresponding coordinates to the current resolution layer. That is, m2 pi ⇐ pi × 2 and m5 qi ⇐ qi × 2, i = 1,L , N c m6 m7

⇐ m2 × 2 ⇐ m5 × 2 .

⇐ m6 / 2 ⇐ m7 / 2

3. Based on the adjusted parameters m0 ~ m 7 , the exact corresponding point of p i in current resolution layer should be in the neighborhood of q ′i , where m0 m1   m2  m m  pi + m   5 . 4 q′i =  3 [m6 m7 ] pi + 1

Then, only the set of pixels within a small window centered around pi is warped according to m0 ~ m 7 . And, its best matching point is found by local search method within the neighborhood centerd around q′i . Hence, a refined set of matching pairs in the current resolution is obtained. 4. Next, based on the refined macthing pairs, the accurate parameters in current resolution layer is derived by applying the Levenberg-Marquardt iterative method using the adjusted parameters as an initial guess. 5. Recursively perform the refinement until we are at the full resolution level.

5

In Visual Communications and Image Processing 2000, Proceedings of SPIE vol. 4067 (2000), pp. 1490-1498, Perth, Australia, 20-23 June 2000.

3. EXPERIMENTAL RESULTS The proposed method has been tested on a number of images of static scene taken with a hand-held DSC (Philips Digital Camera ESP80). Figure 4(a) and (b) are two images taken indoors, where the projective distortions are quite visible. Fig. 4 (c) and (d) are the stitched mosaic using (a) and (b) as the reference coordinates, respectively. Figure 5(a)~(d) are four partially overlapping images taken outdoors with a hand-held still camera. The stitched mosaic is shown in Fig. 5(e) using (c) as the reference coordinates. The speed of the proposed method depends on level of decomposition, the number of retained feature points, and local search range for refinement. For two images of size 600×400 (as shown in Fig. 5(a)~(d)), with 3 decomposition levels, it takes about 2 seconds (on a Pentium-II 300MHz PC) to align and stitch by weighted average into one image mosaic.

4. ACKNOWLEDGEMENT The research described in this paper has been sponsored by Philips Innovation Center Taipei (PICT), Philips Research. We would like to acknowledge the support given by B. S. Paul Lin, Ir. Marcel Führen, Christopher Wei and the AFDC group members.

5. REFERENCES 1. S. Mallat and S. Zhong, “Characterization of Signals form Multiscale edges,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 14, No. 7, pp. 710-732, July 1992. 2. Y. Xu, J. B. Weaver, D. M. Healy and J. Lu, “Wavelet Transform Domain Filters: A Spatially Selective Noise Filtration Technique,” IEEE Trans. Image Processing, Vol. 3, No. 6, pp. 747-757, November 1994. 3. J. W. Hsieh, H. Y. M. Liao, K. C. Fan, M. T. Ko and Y. P. Hung, “Image Registration Using a New Edge-Based Approach,” Computer Vision and Image Understanding, Vol. 67, No. 2, pp. 112-130, August 1997. 4. Candy Hsu and Rob Beuker, “Planar Image Registration using Feature Point Extraction,” PICT TR 98/2, Philips Innovation Center Taipei. 5. K. S. Arun, T. S. Huang, and S. D. Blostein, “Least-square fitting of two 3-D point sets,” IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-9, no. 5. Pp. 698-700, Sept. 1987. 6. W. H. Press et al., Numerical Recipes in C: the Art of Scientific Computing, 2 nd ed., Cambridge University Press, Cambridge, 1992. 7. S. Mann and R. W. Picard, “Virtual Bellows: Constructing High-Quality Images for Video,” Proc. First ICIP, vol. 1, IEEE CS Press, 1994, pp. 363-367. 8. H. Y. Shum and R. Szeliski, “Panoramic Image Mosaics,” Microsoft Research Technical Report MSR-TR-97023. 9. S. Mann and R. W. Picard, “Video Orbits of the Projective Group: A simple approach to featureless estimation of parameters,” Oct. 1996.

6

In Visual Communications and Image Processing 2000, Proceedings of SPIE vol. 4067 (2000), pp. 1490-1498, Perth, Australia, 20-23 June 2000.

Image 1

Image 2

Multiresolution decomposition

Multiresolution decomposition

Feature points extraction

Feature points extraction Find all possible matching pairs Eliminate falsematching pairs

Initial estimation θ, s, tx, ty Find 8 parameters mk, k=0~7 One layer reconstruction

One layer reconstruction

Refine matching mk, k=0~7

Merge into one Image mosaic

Figure 1. Flowchart of multiresolution feature-based image registration.

(a) (b) (c) Figure 2. (a) shows two partially overlapping images. The feature points extracted by multiscale edge correlation method are shown in (b). To improve the matching efficiency, only local extreme points (as shown in (c)) are retained for matching.

7

In Visual Communications and Image Processing 2000, Proceedings of SPIE vol. 4067 (2000), pp. 1490-1498, Perth, Australia, 20-23 June 2000.

p1

T1 q1

p2

q1 T2

Figure 3. Consistent test for eliminating false-matching pairs. For example, if { p1 ⇔ q1 } and { p2 ⇔ q2 } are the correct v v r v matching pairs, then the difference between T1 = q1 − p1 and T2 = q2 − p1 should be small. That is, T1 − T2 ≤ thresholddis tan ce v v

and angle (T1 , T2 ) ≤ threshold angle .

(a)

(b)

(c)

(d)

Figure 4. (a) and (b) are two partially overlapping images with projective transformation. (c) is the registerd image mosaic using (a) as the reference coordinates; while (d) is the registerd image mosaic using (b) as the reference coordinates .

8

In Visual Communications and Image Processing 2000, Proceedings of SPIE vol. 4067 (2000), pp. 1490-1498, Perth, Australia, 20-23 June 2000.

(b)

(c)

(a)

(d)

(e) Figure 5. Registration of four images taken with one hand-held camera: (a)~(d) partially overlapping images; (e) the registerd image mosaic, where the image coordinates of (c) is chosen as the reference coordinates.

9

## Multiresolution Feature-Based Image Registration - CiteSeerX

Jun 23, 2000 - Then, only the set of pixels within a small window centered around i ... takes about 2 seconds (on a Pentium-II 300MHz PC) to align and stitch by ... Picard, âVirtual Bellows: Constructing High-Quality Images for Video,â Proc.

#### Recommend Documents

image fidelity - CiteSeerX
errors in the data, such as amplitude, phase, and pointing errors, and also on .... The smaller antennas are well suited for mapping large source structure, and ...

interpreting ultrasound elastography: image registration ...
brains and prostates due to the availability of anatomical structures that can be readily ... parameters to fit onto the reference point set (considered as data.

Wavelets in Medical Image Processing: Denoising ... - CiteSeerX
Brushlet functions were introduced to build an orthogonal basis of transient functions ...... Cross-subject normalization and template/atlas analysis. 3. .... 1. www.wavelet.org: offers a âwavelet digestâ, an email list that reports most recent n

Wavelets in Medical Image Processing: Denoising ... - CiteSeerX
Wavelets have been widely used in signal and image processing for the past 20 years. ... f Ï , defined in the frequency domain, have the following relationships.

Evaluating Content Based Image Retrieval Techniques ... - CiteSeerX
(âMountainâ class); right: a transformed image (negative transformation) in the testbed ... Some images from classes of the kernel of CLIC ... computer science.

Non-Rigid Image Registration under Non-Deterministic ...
aDepartment of Electrical and Computer Engineering ... â*This work was partially supported by the National Science Foundation ... al.10 propose a groupwise, non-registration algorithm to simultaneously register all subjects in a population to.

Localization and registration accuracy in image guided ... - Springer Link
... 9 January 2008 / Accepted: 23 September 2008 / Published online: 28 October 2008. Â© CARS 2008. Abstract. Purpose To measure and compare the clinical localization ..... operating room, the intraoperative data was recorded with the.

Lucas-Kanade image registration using camera ...
a photograph. We then estimate only the extrinsic parameters for image registration, considering two types of camera motions, 3D rotations and full 3D motions with translations and rotations. As the known ..... [1] Lucas, B. and Kanade, T., âAn ite

Medical image registration using machine learning ...
Medical image registration using machine learning-based interest ... experimental results shows an improvement in 3D image registration quality of 18.92% ...

Image Registration by Minimization of Residual ...
plexity of the residual image between the two registered im- ages. This measure produces accurate registration results on both artificial and real-world problems that we have tested, whereas many other state-of-the-art similarity mea- sures have fail

Multi-Modal Medical Image Registration based on ...
(spatial index), s (imaging plane index); so that the 3-D snake function to be optimized is defined as f(v) = Ä®1ÅvrÅ +. Ä®2ÅvsÅ+ È1ÅvrrÅ + È2ÅvssÅ + È4ÅvrsÅ+ E, where {Ä®i} are constants imposing a tension constraint, and {Èi} are cons

LNCS 4191 - Registration of Microscopic Iris Image ... - Springer Link
Casey Eye Institute, Oregon Health and Science University, USA. {xubosong .... sity variance in element m, and I is the identity matrix. This is equivalent to.

Aerial Image Registration For Tracking ieee.pdf
terrain scenes where man-made structures are absent, the blob. detector is more appropriate. If no information about the images. is available, one may use one detector as the default and, if that. fails, use the alternate detector. In our work, since

Visual Servoing from Robust Direct Color Image Registration
as an image registration problem. ... article on direct registration methods of color images and ..... (either in the calibrated domain or in the uncalibrated case).

Visual Servoing from Robust Direct Color Image Registration
article on direct registration methods of color images and their integration in ..... related to surfaces. (either in the calibrated domain or in the uncalibrated case).

a niche based genetic algorithm for image registration
Image registration aims to find the unknown set of transformations able to reduce two or more images to ..... of phenotypic similarity measure, as domain-specific.

Image Registration by Minimization of Mapping ...
deformation we put a uniform grid of 6 Ã 6 control points over the image, randomly .... is a trade-off between the computational load and accuracy of the registration. ... quire storage of eigenvectors, and can be a choice for the large data-set ...

non-rigid biomedical image registration using graph ...
method, we discuss two different biomedical applications in this paper. The first ... biomedical images in a graph cut-based framework, we compare our method with .... typical desktop with Intel(R) Core(TM)2 Duo processor with a speed of 2.8 ...

A local fast marching-based diffusion tensor image registration ...
relatively low resolution of DTI and less advanced alignment techniques in the initial works, global brain registration was also applied to quantitatively ...... Illustration of the local neighborhood and the deformed tensors. (a) The tensors in the

Removing mismatches for retinal image registration via ...
16 Aug 2016 - 1(d), while Fig. 1 (i) shows the perfect image registration after removing mismatches. Furthermore, vector field interpolation shows the smooth vector field learning from the feature matching. For retinal image registration, however, mu

Localization and registration accuracy in image guided neurosurgery ...
navigation system (Medtronic Inc, USA) on a laptop installed with the ... 9(Pt. 2):637-44. 3. Fitzpatrick, J.M., J.B. West, and C.R. Maurer, Jr., Predicting error in ...

Generalized Multiresolution Hierarchical Shape Models ...
one-out cross-validation. Table 1 shows the results obtained for the multi-object case. Compared with the classical PDM (avg. L2L error: 1.20 Â± 0.49 vox.; avg.