A Block-Based Gradient Descent Search Algorithm for ...

Viewer
Transcript

419

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 6, NO. 4, AUGUST 1996

[15] G. Wolberg, Digital Image Warping. Los Alamitos, CA: IEEE Com-

puter Society Press, 1990. [I61 C.-P. Yeh, “Depth perception based on fusion of stereo images,” SPIE Vol. 1778 Imaging Technologies and Applications, pp. 221-226, 1992. [17] Y. T. Zhou, “Multi-sensor image fusion,” presented at Znt. Con5 Image Processing, Austin, TX, Nov. 1994.

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 I - 7 0 0 0 0 0 0 0 0 0 0 0 0 0 00

-600 0 0 0 0 0 0 0 0 0 0 0 0 0 -50 0 0 0 00 00 0 0 00 0 00 -40 0 0 0 0 0 00 0 0 00 0 0 0 -30 0 0 000 00 00 00 000

A Block-Based Gradient Descent Search Algorithm for Block Motion Estimation in Video Coding Lurng-Kuo Liu and Ephraim Feig

3000000000000000 4000000000000000

Abstract-A block-based gradient descent search (BBGDS) algorithm is proposed in this paper to perform block motion estimation in video coding. The BBGDS evaluates the values of a given objective function starting from a small centralized checking block. The minimum within the checking block is found, and the gradient descent direction where the minimum is expected to lie is used to determine the search direction and the position of new checking block. The BBGDS is compared with full search (FS), three-step search (TSS), one-at-a-timesearch (OTS), and new three-step search (NTSS). Experimental results show that the proposed technique provides competitive performance with reduced computational complexity.

6 0 0 0 0 0 0 00 00 0 0 0 0 0 7000000000000000

5000000000000000

I. INTRODUCTION The high correlation between successive frames of a video sequence makes it possible to achieve high coding efficiency in a video coding system by reducing the temporal redundancy. Motion compensated video coding technique, which predicts current frame from previous frame (or reference frame), has been used to exploit the temporal redundancy between successive frames. Motion estimation plays an important role in such an interframe predictive coding system. Among many types of motion estimation algorithms, blockmatching technique has been adopted in many video compression standards, such as MPEG 111, 121, H.261 131, and H.263 [4], due to its simplicity. In block-matching technique, frames are divided into blocks and one motion vector is associated with each block. For each block in the current frame, the motion estimation searches for a motion vector which points to the best match block in the reference frame. The best match block is then used as the predictor for the current block. The full search (FS) block-matching algorithm is the simplest, but computationally very intensive. It provides an optimal solution by exhaustively evaluating all the possible candidates within the search range in the reference frame. Several fast algorithms, such as the three-step search (TSS) [ 5 ] , the 2-D logarithmic search (LOGS) [6], one-at-a-time search (OTS) [ 7 ] , and the new three-step search (NTSS) [SI have been developed to reduce the computational complexity by reducing the number of checking points. Of these, the first three can be easily trapped into a local minimum, thereby degrading performance; see, for example [9]. The NTSS takes into account the fact that the distribution of global minimum in real world video sequences is centered at zero. This is especially true for head-andManuscript received September 21, 1995; revised March 10, 1996. This paper was recommended by ASSOCidte Editor Y. Wang. The authors are with IBM T. .I. Watson Research Center, Yorktown Heights,

NY 10598 USA.

Publisher Item Identifier S 1051-8215(96)05194-4.

(a)

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 - 7 0 0 0 0 0 0 00 00 0 0 0 0 0

000 00 00 0 0 0 0 0 O p Y C q O 00 00000 O f 0 0 010 0 0 0 0 0 0 0 0 000000 000000 -10 0 0 00 000000 000000 0 0 0 0 0 0 100 0 000~00 010 0 0 0 0 0

-600 -50 0 -40 0 -30 0

0 0 0 0

2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6000000000000000

7000000000000000 (b)

Fig 1 (a) The first pass of BBGDS is centered around plxel (I), the second pass is centered around either pixel (2) or pixel (3) The shading hlghlights those new pixels that have to be checked in the second pass (b) Example of the BBGDS search procedure, where motion vector (-2, -4) i s found

shoulder sequences typical in video conferencing. The NTSS adds checking points which are centered about zero to the first step of the TSS. But usually the search stops after the first step, and then the second most likely scenario has the search ending after a second step which searches close to the first. So, on average, NTSS is somewhat faster than TSS. More significantly, for head-and-shoulder sequences, NTSS yields significant SNR gains over TSS. In the present work we push the idea of utilizing the statistical nature of the motion even further. We further assume that the global minimum has a monotonic distortion in its neighborhood. In our first step we only search around the center point. If the optimum is found at the center, the procedure stops. This will be more than 80% of the time. Otherwise, we proceed to search around the point where the minimum was found. The procedure continues until the winning

1051-8215/96$05.00 0 1996 IEEE

Authorized licensed use limited to: MIT Libraries. Downloaded on January 3, 2009 at 18:08 from IEEE Xplore. Restrictions apply.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 6, NO. 4, AUGUST 1996

420

Algorithms

Foreman

Salesman

Miss America

Average

Car Phone

MSE

Complexity

MSE

Complexity

MSE

Complexity

FS

26.82

100%

6.52

100%

4.99

100%

BBGDS

28.80

5.40%

6.66

4.22%

5.05

5.08%

22.34

5.40%

5.03%

TSS

34.23

11.43%

6.97

11.36%

5.50

11.39%

24.80

11.40%

11.40%

OTS

36.62

3.75%

7.06

2.96%

5.20

3.71%

25.67

3.75%

3.54%

NTSS

28.63

8.93%

6.62

7.83%

5.04

8.89%

22.11

9.11%

8.69%

I

I

TABLE I1

MSE 21.17

Complexity 100%

I

Complexity I

100%

I

*

THEPERFORMANCE COMPARISON OF THE ALGORITHMS WITH SEARCH RANGE 15 PIXELS IN BOTHHORIZONTAL AND VERTICAL DIRECTIONS

Foreman

r

Foreman

I

120,

100-

OTS NTSS

80 - 8 ,

0;

IO

20

30

40

50

60

Frame Number

70

80

90

Id0

4

Ib

20

30

40 !50 60 Frame Number

(a)

7b

80

90

1AO

(b)

Fig. 2. Performance comparison on Foreman sequence with search ranges (a) f 7 pixels and (b) *15 pixels in both horizontal and vertical directions.

point is a center point of the checking block or the checking block hits the boundary of the predefined search range. Hence, we call our method the block based gradient descent search (BBGDS) algorithm. Of course, various modifications suggest themselves, like stopping after K number of searches (say K = 3), or after the improvement in the objective function is below some fixed threshold. It is also clear that the larger size of checking block can reduce the chance of being trapped in a local minimum.

The details of the proposed algorithm are described in Section 11. Experimental results are given in Section 111. Finally, conclusions are made in Section IV.

11. BLOCK-BASED GRADIENT DESCENTSEARCH ALGORITHM The search procedure of the BBGDS algorithm is illustrated in Fig. 1. Checking blocks are squares of 3 x 3 pixels. The BBGDS starts by initializing the checking block so that its center pixel is at the origin.

Authorized licensed use limited to: MIT Libraries. Downloaded on January 3, 2009 at 18:08 from IEEE Xplore. Restrictions apply.

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 6, NO. 4, AUGUST 1996

42 1

Salesman

Miss-America

25 --

BBGDS

- -TSS

OTS NTSS

or!

IO

20

30

io

So

60

70

Frame Number

So

so

,bo

o !l

IO

20

30

io

50 60 Frame Number

70

80

so

70

80

90

,bo

(a) Salesman

Miss-America

25 - FS

- - BBGDS - -TSS OTS NTSS

OA

io

20

30

40 5.0 60 Frame Number

70

80

90

‘ 0

10

20

30

50 60 Frame Number

40

(b)

Fig. 3. Performance comparison on Salesman sequence with search ranges (a) &7 pixels and (h) f 1 S pixels in both horizontal and vertical directions. 1) Evaluate the objective function for all nine points in the checking block. 2 ) If the minimum occurs at the center, stop; the motion vector points to the center. Otherwise, reset the checking block so that its center is the winning pixel, and go to step 1). Note that except for the first iteration, most of the pixels in the checking block have already been checked in a previous pass. This is highlighted in Fig l(a), where the second pass is centered either around corner pixel ( 2 ) or edge pixel (3). In the first case, five new checking pixels have to be visited; in the second case, only three. The search procedure of the BBGDS always moves the search in the direction of optimal gradient descent. This is the direction where one expects the objective function to approach its minimum. The procedure is illustrated in Fig. 1, where the motion vector ( - 2 , -4) is found.

111. EXPERIMENTAL RESULTS Four QCIF video sequences, “Foreman,” “Salesman,” “Miss America,’’ and “Car Phone” are used in our simulations. The block size is

100

(b)

Performance comparison on Miss America sequence with search ranges (a) f 7 pixels and (h) f 1 S pixels in both horizontal and vertical direchons. Fig 4

fixed at 8 x 8. Two different search ranges ik7 and ik15 pixels, in both horizontal and vertical directions, are used in our simulations. We use the mean absolute difference (MAD) as the objective function. For a given displacement (z,y), the MAD between bZock(m,n ) of current frame and bZock(m+z, n + y ) of reference frame is defined as -

7

7

where f h ( i , j ) is the pixel intensity at position ( i , j ) of frame L, and the bZock(m,n) is the block with its upper left corner at position ( m ,n ) of a frame. The first 100 frames of the video sequence are used in our simulations. We use the mean square error per pixel as the measure of performance. The required number of search points of each

Authorized licensed use limited to: MIT Libraries. Downloaded on January 3, 2009 at 18:08 from IEEE Xplore. Restrictions apply.

422

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 6, NO. 4, AUGUST 1996

that global minimum distribution is centralized in real world video sequence, the BBGDS takes this advantage by using a centerbiased checking block in its initial search step. The BBGDS also employs the concept of checking block instead of checking point in each of its search steps. The BBGDS searches for the motion vector along the block-based gradient descent direction where the minimum is expected to lie. The BBGDS has reduced susceptibility to the local minimum because of the use of the checking block. Experimental results show that the proposed technique provides competitive performance with reduced computational complexity.

Car-Phone 70

6o

!

t

- - BBGDS - -TSS

!

REFERENCES

‘0

10

20

30

40 50 60 Frame Number

70

80

90

ISO/IEC JTCl/SC29/WGll, “ISO/IEC CD 11172: Information technology,” MPEG-1 Committee Draft, Dec. 1991. ISOlIEC JTCl/SC29/WGI 1, “ISOfiEC CD 13818: Information technology,” MPEG-2 Committee Draft, Dec. 1993. International Telecommunication Union, “Video codec for audiovisual services at p x 64 kbits,” ITU-T Recommendation H.261, Mar. 1993. International Telecommunication Union, “Video coding for low bitrate communication,” ITU-T Draft H.263, July 1995. T. Koga, K. Iinnma, A. Hirano, and T. Ishiguro, “Motion-compensated interframe coding for video conferencing,” in Nut. Telecommun. Con$, pp. G5.3.145.3.5, 1981. J. Iain and A. Jain, “Displacement measurement and its application in interframe image coding,” ZEEE Trans. Commun., vol. COM-29, pp. 1799-1808, Dec. 1981. R. Srinivasan and K. Rao, “Predictive coding based on efficient motion estimation,” IEEE Trans. Commun., vol. COM-33, pp. 888-896, Aug. 1985. R. Li, B. Zeng, and M. L. Liou, “A new three-step search algorithm for block motion estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 4, pp. 438442, Aug. 1994. L. Chen, W. Chen, Y. Jehng, and T. Chiueh, “An efficient parallel motion estimation algorithm for digital image processing,” IEEE Trans. Circuits Syst. Video Technol., vol. 1, pp. 438442, Dec. 1991.

100

(a) Car-Phone

A Note on “Block Wavelet Transforms for Image Coding” 0‘

10

20

30

40

50

60

Frame Number

70

80

90

100

A. Sharaf and F. Marvasti

(b)

Fig 5 Performance companson on Car Phone sequence with search ranges (a) It7 pixels and (b) 415 pixels in both horizontal and vertical directions block is used as the measure of computational complexity. Each video sequence is processed by five algorithms: full search (FS), three-step search (TSS), one-at-a-time search (OTS), new threestep search (NTSS), and the proposed block-based gradient descent search (BBGDS). The degree of computational complexity of each algorithm with respect to full search algorithm and MSE is calculated. The simulation results are shown in Tables 1-11 and Figs. 2-5. The simulations show that both NTSS and BBGDS outperform TSS and OTS. The distortions of BBGDS are very slightly greater than those of NTSS, but the computational complexity of BBGDS is significantly less than that of NTSS. IV. CONCLUSION In this paper, the BBGDS algorithm is proposed to perform block motion estimation in video coding. Based on the observation

Abstruct- We note that the arrangement of the rows of the block wavelet transform (BWT) matrix as given by (3) in the above paper’ is not in an increasing order of frequency as implied by the equations and figures in the same paper. The raws of the matrix are rearranged to follow an increasing order of frequency.

I. THEREARRANGEMENT AND ITS JUSTIFICATION The arrangement of the rows of the block wavelet transform (BWT) matrix corresponding to the Daubechies’ 8-tap filter as given by (3) in the above paper’ is not in an increasing order of frequency as implied by (1) and Fig. 1 of the same paper. Equation (1)’ states implicitly that the transform domain coefficients 8, are to be ordered Manuscript received October 17, 1995; revised April 23, 1996. This paper was recommended by Associate Editor J. Brailean. This work was supported by Signals and Software (Ltd) and the EPSRC. The authors are with the Department of Electrical and Electronic Engineering, King’s College London, Strand, London, WC2R 2LS U.K. Publisher Item Identifier S 1051-821S(96)0S459-6. A. E. Cetin, 0. N.Gerek, and S. Ulukus, ZEEE Trans. Circuits Syst. Video Technol., vol. 3, no. 6, pp. 433435, Dec. 1993.

’

10.5-8215/96$05,00 0 1996 IEEE

Authorized licensed use limited to: MIT Libraries. Downloaded on January 3, 2009 at 18:08 from IEEE Xplore. Restrictions apply.

A Block-Based Gradient Descent Search Algorithm for ...

is proposed in this paper to perform block motion estimation in video coding. .... shoulder sequences typical in video conferencing. The NTSS adds ... Hence, we call our .... services at p x 64 kbits,â ITU-T Recommendation H.261, Mar. 1993.

Download PDF

363KB Sizes 1 Downloads 286 Views

Report

A Block-Based Gradient Descent Search Algorithm for ...

Recommend Documents