Lossy Strict Multilevel Successive Elimination Algorithm ...

Viewer
Transcript

IEICE TRANS. FUNDAMENTALS, VOL.E90–A, NO.4 APRIL 2007

764

PAPER

Special Section on Selected Papers from the 19th Workshop on Circuits and Systems in Karuizawa

Lossy Strict Multilevel Successive Elimination Algorithm for Fast Motion Estimation Yang SONG†a) , Student Member, Zhenyu LIU†† , Nonmember, Takeshi IKENAGA† , Member, and Satoshi GOTO† , Fellow

SUMMARY This paper presents a simple and eﬀective method to further reduce the search points in multilevel successive elimination algorithm (MSEA). Because the calculated sea values of those best matching search points are much smaller than the current minimum SAD, we can simply increase the calculated sea values to increase the elimination ratio without much aﬀecting the coding quality. Compared with the original MSEA algorithm, the proposed strict MSEA algorithm (SMSEA) can provide average 6.52 times speedup. Compared with other lossy fast ME algorithms such as TSS and DS, the proposed SMSEA can maintain more stable image quality. In practice, the proposed technique can also be used in the fine granularity SEA (FGSEA) algorithm and the calculation process is almost the same. key words: motion estimation (ME), successive elimination algorithm (SEA), multilevel successive elimination algorithm (MSEA), strict multilevel successive elimination algorithm (SMSEA)

1.

Introduction

Motion Estimation (ME) is widely used in nowadays video coding standards to remove the temporal redundancy within frames. Full search block matching algorithm (FSBMA) is one popular ME technique and can achieve desirable performance with abundant computation. To reduce the computation, many fast ME algorithms are proposed based on following two techniques: (1) To reduce the search points, which means ME is only conducted on a subset of the total search area. The three-step search (TSS) [1], four-step search (FSS) [2], diamond search (DS) [3] and hexagon search (HEXBS) [4] are some famous examples. In recent years, using hybrid search patterns [5], [6] becomes more popular because of their better image quality. (2) To decrease the computation overhead for each search point, which means the matching cost is replaced by a partial or simple one with less complexity. The pixel decimation [7], pixel truncation [8], and successive elimination algorithm (SEA) [9] are some instances. SEA [9] attracts lots of interests because it can provide the same image quality as that of FSBMA but with less Manuscript received June 24, 2006. Manuscript revised October 18, 2006. Final manuscript received November 30, 2006. † The authors are with the Graduate School of Information, Production and Systems, Waseda University, Kitakyushu-shi, 8080135 Japan. †† The author is with Kitakyushu Foundation for the Advancement of Industry Science and Technology, Kitakyushu-shi, 8080135 Japan. a) E-mail: [email protected] DOI: 10.1093/ietfec/e90–a.4.764

computation. In theory, SEA tries to reduce the computation by low-pass filter based sub-sampling technique. Both the current macroblock (MB) and the candidate block are reduced to one pixel by additions, and the absolute diﬀerence between the two pixels is evaluated to decide whether to skip this search point or not. The sum operation works as a low-pass filter to achieve the following merits: (1) The high frequency signals are removed and only the low frequency ones are kept to reduce the computation. Because the low frequency signals have more information than high frequency ones, the image quality will not be greatly affected. (2) The data length is increased. Therefore, more information is preserved. To further reduce the computation, multilevel SEA (MSEA) [10] and fine granularity SEA (FGSEA) [11] algorithms are proposed. By splitting the block into sub-blocks, larger thresholds can be obtained and thus more search points can be eliminated. However, the increased computation overhead is the performance penalty. To facilitate the hardware implementation, one global SEA (GSEA) algorithm is proposed [12], which successfully removes the branch prediction in SEA and makes the workflow regular. In GSEA, the whole search area is firstly scanned, and the SEA algorithm is used as an evaluation criterion to choose M positions. Then the FSBMA are only conducted on these positions. However, the intensive computation still is the drawback. To avoid above demerits, a strict MSEA (SMSEA) algorithm is proposed in this paper. The algorithm is based on the MSEA algorithm but with much stricter elimination criteria. Compared with the original MSEA algorithm, the proposed SMSEA algorithm can achieve average 6.52 times speedup with stable image quality, which is better than the famous TSS [1] and DS [3] algorithms. Compared with the lossy GSEA algorithm, the proposed SMSEA can provide competitive image quality with 5.75 times speedup. The rest of this paper is organized as follows: In Sect. 2, the related works are described. The proposed SMSEA algorithm is discussed in Sect. 3. The computation and performance comparison are presented in Sect. 4. Finally, Sect. 5 concludes the paper. 2.

Related Works

2.1 FSMBA Algorithm For a N×N MB, the SAD calculation (m, n) is illustrated as

c 2007 The Institute of Electronics, Information and Communication Engineers Copyright

SONG et al.: LOSSY STRICT MSEA FOR FAST MOTION ESTIMATION

765

Fig. 2

MSEA algorithm at level 1 with 16 × 16 block.

by the frame-based method [9], the introduced computation overhead is negligible. But because many search points can be skipped, lots of computation can be saved. 2.3 Multilevel SEA (MSEA) Algorithm Fig. 1

SAD(m, n) =

SEA algorithm workflow.

N−1 N−1

|C(i, j) − R(i + m, j + n)|,

(1)

i=0 j=0

where C(i, j) and R(i + m, j + n) represent the pixels (i, j) in the current MB and the candidate block, respectively. For a given search range, the best motion vector (MV) is defined as the search point with the smallest SAD.

In MSEA [10] algorithm, the N×N MB and candidate block are divided into sub-blocks of N/2×N/2, and further divided into N/4×N/4 and till 1 × 1. Therefore, there are totally L = log2 N + 1 levels, and level l (0≤l≤L − 1) has K = 4l sub-blocks. In MSEA algorithm, the sum norms of the subblocks are accumulated to get the msea value, as shown in SAD(m, n) =

N−1 N−1

|C(i, j) − R(i + m, j + n)|

i=0 j=0

2.2 SEA Algorithm

≥

According to the inequality |a + b|≤|a| + |b|, (a, b ∈ R), The mathematic basis of SEA [9] can be concluded as SAD(m, n) =

N−1 N−1

|C(i, j) − R(i + m, j + n)|

i=0 j=0

≥ |C0 − R0 | ≡ sea(m, n),

(2)

where C0 and R0 are the sum norms of current MB and candidate block, respectively. C0 =

N−1 N−1

C(i, j).

(3)

R(i + m, j + n).

(4)

i=0 j=0

R0 =

N−1 N−1 i=0 j=0

The workflow of SEA algorithm is illustrated in Fig. 1. For each search point, its sea value is firstly calculated. If the obtained sea value is larger than the current minimum SAD (SADmin ), the SAD value of this point is guaranteed to be larger than the SADmin . Therefore, this search point can be directly skipped without image quality loss. Because for each MB, the C0 only needs to be calculated once and the R0 also can be quickly derived from previous data [12] or

K−1

|Ck − Rk |

k=0

≡ mseal (m, n), where Ck and Rk are the sum norms of kth current block and candidate block, respectively. The msea value at level 1 with 16 × 16 block size is shown in Fig. 2. We can see that the msea value at level 0 is the same as sea value, and msea at level L − 1 is the SAD value. The msea values of diﬀerent level abides by the following inequality msea0 ≤msea1 ≤...≤mseaL−1 =SAD.

(5)

In MSEA algorithm, for each search ponit, the msea values are sequential evaluated from level 0 to L − 2 to eliminate this point. Because the msea values monotonously increase with the levels, more and more search points can be eliminated. 2.4 Fine Granularity SEA (FGSEA) Algorithm In MSEA algorithm, there exists a large gap between two adjacent levels. For instance, with 16 × 16 block, the msea value level 1 (8 × 8 block size) only needs to calculate the sum norms of 4 sub-blocks. But for msea value level 2 (4×4 block size), 16 sub-blocks need to be processed. Therefore, with the increase of level, both the msea value and the computation complexity have an obvious enlargement.

IEICE TRANS. FUNDAMENTALS, VOL.E90–A, NO.4 APRIL 2007

766 Table 1 QCIFa CIFb a b

Fig. 3

FGSEA algorithm at level 2 with 16 × 16 block.

To decrease the level gaps, the FGSEA [11] algorithm adopts diﬀerent block sizes. For a N×N block, it is firstly divided into four N/2×N/2 blocks, and then the four N/2×N/2 blocks are sequential divided, and so on. Therefore, There are totally L = (N×N − 1)/3 levels in FGSEA algorithm. The FGSEA level 2 with 16 × 16 block size is illustrated in Fig. 3 as one instance. In FGSEA algorithm, the fgsea value of each level is sequential calculated and used to eliminate one search point. Because adjacent two levels has a smaller gap, more search points can be skipped at earlier levels and thus the extra computation overhead can be decreased. 2.5 Global SEA (GSEA) Algorithm The hardware-friendly GSEA [12] is based on the assumption that the msea values can eﬀectively represent the SAD values. In GSEA, the whole search area is firstly scanned and the M positions with the smallest mseal values are stored. Then, the FSBMA is only conducted on those chosen search points to save computation. We can see that larger l and M represent more computation and better image quality, and vice versa. In [12], based on experiments, the author recommended to set l = 2 and M = 7 for QCIF and CIF frames. That is to say, for each search point, its msea2 (4 × 4 block) value should be calculated and compared, and only the 7 minimum ones are kept for FSBMA. Because the computation complexity of of mseal calculation is 4l /256 of FSBMA, the GSEA algorithm requires more than 1/16 of the FSMBA computation. 3.

AESP/MB in SEA algorithm. Sequence

AESP/MB

Container Foreman Mobile Football

1.08 1.94 2.00 9.89

16 × 16 block, [−16, +16] search range, 100 frames 16 × 16 block, [−32, +32] search range, 100 frames

can see that many search points which pass the evaluation are in fact ineﬀective and should be eliminated. In SEA algorithm, these ineﬀective points incur more performance penalty because not only the SAD values should be calculated like FSBMA, the sea values also need to be computed. To further improve the skip ratio, in MSEA and FGSEA algorithms, the block is divided into sub-blocks to increase the elimination thresholds. However, these approaches require more computation and incur more performance penalty for such ineﬀective points. In practice, the sea value of one search point can be used to deduce its SAD value. In GSEA [12], the sea value is assumed to have direct ratio to SAD value, smaller sea value means smaller SAD value. In our paper, the ratio between the current SADmin and the sea value is used to increase the elimination ratio. We have observed that the sea values of those eﬀective search points are much smaller than the current SADmin . Therefore, we can simply enlarge the calculated sea values to skip those ineﬀective search points without much aﬀecting the eﬀective search points. This because in SEA algorithm, both the current MB and candidate blocks are reduced to 1 pixel by low-pass filter, and thus the high-frequency signals are removed and only the lowfrequency ones are kept. For those eﬀective search points, when only the low-frequency signals are kept, they have much more similarity to the original MB. Therefore, their sea values are much smaller than other search points. For instance, the SADmin /SEA ratio distribution of 8 sequences are tested and is shown in Fig. 4. The SADmin /SEA is defined as the percentage between the current SADmin and sea values of the eﬀective points. In Fig. 4, for all the sequences, the SADmin /SEA ratio of about 90% eﬀective search points are larger than 2, which means we can simply double the sea value in Eq. (2) to increase the elimination ratio and maintain almost the same the image quality.

Lossy Strict MSEA Algorithm 3.2 Lossy Strict SEA (SSEA) Algorithm

3.1 SEA Algorithm Observations According to our experiments, about 48–79% of the search points can be directly skipped by the SEA algorithm. However, many search points still pass the evaluation and in practice the SADmin is rarely updated, as shown in Table 1. The average eﬀective search points per MB (AESP/MB) is defined as the average points per MB which pass the SEA evaluation and update the current SADmin . Therefore, we

Based on the observations, instead of directly using the sea value as the elimination criterion in SEA algorithm, we can multiply this sea value by λ and use this strict sea (ssea) value to evaluate the search points, as shown in Eq. (6). The workflow of the proposed SSEA algorithm is almost the same as the original SEA algorithm except that the enlarged ssea value is used to replace the original sea value. Because the ssea value is much larger than sea value, more search

SONG et al.: LOSSY STRICT MSEA FOR FAST MOTION ESTIMATION

767

points can be skipped. It can be seen that smaller λ makes the results more reliable and larger λ saves more computation. In general, the λ provides a good tradeoﬀ between the image quality and computation complexity. ssea = sea×λ(λ ≥ 1).

(6)

The key problem in the proposed SSEA algorithm is the choosing of λ, which is a empirical factor and has 2 basic guidelines: (1) Computation complexity. Because the λ is frequently used in the calculation, we hope the calculation complexity is small. Therefore, the values such as 2, 2.5, 4 are more desirable because they can be easily realized by shift and addition operations. (2) Observation and experiments. We hope the chosen λ do not aﬀect too many eﬀective search points. In our experiments, we found 30% is a good threshold, which means about 70% eﬀective search points are not aﬀected, as shown in Fig. 4. Moreover, the chosen λ also need to be evaluated by thoroughly experiments. Based on the above guidelines, the λ is set to 4 in the

proposed algorithm. To further evaluate this value, 4 typical video sequences are tested and listed in Table 2. When λ equals 1, the image quality is the same as FSBMA and is used as the basis for performance evaluation. As we can see, with the increase of λ, more search points are skipped, but the video quality becomes worse. According to our experiments, λ can be set to 4 for QCIF and CIF sequences. 3.3 Lossy Strict MSEA (SMSEA) Algorithm Because SEA algorithm is MSEA algorithm level 0, the proposed idea can also be used in MSEA algorithm and is named strict MSEA (SMSEA) algorithm. The original msea values in MSEA algorithm is multiplied with associ-

(a)

(b) Fig. 4 Average SADmin /SEA ratio distribution with 16 × 16 block and 100 frames. (a) QCIF, [−16, +16] search range. (b) CIF, [−32, +32] search range.

Table 2 Sequence

a b

Fig. 5

Proposed lossy SMSEA algorithm workflow.

Average PSNR/Bitrate versus λ in SSEA algorithm.

QCIFa

Foreman Football

λ=1 35.123 /135.866 34.927/615.439

CIFb

Foreman Football

36.319/478.591 36.307/1609.934

PSNR(dB) / Bitrate(kbps) λ=2 λ=4 35.128/136.116 35.117/135.960 34.926/615.962 34.932/619.421

λ=8 35.097/137.724 34.928/626.822

36.328/479.040 36.302/1610.755

36.282/501.583 36.300/1644.005

16 × 16 block, [−16, +16] search range, 100 frames 16 × 16 block, [−32, +32] search range, 100 frames

36.296/486.566 36.293/1622.789

IEICE TRANS. FUNDAMENTALS, VOL.E90–A, NO.4 APRIL 2007

768

ated λ factors to get the strict msea (smsea) values. Based on the same method, for CIF and QCIF image with 16 × 16 block size, the λ factors of MSEA level 0–3 can be fixed to 4.0, 2.5, 2.0 and 1.5, respectively. The proposed lossy SMSEA algorithms is then shown in Fig. 5 and is described as follows: For each search point, the msea value at level 0 (msea0 ) is firstly calculated and then is multiplied by λ0 . If the obtained value is larger than the current SADmin , this search point can be directly skipped, otherwise, the msea value at level 1 (msea1 ) is calculated and multiplied by λ1 . The obtained value is then used to make a comparison, and so on. It can be seen that the proposed SMSEA is almost the same as MSEA except that the smsea values are used to replace the msea values. Because larger smsea values represent more search points can be eliminated, the proposed SMSEA algorithm can greatly reduce the computation complexity. 4.

Computation and Performance Analysis

4.1 Computation Analysis In the original MSEA algorithm, besides the SAD calculation, the msea values calculation introduces extra overhead and can be divided into two parts. The first one is the calculation of the sum norms of current MB and candidate block. For N×N block size, the sum norm of current MB only need to be calculated once. The sum norms of smaller sub-blocks are firstly calculated and then the sum norms of larger subblocks can be obtained by merging these smaller sub-blocks. Therefore, the computation cost is small and is µcur = 3 × (N/2) + 3× 2

L−2

4.

(7)

The sum norms of candidate block can be fast calculated in the frame-level [10], and the computation complexity is about µre f ≈ 2(L − 1)N 2 .

(8)

The second part of the computation overhead is the calculation of msea values. When both the sum norms of current MB and candidate block are calculated, the msea value should be calculated as the elimination criterion. We know that there are 4l sub-blocks in MSEA algorithm level l. Therefore, the calculation complexity µlmsea for mseal is equal to that of (4l /N 2 ) FSBMA computation. The total computation complexity of the MSEA algorithm includes the computation overhead and the SAD calculation of those unskipped search points. For one search point, the FSBMA computation is 3N 2 −1 operations, which include N 2 subtractions, N 2 absolute operations and N 2 − 1 additions. Therefore, according to the above discussion, the total computation complexity ξmsea of MSEA algorithm is ξmsea

L−2

4l

l=0

+ 2(L − 1)N 2 L−2 + SPl ×4l /N 2 ×(3N 2 − 1) l=0

+ SPunskip ×(3N 2 − 1),

(9)

where the SPl means the processed search points in MSEA level l, and the SPunskip represents the number of unskipped search points. Therefore, the equivalent average search point per MB (EASP/MB) for the MSEA is EASP/MBmsea = ξmsea /(3N 2 − 1).

(10)

For the proposed SMSEA algorithm, the workflow is almost the same as MSEA except that smseal values are used, which are obtained by multiplied the mseal with their associated λl factors. Because all the λ factors are well chosen and can be realized by shift and addition operations, this cost is negligible. Therefore, the computation complexity of the proposed SMSEA algorithm is almost the same as that of MSEA algorithm, namely ξ smsea ≈ ξmsea . Then, the EASP/MB for the proposed SMSEA algorithm is EASP/MB smsea ≈ EASP/MBmsea .

(11)

We can see that this EASP/MB is the real computation complexity of MSEA and SMSEA algorithms, and is evaluated as the search point per MB to make the performance comparison uniform. 4.2 Performance Comparison

l

l=0

overhead

≈ 3 × (N/2)2 + 3×

= µcur + µre f + µmsea +µunskip

When the λ factors are fixed in the proposed SMSEA algorithm, 12 video sequences are tested to prove its stable image quality. The experiments are conducted on the H.264/AVC reference software JM8.1a. The test conditions are I-P-P-P. . . , CAVLC, Hadamard transform, 1 reference frame, and 1/4-pixel accurate motion vector (MV). The R-D cost in H.264/AVC is used to replace the SAD value in SEA algorithm. The block size is fixed to 16 × 16 because the SEA algorithm only can support fixed block mode, and the QP is set to 28, 32, 36, and 40. For QCIF and CIF sequences, the search range is [−16, +16] and [−32, +32], respectively. The FSBMA is used as the basis of the image quality. The average PSNR and bitrate diﬀerences are shown in Table 3. The “+” symbol means increments and “−” symbol means decrements, and the calculation specification can be found in [13]. The image quality of the TSS [1], DS [3], GSEA [12] and the proposed SMSEA algorithms are listed in Table 3. The original MSEA algorithm can provide the same image quality as FSBMA and thus is not shown. We can see that the TSS performs not well for sequences with large motions, such as Table and Football. This because TSS only search 25 search points in the whole search area and is easily trapped into local minimums for image with fast movement.

SONG et al.: LOSSY STRICT MSEA FOR FAST MOTION ESTIMATION

769 Table 3 Sequence QCIF

CIF

Container Foreman Mobile Table Football Tempete Container Foreman Mobile Table Football Tempete

Average

DS [3] PSNR bitrate −0.00 dB +0.02% −0.14 dB +3.37% −0.00 dB +0.09% −0.20 dB +5.27% −0.09 dB +1.84% +0.01 dB −0.29% −0.02 dB +0.54% −0.18 dB +4.35% +0.07 dB −1.63% −0.28 dB +8.37% −0.19 dB +4.04% −0.00 dB +0.01%

GSEA [12] PSNR bitrate −0.02 dB +0.49% −0.11 dB +2.81% +0.01 dB −0.20% −0.16 dB +4.98% −0.08 dB +1.58% −0.04 dB +1.00% −0.00 dB +0.08% −0.05 dB +1.30% +0.01 dB −0.23% −0.09 dB +2.41% −0.04 dB +0.81% −0.02 dB +0.44%

Proposed SMSEA PSNR bitrate −0.01 dB +0.14% −0.08 dB +1.95% −0.00 dB +0.00% −0.10 dB +2.55% −0.06 dB +1.29% −0.03 dB −0.62% −0.02 dB +0.50% −0.21 dB +5.17% −0.00 dB +0.12% −0.09 dB +2.61% −0.10 dB +2.21% +0.01 dB −0.31%

−0.33 dB

−0.09 dB

−0.05 dB

−0.06 dB

+9.64%

Table 4 Sequence QCIF

CIF

Container Foreman Mobile Table Football Tempete Container Foreman Mobile Table Football Tempete

Average

Image quality comparisons.

TSS [1] PSNR bitrate −0.00 dB +0.06% −0.23 dB +5.70% −0.00 dB +0.03% −0.79 dB +21.94% −0.20 dB +4.24% +0.03 dB −0.07% −0.02 dB +0.60% −0.57 dB +14.59% −0.03 dB +0.67% −1.60 dB +55.24% −0.59 dB +12.79% +0.00 dB −0.07%

+2.17%

+1.29%

+1.30%

Computation complexity comparisons.

FS ASP/MB 1089.0 1089.0 1089.0 1089.0 1089.0 1089.0 4225.0 4225.0 4225.0 4225.0 4225.0 4225.0

TSS [1] ASP/MB 25.0 25.0 25.0 25.0 25.0 25.0 25.0 25.0 25.0 25.0 25.0 25.0

DS [3] ASP/MB 13.1 14.5 13.0 15.1 19.4 13.2 13.1 15.6 14.0 15.4 22.5 14.1

2657.0

25.0

15.3

On the contrary, the DS algorithm uses diamond patterns to trace the motion direction and then a much better image quality can be achieved. For the proposed SMSEA, because no search points are directly skipped before evaluation, a stable image quality can be obtained, which is better than TSS and DS. Because the GSEA [12] is a lossy algorithm and is also based on MSEA, its performance is evaluated and listed in Table 3. It can be seen that the GSEA can provide the best image quality, which means the msea2 values can eﬀectively represent the real SAD values. However, the proposed SMSEA can also provide very competitive image quality with average 0.01 dB PSNR drop and 0.01% bitrate increase. Moreover, the proposed SMSEA algorithm requires much less computation of GSEA, as discussed in the following. To make the performance comparisons uniform, the average search point per MB (ASP/MB) and equivalent average search point per MB (EASP/MB) of diﬀerent algorithms are listed in Table 4. The ASP/MB is defined as the actual average search point per MB. The EASP/MB includes the total computation cost and is defined in Eq. (11). We can see that compared with TSS and DS, the proposed SMSEA algorithm requires 1.38 and 2.26 times computation, but when the image quality is taken into consideration, we think this computation increase is still acceptable.

MSEA [10] ASP/MB EASP/MB 70.0 113.7 15.5 36.6 2.2 26.6 140.8 260.2 72.6 163.5 4.0 22.8 387.5 641.6 130.5 234.6 25.4 126.7 231.5 641.7 125.3 335.2 38.8 104.2 103.7

225.6

GSEA [12] EASP/MB 86.7 86.7 86.7 86.7 86.7 86.7 311.2 311.2 311.2 311.2 311.2 311.2 199.0

Proposed SMSEA ASP/MB EASP/MB 7.8 21.4 2.8 12.6 1.1 10.9 5.1 29.0 3.1 20.9 1.1 10.4 15.1 73.8 29.3 64.3 4.0 34.3 4.8 65.7 3.2 41.7 4.3 30.0 6.8

34.6

Compared with the original MSEA, the proposed SMSEA algorithm can provide 6.52 times speedup, which mainly benefits from the much stricter elimination criteria in SMSEA algorithm. Compared with the lossy GSEA algorithm, the proposed SMSEA can provide comparable image quality with 5.75 times speedup. This because for GSEA, the msea2 values of every search point should be calculated and thus introduces lots of computation. On the contrary, for every search point in the proposed SMSEA algorithm, the msea values are sequentially calculated from level 0–3 and thus many points can be eliminated in the beginning levels. 5.

Conclusion

In this paper, a lossy strict multilevel SEA (SMSEA) algorithm is proposed. For most sequences, because the sea values of those best matching points are much smaller than current minimum SAD, we can simply enlarge the calculated sea values to eliminate those non-best matching points. Based on this observation, in the proposed SMSEA algorithm with 16 × 16 block size, the msea values of the original MSEA level 0–3 are multiplied with 4.0, 2.5, 2.0, and 1.5 to increase the elimination ratio. Because these factors are well chosen and can be easily realized by shift and addition operations, the introduced computation overhead is

IEICE TRANS. FUNDAMENTALS, VOL.E90–A, NO.4 APRIL 2007

770

negligible. Compared with the original lossless MSEA algorithm, experiments show that the proposed lossy SMSEA algorithm can provide about 6.52 times speedup. Compared with other lossy fast ME algorithms such as TSS [1] and DS [3], the proposed algorithm can provide a more stable image quality. In practice, the proposed technique can also be used in FGSEA [11] algorithm. For instance, the FGSEA with 16×16 block has 85 levels. Each level can has its associated λ factor to increase the elimination ratio, and the calculation process can be traced by analogy. Acknowledgments This research is supported by CREST, JST. References [1] T. Koga, K. Linuma, A. Hirano, Y. Iijima, and T. Ishiguro, “Motion-compensated interframe coding for video conferencing,” Nat. Telecommunications Conf., pp.G5.3.1–G5.3.5, 1981. [2] L.M. Po and W.C. Ma, “A novel four-step search algorithm for fast block motion estimation,” IEEE Trans. Circuits Syst. Video Technol., vol.6, no.3, pp.313–317, June 1996. [3] S. Zhu and K.K. Ma, “A new diamond search algorithm for fast block-matching motion estimation,” IEEE Trans. Image Process., vol.9, no.2, pp.287–290, Feb. 2000. [4] C. Zhu, X. Lin, and L.P. Chua, “Hexagon-based search pattern for fast block motion estimation,” IEEE Trans. Circuits Syst. Video Technol., vol.12, no.5, pp.49–355, May 2002. [5] C.H. Cheung and L.M. Po, “Novel cross-diamond-hexagonal search algorithms for fast block motion estimation,” IEEE Trans. Multimed., vol.7, no.1, pp.16–22, Feb. 2005. [6] S.Y. Huang, C.Y. Cho, and J.S. Wang, “Adaptive fast blockmatching algorithm by switching search patterns for sequences with wide-range motion cotent,” IEEE Trans. Circuits Syst. Video Technol., vol.15, no.11, pp.1373–1384, Nov. 2005. [7] B. Liu and A. Zaccarin, “New fast algorithms for the estimation of block motion vector,” IEEE Trans. Circuits Syst. Video Technol., vol.3, no.2, pp.148–157, April 1993. [8] Z.L. He, K.K. Chan, and M.L. Liou, “Low-power VLSI design for motion estimation using adaptive pixel truncation,” IEEE Trans. Circuits Syst. Video Technol., vol.10, no.5, pp.669–678, Aug. 2000. [9] W. Li and E. Salari, “Successive elimination algorithm for motion estimation,” IEEE Trans. Image Process., vol.4, no.1, pp.105–107, Jan. 1995. [10] X.Q. Gao, C.J. Duanmu, and C.R. Zou, “A multilevel successive elimination algorithm for block matching motion estimation,” IEEE Trans. Image Process., vol.9, no.3, pp.501–504, March 2000. [11] C. Zhu, W.S. Qi, and W. Ser, “Predictive fine granularity successive elimination for fast optimal block-matching motion estimation,” IEEE Trans. Image Process., vol.14, no.2, pp.213–221, Feb. 2005. [12] Y.W. Huang, S.Y. Chien, B.Y. Hsieh, and L.G. Chen, “Global elimination algorithm and architecture design for fast block matching motion estimaiton,” IEEE Trans. Circuits Syst. Video Technol., vol.14, no.6, pp.898–907, June 2004. [13] G. Bjontegaard, “Calculation of average PSNR diﬀerences betwween RD-curves,” 13th VCEG-M33 Meeting, pp.1–4, April 2001.

Yang Song received the B.E. degree in Computer Science from Xi’an Jiaotong University, China in 2001 and M.E. degree in Computer Science from Tsinghua University, China in 2004. He is currently a Ph.D. candidate in Graduate School of Information, Production and Systems, Waseda University, Japan. His research interest includes motion estimation, video coding technology and associated VLSI architecture.

Zhenyu Liu received his B.E., M.E. and Ph.D. degrees in electronics engineering from Beijing Institute of Technology in 1996, 1999 and 2002, respectively. His doctor research focused on real time signal processing and relative ASIC design. From 2002 to 2004, he worked as post doctor in Tsinghua University of China, where his research mainly concentrated on embedded CPU architecture. Currently he is a researcher in Kitakyushu Foundation for the Advancement of Industry Science and Technology. His research interests include real time H.264 encoding algorithms and associated VLSI architecture.

Takeshi Ikenaga received his B.E. and M.E. degrees in electrical engineering and the Ph.D. degree in information & computer science from Waseda University, Tokyo, Japan, in 1988, 1990, and 2002, respectively. He joined LSI Laboratories, Nippon Telegraph and Telephone Corporation (NTT) in 1990, where he has been undertaking research on the design and test methodologies for highperformance ASICs, a real-time MPEG2 encoder chip set, and a highly parallel LSI & system design for imageunderstanding processing. He is presently an associate professor in the system LSI field of the Graduate School of Information, Production and Systems, Waseda University. His current interests are application SoCs for image, security and network processing. Dr. Ikenaga is a member of the IPSJ and the IEEE. He received the IEICE Research Encouragement Award in 1992.

Satoshi Goto was born on January 3rd, 1945 in Hiroshima, Japan. He received the B.E. degree and the M.E. degree in Electronics and Communication Engineering from Waseda University in 1968 and 1970, respectively. He also received the Dr. of Engineering from the same university in 1981. He is IEEE fellow, Member of Academy Engineering Society of Japan and professor of Waseda University. His research interests include LSI System and Multimedia System.