Impact of Similarity Threshold on Arbitrary Shaped Pattern Selection Very Low Bit-rate Video Coding Algorithm Manoranjan Paul and Manzur Murshed Gippsland School of Computing and Information Technology, Monash University, Churchill Vic 3842, Australia E-mail:{manoranjan.paul, manzur.murshed}@infotech.monash.edu.au
Abstract. Very low bit-rate video coding using arbitrary shaped patterns to represent moving regions in macroblocks has very good potential for improved coding efficiency. For any pattern based coding similarity threshold is used as a matching criterion between a moving region and a pattern. This metric together with quantization can control the coding efficiency curve. Unlike the quantization step size, the benefit of this metric is that it does not need to be transmitted any information in the decoder. Finer changes of coding efficiency curve can be possible by changing the similarity threshold instead of changing the quantization level, as a result a number of bits will be reduced. In this paper, we investigate the coding efficiency curves of different similarity thresholds.
1
Introduction
Reducing the transmission bit-rate while concomitantly retaining image quality continues to be a challenge for efficient video compression standards, such as H.263 [5], MPEG-2 [3].These standards are however inefficient while coding at very low bit-rate (VLBR)(≤ 64 Kbps) due to inability to encode moving objects within a 16 × 16 pixel macroblock (MB) during motion estimation (ME), resulting in all 256 residual error values being transmitted for motion compensation (MC) regardless of whether there are moving objects. H.264/AVC standard [6] extended this block based motion compensated coding idea by introducing variable-block size (from 16×16 to 4×4 ) to approximate the shape of the moving objects within the MB more accurately. It requires a separate motion vector for each partition and the choice of partition size and the number of partition types has a significant impact on coding efficiency. It can be easily observed that the possibility of choosing smaller partition sizes diminishes as the target bit rate is lowered. Consequently, the coding efficiency improvement due to MB partitioning can no longer be realized for a VLBR target as larger partition sizes have to be chosen in most of the cases to keep the bit-rate in check at the expense of inferior shape approximation. To address this problem, Fukuhara et al. [1] first proposed pattern based coding using four MB-partitioning patterns of 128-pixels each. By treating identically each MB, irrespective of its motion content, also resulted in a higher
bit-rate being incurred for those MBs which contained only static background or had moving object(s), but with little static background. In such cases, the motion vectors for both partitions were almost the same and so only one could be represented.
Fig. 1. The pattern codebook of 32 regular shaped, 64-pixel patterns, defined in 16×16 blocks, where the white region represents 1 (motion) and black region represents 0 (no motion).
The MPEG-4 [4] video standard first introduced the concept of content-based coding, by dividing video frames into separate segments comprising a background and one or more moving objects. To address the limitations of [1], Wong et al. [14] exploited the idea of partitioning the MBs via a simplified segmentation process that again avoided handling the exact shape of moving objects, so that popular MB-based motion estimation techniques could be applied. Wong et al. classified each MB into three distinct categories: 1) Static MB (SMB): MBs that contain little or no motion; 2) Active MB (AMB): MBs which contain moving object(s) with little static background; and 3) Active-Region MB (RMB): MBs that contain both static background and part(s) of moving object(s). SMBs and AMBs are treated in exactly the same way as in H.26X. For RMB coding, Wong assumed that the moving parts of an object may be represented by one of the eight predefined patterns P1 − P8 in Figure 1. An MB is classified as RMB if by using some similarity measure, the part of a moving object of an MB is well covered by a particular pattern. The RMB can then be coded using the 64 pixels of that pattern with the remaining 192 pixels being skipped as static background. Successful pattern matching can theoretically therefore have a maximum compression ratio of 4:1 for any MB. The actual achievable compression ratio will be lower due to the computing overheads for handling an additional MB type, the pattern identification numbering and pattern matching errors. Other pattern matching algorithms have been reported [8]-[12]. Figure 1 shows the complete 32-pattern codebook. The performance of the RTPS algorithm [10] has been shown to be superior to all existing pattern matching algo-
Fig. 2. Patterns extracted from video sequences by ASPS algorithm.
rithms. RTPS(4) for example, improved the peak signal to noise ratio (PSNR) value by up to 0.81dB compared with the Fixed-8 [14] algorithm and up to 1.52dB in comparison with H.263. Paul et al. proposed an efficient Pattern Excluded Similarity Metric [12] and a content based Arbitrary Shaped Pattern Selection (ASPS) algorithm [11] which firstly extracted patterns from the actual video content without assuming any pre-defined shape and then used these extracted patterns to represent the RMB using a similarity measure as in all other pattern matching algorithms. Figure 2 shows the patterns generated by ASPS algorithm from some standard video sequences. The ASPS algorithm like any other pattern based coding algorithm uses a similarity threshold to match a moving region with a pattern. ASPS algorithm with larger similarity threshold can capture more RMBs and as a consequence the bit-rate will be lower and the image quality will also be lower. On the other hand ASPS algorithm with smaller similarity threshold will capture less number of RMBs and as a result the bit-rate will be higher with image quality. In this paper we investigate the performance of ASPS algorithm for various similarity thresholds. This paper is organized as follows. The video coding strategy using the ASPS algorithm is described in Section 2, while some simulation results are analysed in Section 3. Importance of similarity threshold is discussed in Section 4. Some future works and conclusions are provided in Section 5.
2
Pattern based VLBR Coding
Prior to video coding, a pattern codebook (PC) has to be constructed. The ASPS algorithm performs this in two phases. In first phase, the PC is formulated on the basis of the actual video content, while in the second phase, the coding is undertaken using this content-dependent PC.
Fig. 3. The ASPG algorithm.
2.1
PC Generation
Let Ck (x, y) and Rk (x, y) denote the k th MB of the current and reference frames, each of size W pixels×Hlines, respectively of a video sequence, where 0 ≤ x, y ≤ 15 and 0 ≤ k < W/16 × H/16. The moving region Mk (x, y)of the k th MB in the current frame is obtained as follows: Mk (x, y) = T (| Ck (x, y) • B − Rk (x, y) • B |)
(1)
where B is a 3 × 3 unit matrix for the morphological closing operation • [2] [7], which is applied to reduce noise, and the thresholding function T(v) = 1 if v > 2 and 0 otherwise. P If 8 ≤ Mk < TS + 64 where TS is a similarity threshold, then the k th MB is defined as a candidate RMB (CRMB). The Arbitrary Shaped Pattern Generation (ASPG) algorithm detailed in Figure 3 then generates the PC of P1 , . . . , Pλ using all CRMBs and user-defined pattern size λ. Any clustering method, such as Fuzzy C-Means (FCM) can be used in the ASPG algorithm. The clustering method classifies all CRMBs into classes using the gravitational centre (GC), which is defined as follows: Let G(A) be the GC of a 16 × 16 binary matrix A, such that P15 P15 P15 P15 x=0 y=0 yA(x, y) x=0 y=0 xA(x, y) , P15 P15 (2) G(A) = P15 P15 x=0 y=0 A(x, y) x=0 y=0 A(x, y)
By using FCM, those CRMBs with less inter GC distance are placed in the same class. The ASPS algorithm then adds all the corresponding ’1’s of those CRMBs in the same class to provide the most populated moving region. To create the 64-most populated moving regions as a pattern, only the first 64-pixel positions are assigned ’1’ with all the rest assigned ’0’. Algorithm PBC(PC) Parameters: PC is the given pattern codebook.Return: Coded bitstream. For each frame to be coded with motion compensation For each k -th MB in the current frame If |Mk |1 < 8 then classify the block as SMB and skip from coding. Else if 8 ≤ |Mk |1 < TS + 64 and (4) is satisfied then classify the block as RMB and code the index of pattern Pi and the moving region covered by this pattern using ME and MC while static region is skipped. Else classify the block as an AMB and code it using full ME and MC as is done in H.264.
Fig. 4. The general Pattern based video coding (PBC) algorithm.
2.2
Actual Coding
Let |Q|` be the total number of ` ’s in the matrix Q. Similarity of a pattern Pn ∈ P C with the moving region in the k th MB can be defined efficiently [12] as Sk,n = |Mk |1 − |Mk ∧ Pn |1
(3)
Clearly, higher the similarity lower will be the value of Sk,n . The CRMB is classified as an RMB and its moving region is represented by a pattern Pi such that Pi = arg min (Sk,n |Sk,n < TS ) ∀Pn ∈P C
(4)
where TS is the predefined similarity threshold; otherwise the CRMB is classified as an AMB. For a given PC, an image sequence is coded using the general pattern based coding (PBC) algorithm in Figure 4. To avoid more than one 8 × 8 block of DCT calculations for 64 residual error values per RMB, these values are rearranged into an 8 × 8 block. It avoids unnecessary DCT block transmission. A similar inverse procedure is performed during the decoding.
3
Simulation Results
To compare the performance of both the ASPS algorithm with different similarity thresholds and H.264 standard we tested a large number of standard and nonstandard video sequences of QCIF digital video formats [13]. For the purposes of this paper, experimental results are presented using the first 100 frames of four standard video test sequences. Full-search, half-Pel, and variable block-size ME and MC were employed to obtain the encoding results using the ASPS approach and H.264 standard. The ASPS algorithm used λ = 8.
80
60 MissAmerica Carphone Salesman Claire
Increase RMB(%)
50
70 60
40
50 40
30
30 20
20 10 0 8 to 16
10 0 16 to 24 24 to 32 Similarity Threshold
(a)
32 to 40
MissAmerica
Carphone Salesman Video Sequences
Claire
(b)
Fig. 5. (a)Percentage of increased RMBs by ASPS algorithm while similarity thresholds are changes from 8 to 16, 16 to 24, 24 to 32, and 32 to 40; (b)Percentage of SMBs, RMBs, and AMBs by ASPS algorithm where similarity threshold is 16.
Figure 5(a) shows the percentage of increased RMBS by ASP algorithm when similarity threshold changes from 8 to 16, 16 to 24, 24 to 32, and 32 to 40. We observed the diminishing trends of increasing the RMBs when the similarity threshold is already large. The original percentages of different MBs generated by ASPS algorithm are shown in Figure 5 (b) where similarity threshold is 16. Note that a rough idea about the motion involvement in a particular video sequences can be concluded by observing the relative number of MB types. For example, the motion involvement of carphone is much greater than salesman sequence as the AMB and RMB of carphone are larger than that of Salesman. The coding performance of the ASPS algorithm like any other pattern based video coding algorithm depends on the value of the similarity threshold. The ASPS algorithm with larger similarity threshold can capture more RMBs and as a consequence the bit-rate will be lower and the image quality will also be lower. On the other hand ASPS algorithm with smaller similarity threshold will capture less number of RMBs and as a result the bit-rate will be higher with image quality. Figure 6 shows the coding efficiency curves by ASPS algorithm with various similarity threshold denoted by parameter as well as H.264 standard.
Miss America
38 37.5
31
37
PSNR (dB)
PSNR (dB)
Carphone
31.5
30.5
36.5 H.264 ASPS(8) ASPS(16) ASPS(24) ASPS(32) ASPS(40)
36 35.5 35
20
25 Bit Rate (kbps)
29.5 29
30
Salesman
32
H.264 ASPS(8) ASPS(16) ASPS(24) ASPS(32) ASPS(40)
30
45
50 55 Bit Rate (kbps) Claire
36.5
60
36 35.5
PSNR (dB)
PSNR (dB)
31.5 31 H.264 ASPS(8) ASPS(16) ASPS(24) ASPS(32) ASPS(40)
30.5 30 29.5
20
25
30
35
Bit Rate (kbps)
40
45
35 H.264 ASPS(8) ASPS(16) ASPS(24) ASPS(32) ASPS(40)
34.5 34 33.5
50
33
15
20 25 Bit Rate (kbps)
30
Fig. 6. Coding performance comparisons for four standard test video sequences by ASPS algorithm with various similarity thresholds denoted by parameter and H.264 standard.
4
Importance of Similarity Threshold
The similarity threshold has a greater role in controlling the bit-rate over a limited bandwidth channel. Normally a video coding algorithm control the bit rate by increasing or decreasing the Quantization level and amount of quantization changes are coded together with video data. In Figure 7 we observed that different similarity thresholds provide different bit rates. But no bits are needed to send in the decoder end about the changes of the similarity threshold. Thus, an adaptive ASPS algorithm can control the bit rate by changing the similarity thresholds instead of changes the quantization level and as a result a large number of bits will be saved. However, the different quantization level is needed where a large change of bit-rate is required.
5
Future Works and Conclusions
Video coding using arbitrary shaped patterns to represent the moving region in macroblocks performed better than the H.264 standard especially for very low bit rate video coding because the former represents an MB by a smaller size moving region covered by the best available pattern that approximates the shape of the region more closely and hence, requiring no extra motion vector, which is not the case with the latter. For any pattern based coding including ASPS algorithm, similarity threshold is used as a matching criterion between
a moving region and a pattern. This metric together with quantization level can control the coding efficiency curve. Unlike the quantization step size, the benefit of this metric is that it does not need to be transmitted any information in the decoder end. Finer changes of coding efficiency curve can be possible by changing the similarity threshold instead of changing the quantization level as a result a number of bits will be reduced. But the existing ASPS algorithm cannot select the suitable similarity threshold for the channel requirements. We are investigating to design an adaptive ASPS algorithm which can utilize the various similarity thresholds to control the bit-rate instead of quantization level in finer adjustments.
References 1. Fukuhara, T., K. Asai, and T. Murakami: Very low bit-rate video coding with block partitioning and adaptive selection of two time-differential frame memories. IEEE Trans. Circuits Syst. Video Technol., Vol. 7, pp. 212-220, 1997. 2. Gonzalez, R.C. and R. E. Woods: Digital Image Processing. Addison-Wesley, 1992. 3. ISO/IEC 13818, MPEG-2 International Standard, 1995. 4. ISO/IEC N4030, MPEG-4 International Standard, 2001. 5. ITU-T Recommendation H.263: Video coding for low bit-rate communication. Version 2, 1998. 6. ITU-T Rec. H.264/ISO/IEC 14496-10 AVC. Joint Video Team (JVT) of ISO MPEG and ITU-T VCEG, JVT-G050, 2003. 7. Maragos, P.: Tutorial on advances in morphological image processing and analysis. Opt. Eng., Vol. 26 no. 7, pp. 623-632, 1987. 8. Paul, M., M. Murshed, and L. Dooley: A Low Bit-Rate Video-Coding Algorithm Based Upon Variable Pattern Selection. Proc. of 6th Int. Conf. on Signal Processing (ICSP-02), Beijing, Vol-2, pp. 933-936, 2002. 9. Paul, M., M. Murshed, and L. Dooley: A new real-time pattern selection algorithm for very low bit-rate video coding focusing on moving regions. Proc. of IEEE Int. Conference of Acoustics, Speech, and Signal Processing (ICASSP-03), Hong Kong, Vol-3, pp. 397-400, 2003. 10. Paul, M., M. Murshed, and L. Dooley: A Real-Time Pattern Selection Algorithm for Very Low Bit-Rate Video Coding Using Relevance and Similarity Metrics. To appear in IEEE trans. on circuits and systems on video technology. 11. Paul, M., M. Murshed, and L. Dooley: An Arbitrary Shaped Pattern Selection Algorithm for VLBR Video Coding Focusing on Moving Regions. Proc. of 4th IEEE Pacific-Rim Int. Con. on Multimedia (PCM-03), Vol. 1, pp. 100-104, 2003. 12. Paul, M., M. Murshed, and L. Dooley: A New Efficient Similarity Metric and Generic Computation Strategy for Pattern-based VLBR Video Coding. Proc. of the IEEE Int. Con. of Acoustics, Speech, and Signal Proc. (ICASSP-04), 2004. 13. Shi, Y.Q. and H. Sun: Image and Video Compression for Multimedia Engineering Fundamentals, Algorithms, and Standards, CRC Press, 1999. 14. Wong, K.-W., K.-M. Lam, and W.-C. Siu: An Efficient Low Bit-Rate Video-Coding Algorithm Focusing on Moving Regions. IEEE Trans. on Circuits and Systems for Video Technology, Vol. 11, no. 10, pp. 1128-1134, 2001.