Mixed-Resolution Wyner-Ziv Video Coding Based on Selective Data Pruning Tuan Tai Phan #1 , Yuichi Tanaka ∗2 , Madoka Hasegawa 3 , and Shigeo Kato 4 #
Department of International Development Engineering, Tokyo Institute of Technology 2-12-1, Ookayama, Meguro-ku, Tokyo, 152-8552 Japan 1
∗
[email protected]
Department of Information Science, Utsunomiya University 7-1-2, Yoto, Utsunomiya, Tochigi, 321-8585 Japan
2−4
{tanaka, madoka, kato}@is.utsunomiya-u.ac.jp
Abstract—In current distributed video coding (DVC), interpolation is performed at the decoder and the interpolated pixels are reconstructed by using error-correcting codes, such as Turbo codes and LDPC. There are two possibilities for downsampling video sequences at the encoder: temporally or spatially. Traditionally temporal downsampling, i.e., frame dropping, is used for DVC. Furthermore, those with spatial downsampling (scaling) have been investigated. Unfortunately, most of them are based on uniform downsampling. Due to this, details in video sequences are often discarded. For example, edges and textured regions are difficult to interpolate, and thus require many parity bits to restore the interpolated portions for the spatial domain DVC. In this paper, we propose a new spatial domain DVC based on adaptive line dropping so-called selective data pruning (SDP). SDP is a simple nonuniform downsampling method. The pruned lines are determined to avoid cutting across edges and textures. Experimental results show the proposed method outperforms a conventional DVC for sequences with a large amount of motions.
I. I NTRODUCTION Distributed source coding (DSC) is based on two information theory results: the theorems by Slepian and Wolf [1] and Wyner and Ziv [2] for lossless and lossy codings of correlated source, respectively. Recently, practical DSC schemes have received a great deal of attention in efforts such as distributed video coding (DVC) with reversed complexity [3], [4], improved error resilience [5], efficient multi-view coding for distributed cameras [6], and flexible decoding capability [7]. Even though DVC algorithms do not outperform conventional video coding schemes in rate-distortion performance at present, DVC is a promising tool for creating reversed complexity codecs for power-constrained devices. Currently, in conventional digital video coding standards, the encoder typically has high complexity mainly due to motion estimation for finding the best inter picture prediction. On the other hand, DVC enables to reversed complexity codecs, i.e., a simple encoder and a complex decoder. A particular application of DVC is transform domain Wyner-Ziv (TDWZ) codec [3], [4], where key frames are encoded by the intra mode and Wyner-Ziv (WZ) frames (nonkey frames) are located between key frames. The decoder uses key frames for motion estimation to interpolate the WZ frames.
Then WZ frames are corrected in transform domain Wyner-Ziv coding. Moreover, a mixed resolution (MR) framework in TDWZ codec (MR-DVC) has been proposed to improve DVC performance by using spatial relationship between original frames and scaled ones [8], [9]. Fig. 1 shows the relation between key frames and non-key frames used in the MR-DVC. It generally works as follows: WZ frames are downsampled uniformly and the obtained reduced-size frames are encoded in a conventional intra/inter-frame encoder. Key frames are compressed at full resolution. The compressed key frames are used to estimate the missing information in WZ frames. TDWZ decoder further corrects WZ frames with error correcting codes. The MR-DVC is based on super-resolution framework and it thus requires a large amount of parity bits to correct high-frequency region in WZ frames. On the other hand, at the image coding standpoint, several methods of the “decimation-then-compression” approach exist [10]–[12]. These methods are usually performed as follows. First, the insignificant region in an input image is discarded, and then the remaining image is compressed by some image coding standards. The transmitted bitstream consists of the compressed small image and the side information that identifies the pixel positions discarded in the original image. On the receiver side, the small image and the side information are synthesized to reconstruct an image of the same size and structure as the original image. This approach is illustrated in Fig. 2. As one of the “decimation-then-compression” methods for image coding, selective data pruning (SDP) with high-order edge-directed interpolation was proposed by V˜o et al. [12]. The SDP method has a very simple architecture for reducing an image size, since it is based on line-based downsampling. That is, rows and/or columns in an image are simply pruned, and then the reduced-size image is interpolated on the synthesis side. The pruned lines are located predominantly across low-frequency regions in the image, and thus the interpolated image has few artifacts. In the comparison of image coding without data pruning, SDP-based image compression improves PSNR at low bitrates. Moreover, it has a straightforward
where y is a K × 1 vector of low-resolution pixels around I(i, j), and C is a 6 × K matrix, where each column in C corresponds to six surrounding pixels of y. Finally, target pixel I(i, j + 12 ) is interpolated with hopt as ¯ j+ 1) = I(i, 2 Fig. 1. Illustration of the relation between key frames and non-key frames in scheme used in [8, 9].
application for video coding. However, in high bitrate coding, SDP is inferior to the normal encoder since the interpolated pixels at those bitrates have more errors than the normallyencoded pixels do. In this paper, we improve SDP-based video coding performance by using an error-correcting system of DVC. Particularly, we use TDWZ coding for residual signals of pruned lines by SDP. Furthermore, it is considered as a very first attempt of DVC with content-aware retargeting (CAR) [13]–[16] since SDP is a very simple CAR of images and/or video sequences. The rest of this paper is presented as follows. Section II reviews SDP. In Section III, our MR-DVC framework based on SDP is presented. Experimental results are shown in Section IV. Finally, Section V concludes the paper. II. S ELECTIVE DATA P RUNING AND H IGH -O RDER E DGE -D IRECTED I NTERPOLATION In [12], the SDP-based image compression method was proposed with high-order new edge-directed interpolation (NEDI) [17]. Here, we briefly review the method. A. Selective Data Pruning SDP is a simple line-based image downsampling method. It is described as follows: the target columns to be pruned are somehow found in an image, and then they are pruned to reduce the width of the image. The original paper [12] uses a metric based on the mean square error (MSE) to determine the columns to be pruned. B. High-Order Edge-Directed Interpolation The NEDI method is based on geometric duality: the covariance of the high-resolution image is estimated from its low-resolution counterpart. NEDI is used for interpolating an image to double the sizes of rows and columns. However, the image after SDP has high resolution along the vertical direction (if some columns are pruned), whereas it has low resolution along the horizontal direction. Thus, high-order NEDI was proposed to utilize the geometric duality after SDP. Let I(i, j) be the pixel value of i-th row and j-th column in the image I. NEDI-6 [12] uses six neighboring pixels around the pruned pixel instead of only four neighboring ones as in conventional NEDI. Its 2-D interpolation filter is represented as follows: � �−1 T hopt = C T C C y (1)
1 1 � �
hopt (3l1 +l0 +1)I(i+l0 , j +l1 ). (2)
l0 =−1 l1 =0
This filter is optimal in that it obtains the minimum MSE for the available low-resolution pixels. NEDI-6 outperforms conventional NEDI in the interpolation of images after SDP, and it has an extended version for video sequences, called NEDI-9. III. SDP-DVC In this section, we present an improvement of SDP-based video coding with TDWZ coding framework. It is based on the original MR-DVC system [8], [9], but does not perform motion estimation even at the decoder. Hereafter, SDP-DVC is referred to as our proposed WZ video coding with SDP, whereas MR-DVC is the original one. A. SDP-DVC Encoder The encoder architecture is shown in Fig. 3. At the encoder, all frames are resized by SDP (*a in Fig. 3). Furthermore, all of the reduced-resolution frames are encoded with H.264/AVC [18] similar to the normal MR-DVC (*b). At the same time, we also encode full resolution residue between the interpolated key frames and the corresponding original frames (*c), which are called spatially scalable residue hereafter. For all frames within a GOP, the downsampled positions by SDP are the same. For WZ frames, pruned lines are extracted by using WZ frames and pruned indices (*d). Residual pruned lines (RPLs) are constructed by taking the difference between the pruned lines of WZ frames and those of the preceding key frame (*e). They are performed TDWZ encoding to correct interpolated lines at the decoder. We use LDPC-accumulated (LDPCA) [19] for the error-correcting codes since it is suitable for DVC purpose. The number of WZ frames and the number of pruned lines may be varied dynamically based on the complexity reduction target. For RPLs, every 16 one-dimensional signals are rearranged into a 4 × 4 matrix by zigzag scanning in order to implement two-dimensional discrete cosine transform (DCT). We also applied nonlinear quantization [20] for the DCT coefficients of RPLs represented as follows: c|x|
y = βsign(x)(1 − e− β ) β |y| x ˆ = − sign(y) ln(1 − ) c β
(3) (4)
where x, y, and x ˆ represent input, quantized, and reconstructed signals, respectively. The arbitrary parameters c and β are constants that represent the level of nonlinearity.
Discarded pixel positions
Decimation
Image encoder
Image decoder
Interpolation
For reduced size image
Fig. 2.
NK’
*a
NK
K: Key frame NK: Non-key frame K’ : Reduced size key frame NK’ : Reduced size non-key frame HOED: High-order edge-directed interpolation SSR: Spatially scalable residue RPL: Residual pruned lines
*b
H.264 encoder
K’
SDP
K
Image coding with the “decimation-then-compression” approach.
Low resolution bitstream
Pruned indices HOED
Feedback channel
*c
SSR
TDWZ encoding
Pruned lines RPL *d
Fig. 3.
DCT
*e
Nonlinear quantization
LDPCA encoder
Wyner-Ziv bitstream
Encoder architecture of SDP-DVC.
*a
Pruned indices nk’ H.264 decoder
Low resolution bitstream
HOED
k’ *b
SSR
+
k
Feedback channel TDWZ decoding
Wyner-Ziv bitstream
LDPCA decoder
+
Nonlinear quantization
DCT
Nonlinear dequantization
IDCT
Fig. 4.
*d
nk
*c
k: Reconstructed key frame nk: Reconstructed non-key frame k’ : Reduced size decoded key frame nk’ : Reduced size decoded non-key frame
Decoder architecture of SDP-DVC.
B. SDP-DVC Decoder The decoder is the reverse operation of the encoder. Fig. 4 illustrates SDP-DVC framework at the decoder. All decoded low-resolution frames are interpolated to the original size by high-order edge-directed interpolation (*a in Fig. 4) [12]. The interpolated key frames are incorporated with the spatially scalable residue to make reconstruction of full resolution key frames (*b). For WZ frames, temporal residue is calculated to generate side information for TDWZ decoder (*c). After TDWZ decoding, the reconstructed frames are obtained by adding the corrected residue to interpolated WZ frames (*d). IV. E XPERIMENTAL R ESULTS In this section, the experimental results of SDP-DVC are shown in a few aspects. We used 150 frames of popular
video sequences, Foreman, Soccer, and Hall Monitor, with QCIF size and 15fps. As the core codec used at the encoder in our framework, we used IPPP coding of H.264/AVC. To some extent, the system is not fully “distributed” since interframe coding is performed at the encoder. However, it still contributes to reduce encoder complexity by encoding lowresolution frames only. A. SDP Performance Fig. 5 shows the comparison of video resizing methods between bicubic scaling and SDP. The scaled frames are shown for comparison purpose. Here, the SDP presents better resizing performance than scaling. As reported in [12], the frames interpolated back to the original resolution with SDP have less error than those with scaling. In other words, SDP works as a very simple CAR. Since more powerful CAR
Foreman QCIF, 15fps
43 41
Y PSNR (dB)
39 37 35 33 31
H.264 intra
29
SDP-DVC
27
DISCOVER
25 0
100
200 300 Bitrate (kbps)
400
500
Hall Monitor QCIF, 15fps 42 40
Fig. 5. Comparison of video resizing methods between bicubic scaling and SDP. The 95th frame of Foreman is used. Top left: Original. Top right: bicubic scaling. Bottom left: Pruned lines by SDP. Bottom right: Resized frame by SDP.
Y PSNR (dB)
38 36 34 32
H.264 intra SDP-DVC DISCOVER
30 28 26
methods have been presented [13]–[16], [21], incorporating them into our proposed framework is the main future work of this paper. However, SDP is very effective in the viewpoint of computational complexity at the encoder due to its simple line-wise downsampling approach.
The performance of SDP-DVC is compared with DISCOVER DVC codec [22] at GOP of 4, and H.264/AVC intra coding. In SDP-DVC, the number of pruned lines in row and column are fixed to 48, which means about a half of pixels in each frame are removed. The number can be varied dynamically based on the complexity of the video sequence similar to [12]. Pruned indices require around 0.7 kbps for the whole GOP structure. For nonlinear quantization in (3) and (4), c and β are experimentally set to 511 and 1.5. The quantization matrices are implemented as reported in [23] similar to the technique for transform domain residual coding. Fig. 6 illustrates the R-D curves of various video coding methods. The PSNR and bitrates are calculated for the luminance component of all frames and they are averaged over the sequences. Experimental results show that SDPDVC can outperform DISCOVER for fast moving sequences, such as Foreman and Soccer. Moreover, it also shows better performance than H.264/AVC intra coding in Hall Monitor sequence. Briefly speaking, SDP-DVC is located at the intermediate place of H.264/AVC intra coding and DISCOVER. If there is a large amount of motion, the conventional DVC usually has problems for frame interpolation resulting lower performance compared to H.264/AVC. It can be observed in Foreman and Soccer sequences. In our framework, the pruned lines mainly lay smooth regions which is easily interpolated,
100
200 Bitrate (kbps)
300
400
Soccer QCIF, 15fps 40 38 36
Y PSNR (dB)
B. Video Coding Performance
0
34 32
H.264 intra
30
SDP-DVC 28
DISCOVER
26 0
100
200
300
400
500
600
Bitrate (kbps)
Fig. 6. Performance comparison of various coding methods. From top to bottom: Foreman, Soccer, and Hall Monitor.
and thus the proposed method requires less parity bits for WZ frames even in the fast-moving sequences. Fig. 7 shows the decoded frames of Foreman by DISCOVER (at 305 kbps, 34.8 dB) and SDP-DVC (at 272 kbps, 35.2 dB), respectively. It can be observed that SDP-DVC produces better picture quality than DISCOVER. Moreover, Fig. 8 shows the performance comparison between SDP-DVC and SDP with high-order edge-directed interpolation alone (SDP+HOED) for Foreman. Clearly from middle to high bitrates, SDP-DVC gains significant PSNR improvements. It is worth noting that SDP+HOED is an option in the SDP-DVC system since it can be realized with low-resolution bitstream and pruned indices. Naturally the
ACKNOWLEDGMENT This work was supported in part by KAKENHI 22760263. R EFERENCES
Fig. 7. Performance comparison for DISCOVER and SDP-DVC at the 47th frame of Foreman. Left: Reconstructed frame. Right: Enlarged portion. From top to bottom: DISCOVER and SDP-DVC. 45
Y PSNR (dB)
40
35
30 SDP-DVC
25
SDP+HOED 20 0
Fig. 8.
100
200
300 400 Bitrate (kbps)
500
600
Performance comparison of SDP-based methods for Foreman.
performance intersection can be varied according to video sequence characteristics. V. C ONCLUSIONS In this paper, a MR-DVC based on SDP has been shown to improve the SDP-based video coding performance. Since this method does not use motion estimation and compensation at the decoder side, the decoder complexity will be reduced significantly compared with the conventional DVC while still keeping low encoding complexity. The RD performance of the proposed codec is better than DISCOVER at videos with relatively many motions. Still, there is a room for improvements such as development of better interpolation and fast CAR that is suitable for the MR-DVC.
[1] D. Slepian and J. Wolf, “Noiseless coding of correlated information sources,” IEEE Trans. Inf. Theory, vol. 19, no. 4, pp. 471–480, 1973. [2] A. Wyner and J. Ziv, “The rate-distortion function for source coding with side information at the decoder,” IEEE Trans. Inf. Theory, vol. 22, no. 1, pp. 1–10, 1976. [3] A. Aaron, S. Rane, E. Setton, and B. Girod, “Transform-domain WynerZiv codec for video,” in Proc. SPIE VCIP, 2004, pp. 520–528. [4] C. Brites, J. Ascenso, and F. Pereira, “Improving transform domain Wyner-Ziv video coding performance,” in Proc. ICASSP’06, 2006. [5] S. Rane, A. Aaron, and B. Girod, “Error-resilient video transmission using multiple embedded Wyner-Ziv descriptions,” in Proc. ICIP’05, 2005. [6] X. Guo, Y. Lu, F. Wu, D. Zhao, and W. Gao, “Wyner–Ziv-based multiview video coding,” vol. 18, no. 6, pp. 713–724, 2008. [7] N. M. Cheung and A. Ortega, “Compression algorithms for flexible video decoding,” in Proc. SPIE VCIP, 2008. [8] B. Macchiavello, F. Brandi, E. Peixoto, R. L. de Queiroz, and D. Mukherjee, “Side-information generation for temporally and spatially scalable Wyner-Ziv codecs,” EURASIP Journal on Image and Video Processing, vol. 2009, 2009. [9] B. Macchiavello, D. Mukherjee, and R. L. de Queiroz, “Iterative sideinformation generation in a mixed resolution Wyner-Ziv framework,” vol. 19, no. 10, pp. 1409–1423, 2009. [10] S. D. Rane, G. Sapiro, and M. Bertalmio, “Structure and texture fillingin of missing image blocks in wireless transmission and compression applications,” IEEE Trans. Image Process., vol. 12, no. 3, pp. 296–303, 2003. [11] D. Liu, X. Sun, F. Wu, S. Li, and Y. Q. Zhang, “Image compression with edge-based inpainting,” IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 10, pp. 1273–1287, 2007. [12] D. T. V˜o, J. Sol´e, P. Yin, C. Gomila, and T. Q. Nguyen, “Selective data pruning-based compression using high-order edge-directed interpolation,” IEEE Trans. Image Process., vol. 19, no. 2, pp. 399–409, 2010. [13] S. Avidan and A. Shamir, “Seam carving for content-aware image resizing,” ACM Trans. Graph., vol. 26, no. 3, 2007. [14] M. Rubinstein, A. Shamir, and S. Avidan, “Improved seam carving for video retargeting,” ACM Trans. Graph., vol. 27, no. 3, 2008. [15] L. Wolf, M. Guttmann, and D. Cohen-Or, “Non-homogeneous contentdriven video-retargeting,” in Proc. ICCV’07, 2007. [16] D. Domingues, A. Alahi, and P. Vandergheynst, “Stream carving: An adaptive seam carving algorithm,” in Proc. ICIP’10, 2010. [17] X. Li and M. T. Orchard, “New edge-directed interpolation,” IEEE Trans. Image Process., vol. 10, no. 10, pp. 1521–1527, 2001. [18] T. Wiegand, G. J. Sullivan, G. Bjntegaard, and A. Luthra, “Overview of the H. 264/AVC video coding standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 7, pp. 560–576, 2003. [19] D. Varodayan, A. Aaron, and B. Girod, “Rate-adaptive codes for distributed source coding,” Signal Processing, vol. 86, no. 11, pp. 3123– 3130, 2006. [20] M. B. Badem, R. Weerakkody, A. Fernando, and A. M. Kondoz, “Design of a non-linear quantizer for transform domain DVC,” IEICE Trans. on Fundamentals, vol. 92, no. 3, pp. 847–852, 2009. [21] Y. Tanaka, M. Hasegawa, and S. Kato, “Seam carving with ratedependent seam path information,” in Proc. ICASSP’11, 2011, to be presented. [22] X. Artigas, J. Ascenso, M. Dalai, S. Klomp, D. Kubasov, and M. Ouaret, “The DISCOVER codec: architecture, techniques and evaluation,” in Proc. 26th Picture Coding Symposium, 2007. [23] M. B. Badem, H. K. Arachchi, S. T. Worrall, and A. M. Kondoz, “Transform domain residual coding technique for distributed video coding,” in Proc. 26th Picture Coding Symposium, 2007.