Rate-Distortion based Video Watermarking for storing ...

Viewer
Transcript

Rate-Distortion based Video Watermarking for storing Privacy Information using H263 Codec Jithendra K. Paruchuri , University of Kentucky. Master‘s Thesis Proposal

August 2006

Abstract To protect the privacy of individuals in surveillance video, the images of selected individuals need to be erased, blurred or re-rendered. Such video modifications, however, destroy the authenticity of the surveillance video. In this proposal, we describe a new rate-distortion based compression domain video watermarking algorithm for the purpose of storing privacy information. Using this algorithm, we can safeguard the original video so that we can reverse the modification process if proper authorization can be established. Existing privacy data preservation schemes such as those described in [1] and [2] mandates a specific type of video modification, namely scrambling. Watermarking focuses solely on the data preservation problem and is thus far more flexible as any modification algorithm can be used. Watermarking algorithm proposed in the previous work provides excellent watermarked video quality at the expense of the output bit rate[3]. This is due to the fact that it does not take the compression technique used for storing the video into consideration. Our proposed rate distortion algorithm minimizes both the output distortion and the output bit rate by exploiting the features of compression technique used. Moreover by analyzing the Lagrangian cost of embedding the watermark in various locations of the image, this algorithm provides a control to tradeoff between rate and distortion based on the requirement. Our initial experiments has demonstrated encouraging results.

1

Introduction

Since the September 11 attack in 2001, video surveillance systems are widely deployed in various places to monitor the improper activities in an environment. At the same time, they are also exposing the privacy of innocent people. So there is a need for technology to protect privacy of individuals in video surveillance systems without compromising the benefits brought forth by modern video surveillance technologies. In [4], Wickramasuriya et al. proposed a privacy protecting video surveillance system which utilizes the RFID sensor to identify the authority of the incoming individual, combined this information with an XML-based framework for access control to determine the violations within the space, and finally used certain video masking technique to selectively display the unauthorized people in the video. But the security flaw overlooked in this system is that once the modifications are done on the video for the purpose of privacy protection, the original video can no longer be retrieved back. It is important to preserve the original video so that it can be used under special circumstances such as being presented as a piece of evidence in a court of law. In addition, the ownership of the original

video can be used to authenticate the modification performed by the owner. On the other hand, the storage or the preservation of the privacy information must be secure so that it will not compromise the privacy protection. In this paper, we propose the video watermarking scheme to store the privacy information within the modified video itself.

2

Related Works

In this section, we review various works on visual privacy protection technologies. There is a recent surge of interest in selective protection of visual objects in video surveillance. The PrivacyCam surveillance system developed at IBM protects privacy by revealing only the relevant information such as object tracks or suspicious activities [5]. Such a system is limited by the types of events it can detect and may have problems balancing privacy protection with the particular needs of a security officer. Alternatively, one can modify the video to obfuscate the appearance of individuals for privacy protection. There are a large variety of such kinds of video modification techniques, ranging from the use of black boxes or large pixels in [6, 7] to complete object removal in [3, 4]. New techniques have also been proposed recently to replace a particular face with generic face [8] or a body with a stick figure [9]. All the above works target only on the modification of the video but not on the feasibility of recovering original video securely. To securely preserve the original imaginary, the authors in [1] and [2] scrambles the pixels of the specific image objects for privacy protection. With the appropriate private key, the scrambling can be undone to retrieve the original back. The drawback of these techniques is that it cannot be used with any other video modification techniques besides scrambling. The use of watermarking for privacy data protection was first proposed in [3]. Using watermarking for privacy data preservation is more flexible as any video modification scheme can be used. The scheme in [3] embeds the watermark bits at the high frequency DCT coefficients. This method works very well in terms of maintaining the output video quality but at an expense of much higher output bit rate. Mainly the surveillance videos have little varying foreground and lot of static background in the videos. As a result, most of the inter-coded frames contain a large number of zero high-frequency coefficients. This fact is often utilized to obtain high compression ratio. The embedding of watermark bits violates this assumption and as a result, the output bit rate is significantly increased. In Section 3, we present an improved algorithm that searches for the best placement of watermark bits by taking into account both the visual quality and the output bit rate.

3

Proposed Rate-Distortion Optimized Watermarking Scheme

In our system, the watermark is the compressed bitstream of the portion of the video that is affected by a video modification process. To embed the watermark in the compressed bitstream, we follow the approach in [3] by changing the parity of the quantized Discrete Cosine Transform (DCT) coefficients of the motion difference. Let c(i, j, k) be the quantized (i, j)-th coefficient of the k-th DCT block. To embed a bit x into c(i, j, k), we use the following embedding method: ¹ º c(i, j, k) c˜(i, j, k) = ·2+x (1) 2 Each coefficient can embed at most one bit. Equation (1) is simple to decode and is compatible with most video compression algorithms. We use DCT coefficeints rather than other fields in the 2

compressed bitstream because DCT coefficients account for most of the bandwidth. This embedding, however, is not invertible. This implies that the reconstructed video will be different from the originally compressed version. At a fine quantization level, the visual difference is minimal but the reconstructed video may not be able to pass a stringent authentication test. We are also testing other invertible embedding methods and the results will be presented in the final report. Let L be the set of all DCT coefficients. Suppose we insert the watermark bits into a subset of DCT coefficients Γ ⊂ L. We can use a distortion function D(Γ) to measure the distortion caused by putting the watermark bits. Also we can measure the increase in bitrate R(Γ) caused by the embedding process. The process of choosing the embedding location of the watermark can be formulated as the following entropy-constrained rate-distortion problem: min D(Γ) subjected to R(Γ) ≤ C Γ⊂L

(2)

where C is the target output bit rate. The standard approach to solve this problem is to convert it into an unconstrained optimization problem as follows: min Θ(Γ, λ)

(3)

Θ(Γ, λ) = [D(Γ) + λ · (R(Γ) − C)]

(4)

Γ⊂L,λ

where This cost function is typically referred as the Lagrangian cost. While R(Γ) can be easily computed by actually compressed the watermarked DCT coefficients, the use of common distortion like mean square does not work here: given the number of bits to be embedded, the mean square distortion will always be the same regardless of which DCT coefficients we use. This is because DCT is an orthogonal transform and a parity change in equal number of coefficients will result in the same mean square distortion. Based on the psychovisual model proposed by Watson in [10], we define a distortion function of a DCT coefficient based on its (in)visibility after inserting a watermark bit X D(i, j, k) (5) D= i,j,k

where D(i, j, k) = Pmax − P (i, j, k)

(6)

P (i, j, k) is an (in)visibility measure of the (i, j)-th coefficients of the k-th DCT block defined as µ ¶ C[0, 0, k] αT P (i, j, k) = t(i, j) (7) C0,0 t(i, j) is the frequency sensitivity threshold, C[0, 0, k] is the DC term of block k, αT = 0.649 is a constant, and C0,0 is the average luminance of the image. Pmax is the maximum invisibility for any coefficient. In the final project report, we will also explore other distortion measures. M The optimization problem in Equation (3) isµ difficult ¶ to solve exactly because there are 2 con|L| figurations that need to be tested where M = and N is the number of watermark bits. To N obtain the first-order sub-optimal solution, we adopt a greedy algorithm by finding the best embedding location one at a time: we first compute the Lagrangian cost of embedding one bit in any of the 64 coefficients in a DCT block. Then, we embed the bit in the minimum cost position and recompute 3

the Lagrangian costs of embedding the second bit in the remaining 63 coefficients. The process repeats until we exhaust all the watermark bits for that block. This greedy algorithm is suboptimal because the Lagrangian cost in Equation (3) is not additive – the rate function R(Γ) for most video compression algorithm is based on run-length coding of non-zero DCT coefficients and it is clearly not additive [11]. The detail explanation and the expected loss of optimality will be discussed in details in the final report. Besides being sub-optimal, this approach has two other serious problems with the decoder. First, the embedding positions are decided based on the values of cover coefficients, which are not available at the decoder side. Second, since the original bit stream to be embedded is not known at the decoder, it is not possible to calculate the expected rate in cost function. One solution to the above problems is to add the embedding locations also in the watermark data, but it has the adverse effect of increasing the size of the watermark. As a compromise, we perform the calculation of the possible bit rate by using a predicted DCT block . Exploiting the temporal correlation between frames, w use the DCT block of previous frame located at same position as the predicted block for the current one. This solves the first problem as this DCT block in previous frame will be available to both encoder and decoder. A possible solution for the second problem is to take averages of the possible bit rates for embedding a bit 0 and 1. The characterization of the loss in performance of these steps will be presented in the final report. In our initial implementation, we have assigned equal number of watermark bits to each DCT block. The next step is to extend this greedy rate distortion algorithm to frame level to get higher coding efficiencies. However, the processing time is very high to sort out the optimal embedding positions at frame level. One possible approach to significantly reduce the computation cost is the principle of equal-slope: we note that even though the rate function is not additive within a DCT block, it is so across different blocks. Thus, we can rewrite the cost function as follows: X [Dk (Γk ) + λ · (Rk (Γk ) − C)] (8) Θ(Γ, λ) = k

If one can provide good differentiable approximations to both Dk and Rk , one can easily show that the optimal solution occur when ∂Dk = −λ (9) ∂Rk for all k. This reduces the problem back to a block-level solution. Detailed results using this approach will be provided in the final report.

4

Experiments

We tested the new rate distortion watermarking algorithm on “hall monitor” sequence. This test video sequence has 299 frames and it is in CIF format(352x288). One of the two persons walking in the foreground is completely removed as a part of privacy protection and the hole is filled up using the the static background. Now this removed foreground video is compressed and encrypted. The resulting bit sequence is watermarked in the modified video during its compression. We need to embed around 2700 bits per frame and it will be 2 bits per block in the block level implementation. The original video , modified video and privacy information of a particular frame are shown in Fig. 1. Videos are compressed using the standard H.263 codec with a quantization parameter 10. The corresponding watermarked frames using the proposed algorithm at block level at different rates are shown in Fig.

4

Figure 1: Experimental results. Left: Original video; Center: Modified video; Right: Privacy Information;

Figure 2: Block Level - Trade off between bit-rate and perceptual quality. Left : 565 kbps ; Center : 644 kbps ; Right : 892 kbps; 2. As we move from right to left in Fig. 2 , the distortion increases slightly but the bit rate reduces from 892 kbps to 565 kbps. If we implement this algorithm at frame level, the results are even better as in Fig. 3 in terms of bit rates but the time complexity for processing each frame is in the order of minutes . There is a certain need to increase the processing speed to catch the real time. As we see the figures from left to right in Fig. 3 , the distortion decreases and bit rate increases from 412 kbps to 690 kbps. The results in Fig. 3 are not the final optimal results because the implementation is still in midway.

Figure 3: Frame Level - Tradeoff between bit-rate and perceptual quality. Left: 412 kbps ; Center: 502 kbps ; Right: 690 kbps;

5

5

Conclusions

In this document, we have proposed a new rate distortion based compression domain video watermarking algorithm for hiding privacy information. Initial results backed up the argument that we can get better output bit rates with same level of distortion if we utilize the properties of compression technique used. Despite the significant rise in coding efficiency, the algorithm is quite complex and so difficult to implement in real time processing. So there is a need for investigation of using some sort of fast R-D techniques so that the video processing can catch real time.

References [1] T. E. Boult, “Pico: Privacy through invertible cryptographic obscuration,” in Computer Vision for Interactive and Intelligent Environments - the Dr. Bradley D. Carter Workshop Series, 2005. [2] Frdric Dufaux and Touradj Ebrahimi, “Scrambling for video surveillance with privacy,” cvprw, vol. 0, pp. 160, 2006. [3] W. Zhang, S.-C. Cheung, and M. Chen, “Hiding privacy information in video surveillance system,” in Proceedings of the 12th IEEE International Conference on Image Processing, Genova, Italy, Sept. 2005, pp. 868–871. [4] J. Wickramasuriya, M. Datt, S. Mehrotra, and N. Venkatasubramanian, “Privacy protecting data collection in media spaces,” in ACM International Conference on Multimedia, New York, NY, Oct. 2004, pp. 48–55. [5] A. Senior et.al, “Blinlering surveillance: Enable video privacy through computer vision,” Tech. Rep., IBM, Research report, August 2003. [6] A. M. Berger, Privacy mode for acquisition cameras and camcorders, Sony Corporation, us patent 6,067,399 edition, May 23 2000. [7] J. Wada, K. Kaiyama, K. Ikoma, and H. Kogane, Monitor camera system and method of displaying picture from monitor camera thereof, Matsushita Electric Industrial Co. Ltd., european patent, ep 1 081 955 a2 edition, April 2001. [8] E. N. Newton, Latanya Sweeney, and B. Main, “Preserving privacy by de-identifying face images,” IEEE transactions on Knowledge and Data Engineering, vol. 17, no. 2, pp. 232–243, February 2005. Enabling Personal Privacy Protection Pref[9] H. Wactlar, S. Stevens, and T. Ng, erences in Collaborative Video Observation, NSF Award Abstract 0534625, http://www.nsf.gov/awardsearch/showAward.do?awardNumber=0534625. [10] I.J. Cox, M.L. Miller, and J.A. Bloom, Digital Watermarking, Morgan Kaufmann Publishers, 2002. [11] B. Korte, L. Lovasz, and R. Schrader, Greedoids, Springer-Verlag, New York/Berlin, 1991.

6

Variable Threshold Based Reversible Watermarking