On Locating Steganographic Payload using Residuals Tu-Thach Quach Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM, USA ABSTRACT Locating steganographic payload using Weighted Stego-image (WS) residuals has been proven successful provided a large number of stego images are available. In this paper, we revisit this topic with two goals. First, we argue that it is a promising approach to locate payload by showing that in the ideal scenario where the cover images are available, the expected number of stego images needed to perfectly locate all load-carrying pixels is the logarithm of the payload size. Second, we generalize cover estimation to a maximum likelihood decoding problem and demonstrate that a second-order statistical cover model can be used to compute residuals to locate payload embedded by both LSB replacement and LSB matching steganography. Keywords: S teganalysis, Payload Location, Maximum Likelihood Decoding
1. INTRODUCTION Digital image steganography embeds hidden messages into cover images to produce stego images, which appear innocuous to an unintended observer. A widely used method in digital image steganography is least-significant bit (LSB) replacement, where a portion of the LSBs of the cover image are replaced with the message bits. By manipulating only the LSBs, the stego image looks similar to the cover image making it difficult to detect by steganalysis detectors. Due to embedding, however, the stego image is clearly different from the cover image. These differences may reveal the presence of a hidden message. Once these differences are found, they can be used as features to train classifiers to detect stego images. Several detectors can be extended to estimate the size of the payload. This can be achieved by training classifiers using stego images carrying different payload sizes. The output of such classifier is no longer binary, but consists of an estimate of the payload size. More interesting are the Weighted Stego-image (WS) detectors.1, 2 These detectors involve estimating the cover image given the stego image. The estimate is then used in combination with the stego image to estimate the payload size. These payload-size estimators are highly accurate with a mean absolute error of the order of 10−4 . The next logical step in steganalysis is to extract the hidden message from the stego image. There are two approaches to this problem. The first technique involves searching for the correct key used in the embedding process.3 This method is applicable when the key space is small. The advantage is that once the key is found, the hidden message can be extracted readily. The alternative is to locate the payload using a number of stego images where each stego image has the payload at the same locations.4, 5 This could happen if the steganographer reuses the key and the stego images are the same size. This method only locates the payload; it cannot extract the hidden message since no logical orderings can be inferred from the payload locations. However, it is a crucial step in extracting the hidden message. Our goal is similar to the latter: to locate the payload using a number of stego images. We first show that in the best case where the cover images are available, the expected number of stego images needed to perfectly locate all load-carrying pixels is the logarithm of the payload size. For payloads of 45000 bits, on average, only 17 stego images are needed. An essential step in payload location is estimating the cover image given the stego image. We would like the estimate to be as close to the cover image as possible since the fidelity of the estimate directly influences the accuracy of payload location. Our approach is to find the most likely estimate, in a statistical sense, given the stego image. Once the most likely estimate is found, the typical method using residuals can be utilized to locate the payload. Further author information: T. Quach: E-mail:
[email protected]
In Section 2, we briefly review WS residuals, provide some bounds on the number of stego images needed to locate payload, and present our approach for estimating cover images. We demonstrate that our method can locate load-carrying pixels embedded by both LSB replacement and LSB matching steganography in Section 3. Further research directions are suggested in Section 4.
2. LOCATING PAYLOAD We denote a stego image as a vector s = (s1 , . . . , sn ) and the corresponding cover image as c = (c1 , . . . , cn ). A stego image is generated by embedding a payload of q bits per pixel (bpp) into the cover image using LSB replacement steganography (we use q instead of p to avoid confusion with probability which is used extensively here). The total number of bits embedded is therefore nq. The WS method first estimates the cover image using the stego image.4 We use the notation b c for the cover estimate. Once b c is obtained, the residuals ri = (si − sei )(si − cbi )
(1)
are computed, where sei indicates si with the LSB flipped. The residuals quantify the difference between the stego image and the cover estimate. If cbi is an unbiased estimator for ci , the estimation error is independent of the parity of ci , and the payload is independent of the cover, the expected values of the residuals ri satisfy 0, if si = ci , (2) E[ri ] = 1, if si = cei . Given N stego images of the same size, the residual of pixel i in image j is rij = (sij − sf c ij )(sij − c ij ).
(3)
The proportion of the number of images in which pixel i is flipped is ri· =
N 1 X rij . N j=1
(4)
Using (2), if pixel i is not a load-carrying pixel, then E[ri· ] = 0. Otherwise, if pixel i is a load-carrying pixel, then E[ri· ] = 0.5. In practice, these expectations are observed for large N . What is the best we can do in terms of the number of required stego images to locate some fraction of the payload? Without the embedding key, the best scenario is cbi = ci for all i. In other words, the cover images are available. For simplicity, we assume that the total number of bits is M = nq = 2k for some non-negative integer k. Since the payload is independent of the cover, with one stego image, we can locate about M 2 bits. M With a second stego image, the expected number of unlocatable bits reduces to 4 . In general, given N stego images, the expected number of unlocatable bits is 2MN . Therefore, on average, we need log2 (M ) + 2 stego images to locate all bits. This is a promising result as we only need a relatively few (logarithm of nq) stego images to locate the payload. To validate this result, we use a set of 1000 uncompressed grayscale TIFF images. We crop each image to 300x300. A fixed payload of 0.5 bpp is used to embed random bits using LSB replacement with the same key to generate 1000 stego images. For 10 iterations, as N increases from 1 to 18 (we expect to locate all payload pixels with log2 (45000) + 2 ≈ 17 stego images), we randomly choose N stego images and compute the mean residuals between the stego images and their corresponding cover images. The payload pixels are those with non-zero mean residuals. We average the number of payload pixels as a function of N over 10 iterations and plot the result in Figure 1. For comparison, we also plot the expected curve. It is clear that the experimental result matches our finding. In practice, we are unlikely to have access to the cover images. Instead, we estimate the cover images using several heuristics. A simple method estimates a cover pixel by averaging its four neighboring pixels.1 More
4
4.5
x 10
Located payload
4
Experiment Ideal
3.5
3
2.5
2
0
2
4
6 8 10 12 Number of stego images
14
16
18
Figure 1. Correctly identified payload pixels as a function of N stego images for the ideal scenario where the cover images are available. The payload is 0.5 bpp for a total of 45000 bits. The experimental result matches our expectation.
sophisticated methods involving weighted averages can also be used.2 In a statistical sense, the best estimate is the most likely given the stego image: b c = arg max p(c|s)
(5)
c
= arg max p(s|c)p(c).
(6)
c
We have slightly abused the notation c, which now indicates a candidate cover image, not the original cover image. For practical reasons, we make the following assumptions: Q 1. p(s|c) = i p(si |ci ), Q 2. p(c) = i p(ci |ci−1 , ci−2 ). The first assumption states that a stego pixel depends only on the current cover pixel. This is a valid assumption for many existing steganographic algorithms. The second assumption is a second-order Markov approximation of p(c), which is needed for computation. The likelihood probabilities p(si |ci ) are approximated from the estimated payload q: q 1 − 2 , if si = ci , q , if si = cei , p(si |ci ) = 2 0, otherwise.
(7)
The prior probabilities p(ci |ci−1 , ci−2 ) are obtained by learning from a set of known cover images. For each cover image, we scan in four different directions: →, ←, ↓, ↑. In each direction, we collect counts of pixel triples ci , cj , and ck . The counts are collected row by row and column by column for the horizontal and vertical directions, respectively. The totals of the counts from all four directions of all images are subsequently used to compute the prior probabilities. Once the probabilities are obtained, we proceed to finding b c using a search algorithm. The algorithm constructs a tree of candidate pixels. A node at depth i corresponds to a candidate cover pixel ci given the observed si . Each leaf node has a score based on its probability. The leaf node with the best score is used to expand the tree to level i + 1. This process is repeated until a node with the best score at depth n is reached. The path from the root of the tree to this node forms the most likely cover image. A trivial example will illustrate the steps of the algorithm. The prior probabilities are shown in Table 1. The stego sequence is s = (2, 2, 4) and q = 0.5 bpp. Since s1 = 2, the algorithm first constructs a tree with one level
Table 1. Prior probabilities used in example.
ci 2 3 4
p(ci ) 3/8 3/8 1/4
ci−1 2 2 3 3 4
ci 3 4 2 4 2
p(ci |ci−1 ) 2/3 1/3 1/2 1/2 1
ci−2 2 2 3 3 4
2
ci 4 2 4 2 3
2
0 2
ci−1 3 4 2 4 2
2
0
2
p(ci |ci−1 , ci−2 ) 1 1 1 1 1 0
2
2
0.0352 4
0.2813 3
3
0.0469
0.0469
3 0 5
root
root
root
root 4 0.0352 0.0264
0.0938
0.0938
3
3
(a)
2 3
(b)
2 3
0
0
3
3
(c)
5 0
(d)
Figure 2. Tree construction example. Given stego sequence s = (2, 2, 4), q = 0.5 bpp, and prior probabilities shown in Table 1, the algorithm forms tree sequences (a), (b), (c) and (d) by expanding the highest score leaf node highlighted in gray color. The value in each node corresponds to b ci and the value next to each node is the current cover estimate probability score according to (6) and (7). The score for node 2 in (a) is (3/8 * 0.75) = 0.2813.
as shown in Figure 2 (a). The highest score leaf node is highlighted and is chosen to expand to level 2 with s2 = 2 to form the tree shown in Figure 2 (b). The highest score leaf node is now in level 1. Expanding it forms the tree shown in Figure 2 (c). The process continues until the final tree is constructed as shown in Figure 2 (d). The sequence from the root node to the highest score leaf node forms b c = (2, 3, 4).
In our actual implementation, we use log-probability instead of probability to avoid zero saturation. We also make sure all prior probabilities have a minimum value greater than zero. Specifically, if pmin is the minimum non-zero prior probability, then all zero probabilities are assigned pmin /2. This is to avoid early elimination of unlikely sequences that may end up being optimal. Finally, to prevent the tree from growing too big, we prune it by removing low score nodes when the tree becomes large. This pruning step is the primary cause for sub-optimal decoding. The residuals in (1) apply only to LSB replacement steganography due to the asymmetry of LSB replacement. In our current scheme, the residual ri indicates whether the cover estimate and the stego image are different at location i. To extend it to LSB matching steganography, we adapt ri to
where I(x, y) is an indicator function
ri = I(si , cbi ), I(x, y) =
1, 0,
if x 6= y, otherwise.
(8)
(9)
This new form of ri is more general as it can be used in place of (1) for both LSB replacement and LSB matching steganography. In terms of adapting the decoding algorithm, the prior probabilities remain the same; only the likelihood probabilities need to be changed: 1 − q , if si = ci , q 2 if si = ci ± 1 and 1 ≤ ci ≤ 254, 4, (10) p(si |ci ) = q , if (ci = 0 and si = 1) or (ci = 255 and si = 254), 2 0, otherwise.
The same tree construction algorithm can now be used to find the most likely cover estimate given a stego image generated by LSB matching.
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
rI .
Figure 3. Histogram of the mean residuals with payload q = 0.5 bpp for N = 1000 stego images using a four-neighbor estimator.
0
0.02
0.04
0.06
0.08
0.1
0
0.02
rI .
0.04
0.06
0.08
0.1
rI .
Figure 4. Histogram of the mean residuals with payload q = 0.5 bpp for N = 500 (left) and N = 1000 (right) stego images using the maximum likelihood estimator.
3. EXPERIMENTAL RESULTS We now apply our cover estimation technique to locate payload using residuals. We use two sets of uncompressed grayscale TIFF images. Each set has 1000 images. One set is used for learning the prior probabilities and the other set is used for testing. We randomly crop each image to 300x300. A fixed payload of 0.5 bpp is used to embed random bits using LSB replacement with the same key. For comparison, we first use a simple estimator based on averaging four neighboring pixels.1, 4 Using 1000 stego images, we compute the mean residuals ri· and display their histogram in Figure 3. Even with 1000 stego images, we cannot observe the two peaks separating payload pixels from non-payload pixels. With 1000 stego images, the number of true positives is 33583 (74.63%) and the number of false positives is 12586 (27.97%), where we have used a threshold ri· > 0.25 to identify load-carrying pixels due to symmetry of the residuals. We repeat the same experiment using our maximum likelihood estimator. For each stego image, we obtain four estimates from four directions: →, ←, ↓, ↑. All four estimates are used to compute the residuals. The histograms of the mean residuals are shown in Figure 4 for two different values of N : 500 and 1000. The formation of the second peak can be seen even with N = 500 stego images. The location of the second peak, however, is not at 0.5. This is because our estimator is not unbiased; it is model and image dependent. Assuming the payload q is known, the payload pixels correspond to the nq locations with the largest mean residuals. A
Figure 5. A 20x20 region of the mean residuals with payload locations marked by a white dot using N = 1000 stego images and payload q = 0.5 bpp. Locations with large mean residuals correspond with the load-carrying pixels. Table 2. Correctly identified load-carrying pixels as a function of the number of stego images, N , for LSB replacement (column 2) and LSB matching (column 3) steganography with a fixed payload q = 0.5 bpp (45000 locations) using the maximum likelihood estimator.
N 1 10 100 200 300 400 500 1000
LSB Replacement 24248 (53.88%) 28888 (64.20%) 38562 (85.69%) 40977 (91.06%) 42449 (94.33%) 43141 (95.87%) 43637 (96.97%) 44644 (99.21%)
LSB Matching 22922 (50.94%) 26022 (57.83%) 34424 (76.50%) 36811 (81.80%) 38948 (86.55%) 41066 (91.26%) 41846 (92.99%) 43488 (96.64%)
simple heat map in Figure 5 confirms the locations with the largest mean residuals are indeed the load-carrying pixels. The accuracy of payload location is sensitive to the number of available stego images. In column 2 of Table 2, we show the number of correctly identified load-carrying pixels as a function of the number of stego images, N . It is apparent that as more stego images are available, better accuracy is obtained: 90% accuracy of payload location is achieved with 200 stego images, 99% accuracy with 1000 stego images. This is clearly better than using the four-neighbor average estimator, but is far from the ideal number 17. As noted earlier, with likelihood probabilities in (10), the same method can be used to locate payload embedded by LSB matching steganography. To this end, we perform the same experiment using LSB matching steganography. The results are shown in column 3 of Table 2. It is more difficult to locate payload embedded by LSB matching than LSB replacement. This coincides with the fact that LSB matching is known to be more difficult to detect than LSB replacement. Perhaps the difficulty lies in the symmetry of LSB matching that is not present in LSB replacement leading to more uncertainty in the decoding process. For comparison, we repeat the same experiment using the Wavelet Absolute Moments (WAM) payload locator,5 which was used to locate payload pixels subject to LSB matching steganography. The first-stage wavelet decomposition is performed using the 8-tap Daubechies filter. We do not use reflected borders and do not ignore edge pixels, which have been shown to slightly improve payload location accuracy. The results are shown in Table 3. The WAM method can locate only 72% of the payload even with 1000 stego images. In contrast, our method has an accuracy of 96%. It is notable that with WAM, using more stego images does not always improve accuracy. Increasing from 300 to 400 stego images actually decreases accuracy. This problem was also observed in the original work.5 An inherent problem with these linear filter estimators is the edge effect, where payload pixels near the edges
Table 3. Correctly identified load-carrying pixels as a function of the number of stego images, N , for LSB matching steganography with a fixed payload q = 0.5 bpp (45000 locations) using WAM.
N 1 10 100 200 300 400 500 1000
Correct 22614 (50.25%) 26432 (58.73%) 30900 (68.66%) 31298 (69.55%) 31772 (70.60%) 31529 (70.06%) 31994 (71.09%) 32563 (72.36%)
of an image are more difficult to locate. This is because pixels near an edge have fewer neighboring pixels, causing more errors in the estimates. As a consequence, the residuals of these near-edge pixels are not reliable. Several improvements such as reflecting borders and excluding edge pixels have been explored to reduce the edge effect. However, the outermost rows and columns remain problematic. In contrast, our maximum likelihood estimator does not suffer from this problem. In fact, the heat map in Figure 5 shows pixels near the top left edges.
4. DISCUSSION Using residuals is an interesting approach to locate steganographic payload. This method is easily defeated simply by using a different key each time. There are of course inconveniences associated with frequent key changes. At the least, the intended recipients of the stego contents must be aware of these changes. We have shown the maximum number of times a key can be reused and still defeat residual-based locators. This can be used to determine when a new key should be used. As a trade off between inconvenience and security, it is typical to share a set of keys instead just one. Given this situation, can we group stego images that share the same key from a set of stego images of mixed keys? This problem should be further investigated. Perhaps the most difficult challenge with residual-based locators is estimating the cover images. The better the estimates are, the closer we get to the theoretical best. We have shown an approach to finding the most likely estimate by learning the statistics from a mixture of cover images. The estimates are clearly better than pure guessing, but is still far from perfect. This is an interesting result on its own; it implies there are some intrinsic similarities across cover images that are violated by the embedding process. Our statistical model is a second order Markov chain. It may be possible to get better estimates if we use a higher order model. This does not come without a cost. Assuming that we use four bytes to store the probabilities, with a second order model, we need 4(2563 ) ≈ 67 MB. While the memory requirement is relatively small, with a third order model, we would need 17 GB. This would be a challenge even for high-end workstations. We do not believe a second order approximation is the main issue. The biggest criticism is that we are learning from a mixture of cover images, not the originating cover image of the stego image. It would be interesting to search for cover images that are similar to the stego image and use them to estimate the cover image.
REFERENCES [1] Fridrich, J. and Goljan, M., “On estimation of secret message length in LSB steganography in spatial domain,” in [Security, Steganography, and Watermarking of Multimedia Contents VI ], 5306, 23–34, SPIE (2004). [2] Ker, A. D. and B¨ ohme, R., “Revisiting weighted stego-image steganalysis,” in [Security, Forensics, Steganography and Watermarking of Multimedia Contents X ], 6819, SPIE (2008). [3] Fridrich, J., Goljan, M., and Soukal, D., “Searching for the stego-key,” in [Security, Steganography, and Watermarking of Multimedia Contents VI ], 5306, 70–82, SPIE (2004). [4] Ker, A. D., “Locating steganographic payload via WS residuals,” in [10th Multimedia and Security Workshop ], 27–31, ACM (2008). [5] Ker, A. D. and Lubenko, I., “Feature reduction and payload location with WAM steganalysis,” in [Media Forensics and Security XI ], 7254, 0A01–0A13, SPIE (2009).