On Locating Steganographic Payload using Residuals Tu-Thach Quach Electrical and Computer Engineering, University of New Mexico, Albuquerque, NM, USA ABSTRACT Locating steganographic payload using Weighted Stego-image (WS) residuals has been proven successful provided a large number of stego images are available. In this paper, we revisit this topic with two goals. First, we argue that it is a promising approach to locate payload by showing that in the ideal scenario where the cover images are available, the expected number of stego images needed to perfectly locate all load-carrying pixels is the logarithm of the payload size. Second, we generalize cover estimation to a maximum likelihood decoding problem and demonstrate that a second-order statistical cover model can be used to compute residuals to locate payload embedded by both LSB replacement and LSB matching steganography. Keywords: S teganalysis, Payload Location, Maximum Likelihood Decoding

1. INTRODUCTION Digital image steganography embeds hidden messages into cover images to produce stego images, which appear innocuous to an unintended observer. A widely used method in digital image steganography is least-significant bit (LSB) replacement, where a portion of the LSBs of the cover image are replaced with the message bits. By manipulating only the LSBs, the stego image looks similar to the cover image making it difficult to detect by steganalysis detectors. Due to embedding, however, the stego image is clearly different from the cover image. These differences may reveal the presence of a hidden message. Once these differences are found, they can be used as features to train classifiers to detect stego images. Several detectors can be extended to estimate the size of the payload. This can be achieved by training classifiers using stego images carrying different payload sizes. The output of such classifier is no longer binary, but consists of an estimate of the payload size. More interesting are the Weighted Stego-image (WS) detectors.1, 2 These detectors involve estimating the cover image given the stego image. The estimate is then used in combination with the stego image to estimate the payload size. These payload-size estimators are highly accurate with a mean absolute error of the order of 10−4 . The next logical step in steganalysis is to extract the hidden message from the stego image. There are two approaches to this problem. The first technique involves searching for the correct key used in the embedding process.3 This method is applicable when the key space is small. The advantage is that once the key is found, the hidden message can be extracted readily. The alternative is to locate the payload using a number of stego images where each stego image has the payload at the same locations.4, 5 This could happen if the steganographer reuses the key and the stego images are the same size. This method only locates the payload; it cannot extract the hidden message since no logical orderings can be inferred from the payload locations. However, it is a crucial step in extracting the hidden message. Our goal is similar to the latter: to locate the payload using a number of stego images. We first show that in the best case where the cover images are available, the expected number of stego images needed to perfectly locate all load-carrying pixels is the logarithm of the payload size. For payloads of 45000 bits, on average, only 17 stego images are needed. An essential step in payload location is estimating the cover image given the stego image. We would like the estimate to be as close to the cover image as possible since the fidelity of the estimate directly influences the accuracy of payload location. Our approach is to find the most likely estimate, in a statistical sense, given the stego image. Once the most likely estimate is found, the typical method using residuals can be utilized to locate the payload. Further author information: T. Quach: E-mail: [email protected]

In Section 2, we briefly review WS residuals, provide some bounds on the number of stego images needed to locate payload, and present our approach for estimating cover images. We demonstrate that our method can locate load-carrying pixels embedded by both LSB replacement and LSB matching steganography in Section 3. Further research directions are suggested in Section 4.

2. LOCATING PAYLOAD We denote a stego image as a vector s = (s1 , . . . , sn ) and the corresponding cover image as c = (c1 , . . . , cn ). A stego image is generated by embedding a payload of q bits per pixel (bpp) into the cover image using LSB replacement steganography (we use q instead of p to avoid confusion with probability which is used extensively here). The total number of bits embedded is therefore nq. The WS method first estimates the cover image using the stego image.4 We use the notation b c for the cover estimate. Once b c is obtained, the residuals ri = (si − sei )(si − cbi )

(1)

are computed, where sei indicates si with the LSB flipped. The residuals quantify the difference between the stego image and the cover estimate. If cbi is an unbiased estimator for ci , the estimation error is independent of the parity of ci , and the payload is independent of the cover, the expected values of the residuals ri satisfy  0, if si = ci , (2) E[ri ] = 1, if si = cei . Given N stego images of the same size, the residual of pixel i in image j is rij = (sij − sf c ij )(sij − c ij ).

(3)

The proportion of the number of images in which pixel i is flipped is ri· =

N 1 X rij . N j=1

(4)

Using (2), if pixel i is not a load-carrying pixel, then E[ri· ] = 0. Otherwise, if pixel i is a load-carrying pixel, then E[ri· ] = 0.5. In practice, these expectations are observed for large N . What is the best we can do in terms of the number of required stego images to locate some fraction of the payload? Without the embedding key, the best scenario is cbi = ci for all i. In other words, the cover images are available. For simplicity, we assume that the total number of bits is M = nq = 2k for some non-negative integer k. Since the payload is independent of the cover, with one stego image, we can locate about M 2 bits. M With a second stego image, the expected number of unlocatable bits reduces to 4 . In general, given N stego images, the expected number of unlocatable bits is 2MN . Therefore, on average, we need log2 (M ) + 2 stego images to locate all bits. This is a promising result as we only need a relatively few (logarithm of nq) stego images to locate the payload. To validate this result, we use a set of 1000 uncompressed grayscale TIFF images. We crop each image to 300x300. A fixed payload of 0.5 bpp is used to embed random bits using LSB replacement with the same key to generate 1000 stego images. For 10 iterations, as N increases from 1 to 18 (we expect to locate all payload pixels with log2 (45000) + 2 ≈ 17 stego images), we randomly choose N stego images and compute the mean residuals between the stego images and their corresponding cover images. The payload pixels are those with non-zero mean residuals. We average the number of payload pixels as a function of N over 10 iterations and plot the result in Figure 1. For comparison, we also plot the expected curve. It is clear that the experimental result matches our finding. In practice, we are unlikely to have access to the cover images. Instead, we estimate the cover images using several heuristics. A simple method estimates a cover pixel by averaging its four neighboring pixels.1 More

4

4.5

x 10

Located payload

4

Experiment Ideal

3.5

3

2.5

2

0

2

4

6 8 10 12 Number of stego images

14

16

18

Figure 1. Correctly identified payload pixels as a function of N stego images for the ideal scenario where the cover images are available. The payload is 0.5 bpp for a total of 45000 bits. The experimental result matches our expectation.

sophisticated methods involving weighted averages can also be used.2 In a statistical sense, the best estimate is the most likely given the stego image: b c = arg max p(c|s)

(5)

c

= arg max p(s|c)p(c).

(6)

c

We have slightly abused the notation c, which now indicates a candidate cover image, not the original cover image. For practical reasons, we make the following assumptions: Q 1. p(s|c) = i p(si |ci ), Q 2. p(c) = i p(ci |ci−1 , ci−2 ). The first assumption states that a stego pixel depends only on the current cover pixel. This is a valid assumption for many existing steganographic algorithms. The second assumption is a second-order Markov approximation of p(c), which is needed for computation. The likelihood probabilities p(si |ci ) are approximated from the estimated payload q:  q  1 − 2 , if si = ci , q , if si = cei , p(si |ci ) =  2 0, otherwise.

(7)

The prior probabilities p(ci |ci−1 , ci−2 ) are obtained by learning from a set of known cover images. For each cover image, we scan in four different directions: →, ←, ↓, ↑. In each direction, we collect counts of pixel triples ci , cj , and ck . The counts are collected row by row and column by column for the horizontal and vertical directions, respectively. The totals of the counts from all four directions of all images are subsequently used to compute the prior probabilities. Once the probabilities are obtained, we proceed to finding b c using a search algorithm. The algorithm constructs a tree of candidate pixels. A node at depth i corresponds to a candidate cover pixel ci given the observed si . Each leaf node has a score based on its probability. The leaf node with the best score is used to expand the tree to level i + 1. This process is repeated until a node with the best score at depth n is reached. The path from the root of the tree to this node forms the most likely cover image. A trivial example will illustrate the steps of the algorithm. The prior probabilities are shown in Table 1. The stego sequence is s = (2, 2, 4) and q = 0.5 bpp. Since s1 = 2, the algorithm first constructs a tree with one level

Table 1. Prior probabilities used in example.

ci 2 3 4

p(ci ) 3/8 3/8 1/4

ci−1 2 2 3 3 4

ci 3 4 2 4 2

p(ci |ci−1 ) 2/3 1/3 1/2 1/2 1

ci−2 2 2 3 3 4

2

ci 4 2 4 2 3

2

0 2

ci−1 3 4 2 4 2

2

0

2

p(ci |ci−1 , ci−2 ) 1 1 1 1 1 0

2

2

0.0352 4

0.2813 3

3

0.0469

0.0469

3 0 5

root

root

root

root 4 0.0352 0.0264

0.0938

0.0938

3

3

(a)

2 3

(b)

2 3

0

0

3

3

(c)

5 0

(d)

Figure 2. Tree construction example. Given stego sequence s = (2, 2, 4), q = 0.5 bpp, and prior probabilities shown in Table 1, the algorithm forms tree sequences (a), (b), (c) and (d) by expanding the highest score leaf node highlighted in gray color. The value in each node corresponds to b ci and the value next to each node is the current cover estimate probability score according to (6) and (7). The score for node 2 in (a) is (3/8 * 0.75) = 0.2813.

as shown in Figure 2 (a). The highest score leaf node is highlighted and is chosen to expand to level 2 with s2 = 2 to form the tree shown in Figure 2 (b). The highest score leaf node is now in level 1. Expanding it forms the tree shown in Figure 2 (c). The process continues until the final tree is constructed as shown in Figure 2 (d). The sequence from the root node to the highest score leaf node forms b c = (2, 3, 4).

In our actual implementation, we use log-probability instead of probability to avoid zero saturation. We also make sure all prior probabilities have a minimum value greater than zero. Specifically, if pmin is the minimum non-zero prior probability, then all zero probabilities are assigned pmin /2. This is to avoid early elimination of unlikely sequences that may end up being optimal. Finally, to prevent the tree from growing too big, we prune it by removing low score nodes when the tree becomes large. This pruning step is the primary cause for sub-optimal decoding. The residuals in (1) apply only to LSB replacement steganography due to the asymmetry of LSB replacement. In our current scheme, the residual ri indicates whether the cover estimate and the stego image are different at location i. To extend it to LSB matching steganography, we adapt ri to

where I(x, y) is an indicator function

ri = I(si , cbi ), I(x, y) =



1, 0,

if x 6= y, otherwise.

(8)

(9)

This new form of ri is more general as it can be used in place of (1) for both LSB replacement and LSB matching steganography. In terms of adapting the decoding algorithm, the prior probabilities remain the same; only the likelihood probabilities need to be changed:  1 − q , if si = ci ,    q 2 if si = ci ± 1 and 1 ≤ ci ≤ 254, 4, (10) p(si |ci ) = q , if (ci = 0 and si = 1) or (ci = 255 and si = 254),    2 0, otherwise.

The same tree construction algorithm can now be used to find the most likely cover estimate given a stego image generated by LSB matching.

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5

rI .

Figure 3. Histogram of the mean residuals with payload q = 0.5 bpp for N = 1000 stego images using a four-neighbor estimator.

0

0.02

0.04

0.06

0.08

0.1

0

0.02

rI .

0.04

0.06

0.08

0.1

rI .

Figure 4. Histogram of the mean residuals with payload q = 0.5 bpp for N = 500 (left) and N = 1000 (right) stego images using the maximum likelihood estimator.

3. EXPERIMENTAL RESULTS We now apply our cover estimation technique to locate payload using residuals. We use two sets of uncompressed grayscale TIFF images. Each set has 1000 images. One set is used for learning the prior probabilities and the other set is used for testing. We randomly crop each image to 300x300. A fixed payload of 0.5 bpp is used to embed random bits using LSB replacement with the same key. For comparison, we first use a simple estimator based on averaging four neighboring pixels.1, 4 Using 1000 stego images, we compute the mean residuals ri· and display their histogram in Figure 3. Even with 1000 stego images, we cannot observe the two peaks separating payload pixels from non-payload pixels. With 1000 stego images, the number of true positives is 33583 (74.63%) and the number of false positives is 12586 (27.97%), where we have used a threshold ri· > 0.25 to identify load-carrying pixels due to symmetry of the residuals. We repeat the same experiment using our maximum likelihood estimator. For each stego image, we obtain four estimates from four directions: →, ←, ↓, ↑. All four estimates are used to compute the residuals. The histograms of the mean residuals are shown in Figure 4 for two different values of N : 500 and 1000. The formation of the second peak can be seen even with N = 500 stego images. The location of the second peak, however, is not at 0.5. This is because our estimator is not unbiased; it is model and image dependent. Assuming the payload q is known, the payload pixels correspond to the nq locations with the largest mean residuals. A

Figure 5. A 20x20 region of the mean residuals with payload locations marked by a white dot using N = 1000 stego images and payload q = 0.5 bpp. Locations with large mean residuals correspond with the load-carrying pixels. Table 2. Correctly identified load-carrying pixels as a function of the number of stego images, N , for LSB replacement (column 2) and LSB matching (column 3) steganography with a fixed payload q = 0.5 bpp (45000 locations) using the maximum likelihood estimator.

N 1 10 100 200 300 400 500 1000

LSB Replacement 24248 (53.88%) 28888 (64.20%) 38562 (85.69%) 40977 (91.06%) 42449 (94.33%) 43141 (95.87%) 43637 (96.97%) 44644 (99.21%)

LSB Matching 22922 (50.94%) 26022 (57.83%) 34424 (76.50%) 36811 (81.80%) 38948 (86.55%) 41066 (91.26%) 41846 (92.99%) 43488 (96.64%)

simple heat map in Figure 5 confirms the locations with the largest mean residuals are indeed the load-carrying pixels. The accuracy of payload location is sensitive to the number of available stego images. In column 2 of Table 2, we show the number of correctly identified load-carrying pixels as a function of the number of stego images, N . It is apparent that as more stego images are available, better accuracy is obtained: 90% accuracy of payload location is achieved with 200 stego images, 99% accuracy with 1000 stego images. This is clearly better than using the four-neighbor average estimator, but is far from the ideal number 17. As noted earlier, with likelihood probabilities in (10), the same method can be used to locate payload embedded by LSB matching steganography. To this end, we perform the same experiment using LSB matching steganography. The results are shown in column 3 of Table 2. It is more difficult to locate payload embedded by LSB matching than LSB replacement. This coincides with the fact that LSB matching is known to be more difficult to detect than LSB replacement. Perhaps the difficulty lies in the symmetry of LSB matching that is not present in LSB replacement leading to more uncertainty in the decoding process. For comparison, we repeat the same experiment using the Wavelet Absolute Moments (WAM) payload locator,5 which was used to locate payload pixels subject to LSB matching steganography. The first-stage wavelet decomposition is performed using the 8-tap Daubechies filter. We do not use reflected borders and do not ignore edge pixels, which have been shown to slightly improve payload location accuracy. The results are shown in Table 3. The WAM method can locate only 72% of the payload even with 1000 stego images. In contrast, our method has an accuracy of 96%. It is notable that with WAM, using more stego images does not always improve accuracy. Increasing from 300 to 400 stego images actually decreases accuracy. This problem was also observed in the original work.5 An inherent problem with these linear filter estimators is the edge effect, where payload pixels near the edges

Table 3. Correctly identified load-carrying pixels as a function of the number of stego images, N , for LSB matching steganography with a fixed payload q = 0.5 bpp (45000 locations) using WAM.

N 1 10 100 200 300 400 500 1000

Correct 22614 (50.25%) 26432 (58.73%) 30900 (68.66%) 31298 (69.55%) 31772 (70.60%) 31529 (70.06%) 31994 (71.09%) 32563 (72.36%)

of an image are more difficult to locate. This is because pixels near an edge have fewer neighboring pixels, causing more errors in the estimates. As a consequence, the residuals of these near-edge pixels are not reliable. Several improvements such as reflecting borders and excluding edge pixels have been explored to reduce the edge effect. However, the outermost rows and columns remain problematic. In contrast, our maximum likelihood estimator does not suffer from this problem. In fact, the heat map in Figure 5 shows pixels near the top left edges.

4. DISCUSSION Using residuals is an interesting approach to locate steganographic payload. This method is easily defeated simply by using a different key each time. There are of course inconveniences associated with frequent key changes. At the least, the intended recipients of the stego contents must be aware of these changes. We have shown the maximum number of times a key can be reused and still defeat residual-based locators. This can be used to determine when a new key should be used. As a trade off between inconvenience and security, it is typical to share a set of keys instead just one. Given this situation, can we group stego images that share the same key from a set of stego images of mixed keys? This problem should be further investigated. Perhaps the most difficult challenge with residual-based locators is estimating the cover images. The better the estimates are, the closer we get to the theoretical best. We have shown an approach to finding the most likely estimate by learning the statistics from a mixture of cover images. The estimates are clearly better than pure guessing, but is still far from perfect. This is an interesting result on its own; it implies there are some intrinsic similarities across cover images that are violated by the embedding process. Our statistical model is a second order Markov chain. It may be possible to get better estimates if we use a higher order model. This does not come without a cost. Assuming that we use four bytes to store the probabilities, with a second order model, we need 4(2563 ) ≈ 67 MB. While the memory requirement is relatively small, with a third order model, we would need 17 GB. This would be a challenge even for high-end workstations. We do not believe a second order approximation is the main issue. The biggest criticism is that we are learning from a mixture of cover images, not the originating cover image of the stego image. It would be interesting to search for cover images that are similar to the stego image and use them to estimate the cover image.

REFERENCES [1] Fridrich, J. and Goljan, M., “On estimation of secret message length in LSB steganography in spatial domain,” in [Security, Steganography, and Watermarking of Multimedia Contents VI ], 5306, 23–34, SPIE (2004). [2] Ker, A. D. and B¨ ohme, R., “Revisiting weighted stego-image steganalysis,” in [Security, Forensics, Steganography and Watermarking of Multimedia Contents X ], 6819, SPIE (2008). [3] Fridrich, J., Goljan, M., and Soukal, D., “Searching for the stego-key,” in [Security, Steganography, and Watermarking of Multimedia Contents VI ], 5306, 70–82, SPIE (2004). [4] Ker, A. D., “Locating steganographic payload via WS residuals,” in [10th Multimedia and Security Workshop ], 27–31, ACM (2008). [5] Ker, A. D. and Lubenko, I., “Feature reduction and payload location with WAM steganalysis,” in [Media Forensics and Security XI ], 7254, 0A01–0A13, SPIE (2009).

On Locating Steganographic Payload using Residuals

ri = (si − ˜si)(si − ̂ci). (1) are computed, where ˜si indicates si with the LSB flipped. The residuals quantify the difference between the stego image and the cover estimate. If ̂ci is an unbiased estimator for ci, the estimation error is independent of the parity of ci, and the payload is independent of the cover, the expected ...

161KB Sizes 0 Downloads 165 Views

Recommend Documents

data model residuals residuals -
0. 22. Pop2. 0. 14. Pop1 data. 100. 101. 102. 103. 104. 0. 22. Pop2. 0. 14. Pop1 model. 100. 101. 102. 103. 104. 0. 22. Pop2. 0. 14. Pop1 residuals. -15. 0. 15.

Steganographic Generative Adversarial Networks
3National Research University Higher School of Economics (HSE) ..... Stacked convolutional auto-encoders for steganalysis of digital images. In Asia-Pacific ...

Cover Estimation and Payload Location using Markov ...
Payload location accuracy is robust to various w2. 4.2 Simple LSB Replacement Steganography. For each cover image in test set B, we embed a fixed payload of 0.5 bpp using LSB replacement with the same key. We then estimate the cover images, or the mo

Cover Estimation and Payload Location using Markov ...
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly ... Maximum a posteriori (MAP) inferencing:.

deviance residuals - GitHub
Sometimes this can go wrong. R will warn ... up to a point! (But computation is slower with bigger k) ... segment.data=segs, observation.data=obs, family=tw()).

Studies on locating moisture sensor for automatic ...
an automated furrow irrigation system. Many approaches, both analytical and empirical, have been developed for predicting the wetting front advance rate.

Locating the Greenhouse
greenhouse stock. The Department of Agronomy at Oklahoma. State University tests suitability of irrigation water at a nominal cost. Contact your local OSU County ... Also, gas heating systems are generally less expensive to purchase. The cost of the

Locating Borrowing Books.pdf
Locating Borrowing Books.pdf. Locating Borrowing Books.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying Locating Borrowing Books.pdf. Page 1 ...

Steganographic Communication in Ordered Channels
maximum latency in receiving a packet at the actual receiver we define the following permuter. Definition 1. A k-distance permuter is one in which the ...

Locating geographies of tourism
the task. The most consistently used online aca- ..... Education), while some national journals pub- ...... Young, T. 2006: False, cheap and degraded: when history ...

Locating geographies of tourism
*Email: [email protected]. I Introduction ... agement, marketing and economics – to which .... Placing the market and marketing place: tourist advertising …

Drone Integration for RF Scanner Payload
spectrum surveying for 4G, LTE, and 5G network optimization and IoT network cybersecurity. Hoverport is ... Project Goals. The aim of the project is to integrate our radio frequency scanner with the DJI Matrice 100 drone. In doing this, there will bo

synthetic aperture radar payload of radar imaging satellite (risat) - URSI
The principal motivation behind this development is to provide imaging radar data which will complement and supplement application capability provided by the .... PLC essentially transmits beam definition command and switching sequence ...

Isotomic Inscribed Triangles and Their Residuals
Jun 16, 2003 - BAb = AcC = s, BAc = AbC = −(s − a). Similarly, the other points of tangency Bc, Ba, Ca, Cb form pairs of isotomic points on the lines CA and AB respectively. See Figure 1. Corollary 4. The triangles AbBcCa and AcBaCb have equal ar

A bioinformatic tool for locating miRNA targets on plant ...
A bioinformatic tool for locating miRNA targets on plant genes .... X-meeting 2007 - Third International Conference of the Brazilian Association for Bioinformatics ...

Extracting Hidden Messages in Steganographic Images
Jul 16, 2014 - establishes an important result addressing this shortcoming: we show that ..... [5] A. D. Ker, Locating steganographic payload via WS residuals,.

On the self-locating response to the knowledge argument
Sep 3, 2010 - facts that one is said to be ignorant of in AA are self-locating or indexical facts. It has been discussed by a number of philosophers including ...

Optimal Cover Estimation Methods and Steganographic ...
WAM locator reflects pixels at the borders of the stego image to achieve the best ... We also use border reflection in .... http://ece.unm.edu/˜tuthach/decoder.html.

A Steganographic Approach to Localizing Botmasters
honeytoken to the IP address of a botmaster's machine. Keywords-botnets .... honeytoken include a credit card number, an Excel or. PowerPoint file, a database table or ...... Mobile Computing Opportunities and Challenges, 46(12), pp. 258-.