Compressing Encrypted Data ECE 559RB Cryptography Siva Theja Maguluri

Outline • Introduction • Distributed Source Coding  Lossless Compression – Slepian Wolf  Compression with Fidelity Criterion WynerZiv

• Information Theoretic Security • Compression of Encrypted Data • Computer Simulations  Lossless Compression of binary data  Lossy Compression of real valued data

• Conclusions 5/5/2009

Siva Theja Maguluri

2

Introduction • To transmit Redundant data over an insecure bandwidth constrained channel,

• Reversing the order of Encryption and Compression

5/5/2009

Siva Theja Maguluri

3

Introduction • Compressor does not have access to the Key • At first glance, it appears that not much gain can be obtained, because encrypted data looks quiet random • But we have a joint decompression and decryption. So, decrypter has access to the key • Turns out significant compression gain can be obtained, from distributed source coding theory • In some cases, same gains as in encryption followed by compression • Application – A scenario where data is being distributed on a network 5/5/2009

Siva Theja Maguluri

4

Distributed Source Coding: Lossless • To Compress Sources Y and K that are correlated, but cannot communicate with each other. • Lossless case – Discrete Source • Special Case: K is available at the Decoder, and is correlated to Y

• Slepian-Wolf Result – H(Y/K) in both cases

5/5/2009

Siva Theja Maguluri

5

Lossless Source Coding Example • K known both at encoder and decoder

• Y and K uniformly distributed binary length 3 • Y and K differ in at most one position ie Hamming distance 1 • Encoder transmits index of error e=Y+K {000,001,010, 100} – 2 bits

5/5/2009

Siva Theja Maguluri

6

Example Continued… • K known only at the decoder • Encoder cant find e – but that is not necessary • Do not differentiate between 000 and 111 etc • Cosets of repetition code – cover the entire space • Use index of Coset as encoding – 2 bits again

5/5/2009

Siva Theja Maguluri

7

Example Continued… • Suppose X is a random variable taking values on {000, 001, 010, 100} and K is a one time pad and Y = X+K • Hamming distance between Y and K is at most 1 • Can use this construction to compress Y to 2 bits, since decoder has access to K, it can decode Y and get X=Y+K • In general case, partition the space into cosets associated with the syndromes of the principal underlying channel (repetition code here) • Encoding – Compute Syndrome corresponding to the Channel code • Channel code – Choose depending on correlation between Y and K • Decoding – Identify closest codeword to K in the coset corresponding to the transmitted Syndrome

5/5/2009

Siva Theja Maguluri

8

Distributed Source Coding: Lossy • Wyner- Ziv extends Slepian- Wolf to the case of lossy coding with distortion measure • Discrete or Continuous • We will focus on Real Line with mean square error

5/5/2009

Siva Theja Maguluri

9

Compression with Fidelity Criterion - Example • Y- Uniformly distributed on [- 9δ/2,9δ/2] • Side Information K such that |Y-K|< δ

• Encoder will quantize Y to Y’ with step size δ . |Y-Y’|≤ δ/2 • This can be thought of as three interleaved quantizers (cosets) of size 3δ • Encoder transmits the label of the coset – log3 bits Y '− K ≤ Y '−Y + Y − K <

5/5/2009

δ 3δ +δ = 2 2

Siva Theja Maguluri

10

Example Continued… • Decoder finds the reconstruction level closest to K with same label and decodes Y

• log3 bits for reconstruction with in δ/2. In absence of K, it would have been log9 bits • Performance can be improved using more complex alternatives.

5/5/2009

Siva Theja Maguluri

11

Information Theoretic Security • General Secret Key Cryptosystem • WLOG discrete iid source • Block length n • Key independent of the source, uniformly distributed • Noiseless insecure public channel • Rate, R – bits per symbol,

5/5/2009

Siva Theja Maguluri

12

Performance Measures • Measure of Secrecy against Eavesdropper  Shannon-sense perfect secrecy I ( X ; B) = 0 I ( X ; B)  Wyner sense perfect secrecy lim =0 n  Maurer sense perfect secrecy lim I ( X ; B) = 0 n

n

• Measure of fidelity of the decoder i.e. expected distortion • Number of bits per source symbol, R • Number of bits per source symbol of the secret key – cardinality of key space

5/5/2009

Siva Theja Maguluri

13

Tradeoff between performance parameters

• RX(D) is the rate distortion function • Shannon Cryptosystem – Achieves these bounds

5/5/2009

Siva Theja Maguluri

14

Compression of Encrypted Data • Define XOR on general alphabet x⊕ y = y⊕ x x⊕z = y⊕z ⇒ x = y

• Reversed Cryptosystem

5/5/2009

Siva Theja Maguluri

15

5/5/2009

Siva Theja Maguluri

16

Performance Limit

• It can also be shown that this is the best possible performance for a system having this kind of structure

5/5/2009

Siva Theja Maguluri

17

Performance limit • For finite alphabets, it is possible to guarantee the stronger notion of Shannon sense perfect secrecy by sacrificing key efficiency (R’), by letting K be distributes uniformly over alphabet of X • How much compression can be achieved if the encryption scheme is pre specified? • When source is required to be reproduced at the decoder losslessly, by Slepian Wolf Theorem, one can compress up to the entropy rate of unencrypted source, without compromising on the security 5/5/2009

Siva Theja Maguluri

18

Special Cases

5/5/2009

Siva Theja Maguluri

19

Simulations – lossless compression of binary data • Binary source with empirical entropy .37 bits per pixel • Encrypt using pseudorandom Bernouli(1/2) string • Encrypted data has 1 bit/pixel empirical entropy • Incompressible if no side information • Compress by finding syndrome using a rate ½ LDPC code

5/5/2009

Siva Theja Maguluri

20

Simulations

• Modify the iterative decoding algorithm • At check nodes to take syndrome into account • Initialize with the knowledge of key and it’s correlation to the encrypted string • Decryption is trivial after decoding 5/5/2009

Siva Theja Maguluri

21

5/5/2009

Siva Theja Maguluri

22

Simulation – Lossy compression of Real valued data • iid Gaussian sequence, with variance 1 • Encrypted using a stream cipher • Key – iid gaussian, independent of data • Each sample is quantized, and the levels are labeled with 4 labels. This gives a sequence of binary digits, double the original length • Compressed, by finding the syndrome wrt a rate ½ trellis code – effectively, 1 bit/sample • Decoder finds the closest real valued sequence to the key which gives same syndrome • It then combines this sequence with the key sequence to get an optimal estimate of the encrypted data

5/5/2009

Siva Theja Maguluri

23

5/5/2009

Siva Theja Maguluri

24

Conclusions • Seen the possibility of compressing encrypted data without the knowledge of the key • Inspired by Distribution Source Coding Principles • In some cases, can be compressed to the same extent as the original unencrypted data

5/5/2009

Siva Theja Maguluri

25

References • K Ramachandran, V Prabhakaran et al, “On Compressing Encrypted Data”, IEEE Trans on Signal Proc vol 52 No. 10, pp2992-3006 Oct 2004 • S. S. Pradhan and K. Ramchandran, “Distributed source coding using syndromes (DISCUS): Design and construction,” IEEE Trans. Inform.Theory, vol. 49, pp. 626–643, Mar. 2003. • D. Slepian and J. K. Wolf, “Noiseless coding of correlated information sources,” IEEE Trans. Inform. Theory, vol. IT-19, pp. 471–480, July 1973. • A.Wyner and J. Ziv, “The rate-distortion function for source coding with side information at the decoder,” IEEE Trans. Inform. Theory, vol. IT-22, pp. 1–10, Jan. 1976. 5/5/2009

Siva Theja Maguluri

26

Questions?

5/5/2009

Siva Theja Maguluri

27

Linear Error Correcting Codes • Code C is a linear subspace of vector space on the finite field • G, generating matrix of the code, [Ik|A] – size(k,n) • Maps any k length vector into a n length vector in C, xTG T • H, Parity Check Matrix [-A |In-k ]

• Hx = 0 for x in C • If z= x+e, Hz=He – called Syndrome of z • Minimum hamming distance d between codewords • Syndrome decoding of Linear codes is efficient

5/5/2009

Siva Theja Maguluri

28

Lossless Compression – Slepian Wolf

May 5, 2009 - decoder has access to K, it can decode Y and get X=Y+K. • In general case, partition the space into cosets associated with the syndromes of the principal underlying channel. (repetition code here). • Encoding – Compute Syndrome corresponding to the. Channel code. • Channel code – Choose depending ...

822KB Sizes 2 Downloads 124 Views

Recommend Documents

No documents