Universal Secure Network Coding by Non-linear Secret ...

Viewer
Transcript

1

Universal Secure Network Coding by Non-linear Secret Key Agreement Chung Chan

Abstract—A non-linear secure network coding scheme is proposed where the secret is hidden by a non-linear function against any linear wiretappers. It is shown to achieve stronger notions of universality and security than linear secure network coding.

I. I NTRODUCTION The problem of secure multicast over a wiretap network was first considered by Cai and Yeung [3]. The network is modeled by a directed graph with nodes representing users and edges representing channels, each of which supports a flow of one packet in the direction of the edge. A source node wants to convey some secret to a subset of sink nodes by network coding, i.e. by generating codewords at each node and sending them over the edges, but in the presence of a wiretapper who attempts to learn the secret by eavesdropping a subset of the channels. A particular case of interest is that the wiretapper can eavesdrop up to a given number of channels, called the wiretapping rate, with the complete knowledge of the network coding scheme. The objective of secure network coding is to enable the users to communicate as much data as possible, not only reliably but also securely. A linear secure network coding framework was proposed in [3], turning any given reliable linear network code to a secure one by an additional linear precoding step at the source node. The linear precoding was later understood in [4] as the coset coding for the wiretap channel II in [5]. Following this linear paradigm, other constructions have been proposed that not only make a reliable network code secure but also achieve additional desirable properties. [6] gave a construction based on the maximum-rank distance code, making the linear precoding step universal to a class of linear network codes and so it works even without a complete knowledge of the network topology. [7] gave a construction that is strongly secure in the sense that different subsets of the secret elements are protected from wiretappers with different wiretapping rates simultaneously. A linear code is also given in [8] that is both universal and strongly secure. The security and universality of these schemes, however, relies on the ability of making the field size of the linear code large. They fail under the more general network model in [9] where packets may be split or merged by some underlying network protocol. Random Chung Chan ([email protected], [email protected]) is with the Institute of Network Coding, the Chinese University of Hong Kong. This work is partially supported by a grant from the University Grants Committee of the Hong Kong Special Administrative Region, China (Project No. AoE/E-02/08). The manuscript is available online in [1] at http://goo.gl/8zQwY, with preliminary results published in [2].

linear precoding was proposed as a solution, but it is under a stochastic wiretap model where the wiretapper cannot choose which channels to wiretap based on the random code. In this work, we deviate from the previous linear approach and consider the possibility of improving the notions of universality and strong security by hiding the secret non-linearly. The wiretap network is given as a black box that allows the users to share a random string reliably but with arbitrary linear functions of the string revealed to different wiretappers. This universal framework covers all possible linear coding, splitting and merging of packets. Without changing or observing the inner details of the black box, the users want to choose a non-linear function of the shared random string as a secret key that cannot be well-approximated by the linear functions observed by the wiretappers. The secret key is used later for encryption over a separate public channel and the security is characterized by a secrecy exponent that measures how fast the leaked information decays to zero simultaneously for different linear functions of the secret against linear wiretappers with different wiretapping rates. In the sequel, we first review some linear secure network coding schemes in §II to explain the ideas of universality and strong security, and to point out the limitations of linear precoding. In §III, non-linear secret key agreement will be shown to achieve stronger notions of universality and security. II. L INEAR SECURE NETWORK CODING A. Linear precoding It is well known [10] that reliable multicast at the capacity of the network can be achieved by linear network coding. More precisely, a packet of length m represents a symbol in the finite field Fqm of order q m , where q is the size of some fixed alphabet for each element in the packet. The message is represented by a row vector x ∈ Fnqm . The source node and the intermediate nodes choose each outgoing packet as a linear combination of the symbols from the message and incoming packets respectively. The collection of all symbols transmitted over the channels in the network can be written as a vector y = xC where C ∈ Fn×t q m is an n-by-t matrix called the global coding kernel and t is the number of channels in the network. The symbols from the incoming edges of a node form a subvector xA of y where columns of A are the global coding vectors from the columns of C. If the set of global coding vectors in A has rank n, then the node can recover x. The fundamental result for linear network coding is that the sets of global coding

2

vectors for the sink nodes can have rank n (i.e. x is recoverable by the sink nodes) simultaneously if m is sufficiently large and n is the minimum size of an edge cut separating the source from one or more sink nodes. Linear coding is optimal in the sense that even non-linear code cannot achieve a large n because the size of the edge cut upper bounds the number of packets that can be reliably communicated from the source node to the sink nodes separated from the source by the cut. To make the reliable code secure, [3, 4] added a linear precoding step at the source node that combines the secret message with some junk data over an extension field Fqmk for some positive integer k. Let s ∈ Flqmk be the secret for l ≤ n and r ∈ Fn−l q mk be some independent and uniformly random junk data. The source node computes the coded message x = [s r] T = sT 1 + rT 2 n×n 1 using some invertible precoding matrix T = [ T T 2 ] ∈ Fq mk . The coded message can be multicast to the sink nodes using the reliable network code except that each packet now has length mk instead of m to represent a symbol in Fqmk , and the coding coefficients in the global coding kernel C over Fqm are now regarded as elements over Fqmk to carry out the finite field arithmetic. The resulting network code remains reliable because the sink nodes can recover the coded message after observing n linearly independent combinations of it, from which the secret can also be recovered by inverting T . The purpose of the additional transformation by T is to mix the junk data with the secret well enough so that the wiretapped information is independent of the secret. Suppose the wiretapper eavesdrops µ channels and therefore observes

w = xB = sT 1 B + rT 2 B where the columns in B ∈ Fn×µ q m are linearly independent vectors from the columns of C. If T 2 B has rank µ, then rT 2 B is uniformly distributed and so is w regardless of the realization of s, implying that w is independent of s as desired. This is along the same idea in the one-time pad [11], where a uniformly random key is added to the secret so that the resulting cryptogram remains uniformly distributed independent of the secret. For the (n − l)-by-µ matrix T 2 B to have rank µ, the number of rows must be at least µ, i.e. l + µ ≤ n.

(1)

Assume l + µ = n without loss of generality by regarding some junk data as part of the secret. To construct the desired n×(n−µ) T , find Q ∈ Fqmk such that [B Q] is invertible. Then, make T 2 the parity-check matrix of Q⊺ , i.e. T 2 has rank µ with T 2 Q = 0, and choose T 1 such that T is invertible. It T 1B T 1Q follows that T [B Q] = [ T 0 ] has full rank and so 2B T 2 B has rank µ as desired. By the counting argument in [3], there exists a matrix Q that works simultaneously for all possible B’s provided that k is large enough. Consider the non-trivial case µ < n and suppose that the first i ∈ {0, . . . , n − µ − 1} columns Qi ∈ Fn×i q mk of Q can be successfully chosen such that [B Qi ] has full rank µ + i for all possible B’s. Then, the i-th column can also be chosen if there is a vector not in the column space of [B Qi ] for any B. The number

of such vector is q mkn − (µt )q mk(µ+i) since ∣Fnqmk ∣ = q mkn and the column space of [B Qi ] has size q mk(µ+i) while there are at most (µt ) distinct spaces for different choices of the µ columns of B from the t columns of C. Since µ + i < n, the number is positive as desired if q mk > (µt ), which is possible with k large enough. Thus, l ≤ n − µ packets can be multicast securely given wiretapping rate µ. B. Universality It is possible to take the counting argument further. Provided that k is large enough, e.g. with k > nµ, there exists a matrix Q that works simultaneously for all possible choices of B as a matrix of rank µ from Fn×µ q m , not necessarily a submatrix of a given global coding kernal C. i.e. not only can a linear precoding T make a reliable network code C secure, it can do so without a complete knowledge of C. Such a secure network code is called universal. This property first introduced in [6] is particularly useful for random network coding where the coding coefficients in C are not fixed a priori but chosen randomly to adapt to an unknown network topology. An explicit construction given by [6] for l+µ = n is to make T 2 the generator matrix of a maximum-rank-distance (MRD) code over Fqmk with base field Fqm and extension k ≥ n. This choice has the desired property that the rank of T 2 B is µ for every matrix B ∈ Fn×µ q m of rank µ. To explain this, consider some basis formed by the elements α1 , . . . , αk ∈ Fqmk linearly independent over the base field Fqm . Any element x in the extension field Fqmk can be expressed as ∑ki=1 xi αi ′ for some xi ’s in the base field. Similarly, a vector x ∈ Fnqmk ′ can be expressed as ∑ki=1 xi αi with vectors xi ∈ Fnqm . The rank metric of x is defined as the maximum number of xi linearly independent over Fqm , or equivalently, the rank of x1 the matrix [ x⋮ ]. Like the Hamming weight, rank metric gives k a distance measure for a code. An [n′ , k ′ , d]-rank distance code over Fqmk with base field Fqm is a linear code with length n′ , dimension k ′ and the minimum rank metric of the difference between any two codewords at least d. For k ≥ n′ , the Singleton bound for the minimum distance is d ≤ n′ −k ′ +1. This can be achieved with equality by a code referred to as the MRD code for any k ′ ≤ n′ ≤ k. Suppose T 2 is the generator matrix of an [l + µ, µ, l + 1]-MRD code. Then, rT 2 written as µ k ∑i=1 xi αi is a non-zero codeword for any non-zero r ∈ Fqmk . In particular, the rank of the set of xi ’s is at least the minimum rank distance l+1. Since the nullity of B ⊺ is n−µ, the number of xi ’s with xi B ≠ 0 is at least (l + 1) − (n − µ) = 1 by the assumption l + µ = n. This implies that rT 2 B ≠ 0 for all non-zero r, i.e. the desired property that T 2 B has rank µ. For the more general case (1), the desired property follows if T 2 contains a µ-by-n submatrix that generates the [n, µ, l+1]MRD code, i.e. T 2 spans an MRD code with dimension at least µ. Thus, l ≤ n − µ packets can be multicast securely by linear precoding that does not depend on the coding coefficients of the given reliable network code. C. Strong security If µ > n − l violating (1), secure multicast may not be possible but it is still desirable to protect the secret as much as

3

possible. In [7], a secure network code is called strongly secure if any subset of min{n − µ, l} elements of the secret vector s can be made independent of the wiretapped information from any µ ≤ n channels. An explicit construction of a universal strongly secure network code is given by [8] based on the MRD code, assuming that the secret is uniformly random. To explain the idea, let ti be the i-th row of T . For some J ⊆ [l] ∶= {1, . . . , l}, express the wiretapped information as w = ∑ si t i B + i∈J

n ⎛ ⎞ ∑ si ti + ∑ ri−l ti B ⎝i∈[l]∖J ⎠ i=l+1

where si and ri are the i-th elements of s and r respectively. Regarding (si ∶ i ∈ [l] ∖ J) as uniformly random junk data, it can be argued as before that w is uniformly random independent of (si ∶ i ∈ J) for any B of rank µ if (ti ∶ i ∈ [n] ∖ J) spans an MRD code over Fqmk with base field Fqm , dimension at least µ, and length at least l + µ. If this can be satisfied by an invertible matrix T simultaneously for all possible J and µ satisfying ∣J∣ + µ ≤ n, then the code is strongly secure, reliable and universal. In particular, it suffices to have (ti ∶ i ∈ [n] ∖ J) span an [n, n − ∣J∣, ∣J∣ + 1]-MRD code for all J ⊆ [l] ∶ 0 ≤ ∣J∣ ≤ l. The desired T is obtained in [8] by removing the first l columns of any generator matrix n×(n+l) G = [I n P ] ∈ Fqmk of an [n+l, n, l +1]-MRD code with base field Fqm , where G is in the standard form with the first n columns being the identity matrix I n . i.e. 0 T = [ I n−l

P].

Such a code exists for any k ≥ n + l. By construction, the i-th row of G is g i = [ei ti ] where ei ∈ Flqmk is all-zero except the i-th element. For some J ⊆ [l] ∶ 0 ≤ ∣J∣ ≤ l, consider the subcode spanned by (g i ∶ i ∈ [n] ∖ J). It has dimension n−∣J∣ and the rank distance is at least l+1. Furthermore, for all i ∈ J, the i-th element of every codeword is zero and so the i-th column can be removed without diminishing the rank distance. Removing the remaining l − ∣J∣ columns indexed by [l] ∖ J reduces the minimum rank distance to at least ∣J∣ + 1. Indeed, this is the desired MRD code spanned by (ti ∶ i ∈ [n] ∖ J). In particular, T is invertible as it spans an [n, n, 1]-code. D. Strong security with helper nodes [12] considered a more general network with helper nodes. Although the strongly secure code in [12] is neither universal nor in the form of linear precoding, we will show below that the universal code in [8] can adapt directly to a universal model with helper nodes. Let n ∶= n1 + n2 and x = [x1 x2 ] ∈ Fnqmk , where x1 and x2 are arbitrary messages that can be multicast reliably from a helper and a source nodes respectively to the sink nodes. As before, a wiretapper with wiretapping rate µ ≤ n can observe w = xB by choosing any matrix B ∈ Fn×µ qm . Let s ∈ Flqmk for some l ≤ n2 be the uniformly random secret that the source node wants to multicast securely to the sink nodes. To do so, a helper node first generates and 1 multicasts some uniformly random junk data x1 ∈ Fnqmk to the source and sink nodes. Then, the source node generates some 2 −l addition uniformly random junk data r ∈ Fqnmk independent of (s, x1 ) and mix the secret linearly with the junk data into the

coded message x2 = [s x1 r] T˜ for transmission by some 2 precoding matrix T˜ ∈ Fn×n to be decided. The entire message q mk multicast to the source and sink nodes can be written as x = [x1

x2 ] = [s

x1

0

r] [ I n1 0

T˜ ]} =∶ T

where the first and second 0’s are the zero matrices of l and n2 − l rows respectively. The last matrix T is precisely in the form of the universal strongly secure precoding matrix given by [8] in §II-C for the model without the helper node but with [x1 r] being the junk data generated by the source node. Set k ≥ l + n and choose T as in §II-C, i.e. T˜ = [ I n02 −l P ]. As argued before, this choice not only makes T invertible, which allows the secret to be recovered from x by the sink nodes, but it also satisfies the strong security in [12] that every subset of λ ≤ l elements in s is independent of every µ ≤ n − λ wiretapped packets w = xB for any choice of B ∈ Fn×µ q m . The key idea is that T has the right form that allows x1 to be “precoded” by a helper node who does not know s. E. Random linear precoding [9] considered a more general wiretap network where packets may be split or merged, which may be unavoidable due to some underlying network protocol. The proof of security so far relies on the assumption that each symbol in the extension field Fqmk is wiretapped as a whole but this may no longer hold when packets are split or merged over the network. To explain this, we need to describe how information are leaked through subsets of elements of a packet under linear network coding. Consider some basis formed by β1 , . . . , βm ∈ Fqm that are linearly independent over Fq . A packet [x1 . . . xm ] ∈ Fm q b of size m can be taken to represent an element x = ∑m b=1 x βb in Fqm . Suppose a node performs the linear coding y = n′ m ∑i=1 xi ci over Fqm where each xi = ∑b=1 xbi βb ∈ Fqm is b formed by some elements xi ’s from the incoming packets or b the message, ci = ∑m b=1 ci βb ∈ Fq m is the coding coefficients m b and y = ∑b=1 y βb ∈ Fqm gives the elements y b ’s to be sent in some outgoing packets. Splitting and merging of packets only affects how the elements are organized into packets but not the linear dependence of y b ’s on xbi ’s shown below. m

n′

m

m

b b b ∑ y βm = ∑ ∑ xi 1 βb1 ∑ ci 2 βb2 b=1

i=1 b1 =1 n′

m

b2 =1

m

= ∑ ∑ ∑ xbi 1 cbi 2 βb1 βb2 . i=1 b1 =1 b2 =1

If we let χb1 b2 b be the indicator of βb1 βb2 = βb , then y b = n′ m m b b b ∑i=1 ∑b1 =1 xi 1 ci 2 ∑b2 =1 χb1 b2 b , which is linear in xi 1 ’s. Thus, nm if the possibly coded message x ∈ Fq can be organized into packets and multicast reliably over the network this way, then the vector of all transmitted packet elements over all channels can be expressed as y = xC for some global coding kernel C ∈ Fnm×tm . A wiretapper who can eavesdrop µ channels q may now observe w = xB for some B ∈ Fnm×nµ . Without q any assumption on how data are arranged into packets, B is completely arbitrary. This is different from the previous model where the elements of x are from a larger field. If the secret

4

s is now linearly precoded into x = [s r] T as before for some precoding matrix T ∈ Fqnm×nm so that it can be linearly decoded as s = xD for some decoding matrix D ∈ Flm×nm , q then xB may potentially contain some elements of the secret if B contains some columns from D. This is possible without any further restriction on the choice of B and so the previous linear precoding approach is insecure. One solution proposed in [9] is to make the wiretapper unable to select the coefficients of B strategically based on the network code. This is done by random network coding, the coding coefficients of which are completely revealed only after the wiretapper chooses which channels to wiretap. More precisely, the wiretapper observes w = xB and the random matrix B of rank at most µm. The source node multicasts the coded message x = [s r] T by a uniformly random invertible (n−l)m matrix T of rank nm independent of B, where r ∈ Fq is a uniformly random junk data and s = [s1 . . . sl ] is the uniformly random secret independent of B and T with the i-th component si ∈ Fm q . The wiretapper observes B and T eventually but he cannot make B depend on T by selecting which channels to wiretap. It can then be argued that, for any subset (si ∶ i ∈ J ⊆ [l]) of ∣J∣ ≤ n − µ secret components, the amount of information I((si ∶ i ∈ J) ∧ w, B, T) leaked to the wiretapper decays to zero exponentially in m as q −m(n−µ) , where I(⋅ ∧ ⋅) denotes the mutual information [10]. The argument is an extension of the privacy amplification theorem first introduced by [13, 14]. It provides strong security under the more general wiretap network in the expense of leaking a small amount of information to the wiretapper. It is not possible to find a particular realization of T that works well for all possible distributions of B, fundamentally because linear precoding coding hides the secret as a linear function of the coded message, which can be leaked through some choice of B as argued before. Thus, randomization is essential unless some information or restriction on the distributions of B is given. Since T is required for decoding, it needs to be communicated to the sink nodes in the network, which imposes an overhead of about (nm)2 packet elements since the number of nm-by-nm invertible matrices 2 nm is ∏nm − q i ) ≥ q (nm) −nm . Since m needs to be large i=1 (q for the amount of leaked secret information to be small, this O(m2 ) overhead can be quite significant. m also needs to scale with the size of the network because there can be many wiretappers each observing µ channels independently, and so the randomly chosen T needs to work well simultaneously for multiple independent B’s. To make this clear, consider the worst scenario where every possible matrix B of rank µ is observed by at least one wiretapper, then at least one wiretapper will have the right choice of B to recover some secret elements. Thus, random linear precoding may fail in the presence of many wiretappers, in the same way that linear precoding fails against a wiretapper who can chooses B with complete knowledge of the code. III. N ON - LINEAR SECURE NETWORK CODING As explained in the previous section, the weakness of linear precoding is that the secret is a linear function of the

coded message and can therefore be decoded partially by a wiretapper who can observe an arbitrary linear function of the message up to any given positive rate. This motivates us to consider the alternative approach of hiding the secret by a non-linear function that cannot be well-approximated by linear functions. In particular, we will show that such a transformation exists with the desired security and university in a stronger sense compared to linear precoding. The wiretap network we consider is abstracted as a random vector x ∈ Fnq that is chosen by some helper nodes, not necessarily the source node, and can be recovered by the source and sink nodes. The length n can be increased by using the network for a longer time. q ≥ 2 is the alphabet size of an element in a packet, which is the basic unit for data representation and cannot be altered for any purpose. Nothing else is assumed about the network except that a wiretapper can eavesdrop any linear combinations of the elements of x up to some wiretapping rate µ ≤ n, i.e. the wiretapper observes w = xB Fn×µ . q

for some matrix B ∈ As explained in §II-E, this covers any networks with linear coding over any extension field of Fq as well as splitting and merging of packets. Instead of generating x from the secret, the users make x a uniformly random junk data and then extract from it a key k = θ(x) using some non-linear function θ ∶ Fnq ↦ Flq . Then, the secret s ∈ Flq independent of x is encrypted by the source node into the cryptogram c using the one-time pad [11] c=s+k and communicated to the sink nodes over a separate authenticated public channel observable by everyone including the wiretapper. Compared to the direct approach [15] of transforming the secret non-linearly into x, this additional encryption step incurs an overhead that is linear in the size of the message. However, it not only saves us from the trouble of inverting a non-linear function, but it also permits secure communication even when only a helper node, not the source node, can multicast reliably over the wiretap network. The coding scheme is reliable because every source or sink nodes observe x, from which they can extract the key k and decrypt the secret s by subtracting k from the cryptogram c. If k is uniformly distributed independent of the wiretapped information w, then c will be uniformly distributed independent of s. We can measure how uniformly random k is and therefore how secure s is as follows. Define for every ν ∈ {0, 1, . . . , n}, ςnθ (ν) ∶= max [λ − H(kA∣xB = w)] .

(2)

All the logarithms are taken base q including the definition of the entropy H(⋅) [10]. A ∈ Fl×λ and B ∈ Fn×µ are q q matrices of rank λ ≤ l and µ ≤ n respectively. w ∈ Fµq is a possible realization of the wiretapped information xB. The maximization is taken over all λ, µ, A, B, w satisfying λ + µ + ν ≤ n.

(3)

5

Note that H(kA∣xB = w) ≤ λ with equality if and only if kA is uniformly distributed given xB = w. Thus, ςnθ (ν) = 0 implies that kA is uniformly distributed independent of xB for all possible choices of A and B with the gap between n and the total rank λ + µ no smaller than ν. The notion of security is stronger than [7] in the sense that ςnθ not only measures the secrecy of subsets of the key elements but also their linear combinations. The maximization over all B also gives a stronger notion of universality than [9]. The secrecy exponent S ∶ [0, 1] ↦ R is said to be achievable if there is a sequence of θ in n with lim inf

min

n→∞ ν∈{0,...,n}

[

−1 log ςnθ (ν) − S( nν )] ≥ 0. n

(4)

In other words, there exists δn → 0 such that ςnθ (ν) ≤ q −n(S(ν/n)−δn )

(5)

for all non-negative gap ν. Then, the secrecy of s can be expressed in terms of an achievable S as follows. For some choice of A, B, w described above, denote the amount of information of sA leaked to a wiretapper who observes c and the event xB = w by ςns (A, B, w) ∶= I(sA ∧ c∣xB = w).

(6)

More precisely, if S(ϵ) > 0 for some ϵ > 0, then, by making the gap ν ≥ ϵn, the first term in (d) decays to zero exponentially. If the secret is also nearly uniformly distributed with l−H(s) → 0 as n → ∞, then the amount of leaked information goes to zero, giving the desired strong security. In the special case when A = I l is the identity matrix, the bound can also be improved by dropping the terms involving E in the derivation. i.e. ςns (I l , B, w) = I(s ∧ c∣xB = w) ≤ q −n(S(ν/n)−δn ) . In other words, the secret message needs not be uniformly distributed if we just want to protect it from a wiretapper who observes µ ≤ n − l − ϵn messages. The following theorem shows that indeed S(ϵ) > 0 for any ϵ > 0, which establishes the desired notion of strong security. Theorem 1 S(γ) ∶= γ is achievable.

P ROOF Consider the non-trivial case l ≥ λ ≥ 1, µ ≥ 1, i.e. both A and B have at least one column. We prove the theorem using the method of types [16] and a random coding argument. Unlike the random linear precoding scheme in [9], the random coding argument here implies a deterministic choice of the key function θ and so random coding is not needed for the desired security. Let us first define some notations needed for the method of types. Let

Let E be some matrix such that [A E] is invertible. Then, (a)

ςns (A, B, w) = I(sA ∧ cA∣xB=w) + I(sA ∧ cE∣cA, xB=w)

2

L ∶= {x ∈ Fnq ∶ xB = w}

(b)

be the set of possible vector x that may give rise to the wiretapped information w spanned by the columns of B. Then, the size of L satisfies

(c)

q λ+ν ≤ ∣L∣ = q n−µ ≤ q n−1 .

= H(cA∣xB = w) − H(cA∣sA, xB = w) + H(cE∣cA, xB=w) − H(cE∣sA, cA, xB=w) ≤ [λ − H(kA∣xB = w)] + (l − λ) − H(sE∣sA)

(d) −n(S(ν/n)−δ ) n

≤q

+ l − H(s).

(a) is from the chain rule after replacing c by its bijection c [A E]. The mutual information terms are expanded in (b). (c) can be obtained by bounding each entropy term in (b) as follows. The first term is H(cA∣xB = w) ≤ λ because cA has length λ. Similarly, the third term is H(cE∣cA, xB = w) ≤ l − λ. The second term is H(cA∣sA, xB = w) = H(kA∣sA, xB = w) because there is a bijection between cA and kA given the value of cA = (s + k)A. This entropy simplifies to H(kA∣xB = w) because s is independent of x and therefore k, which is a function of x. The final entropy term is H(cE∣sA, cA, xB = w) = H(cE∣sA, kA, xB = w) because cA can be replaced by the bijection kA given sA. The entropy is then upper bounded by H(cE∣sA, k, xB = w) because kA is determined by k. The entropy equals H(sE∣sA, k, xB = w) because there is a bijection between cE and sE given k. Finally, the entropy simplifies to H(sE∣sA) because s is independent of x and therefore k. Finally, (d) is because the first bracketed term in (c) measures the uniformity of k, which is upper bounded by q −n(S(ν/n)−δn ) for some δn → 0 by (5). The remaining term is obtained from H(sE∣sA) = H(s) − H(sA) ≥ H(s) − λ. Note that the bound in (d) depends on the choice A, B, w only through the gap ν. It relates the amount of leaked information to the uniformity of the secret key and message.

(7)

The equality is because the nullity of B ⊺ is n − µ. The upper bound is because µ ≥ 1 while the lower bound follows from (3). We write ϕ ∈ TQ if ϕ ∶ L ↦ Fλq and Q ∶ Fλq ↦ Q satisfy Q(α) =

∣{x ∈ L ∶ ϕ(x) = α}∣ ∣L∣

for α ∈ Fλq .

TQ is called a type class and Q called a type. The set of possible types for ϕ is denoted by P with size λ

λ

λ

∣P∣ ≤ (∣L∣ + 1)q ≤ (q n−1 + 1)q ≤ q nq .

(8)

The first inequality is because there are at most ∣L∣+1 possible ∣L∣ 0 1 , ∣L∣ , . . . , ∣L∣ }. The second choices for each Q(α), namely { ∣L∣ inequality is because of the upper bound in (7). The last inequality holds for q ≥ 2. For the random coding argument, pick θ completely randomly independent of everything else, with θ(x) uniformly distributed over Fnq and independent over x ∈ Fnq . Define ϕ ∶ L ↦ Fλq by ϕ(x) ∶= θ(x)A for all x ∈ L. Then, by the uniformity and independence of θ(x)’s, we also have ϕ(x) uniformly distributed over Fλq and independent over x ∈ L. By the method of types [16], Pr{ϕ ∈ TQ } ≤ q −∣L∣D(Q∥q

−λ

)

(9)

for any type Q ∈ P, where D(Q∥q −λ ) = λ − H(Q) is the information divergence [10] from Q to the uniform distribution

6

over Fλq . Given xB = w, we have x uniformly distributed over L. Thus, for all α ∈ Fλq and any choice of ϕ ∈ TQ , Pr{kA = α∣xB = w} = Q(α) and so we have λ−H(kA∣xB = w) = λ−H(Q) = D(Q∥q λ ) in the definition (2) of ςnθ . Now, regarding λ−H(kA∣xB = w) as a random variable with randomness due to the random choice of θ, we have for any δ ≥ 0 that (e)

Pr{λ − H(kA∣xB = w) ≥ δ} =

∑

Pr{ϕ ∈ TQ }

∑

q −∣L∣D(Q∥q

Q∈P∶D(Q∥q −λ )≥δ (f)

≤

−λ

)

P ROOF Fix q = 2, λ = µ = 1 and rewrite θ(⋅)A as the boolean function θ′ ∶ Fn2 ↦ F2 and B as the boolean row vector b ∈ Fn2 . A direct consequence of Parseval’s equation [17] is that RRR RRR n RRR θ ′ (x)+xb⊺ RRR max (−1) ∑ RRR RRR ≥ 2 2 b∈Fn n R 2 R RRx∈F2 RR n θ ′ (x)+xb⊺ +c ≥ 22 . max ∑ (−1) n 2

The L.H.S. can be viewed as a measure of linearity of θ′ and so the above bound asserts a limit on how non-linearly a boolean function θ′ can be. More precisely, let

≤ ∣P∣q

≤q

N (b, c) ∶= {x ∈ Fn2 ∶ θ′ (x) = xb⊺ + c}.

−∣L∣δ

(h) nq λ −q λ+ν δ

= qq

q

λ

(n−q ν δ)

Then, the summation in (10) can be rewritten as

.

(e) is because λ − H(kA∣xB = w) = λ − H(Q) = D(Q∥q λ ) when ϕ ∈ TQ as argued before. (f) is by (9). (g) is obtained −λ by applying the upper bound q −∣L∣D(Q∥q ) ≤ q −∣L∣δ under the constraint D(Q∥q −λ ) ≥ δ. (h) is by (8) and the lower bound in (7). To prove that S(γ) = γ is achievable, we choose δ = q −n(S(ν/n)−δn ) = q −ν+nδn . Summing the bound in (h) over all possible choices of λ, µ, ν, A, B, w satisfying (3), we have the union bound Pr{∃ν, ςnθ (ν) ≥ q −n(S(ν/n)−δn ) } ≤

qq

∑

λ

(n−q ν q −ν+nδn )

λ,µ,ν,A,B,w 2

≤ n3 q 2n

+n

max q q

λ

(n−q

nδn

λ

)

.

The last inequality is obtained by upper bounding the num2 2 bers of choices of λ, µ, ν, A, B, w by n, n, n, q n , q n , q n respectively. If we can show that the upper bound is strictly smaller than 1 by some choice of δn → 0, then there exists a deterministic sequence of θ in n satisfying (5), implying that S is achievable. The desired δn can be constructed as follows. Make (n − q nδn ) < −q nδn /2 by requiring δn > logn2n . This is possible with δn → 0. With such restriction, max q q λ

λ

(n−q nδn )

≤ max q −q

λ nδn q /2

λ

≤ q −q

nδn /2

.

The last inequality is obtained by setting λ = 0 since the expression in the maximization is decreasing in λ. The overall bound on the probability becomes Pr{∃ν, ςnθ (ν) ≥ q −n(S(ν/n)−δn ) } ≤ n3 q 2n

2

(10)

b∈F2 ,c∈F2 x∈Fn

Q∈P∶D(Q∥q −λ )≥δ (g)

, or equivalently

+n −q nδn /2

x∈Fn 2

⊺

+c

=

∑

(−1)0 +

x∈N (b,c)

∑

(−1)1

x∈Fn 2 ∖N (b,c)

= 2∣N (b, c)∣ − 2n . Consider the optimal (b, c) to (10). It follows that 2∣N (b, c)∣− n 2n ≥ 2 2 , or equivalently, ∣N (b, c)∣ ≥ 2n−1 + 2

n−2 2

.

With x generated uniformly random over Fn2 , Pr{θ′ (x) = xb⊺ + c} =

n+2 ∣N (b, c)∣ 1 ≥ + 2− 2 n ∣F2 ∣ 2

(11)

The secrecy index ς can be bounded by λ − H(kA∣xB) = 1 − H(θ′ (x)∣xb⊺ + c) (a) n+2 1 ≥ 1 − h ( + 2− 2 ) 2 (b)

≥ (2 log e) [2−

n+2 2

2

] =

log e −n 2 2

1 where h(p) ∶= p log p1 + (1 − p) log 1−p is the binary entropy function. This decays to zero with exponent 1 as desired. (a) is obtained by maximizing the conditional entropy H(θ′ (x)∣xb⊺ + c) subject to the constraint (11). (b) follows from the inequality h( 21 + δ) ≤ 1 − (2 log e)δ 2 by Taylor’s theorem. ∎

IV. C ONCLUSION

q

log n) which is smaller than 1 if δn > log 2(2n +n+3 . This is n again feasible with δn → 0, completing the proof. ∎ 2

The following is a converse result on the achievable exponent obtained from the study of highly non-linear function, called the bent function [17]. The result applies for the binary case q = 2 when the fractional gap approaches γ = 1. A complete converse is not known and so it remains open whether the exponent in Theorem 1 is the best achievable. Theorem 2 S(1) ≤ 1 for all achievable S when q = 2.

′

θ (x)+xb ∑ (−1)

2

We have described a non-linear approach to secure network coding that is universal to a very general class of linear networks involving splitting and merging of packets, and that protect not only subsets of the secret elements but also their linear functions. The idea is to use a given but possibly unknown reliable network code to generate some possibly insecure common randomness among the source and sink nodes over the wiretap network. Then, a non-linear secret key is extracted from the common randomness for further encryption of a secret message. An explicit construction may be found in further study of highly non-linear functions.

7

R EFERENCES [1] C. Chan, publications. http://chungc.net63.net/pub, http://goo.gl/4YZLT. [2] ——, “Universal secure network coding by non-linear secret key agreement,” in Network Coding (NetCod), 2012 International Symposium on, Jun. 2012, see [1]. [3] N. Cai and R. Yeung, “Secure network coding on a wiretap network,” Information Theory, IEEE Transactions on, vol. 57, no. 1, pp. 424 –435, Jan. 2011. [4] S. El Rouayheb, E. Soljanin, and A. Sprintson, “Secure network coding for wiretap networks of type II,” Information Theory, IEEE Transactions on, vol. 58, no. 3, pp. 1361 –1371, Mar. 2012. [5] L. H. Ozarow and A. D. Wyner, “Wire-tap channel II.” in EUROCRYPT’84, 1984, pp. 33–50. [6] D. Silva and F. Kschischang, “Universal secure network coding via rankmetric codes,” Information Theory, IEEE Transactions on, vol. 57, no. 2, pp. 1124–1135, Feb. 2011. [7] K. Harada and H. Yamamoto, “Strongly secure linear network coding,” IEICE Transactions on Fundamentals, vol. 91-A, no. 10, pp. 2720–2728, Oct. 2008. [8] J. Kurihara, T. Uyematsu, and R. Matsumoto, “Explicit construction of universal strongly secure network coding via MRD codes,” in 2012 IEEE International Symposium on Information Theory Proceedings (ISIT2012), Cambridge, MA, Jul. 2012. [9] R. Matsumoto and M. Hayashi, “Secure multiplex network coding,” in 2011 International Symposium on Network Coding (NetCod), Jul. 2011, pp. 1–6. [10] R. W. Yeung, Information Theory and Network Coding. Springer, 2008. [11] C. E. Shannon, “Communication theory of secrecy systems,” Bell System Technical Journal, vol. 28, no. 4, pp. 656–715, 1949. [12] H. Yamamoto, K. Harada, and T. Kubo, “Secure network coding with helper nodes generating random numbers,” submitted to ISIT 2012. [13] C. H. Bennett, G. Brassard, and J.-M. Robert, “Privacy amplification by public discussion,” SIAM Journal on Computing, vol. 17, no. 2, pp. 210–229, Apr. 1988. [14] C. H. Bennett, G. Brassard, C. Cr´epeau, and U. M. Maurer, “Generalized privacy amplification,” IEEE Transactions on Information Theory, vol. 41, no. 6, pp. 1915–1923, Nov. 1995. [15] C. Chan, “Universal secure network coding by non-linear precoding,” see [1]. [16] I. Csisz´ar and J. K¨orner, Information Theory: Coding Theorems for Discrete Memoryless Systems. Akad´emiai Kiad´o, Budapest, 1981. [17] O. Rothaus, “On bent functions,” Journal of Combinatorial Theory, Series A, vol. 20, no. 3, pp. 300–305, May 1976.