KAPDA: k-Anonymous Privacy-preserving Data ...

Viewer
Transcript

KAPDA: k -Anonymous Privacy-preserving Data Aggregation in Wireless Sensor Networks Michael M. Groat, Wenbo He, Stephanie Forrest Department of Computer Science, University of New Mexico MSC01 1130, 1 University of New Mexico, Albuquerque, NM 87131-0001 {mgroat, wenbohe, forrest}@cs.unm.edu ABSTRACT

1. INTRODUCTION

When wireless sensor networks accumulate sensitive and confidential measurements about human beings, preserving data privacy becomes an increasingly important concern. Since sensors are usually resource-limited and power-constrained, providing privacy without disrupting in-network data aggregation poses a challenge. Privacy-preserving data aggregation for additive and multiplicative aggregation functions has been well studied. However, more non-linear aggregation functions such as MAX and MIN have not been well addressed. We present KAPDA, a privacy-preserving method for more general aggregation functions which hides sensitive measurements in plain sight among a set of camouflage values, enabling k-anonymity for data aggregation. The proposed method can be used to hide a wide range of aggregation functions in addition to MAX and MIN. Because the sensitive data is hidden in plain sight, data aggregation is easily and efficiently computed, and the in-network processing delay can be reduced compared to hop-by-hop encryption methods. We quantify the power efficiency of the proposed method in terms of the amount of camouflage data used and study the trade offs between the protocol’s effectiveness and its resilience against collusion.

Wireless sensor networks (WSNs) are increasingly employed to collect sensitive and confidential measurements of people’s everyday lives [23, 31]. Without privacy preservation techniques, other parties, or peers, can easily infer our daily activities and obtain private information. While data passes through a network, data aggregation can take advantage of the limited computational power inside the network to reduce overall bandwidth. Instead of every node sending its sensed data to the base station, information is combined along the way to conserve bandwidth. Data aggregation is more challenging when privacy and security are added to the scenario, where information may be disclosed to both outside observers and neighboring network users. Due to limited resources such as memory, power, and computation, mainstream solutions for security and privacy such as public key asymmetric encryption do not work very well in this domain. For example, previous solutions to Yao’s Millionaire Problem [46, 12], where two people want to know who is richer without revealing their true wealth, leverage public-key cryptography. These solutions are computationally expensive, and therefore problematic in resourceconstrained WSNs. Privacy-preserving data aggregation (PDA) for additive aggregation based on the algebraic properties of polynomials and addition [24], homomorphic encryption [20, 7], and perturbation techniques [30, 27] have been well addressed in WSNs. However, research efforts on PDA for more general aggregation functions, such as MAX and MIN, have been limited. The challenge arises from the non-linear characteristics of these functions. The objective of this paper is to present KAPDA, a light weight k-anonymous solution to PDA for general aggregation functions (especially the MAX and MIN functions) because they are important for many applications reporting boundary values. Current encryption based schemes, either end-to-end or hop-by-hop, are not sufficient for the general data aggregation problem. End-to-end encryption schemes establish a secure channel between a sensor node and the base station, preventing in-network aggregation. With hop-by-hop encryption, a node receives encrypted data, decrypts it, aggregates it, re-encrypts the aggregate and then sends it to the next hop. This incurs a high overhead and does not preserve data privacy at the aggregators. We propose a non-cryptographic method which hides the sensitive data in plain sight by adding a set of camouflage values. In our method the aggregates, what we call the real values, are left in plain text so the aggregation computa-

Categories and Subject Descriptors D.2.11 [Software Engineering]: Software Architectures— information hiding; C.2.m [Computer-Communication Networks]: Miscellaneous

General Terms Algorithms, Security

Keywords Wireless Sensor Networks, Privacy Preserving Data Aggregation, k-Anonymity

Submitted to the ACM Computer and Communications Security Conference, October 2010, Chicago, IL, USA.

1

during data aggregation [21, 35]. Sensors deployed by a common authority can collaborate to fulfill a certain task and interact with each other trustfully. Hence, the honest but curious model is appropriate for studying PDA in WSNs. There are two levels of privacy preservation. The first level is to ensure that data are not revealed to nodes outside of the network. Hop-by-hop encryption with symmetric keys [6, 8, 16, 36] is able to achieve this goal. Another PDA solution, called Order Preserving Encryption Scheme (OPES) [3], allows comparison operations directly on encrypted data. Such a scheme preserves the first level of privacy. However, if the in-network sensor nodes use the same set of mapping functions, they are able to know each others’ measurements. OPES is explained more in Section 6. The second level of privacy preservation ensures that individual private information is not disclosed to in-network nodes. This level of privacy preservation is stricter, but closer to real world situations. Our method aims to preserve the privacy under the honest but curious attack model so that even in-network nodes cannot easily determine other nodes’ sensitive data.

tion can take place efficiently. Privacy occurs from hiding these real values among several other camouflage values in a message set. We define a message set for MIN/MAX aggregation as the union of the single real value with the camouflage values that a node transmits to its parent in the data aggregation tree, usually in one packet. This is a form of k-anonymity [43, 2, 5] where a single value is hidden among k − 1 other values. We show that this method consumes considerably less power than end-toend encryption, and is more power and time efficient than hop-by-hop encryption. Our method is similar to a global symmetric key solution [32], but it differs in that each node has a random part of the global key. Only when enough nodes collude or are captured by adversaries will privacy be broken. Several applications could take advantage of KAPDA. Intelligent or smart meters for electrical utilities are becoming more popular, as the United States and other countries try to conserve energy. These devices send usage information to the power company, which then sends real time data back to the end user, with the idea that an informed user is a better conserver. Information from the meter is usually sent over an existing cell phone infrastructure, radio transmission, or other unsecured network. These public third-party unprotected networks need to protect the privacy of end users and their information such as the maximum or minimum of certain appliances or users in a neighborhood. [23] In medicine, another potential application area is where a medical worker may not have the time or resources to monitor a large group of patients. Determining the maximum or minimum value of an indicator could show the entire group is within normal range, or that further investigation is warranted. A similar idea could be used to triage patients at a disaster site [31]. The remainder of the paper is organized as follows: Section 2 gives the background and model assumptions of our research. Section 3 gives an overview of the protocol, while Section 4 specifically discusses the protocol details. We show in Section 5 how much power is preserved when sending small amounts of extra bits for camouflage values. We also study the trade-off between protocol efficiency and performance on privacy preservation. In Section 6 we discuss related work in the privacy-preserving data aggregation domain. Finally, Section 7 outlines ideas for future work and concludes the paper.

2.

2.3 Requirements of PDA The following criteria summarize the desired characteristics of a PDA scheme. Depending on different applications, trade offs should be made among these performance metrics. 1. Privacy: To preserve data privacy, real data di of node i should not be known to any other nodes j in the network. For a PDA scheme that provides k anonymity, the private data is hidden among k − 1 items of camouflage data, so other users cannot determine which is the private value. PDA schemes should also be robust to collusion among several nodes, at least to some extent. 2. Efficiency: Data aggregation is able to reduce the number of messages transmitted within the sensor network, thus reducing bandwidth and power usage. However, additional overhead is introduced to protect privacy. If the energy cost of a PDA scheme cancels out the benefit of data aggregation, then it accomplishes nothing. A good PDA scheme should have low overhead. Hence, bandwidth, power consumption, and delay are three important metrics to measure the protocol efficiency. 3. Accuracy: In perturbation-based techniques [30, 27], accuracy is sacrificed to achieve privacy. However, the accuracy of the aggregated data may affect decision making. Hence, accuracy is an important element of assessing performance of PDA schemes.

MODEL AND ASSUMPTIONS

2.1 Data Aggregation in WSNs We model a sensor network as a connected graph G(V, E ), where sensor nodes are represented as the set of vertices V and wireless links as the set of edges E . The number of sensor nodes is defined as |V | = N . A data aggregation function is defined as y(t) , f (d1 (t), d2 (t), · · · ,dN (t)), where di (t) is the individual sensor reading at time t for node i. Aggregation from the individual nodes to the base station is assumed to follow a common tree structured route [37]. We focus in this paper on f as the MAX or MIN function, but the method can be applied to more general aggregation functions.

3. OVERVIEW OF SOLUTION As mentioned earlier, we propose to hide the individual node inputs among a set of camouflage values. First, we discuss the general terminology, which is summarized in Table 1. Let U i be the set of n values in the message set for node i, where (|U i | = n, ∀i). The message set is composed of the real or actual value, di , and the restricted and unrestricted camouflage values. Restricted camouflage values must all be less or all greater than di depending if MAX or MIN aggregation is used respectively. Unrestricted camouflage values can be either greater or less than di . Let I = {1, 2, ..., n} be

2.2 Honest But Curious Attack Model We use the honest but curious model to define our privacythreat attack, where every user or sensor node attempts to break privacy but faithfully follows the protocol specification 2

difference. This ensures that node i cannot determine ISj (∀i, j, i 6= j) and IS . Property 3 guarantees that the true max value will not be filtered out in the aggregation process.

the index set of U i , ∀i. Let IS , a subset of I, be the subset of the secret index values kept at the base station to determine the final aggregated results. IS can be regarded as global secret information, which is partially shared among the network nodes. If we look at IS as global secret information, ISi is the partially shared secret information about IS at each node i. Let the base station define set ISi , for all nodes i, that includes all elements from set IS and a subset of elements from IS , i.e. IS ∩ ISi 6= ∅. ITi represents the index set of the real or truth value, di , in U i for node i. For the MAX and MIN aggregation functions, |ITi | = 1, and ITi ⊂ ISi , ∀i. ISi is the union of the index set of the restricted camouflage values, and the index set of the real value, ITi . A sensor node i hides its real value di in the set of camouflage values, and sends the set of values U i = {v1i , v2i , ..., vni } to its aggregator, such that: vli vli vli vli

= di , if l ∈ ≤ di , if l ∈ ≥ di , if l ∈

ITi ISi , ISi ,

Property 1: The index of the real value is drawn from IS , so: ITi ⊂ IS , ∀i.

Property 2: contains elements from both IS and IS to successfully hide the real value in set U i . We have: IS ∩ (ISi − ITi ) 6= ∅, ∀i. Property 3

ISi

(3)

is a superset of IS , ITi ⊂ IS ⊂ ISi , ∀i.

(4)

We will use an example to illustrate the method. Consider a three-node case of MAX aggregation shown in Figure 1, where nodes 1, 2, and 3 have sensor readings 34, 12, and 23 respectively. Assume IS = {1, 3, 5}, and IT1 = {5}, IT2 = {3}, and IT3 = {1}, which are all drawn from the set IS . Based on Properties 1, 2, and 3, we have IS1 = {1, 3, 4, 5, 7}, and IS1 = {2, 6} stored at node 1; IS2 = {1, 2, 3, 5, 6} and IS2 = {4, 7} for node 2; IS2 = {1, 2, 3, 5, 7} and IS3 = {4, 6} for node 3. In set IS1 , IS2 , and IS3 , we draw three values from set IS , and two values from set IS . After ITi , IS , ISi have been determined, the base station sends ITi and ISi to node i in the pre-distribution phase, where 1 ≤ i ≤ 3. During the data aggregation report phase, node 1 places its true value 34 in the 5th slot in U 1 . Node i determines U i according to Equation (1). Therefore, U 1 = {18, 47, 27, 30, 34, 9, 4}, U 2 = {6, 11, 12, 15, 1, 5, 10}, and U 3 = {23, 18, 22, 25, 15, 27, 19}. When node 3 intercepts the message set U 1 and U 2 , it only knows that d1 and d2 are in one of the positions 1,2,3,5,7. During the data aggregation process, when node 3 receives message sets U 1 and U 2 from its children, it determines the aggregated value, so vl3 = max{vli } for each l = 1, 2, ..., 7 and i = 1, 2, 3. Hence, the aggregated value is U 3 = {23, 47, 27, 30, 34, 27, 19}, and replaces the original U 3 of node 3. Node 3 sends the final or last aggregated message set, U Ω , to the base station. The base station extracts the elements in the positions denoted by set IS from U Ω , and then chooses the maximum of these elements as the true network aggregate. In this example, elements at position 1, 3, 5 of the aggregated set U Ω are 23, 27 and 34. Hence, the maximum of these real values, and the network MAX aggregate, is 34. As described, this method alone is prone to statistical analysis attacks. An adversary can examine the packets for statistical correlations to determine the sets ISi and I¯Si for various nodes and ultimately the sets IS and I¯S of the base station. If the theoretical network maximum and minimum values are known, then the values in the message set other than di could be determined such that the entire message resembles a uniform distribution. If the size of ISi does not allow this, then consideration should be given to change its size. Another way to defeat statistical analysis attacks would be to change or shuffle all the sets either after each network wide aggregation, or after a period of aggregations. The base station would choose new sets IS and I¯S , and end to end encryption can be used to distribute the keys (sets ISi and ITi to each i). These methods would also help if values of neighboring nodes are similar or correlated, or if an adversary manipulated the environment such as putting an ice

for MAX function for MIN function

≤ di or vli > di , if l ∈ ISi .

(2)

ISi

(1)

The following properties must hold in our scheme. Property 1 ensures that the base station can correctly determine the final aggregated value. Property 2 makes sure that ISi − ITi draws from both sets IS , and IS , where “−” denotes set

Table 1: KAPDA Notations message Set of camouflage and real values set sent to the node’s parent. restricted Values in the message set that are camouflage greater than the real value for MIN values function and less for MAX function. unrestricted Values in a message set that are camouflage either more or less than values the real value in the set. Notation for the message set of i U node i. U i = {v1i , v2i , ..., vni }. Last message set sent from the last UΩ node Ω to the base station. Real value of node i. It is hidden in di plain sight among U i where vli = di if l ∈ ITi . Values of U i for node i where vli l = 1, 2, ..., n. I Index set of U i , ∀i, I = {1, 2, ..., n} Number of values in a message set. n n = |U i |, ∀i The index set kept at the base station that contains possible locations for IS the overall network aggregated value, where IS ⊂ I, Union of the index set of restricted camouflage values and the index of ISi the real value. Index set of unrestricted camouflage ISi values of node i. Index set of the real value of node i, ITi |ITi | = 1, ∀i. It is drawn from set IS for MAX/MIN functions

3

the sizes of IS and ISi carefully to achieve good performance against a node collusion attack. There are two ways for colluding nodes to infer either IS or IS . The first is to infer IS from ITi , ∀i ∈ colluding nodes; the other is to infer IS from ISi or ISi . Based on these methods, the optimal size of I is discussed in Section 5. Proposition 3: Assume ITi is randomly selected from IS , and collusive nodes try to infer IS from ITi . If there exist x collusive nodes, then from ITi , ∀i ∈ colluding nodes, x collusive nodes will disclose: x X |IS | − i + 1 , (6) |IS | i=1

elements in IS . To get all |IS | elements in IS , the expected number1 of collusive nodes is: x = |IS | ×

(7)

Proposition 4: Let x be the number of nodes colluding in the network. Let g(x) be the number of elements known in IS given x. We assume no bias when ISi is determined from IS . The object of the colluding nodes is to infer IS from the information contained in ISi . When x nodes collude, the following equation: !x−1 |IS | − |ISi | i , (8) g(x) = |IS | − (|IS | − |IS |) |IS |

Figure 1: An example KAPDA aggregation scheme of three nodes. block on top of a sensor.

3.1 Achieving Accurate Aggregation Results

computes the number of elements known in set IS , where g(x) ≤ |IS |.

Proposition 1: KAPDA provides a type of k-anonymity where: k = |ISi | + 1 = |I| − |ISi | + 1.

|IS | X 1 . i i=1

Proof. We first give a recurrence relation about g(x). When the x-th node colludes with the other (x − 1) nodes, then ! |ISi | − g(x − 1) i g(x) = g(x − 1) + |IS | , (9) |IS |

(5)

This is because ∀i, j : |ISi | = |ISj |. Hence, the nodes know that the real value is contained in the |ISi | + 1 largest values in any U i for the MAX aggregation, and the smallest |ISi |+1 values in any U i for the MIN aggregation. However, for an in-network node, if the node considers the content in ISi , k may be reduced. In the example shown in Figure 1, node one can determine the real value of U 2 to be hidden in slots 1,3,4,5, or 7. Since one of the three, (|ISi | + 1 = 3), largest values in U 2 occur at position 2, and 2 ∈ IS1 , node one can determine that the real value is either 12 or 15 at position 3 or 4 of U 2 .

elements in IS will be disclosed. Let a = |IS | and b = |ISi |, g(x − 1) + b, where g(1) = b. Equation (9) implies g(x) = a−b a Let yi = g(i) − a. Replace the g(i) with yi + a, then yi = ` ´i−1 a−b yi−1 and y1 = b − a, giving yi = a−b y1 . Therefore, a a ` a−b ´i−1 g(i) = a − (a − b) a .

From Proposition 4 we can infer the expected number of collusive nodes required to recover all the elements in IS . This can be obtained by converting Equation (8) into the following equation where g(x) = |IS |: !x−1 ’ & |IS | − |ISi | i |IS | = |IS | − (|IS | − |IS |) . (10) |IS |

Proposition 2: KAPDA accurately computes the MAX and MIN aggregation functions. The aggregation result can be affected only by unrestricted camouflage values. However, the unrestricted camouflage values occur only in locations in ISi , ∀i. Since IS ⊂ ISi , then ISi ⊂ IS . Hence, the unrestricted camouflage values don’t affect the aggregated results in positions in IS .

The left hand side is an integral. We take the ceiling of the right hand side without loss of generality. Isolating x in Equation (10) yields the following equation: 6 7 6 log( 7 1 ) 6 i| 7 |IS |−|IS 6 7 x=2+4 (11) 5. i| |IS |−|IS log( |I | )

3.2 Collusion Attacks In Figure 1, if node one colludes with node two, they can determine that d3 is in slots 1,3,5. They can do this by determining the intersection of IS1 and IS2 , since IS ⊆ ISi ∩ ISj

S

1

We observe that this is an instance of the coupon collector’s problem [39].

and ISi ∪ ISj ⊆ IS . Given the size of I, we need to choose 4

3, and a given size of ISi . When |ISi | = 2 we see the optimal size of IS is 5. The sizes of IS and ISi are related to the performance against collusive attacks. Next, the base station determines ITi and ISi for each node i. For MAX/MIN, we have |ITi | = 1, so the base station can determine ITi by choosing an element from |IS |. ISi contains all the elements in set IS and |ISi | − |IS | elements from its complement set. One way to select |ISi | − |IS | elements from set IS is to throw a die with |I| faces, stopping when there are |ISi | − |IS | distinct numbers in IS . Many methods currently exist to securely distribute keys in WSNs, e.g. [40]. Such methods could easily be modified to distribute sets ISi , and ITi to all nodes i.

4.2 Reporting In the reporting phase, each node i determines the values for the set U i where U i = {v1i , v2i , ..., vni }. The message set U i contains the real value, the restricted camouflage values, and the unrestricted camouflages values. If the real sensed value has a range [dmin , dmax ], the restricted camouflage values are drawn from [dmin , di ] for node i. The unrestricted camouflages values are drawn from [dmin , dmax ]. Different nodes can use different distributions to generate random values for the restricted or unrestricted camouflages values, so it is harder for others to infer the real value from U i . Node i places these values in U i according to Equations (1) and sends message set U i to its aggregator.

Figure 2: The optimal size of IS with |I| = 15 is given by the intersection of the desired |ISi | line with Equation (7)

From Proposition 3 and Proposition 4, we conclude that the number of collusive nodes needed to infer IS or IS is the minimum of Equation (7) and Equation (11). Setting Equation (11) equal to Equation (7) solves for |IS |. Figure 2 shows the expected number of colluding nodes to know a certain number of elements in either IS or IS . In Figure 2 |I| is set to 15, and |ISi | varies between 2 and 4 for different values of k from Equation (5). According to Proposition 4, the expected number of colluding nodes to get |IS | decreases as |IS | increases. According to Proposition 3 the expected number of colluding nodes needed to obtain |IS | or |IS | increases with |IS |. Figure 2 shows that if we set |ISi | = 3, the optimal value of |IS | should be 4, and it will take about 8 nodes to collude before the global secret is known.

4.

4.3 Aggregation In the aggregation phase, each node i takes the maximum or minimum, depending on which aggregation function is needed, for each l = {1, 2, ..., n} in vlj , among all children j, plus its own vli , if node i is also a sensing node. Note that the aggregated message set of node (or aggregator) i is U i = {v1i , v2i , · · · , vni }. For the MAX function, vli , ∀l = 1, 2, · · · , n is: vli = max(vlh , vli ), ∀h,

(12)

where h ranges over all the children of i. The aggregator i replaces values vli in U i with the aggregated values, and then passes the aggregated message set, U i , to its next hop. Note that in the case of MAX aggregation the aggregates in the message sets grow larger when they approach the base station. This can be minimized by replacing one or more values pointed to by I¯Si for each i in the aggregated message set U i so that it appears more uniformly distributed about [dmin , dmax ].

PROTOCOL DESIGN

In this section we describe our protocols for the MAX/MIN aggregation functions. We assume the aggregation trees are built according to standard data aggregation protocols [37]. There are four unique phases, pre-distribution, reporting, aggregating, and base station determination.

4.4 Base Station Processing

4.1 Pre-distribution

When the final aggregated message set U Ω arrives, the base station determines the real aggregated result by selecting the maximum (or minimum). For a MAX aggregation function, the real network aggregated result is:

In the pre-distribution phase, the base station chooses a set IS ⊂ I. This is the global secret information, and each node i keeps a subset of IS in set ISi along with some noise values generated by either the base station or each node that are not in IS . Hence, no nodes can infer the exact set IS from their own point of view, ISi . Since IS ⊂ I, and |I| = |U i | is proportional to the bandwidth and power consumption, |I| cannot be too large. We discuss the proper size of I in Section 5. After |I| is determined, the optimal values of IS and ISi are determined by the requirement of k-anonymity and Propositions 3 and 4. As an example, Figure 2 shows how to compute |IS | as the intersection of two curves, Proposition

maxk∈IS (vkΩ ).

(13)

4.5 Generalizing to the Additive Aggregation Function Other types of aggregation functions can be used in place of MAX/MIN. However, care should be taken as to what should be placed in the restrictive set ISi and how. For example, if an additive aggregation is used, we will choose 5

|ITi | > 1, and in the data reporting phase, we will set: X i di = vl

Table 2: Bandwidth and energy usage of end to end encryption per level in a tree network with a fan out of 3 assuming no aggregation Level Number Bits Sent MICAz Energy of Nodes Per Node per Node (µJ) 1 3 65536 72253.44 2 9 16384 18063.36 3 27 4096 4515.84 4 81 1024 1128.96 5 243 256 282.24 6 729 64 70.56 7 2187 16 9.6

i l∈IT

0=

X

vli

i l∈IS

vli , l

∈ ISi .

(14)

Here, we allow negative values in U i . The aggregators sum up each slot in the message sets, and the base station can retrieve the real aggregation result from finalPaggregated message set U Ω = {v1Ω , v2Ω , · · · , vnΩ }, which is l∈IS vlΩ .

5.

EVALUATION

Table 3: Energy consumption in µJ and nJ of common operations on the MICAz mote,7.37 MHz, and the TelosB mote, 4 MHz [13] . Operation MICAz TelesB Compute for 1 Clock Tick 3.5 nJ 1.2 nJ Transmit 1 bit 0.60 µJ 0.72 µJ Receive 1 bit 0.67 µJ 0.81 µJ

In this section, we evaluate our approach by comparing it with hop-by-hop and end-to-end encryption methods with particular attention to power consumption and delays.

5.1 Power Analysis 5.1.1 End-to-End Encryption End-to-end encryption without a homomorphic encryption scheme uses too much power to be practical because each value sensed in the network must be sent back to the base station. Let us assume that the data to be aggregated are 16 bits wide. In some cases with block encryption, the data sent over the network would be even larger because of padding. Table 2 demonstrates the bandwidth and hence energy consumption of the network nodes per level. Here, the level is the number of hops a node is away from the base station, assuming that the branching factor of the network is 3. Nodes closer to the base station consume more bandwidth. Taking the amount of energy consumed per bit transmitted and received from Table 3 for the MICAz architecture, Table 2 shows how much power would be consumed per node at each level. The TelosB architecture energy usage information is not shown, but it is similar to the MICAz architecture. In order to balance the amount of traffic sent, either the sink has to move around, or the nodes themselves have to migrate[7], both of which are impractical in many cases. Average bandwidth consumption in an end-to-end encryption scenario is O(logN ) per node, where there are N nodes in the network and assuming no aggregation; while in our scheme the bandwidth consumption is O(|U i |) per node, the number of values in a message set. Hence, power usage grows more quickly for end-to-end encryption than for KAPDA. Additionally, because nodes near the sink have to send more information, there would be larger delays due to radio transmissions. Yet, end to end encryption has the best privacy protection. Neither an outsider, nor the node neighbors themselves can determine the sensed values or the aggregated results.

To determine power consumption of various encryption methods in WSNs, we use the results from [18] to determine the energy used in 3 encryption schemes, IDEA, RC4 and RC5 which were generalized to any generic mote architecture. We then used this generalization and applied it to two common architectures to determine the basic encryption times: the MICAz that has a bus width of 8 bits and runs at 7.37 MHz, and the TelosB that has a bus width of 16 bits and runs at a speed of 4 MHz. To determine the energy costs of encryption for these two architectures we used the cost of common operations determined by Meulenaer et al. [13] which are given in Table 3 along with the general equation from Ganesan et al. [18] given as follows: T imeENC/DEC =

a + b ∗ ⌈textlength/blocksize⌉ , processorf req ∗ buswidth

(15)

where variables a and b are given as follows respectively: a = aBASE + aM U L + aRISC

(16)

b = bBASE + aM U L + aRISC .

(17)

Parameters aBASE and bBASE are given in Table 4, and aM U L and bM U L are given in Table 5, depending on whether a multiplication instruction is native to the architecture. aRISC and bRISC are given in Table 6 depending on whether a RISC or CISC architecture is used. aBASE , bBASE , aM U L , bM U L , aRISC , and bRISC , were determined in [18] through minimizing the least square relative error in their experiments. With these information, we can compute the time, and hence the number of CPU cycles, spent on various encryption schemes. The final result, Table 7, shows the amount of time spent per node to encrypt and decrypt 10 bits of data on the MICAz and TelosB architectures with IDEA, RC4, and RC5 encryption. IDEA and RC5 are both block methods that operate on block sizes of 64 bits, RC4 is a stream method that works in segments of 8 bits. Because the plain text size in all cases was only 10 bits, padding is required. The time in microseconds was determined from Equation (15), clock ticks were

5.1.2 Hop-by-Hop Encryption We show that by sending a small number of camouflage values, less energy is consumed than using the usual methods of hop-by-hop encryption such as IDEA, RC5, and RC4. Hop-by-hop encryption methods consume less power near the sink than end-to-end encryption. Yet, a large amount of power is consumed overall throughout the network, and more delay is introduced in the decryption and re-encryption phases. 6

Table 7: Cost to encrypt 10 bits of data CAz and TelosB architectures. Method, Time Clock Architecture ms Ticks IDEA Enc, MICAz 2902.12 21388.63 IDEA Enc, TelosB 2673.58 10694.31 IDEA Dec, MICAz 8350.80 61546.13 IDEA Dec, TelosB 7693,27 30773.06 RC5 Enc, MICAz 7037.25 51864.50 RC5 Enc, TelosB 6483.06 25932.25 RC5 Dec, MICAz 7035.89 51854.50 RC5 Dec, TelosB 6481.81 25927.25 RC4, MICAz 2018.00 14872.63 RC4, TelosB 1859.08 7436.31

Table 4: Parameters aBASE and bBASE [18] Algorithm aBASE bBASE blocksize (bits) RC5 init/encrypt 352114 40061 64 RC5 init/decrypt 352114 39981 64 IDEA encrypt 67751 80617 64 IDEA decrypt 385562 84066 64 RC4 68540 13591 8 Table 5: Parameters for aM U L and bM U L [18] Operation a M U L bM U L w/ MUL instruction 19016 -1143 w/o MUL instruciton -14330 8252

=

n ∗ (R(Blk) + Dec + Agg) +Enc + T (Blk),

(18)

where n is the number of children, R(Blk) is the energy consumption to receive Blk bits, T (Blk) is the energy consumption to transmit Blk bits, and Dec and Enc are the energy consumption of decryption and encryption given in Table 7. Agg is the consumption of computing the aggregate and is dependent on the number of children. We estimated the time to aggregate 2 values with one clock tick. Table 8 gives the energy consumption of hop-by-hop encryption when the average branching factor is 5, and a value can be expressed in 10 bits. Here, a 10-bit value is able to capture 1,024 distinct values, which is enough to express a sensor reading in many WSN applications. The energy consumption of KAPDA is determined with the following equation: EnergyKAP DA = n ∗ m(R(w) + Agg) + m ∗ T (w),

Energy µJ 74.86 12.83 215.41 36.93 181.53 31.12 181.49 31.11 52.05 8.92

Table 8: Energy consumption of hop-by-hop encryption per node for 10 bits of data for the MICAz and TelosB architectures Method, Architecture Energy µJ IDEA, MICAz 1404.74 IDEA, TelosB 502.76 RC5, MICAz 1341.80 RC5, TelosB 491.97 RC4, MICAz 375.55 RC4, TelosB 129.87

determined by multiplying the time by clock frequency, and energy was determined from the number of clock ticks according to energy per tick given in Table 3. We use the information in Table 7 and Table 3 to determine the energy consumption of each node in a hop-by-hop method. The amount of energy consumed is determined by the following equation: EnergyHBH

on the MI-

It is operating on segments of 8 bits. KAPDA can use up to about 7 camouflage values before it uses the same amount of power as the RC4 method. Figure 4 shows the same results for the TelosB architecture where RC4 works so efficiently that the crossover point is about 2 values, yet for RC5 and IDEA the crossover point is about 9. To achieve a net power savings, the size of I for the MicaZ motes can be between 27 and 33 if IDEA or RC5 encryption is used, and between 5 and 9 if RC4 is used. For the TelosB architecture, the size of I could be between 7 and 10 for IDEA and RC5 encryption. It would not be feasible to use KAPDA on the TelosB architecture when RC4 encryption is used. However, as described in the next section, KAPDA significantly reduces delay in the network due to encryption, and would be appealing in networks that require a minimal network delay. The sizes of the rest of the sets can be determined from |I|. If we choose |I| to be 25 based on power analysis, and we want our k-anonymity factor to be 8, then based on Equation (5), |ISi | should be about 7, and the optimal size of IS , based on the method discussed in Section 3.2 should be about 4.

(19)

where n is the branching factor, R(w) is the energy consumption to receive w bits, T (w) is the energy consumption to transmit w bits, w is the number of bits in a value, m is the number of values in a message set, and Agg is the energy consumption of aggregation.

5.2 Size of Set I

5.3 Delay Analysis

Figure 3 shows that we can accommodate 34 to 35 values before reaching the same energy consumption as IDEA and RC5 for the MICAz architecture. Note that these two encryption methods operate on a block size of 64 bits, which is why RC4, a streaming encryption method, appears so low.

Time for the hop by hop method is determined with the following equation: T imeHBH = n ∗ Dec + Enc + (n + 1) ∗⌈textsize/Blk⌉ ∗ Blk/Bndwh,

(20)

where Blk is the message block size, and Bndwh is the bandwidth for both architectures, which was 250kbps converted to .25 bits per microsecond. Since the bandwidth on both architectures is the same, the calculated times for KAPDA are the same. The equation to determine the times for KAPDA

Table 6: Parameters for aRISC and bRISC [18] aRISC bRISC RISC 3207 1661 CISC 77175 -103593

7

Figure 3: MicaZ power profile per node for 10 bits of sensed data and 5 children to aggregate. IDEA, RC5, and RC4 use a constant but different amount of energy to send 10 bits of data. KAPDA can send about 30 decoy values before it uses more energy than IDEA or RC5.

Figure 4: TelosB power profile per node for 10 bits of sensed data and 5 children to aggregate. IDEA, RC5, and RC4 use a constant but different amount of energy to send 10 bits of data. KAPDA can send about 8 decoy values before it uses more energy than IDEA or RC5.

is given as follows:

The majority of research in secure data aggregation takes either a hop-by-hop approach or an end-to-end approach. In hop-by-hop, data are decrypted before the aggregation step, aggregated, then encrypted and forwarded to its next destination. Because data are decrypted, it cannot provide data confidentiality at the aggregator nodes. Additionally there is likely a latency delay due to the decryption/encryption process. To combat these problems, a set of algorithms have been developed that can work on the data without the decrypting it. They employ either a symmetric or asymmetric approach with homomorphic encryption schemes. Homomorphic encryption schemes [20, 7] adopt homomorphic stream ciphers that allow efficient aggregation of encrypted data without decryption for additive aggregation functions. Most homomorphic encryption schemes work with multiplication or addition and has the following property:

T imeKAP DA = (n + 1) ∗ m ∗ bpv/Bndwh,

(21)

where bpv is the number of bits per value, m is the number of values in the message set, and n is the branching factor in the network. In Figure 5 we used 10 bits per value and an average branching factor of 5. We compare the time it takes for IDEA, RC4, and RC5 on both architectures to KAPDA, which is the same on both architectures due to the same bandwidth. Since IDEA, RC5, and RC4 are not dependent on the message set size, they are shown as constants. When using block encryption methods such as IDEA and RC5, KAPDA is much faster, while for streaming methods such as RC4, 47 camouflage values would have the same delay. We can conclude it would be acceptable for our scheme to consume a little more power in transmitting camouflage values if delay was a critical issue and privacy was of concern.

6.

x + y = Decrypt(Encrypt(x) ⊕ Encrypt(y)) x ∗ y = Decrypt(Encrypt(x) ⊗ Encrypt(y)),

RELATED WORK

where ⊕ and ⊗ are special addition and multiplication functions that work on encrypted data. Because the data are encrypted from end to end, the problem with data confidentiality usually does not exist. However, these methods usually work only on some aggregation functions, such as sum and average, and do not work well with maximum and minimum. End-to-end encryption has the advantage of less computation, because values do not need to be decrypted then encrypted, but it still leaves the questions of how to distribute encryption keys. Hop-by-hop encryption has the disadvantage of more computation overhead, because values are decrypted and encrypted at each node along the routing tree to the base station. Also, the plain text is available at each node, which increases the risk of data leakage through node capture attacks. Previous efforts on PDA focused on the additive aggregation functions. Horey et al. propose a data collection scheme based on negative surveys [25], where sensor nodes transmit

Data aggregation has the benefit to achieve bandwidth and energy efficiency in resource-limited wireless sensor networks [38]. Previous work [29, 28, 14, 42, 1, 10, 44, 34] addresses data aggregation in various application scenarios with the assumption that all sensors are working in trusted and friendly environments. In reality, sensor networks are likely to be deployed in an untrusted environment, where links can be eavesdropped and messages can be altered. An adversary could manipulate the sensory data in wireless sensor networks. LeMay et al. summarize the functional characteristic of wireless metering sensors and categorizes attackers in [33], where both privacy and security are concerns in the given scenarios. Previous work [41, 45, 9] investigates secure data aggregation against adversaries who try to tamper with the intermediate aggregation result. The focus of this paper is on PDA in wireless sensor networks. Privacy has been studied in the data mining domain [4, 30, 27] and peer-to-peer network application [26]. 8

erwise be difficult. Confidentiality is achieved through kanonymity with the aggregate hidden among k − 1 other values. Dividing a message set into different subsets, ISi , ITi , and ISi , allows us to camouflage the message set with restricted decoys and unrestricted decoys. We use a semi shared global key call set IS , which allows some resistance to node collusion and capture. We have shown that it is power efficient to slightly increase the message bandwidth with decoys than using conventional methods of hop-by-hop encryption.

8. ACKNOWLEDGMENTS The authors thank Gelerah Taban for her suggestions and ideas. MG acknowledges support from Motorola. SF acknowledges the partial support of NSF (grants CCF-0621900, CCR-0331580, SHF-0905236) and AFOSR MURI grant FA955007-1-0532. WH acknowledges support from DOE NNSA grant DE-FG52-06NA27494. Figure 5: The amount of time delay in the network using various forms of encryption on different architectures compared to our method. Even if 40 decoy values are sent per real value, our method takes a considerable less amount of time compared to other methods.

9. REFERENCES

[1] T. Abdelzaher, T. He, and J. Stankovic. Feedback Control of Data Aggregation in Sensor Networks. 43rd IEEE Conference on Decision and Control, December 2004. [2] G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. Panigrahy, D. Thomas, and A. Zhu. Approximation Algorithms for K-Anonymity. Journal of Privacy Technology, November 2005. [3] R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu. Order Preserving Encryption for Numeric Data. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of data, pages 563–574, New York, NY, USA, 2004. ACM. [4] R. Agrawal and R. Srikant. Privacy Preserving Data Mining. In ACM SIGMOD Conf. Management of Data, pages 439–450, 2000. [5] M. Atzori, F. Bonchi, F. Giannotti, and D. Pedreschi. k-Anonymous Patterns. In Proceedings of the Principles and Practice of Knowledge Discovery in Databases (PKDD), pages 10–21, 2005. [6] S. Camtepe and B. Yener. Combinatorial Design of Key Distribution Mechanisms for Wireless Sensor Networks. In Proceedings of 9th European Symposium On Research in Computer Security (ESORICS 04), 2004. [7] C. Castelluccia, E. Mykletun, and G. Tsudik. Efficient Aggregation of Encrypted Data in Wireless Sensor Networks. Mobiquitous, 2005. [8] H. Chan, A. Perrig, and D. Song. Random Key Predistribution Schemes for Sensor Networks. In IEEE Symposium on Research in Security and Privacy, pages 197–213, 2003. [9] H. Chan, A. Perrig, and D. Song. Secure Hierarchical In-Network Aggregation in Sensor Networks. In Proceedings of 13rd ACM Conference on Computer and Communications Security (CCS06), October 2006. [10] J.-Y. Chen, G. Pandurangan, and D. Xu. Robust Computation of Aggregates in Wireless Sensor Networks: Distributed Randomized Algorithms and Analysis. IPSN, 2005. [11] W. Conner, T. F. Abdelzaher, and K. Nahrstedt. Using Data Aggregation to Prevent Traffic Analysis in

a sample of the data complement to a base station instead of transmitting their actual data. The base station then uses the negative samples to reconstruct a histogram of the original sensor readings. In [17], Feng et al. propose a family of secret perturbation-based schemes that can protect sensor data confidentiality without disrupting the additive data aggregation result. He et al. proposed two PDA protocols in [24] based on algebraic properties of polynomials and addition operation. These efforts in privacy preservation domain do not assume data manipulation/pollution attacks. In [19], Ganti et al. present architectural components for privacy guarantees on stream data from private owned sensors to collect mutually interested aggregated phenomena. Although the concept of camouflage has not, to the best of our knowledge, been applied to data aggregation, it has been applied to routing methods [22, 15]. In [11], the authors use a decoy sink to perturb traffic and hence protect the location of the real sink.

7.

FUTURE WORK AND CONCLUSION

Future work will focus on using variable sizes for sets I, IS , ISi . As the cost for radio communication declines, more emphasis will be placed on methods that take advantage of such a decline. New keyless methods can be investigated for the pre-distribution phase, possibly by querying the base station for values that are in IS after deployment. For example, the base-station could randomly deny queries that ask if a certain value is present in set IS . While encryption provides a stronger level of privacy, we have shown in Section 5 with a higher cost. Future network implementors will have to determine the trade offs between level of privacy needed versus the energy and time constraints. In this paper we showed that WSNs can protect confidentiality by hiding values in plain text along with decoys. By allowing the maximum or minimum to be in plaintext, aggregation can take place efficiently, which would oth9

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23] [24]

[25]

Wireless Sensor Networks. In DCOSS: International Conference on Distributed Computing in Sensor Systems, pages 202–217, 2006. I. Damgard, M. Geisler, and M. Kroigard. Homomorphic encryption and secure comparison. International Journal of Applied Cryptograhpy, 1(1):22–31, 2008. G. de Meulenaer, F. Gosset, F.-X. Standaert, and O. Pereira. On the Energy Cost of Communication and Cryptography in Wireless Sensor Networks. IEEE International Conference on Wireless and Mobile Computing, Networking and Communication, 0:580–585, 2008. A. Deshpande, S. Nath, P. B. Gibbons, and S. Seshan. Cache-and-query for wide area sensor databases. SIGMOD, 2003. R. Dingledine, N. Mathewson, , and P. Syverson. Deploying Low-Latency Anonymity: Design Challenges and Social Factors. In Proceedings of the IEEE Symposium on Security & Privacy, September 2007. W. Du, J. Deng, Y. S. Han, and P. K. Varshney. A pairwise key pre-distribution scheme for wireless sensor networks. In Proceedings of the 10th ACM Conference on Computer and Communications Security (CCS), pages 42–51, October 2003. T. Feng, C. Wang, W. Zhang, and L. Ruan. Confidentiality Protection Schemes for Data Aggregation in Sensor Networks. In IEEE INFOCOM, Phoenix, AZ, April 2008. P. Ganesan, R. Venugopalan, P. Peddabachagari, A. Dean, F. Mueller, and M. Sichitiu. Analyzing and modeling encryption overhead for sensor network nodes. In WSNA ’03: Proceedings of the 2nd ACM international conference on Wireless sensor networks and applications, pages 151–159, New York, NY, USA, 2003. ACM. R. Ganti, N. Pham, Y.-E. Tsai, and T. Abdelzaher. PoolView: Stream Privacy for Grassroots Participatory Sensing. In The 6th ACM Conference on Embedded Networked Sensor Systems (Sensys), November 2008. J. Girao, D. Westhoff, and M. Schneider. CDA: Concealed Data Aggregation for Reverse Multicast Traffic in Wireless Sensor Networks. In 40th International Conference on Communications, IEEE ICC, May 2005. O. Goldreich. Foundations of Cryptography: Volume 2, Basic Applications. Cambridge University Press, New York, NY, USA, 2004. D. Goldschlag, M. Reed, and P. Syverson. Onion Routing for Anonymous and Private Internet Connections. Communications of the ACM, 42(2), February 1999. Google. Google powermeter, 2009. http://www.google.org/powermeter/. W. He, X. Liu, H. Nguyen, K. Nahrstedt, and T. Abdelzaher. PDA: Privacy-preserving Data Aggregation in Wireless Sensor Networks. In IEEE INFOCOM, 2007. J. Horey, M. M. Groat, S. Forrest, and F. Esponda. Anonymous Data Collection in Sensor Networks. In

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35] [36]

[37]

[38]

[39]

[40]

10

Fourth Annual International Conference on Mobile and Ubiquitous Systems: Networking and Services (MobiQuitous’07), August 2007. Q. Huang, H. J. Wang, and N. Borisov. Privacy-Preserving Friends Troubleshooting Network. In Symposium on Network and Distributed Systems Security (NDSS), San Diego, CA, Feburary 2005. Z. Huang, W. Du, and B. Chen. Deriving Private Information from Randomized Data. In Proceedings of the ACM SIGMOD Conference, June 2005. C. Intanagonwiwat, D. Estrin, R. Govindan, and J. Heidemann. Impact of Network Density on Data Aggregation in Wireless Sensor Networks. In Proceedings of the 22nd International Conference on Distributed Computing Systems, 2002. C. Itanagonwiwat, R. Govindan, and D. Estrin. Directed Diffusion: A Scalable and Robust Communication Paradigm for Sensor Networks. MobiCom, 2002. H. Kargupta, Q. W. S. Datta, and K. Sivakumar. On The Privacy Preserving Properties of Random Data Perturbation Techniques. In the IEEE International Conference on Data Mining, November 2003. H. S. N. Lab. Codeblue: Wireless sensors for medical care, 2008. http://fiji.eecs.harvard.edu/CodeBlue. B. Lai, S. Kim, and I. Verbauwhede. Scalable Session Key Construction Protocol for Wireless Sensor Networks. In In IEEE Workshop on Large Scale RealTime and Embedded Systems (LARTES, page 7, December 2002. M. LeMay, G. Gross, C. A. Gunter, and S. Garg. Unified Architecture for Large-Scale Attested Metering. In Proceedings of HICSS-40, January 2007. M. Li and Y. Liu. Underground Structure Monitoring with Wireless Sensor Networks. In 6th International Symposium on Information Processing in Sensor Networks (IPSN), Cambridge, Massachusetts, USA, April 2007. Y. Lindell and B. Pinkas. Privacy Preserving Data Mining. J. Cryptology, 15(3):177–206, 2002. D. Liu and P. Ning. Establishing pairwise keys in distributed sensor networks. In Proceedings of 10th ACM Conference on Computer and Communications Security (CCS03), pages 52–61, October 2003. S. Madden, M. J. Franklin, J. Hellerstein, and W. Hong. TAG: a Tiny Aggregation Service for Ad-Hoc Sensor Networks. In In the Fifth Symposium on Operating Systems Desgin and Implementation (OSDI’02), 2002. S. Madden, M. J. Franklin, and J. M. Hellerstein. TAG: A Tiny AGgregation Service for Ad-Hoc Sensor Networks. OSDI, 2002. V. G. Papanicolaou, G. E. Kokolakis, and S. Boneh. Asymptotics for the random coupon collector problem. J. Comput. Appl. Math., 93(2):95–105, 1998. A. Perrig, M. Luk, and C. Kuo. Message-In-a-Bottle: User-Friendly and Secure Key Deployment for Sensor Nodes. In Proceedings of the ACM Conference on Embedded Networked Sensor System (SenSys 2007), October 2007.

[44] X. Tang and J. Xu. Extending Network Lifetime for Precision-Constrained Data Aggregation in Wireless Sensor Networks. INFOCOM, 2006. [45] Y. Yang, X. Wang, S. Zhu, and G. Cao. SDAP: A Secure Hop-by-Hop Data Aggregation Protocol for Sensor Networks. ACM MobiHoc, 2006. [46] A. C. Yao. Protocols for Secure Computations. In 23rd IEEE Symposium on the Foundations of Computer Science (FOCS), pages 160–164, 1982.

[41] B. Przydatek, D. Song, and A. Perrig. SIA: Secure Information Aggregation in Sensor Networks. In Proc. of ACM SenSys, 2003. [42] I. Solis and K. Obraczka. The Impact of Timing in Data Aggregation for Sensor Networks. ICC, 2004. [43] L. Sweeney and L. Sweeney. Achieving k-Anonymity Privacy Protection Using Generalization and Suppression. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10:571–588, 2002.

11

KAPDA: k-Anonymous Privacy-preserving Data ...

Wireless Sensor Networks, Privacy Preserving Data Aggre- gation, k-Anonymity ... This incurs a high overhead and does not preserve data ..... 16 bits and runs at a speed of 4 MHz. To determine ..... Routing for Anonymous and Private Internet.

Download PDF

454KB Sizes 1 Downloads 254 Views

Report

KAPDA: k-Anonymous Privacy-preserving Data ...

Recommend Documents