Privacy Preserving ID3 using Gini Index over Horizontally Partitioned Data Ali Miri School of Information Technology and Engineering (SITE) University of Ottawa, Canada [email protected]

Saeed Samet School of Information Technology and Engineering (SITE) University of Ottawa, Canada [email protected]

Abstract

using Entropy and Gini Index are very similar, almost all existing protocols use Entropy to compute information-gain to find the best split at each node.

The ID3 algorithm is a standard, popular, and simple method for data classification and decision tree creation. Since privacy-preserving data mining should be taken into consideration, several secure multi-party computation protocols have been presented based on this technique. Entropy and Gini Index are two protocols which compute Information-Gain at each step when producing a decision tree. The Gini Index, however, has been less studied in privacy-preserving data mining protocols. In this paper, we show how Gini can be used in privacy-preserving ID3 algorithms to create decision tree classifications in such a way that involved parties can jointly compute the gain value of each normal attribute without revealing their own private information to each other, while the database is horizontally partitioned over two or more parties. Three secure multiparty sub-protocols are presented to evaluate the intermediate computations. The communication overhead has been kept reasonably low to make the whole protocol efficient and practical.

1

The difference between these two formulas is that with the Gini Index splits are done, preferably, in such a way that the largest class goes into one pure node, while the other classes go into the other node, while Entropy normally tries to create balanced tree. Thus, in distributed computation of the decision tree, where communication cost is the most important issue, we can use the one with better performance, regardless of the negligible difference in their final decision tree. Also, some applicants prefer to test their database with different types of existing techniques and select the best one depending on the final result and their needs for specific problems. Therefore, different protocols, using various techniques are needed to be proposed in this field of study. In this paper, we introduce a secure solution for the ID3 algorithm in which the Gini Index is used to compute information gains for the remaining attributes in the current node of the decision tree. We present a multi-party protocol to securely compute each expression of the formula obtained and three secure multi-party sub-protocols for addition , multiplication and square division . First one generates private output shares for involved parties such that their multiplication becomes equal to the addition of the input shares. Second one generates private output shares such that their addition becomes equal to the multiplication of the input shares. Third one, square division, computes a sub-formula of the Information Gain formula.

Introduction

Nowadays, many data mining systems are dealing with distributed database among two or more parties, while each party wants to keep its own information private. This is the case in applications in various environments such as medical and insurance. Many protocols and tools have been presented, since privacy-preservation has been considered a crucial issue in data-mining. One popular technique in data mining to classify the information is ID3(Iterative Dichotomizer 3) algorithm by which a decision tree is produced from existing data. There are two main formulas, Entropy and Gini Index, which can be used in ID3 algorithm to select the attribute with the best information-gain value at each step of this process. Although, according to the surveys for splitting criteria, such as [19, 4, 7], the results of

978-1-4244-1968-5/08/$25.00 ©2008 IEEE

The rest of the paper is organized as follows: Section 2 is dedicated to explaining splitting rules in data classification (decision tree). Section 3 reviews a short background of existing work in this field. In Section 4, the new protocol along with the sub-protocols are presented in detail. We discuss the cost and efficiency of the proposed protocol in Section 5. Finally, conclusions and possible future work are described in Section 6.

645

2

Splitting rules in data classification

a random item is in class ci is p(ci ). Thus, the estimated probability of misclassification under this rule is the Gini Index:

The main idea in decision tree creation is to find a normal attribute with the best predicting strength, such that the dataset in each branch is purer than the data in the parent node [3]. This idea will be repeated at each level until a node meets the stopping criteria. To implement this concept, splitting rules in data classification are used. The main point of splitting rules is that we have to use a split in such a way that the node impurity is reduced as much as possible. Therefore, an impurity function φ, on the set of all n-tuples of numbers:

Gini(T ) =

p(ci )(1 − p(ci )) If we repeat this formula for all possible class values, and compute the summation, we have: Gini(T ) = 1 −

• φ is a non-negative function • φ is a symmetric function • φ( n1 , n1 , · · ·, n1 ) has the maximum value

3

• φ(1, 0, 0, ···, 0) = φ(0, 1, 0, 0, ···, 0) = ··· = φ(0, 0, ·· ·, 0, 1) = 0

Privacy-preserving data mining background

Several protocols in the literature have considered privacy-preserving data mining. (In here, privacy for each party mostly refers to this fact that no other party is able to get information about the number of transactions of the party with a specific value for a normal or class attribute.) Some of these techniques work for the case that all transactions are horizontally partitioned among several parties, such as [12, 22], while the others work for vertically partitioned databases, such as [9, 20, 18]. In a horizontally partitioned database, a subset of the database containing whole records is assigned to each party. Whereas, in a vertically partitioned database, each party has the value of some attributes of all records. In both cases, normally, all parties know the complete structure of the database, that is they know all the attribute names and their possible values. There are also some tools [14, 8, 6, 16] used by those protocols as building blocks, such as Efficient Oblivious Transfer, Oblivious Polynomial Evaluation, Secure Scalar Product, Secure Sum, and Secure Size of Set Intersection. Pinkas and Lindell, in [12], present a protocol for running the ID3 algorithm in a secure way between two parties. They use the concept of entropy to find the attribute with the best gain value at each step. A secure protocol for computing x ln x, when x is distributed between two parties

The standard function for this idea is the Entropy function: n X Entropy(T ) = − p(ci ) log p(ci ) i=1

This selection is often used because it satisfies the impurity function requirements. Thus, any non-negative function satisfying all these requirements can be used as the splitting rule. The other popular formula for this purpose is the Gini diversity Index. p(cj )p(ci )

j=1 i=1 i6=j

which can also be written as: n X

p2 (ci )

In [17], these two methods are compared and the frequency of difference between them is obtained. It is found to be no more than 2% in databases of various sizes. That is why, in almost all studies in this area, no significant difference is derived between Information Gain and Gini Index formulas.

p(ci ) = 1

Gini(T ) = 1 −

n X i=1

i=1

n X n X

p2 (ci )

Another interpretation of the Gini Index is according to variances [11]. In each node t, if we assign 1 to all records with class value ci , and 0 to others, then the sample variance of these values is:

• ∀i ∈ {1, · · ·, n} : p(ci ) ≥ 0

Gini(T ) =

n X i=1

i6=j

where p(ci ) is the probability that the value of the class attribute C in a dataset is ci , is defined to measure the impurity of the current dataset, such that:



p(cj )p(ci ) = 1 −

j=1 i=1

(p(c1 ), p(c2 ), · · ·, p(cn ))

n P

n X n X

p2 (ci )

i=1

This formula can be computed quickly and easily. However, according to the formula, the best split has the minimum value for the Gini Index. One interpretation of the Gini Index formula is as follows: The estimated probability that

646

(x = x1 + x2 ), has been proposed which helps to compute the information gain of each normal attribute. The main building block used in this technique is Oblivious Polynomial Evaluation (OPE) [13] in which one party (the server) has a polynomial Q(x) and the other party (the user) has an input a. They run the protocol and at the end, the user knows only the value of the polynomial Q at a, Q(a), and the server knows nothing about the input a. This technique, however, is limited to two parties because of the use OPE. Xiao et al. [22] proposed another protocol in which there is no limitation on the number of parties. They also use entropy in their technique. Some sub-protocols, such as secure two, and multi-party addition, have been presented to implement the whole secure protocol. The main technique used in this protocol is Homomorphic Encryption. In addition to no limitation on the number of involved parties, the performance of their protocol is better than the previous one. Du and Zhan [9] introduced a protocol for the vertically partitioned case in which they apply a secure scalar product protocol by using a third party as the commodity server whose duty is only to generate random vectors and to send some values, according to those vectors, to the parties. This technique is limited to two parties because it uses using twoparty secure dot product as the main building block of the whole protocol. Another algorithm for the vertically partitioned situation has been presented by Vaidya and Clifton [20]. This protocol is applicable to two or more parties. It also works in the case that class or normal attributes are known only to some (and not all) parties. A building block to securely compute the cardinality of set intersection [21, 5, 1] is used in this protocol to determine the majority class of a node when the class is known only to one party. In the tree production process, each party creates and keeps a constraint set of the values of its normal attributes by which it can reach the current node. The Entropy formula is used to compute information gain for each normal attribute. There is another approach in which data values are perturbed before distribution and reconstructed in the aggregate step [2]. As we can see, especially in horizontally partitioned databases, all the proposed protocols use Entropy as their base formula for computing information gain in the ID3 algorithm.

4

the main protocol and then three sub-protocols that we use in our main protocol are discussed.

4.1

Main protocol

In this protocol we compute information gain, using the Gini Index, at each step of decision tree creation for a database that is horizontally partitioned among two or more parties. Suppose there are some normal attributes and one class attribute C in a database S. We run the gain formula for a normal attribute, A, using Gini: Gain(S, A) = Gini(S) −

n X |SAi | Gini(SAi ) |S| i=1

SAi is the set of transactions having value Ai for the normal attribute A. (The set of all possible values for attribute A is {A1 , A2 , · · ·, An }). Gini(SAi ) = 1 −

m X

2 PA i Cj

j=1

in which PAi Cj is the probability that the value of class attribute C in SA is Cj and the value of attribute A is Ai . (The set of all possible values for attribute C is {C1 , C2 , · · ·, Cm }). Therefore, we have: Gini(SAi ) = 1 −

m X |SAi Cj |2 j=1

|SAi |2

Because Gini(S) is fixed for all normal attributes, we can eliminate it and select the attribute with the minimum value of the second part of the Gain formula, named F (S, A), at each step when creating the decision tree. Thus, we have to compute F (S, A): F (S, A)

=

=

  n n m X X X |SAi Cj |2 |SAi | |SAi | 1 −  Gini(SAi ) = |S| |S| |SAi |2 i=1 i=1 j=1   |SA1 C1 |2 |SA1 Cm |2 |SA1 | 1− −···− +···+ 2 2 |S| |SA1 | |SA1 |   |SAn C1 |2 |SAn | |SAn Cm |2 1− − · · · − |S| |SAn |2 |SAn |2

.. . =

=

=

Protocol for privacy-preserving ID3 using Gini Index

   |SA1 C1 |2 |SA1 Cm |2 1 |S| − +···+ − · · ·− |S| |SA1 | |SA1 |   |SAn C1 |2 |SAn Cm |2 +···+ |SAn | |SAn |   m m 2 X |SAn Cj |2 1 X |SA1 Cj |  1− +···+ |S| j=1 |SA1 | |SAn | j=1   n m 2 1 XX |SAi Cj |  1− |S| i=1j=1 |SAi |

Again, because |S| is fixed, we only need to compute the sum. Without loss of generality, by considering the inner 2 m |S P Ai Cj | sum, |SA | , we realize that this sum is in form of:

In this section we propose a new protocol for ID3 algorithm that preserves privacy of parties involved. This protocol uses the Gini Index formula in each step to find the attribute with the best Information-Gain. First, we describe

j=1

i

x21 + · · · + x2m x1 + · · · + xm

647

(1)

in which xj (for j = 1, · · ·, m) is |SAi Cj | and x1 + · · · + xm equals to |SAi |. Now, if the database is horizontally partitioned between two or more parties, each expression (1) is in the form of expression (2), in which k is the number of parties. !2

k P

xi1

k P

+···+

i=1

xi1 + · · · +

i=1

k P

i=1

5. Pk randomly selects its, nonzero, output share lk , com k lk−1 Q −1 putes its inverse, lk , calculates E(xi , e)

!2 xim

i=1 k P

4. Pk encrypts its input xk , E(xk , e), and computes k Q E(xi , e).

i=1

(2)

and sends it to Pk−1 .

xim

6. For i = k − 1 to 2

i=1

By using three sub-protocols mentioned in sections 4.2.1, 4.2.2 and 4.2.3, expression (2) can be securely computed. These steps are executed repeatedly for all possible values of the current attribute and the information gain of this attribute will be evaluated. At each level when producing the decision tree, involved parties run the protocol for their private inputs and after computing the gain values for all remaining normal attributes, they select the attribute with the best gain value and split the transactions according to the values of that attribute at the current node.

4.2

• Pi randomly selects its, nonzero, output share li , computes its inverse, li−1 , calculates k Q

!lk−1. E(xj , e)

 ! k  Y  l1 = D  E(xi , e)   i=1

k Y

In this sub-protocol, every party Pi , 1 ≤ i ≤ k, has a private input xi and, at the end, each Pi obtains a private share li , such that: k k Y X xi = li

li

=

D

E

i=1

i=1

=E

(3)

k Y

! li , e

..

= E(l1 , e)l2

=

D

k Y i=1

    , d  

(5)

! li , e



! ,d

..



.lk

= D E (l1 , e)l2

, d

! E (xi , e) , d

=D

E

..

l.2      

lk

      , d    

k X xi , e i=1

!

! ,d

=

k X xi i=1

Security Analysis Except P1 no one knows the private key d and therefore only P1 is able to decrypt an encrypted message. In steps (2) to (4) every party P2 to Pk receives an encrypted value from the previous party and after encrypting its own input and multiplying with the received value, sends to the next party, except Pk . Thus, they cannot find any private data from each other, because of not having the private key. In steps (5) and (6) each party Pk to P2 selects a random number, which is unknown to all other parties, and by computing the power of receiving value to the inverse of this random number and sending to the next party, nothing is revealed to that party. Finally, in step (7) P1 receives a number which is the multiplication of all encrypted messages power to the inverse of all random numbers, selected by Pk to P2 , and therefore by decrypting this value, is not able to figure out anything about the private inputs belong to those parties.

.lk

(4)

i=1

This sub-protocol contains following steps: 1. P1 selects an additive homomorphic encryption, keeps the private key d and sends the public key e to all other parties. 2. P1 encrypts its input x1 , E(x1 , e), and sends it to P2 . 3. For i = 2 to k − 1 • Pi encrypts its input xi , E(xi , e), multiplies it by i−1 i Q Q E(xj , e), and sends E(xj , e) to Pi+1 . j=1

k Y

  −1 l  . 2  .. −1  k !lk  Y  E (xi , e) D    i=1  

i=1

!

−1 l . 2

i=1

=

and no party obtains any information about other’s input and output. Suppose E is an additive homomorphic encryption, with public key e and private key d. Thus, having equation (3), we have: k X xi , e

.. l−1 k

Correctness of the algorithm We have

Multi-party Addition

E(xi , e) = E

and sends it to Pi−1 .



i=1

k Y

−1 i

7. P1 decrypts the received value from P2 and sets it as its output share l1 .

Sub-protocols

i=1

l

j=1

We use Additive Homomorphic Encryption proposed by [15] for sub-protocols Multi-party Addition and Multi-party Multiplication. 4.2.1

..

j=1

648

u

4.2.2 Multi-party Multiplication

4.2.3

In this sub-protocol, every party Pi , 1 ≤ i ≤ k, has a private input xi , and at the end each Pi obtains a private share li , such that:

The last sub-protocol is for computing the value of expression (2) while for every i, private inputs xi1 , · · ·, xim belong to party Pi and at the end of this sub-protocol, every party has the value of expression (2). To make it simple, and without loss of generality, we assume there are two parties, k = 2, and number of possible class values are two, m = 2. Thus, expression (2) will be:

k Y

k X li

xi =

i=1

(6)

i=1

and no party obtains any information about other’s input and output. Following are the steps of the sub-protocol:

(x11 + x21 )2 + (x12 + x22 )2 (x11 + x21 ) + (x12 + x22 )

1. P1 selects an additive homomorphic encryption, keeps the private key d and sends the public key e to all other parties.

x11 ∗ x21 = s11 + s21

3. For i = 2 to k − 1 xi ..

, and sends it to Pi+1 .

P1 : z1 = x211 + x212 + 2s11 + 2s12

,

P2 : z2 = x221 + x222 + 2s21 + 2s22

,

z1 + z2 = m1 ∗ m2

k Y



! E(xi , e)−1

Then, they send r1 and r2 to each other. Now, each party computes r1 ∗ r2 , which equals to expression (8), shown as follows:

! ,d

(7)

r1 ∗ r2

i=2

D

E

i=1

k X li , e

!

k Y

! ,d

=D

i=1

=

D

!

=

E (l1 , e) , d

i=1

E(l1 , e) ∗

k Y

..

D

E(x1 , e)x2

E(li , e), d

D

.. x2

.xk

k Y



E(li , e)−1 ∗

i=2

=

=

D

E(x1 , e)

E

k Y i=1

.xk

! E(li , e), d

i=2

! ,d

! xi , e

k Y

! ,d

=

k Y

m1 ∗ m2 n1 ∗ n2 z1 + z2 w1 + w 2 (x11 + x21 )2 + (x12 + x22 )2 (x11 + x21 ) + (x12 + x22 )

Security Analysis This sub-protocol uses the two former sub-protocols to compute expression (8), and therefore is secure by using composition theorem [10]. First, secure multiplication is used for each pair of parties’ inputs. Then private output shares along with private inputs, inside linear combinations, become inputs for secure addition protocol. Also, in the last step, each party receives a number which is the result of division of two inputs of the other party, and thus is not able to find the value of each individual input.

!

i=2

=

= =

Correctness of the algorithm We have =

w1 + w2 = n1 ∗ n2 .

m1 n1 m2 P2 : r2 = . n2

5. P1 decrypts the received value from P2 and sets it as its output share l1 . .xk

,

P1 : r1 =

Pi−1 .

..

w2 = x21 + x22

Next, they set:

j=i

E(x1 , e)x2

w1 = x11 + x12

and they securely and jointly compute They first run the secure two-party addition sub-protocol, mentioned in sub-section 4.2.1, for each pair of (z1 , z2 ) and (w1 , w2 ) and P1 obtains m1 and n1 , and P2 obtains m2 and n2 , such that:

• Pi randomly selects its, nonzero, output share li , encrypts it, E(li , e), computes its inverse, E(li , e)−1 , multiplies the received value by that, .xk k Q .. E(x1 , e)x2 ∗ E(lj , e)−1 , and sends it to

k X li

x12 ∗ x22 = s12 + s22

z1 +z2 w1 +w2 .

4. For i = k to 2

l1 = D

,

Now, each party considers its inputs as follows:

• Pi powers the received value to its input xi , .

(8)

First we run the secure two-party multiplication subprotocol, mentioned in sub-section 4.2.2, for each pair of inputs (x11 , x21 ) and (x12 , x22 ), in which x11 and x12 belong to P1 and x21 and x22 belong to P2 . At the end, P1 obtain s11 and s12 and P2 obtain s21 and s22 , such that:

2. P1 encrypts its input x1 , E(x1 , e), and sends it to P2 .

E(x1 , e)x2

Secure multi-party square division

xi

i=1

5 Security analysis for this sub-protocol is the same as that of the previous one and is omitted here because of size limitation.

Protocol cost and efficiency

Computation and communication costs of the protocol depend on the size of the database, the number of parties

649

by b and k, the overall cost at each node is:

involved, the number of attributes and number of possible values for attributes (on average). This dependency is common to all protocols presented. To compute the cost of the protocol, we use the following notations:

• The number of remaining normal attributes at the current step and node is denoted by a. • The number of possible values for those normal attributes, on average, is denoted by v.

6

• The number of possible values for the class attribute is denoted by c.

• The number of bits exchanging for one ri (for i = 1, · · ·, p) is denoted by e. First, we review the communication and computation cost of three sub-protocols Secure Multi-party Addition, Multiplication and Square-division. The two former subprotocols, cost the same. Thus we compute the first one. Communication cost is linear and depends on the number of the parties. Each party P1 , · · · , Pk−1 has to send a number to the next party, and also each party Pk , · · · , P2 has to send a number to the previous party. Therefore, the overall communication cost for each sub-protocol is (b∗2∗(k−1)). For computation cost, P1 has to do one encryption and one decryption in the first and last steps. Each party P2 , · · · , Pk has to encrypts its input value and multiplies by the received value in steps 3 and 4. Also, they have to randomly select their output share and compute their reverse, which can be done in advance and computes the power of a number. Therefore, there are k encryptions, one decryption, k − 1 multiplications and k − 1 power computations. We denote this cost by CPk .   ∗ c secure In the last sub-protocol, we need k∗(k−1) 2 two-party multiplication, and one secure k-party addition and (k ∗ (k − 1)) exchange. Therefore, communication cost of this sub-protocol (to compute expression (9)) is  k ∗ (k − 1) ∗ (b ∗ c + 1 + 2 ∗ kb ) .

7

k ∗ (k − 1) ∗ c ∗ CP2 ∗ CPk 2

=

O(a ∗ v ∗ k2 ∗ b)

Future work and conclusion

Acknowledgments

References [1] R. Agrawal, A. V. Evfimievski, and R. Srikant. Information Sharing Across Private Databases. In ACM Special Interest Group on Management of Data (SIGMOD) Conference, pages 86–97, 2003. [2] R. Agrawal and R. Srikant. Privacy-Preserving Data Mining. In ACM Special Interest Group on Management of Data (SIGMOD) Conference, pages 439–450, 2000. [3] F. J. O. R. Breiman, L. and C. Stone. Classification and Regression Trees. Chapman & Hall, New York, 1984. [4] L. Breiman. Technical Note: Some Properties of Splitting Criteria. Machine Learning, 24(1):41–47, 1996. [5] C. Cachin and J. Camenisch, editors. Advances in Cryptology - EUROCRYPT 2004, International Conference on the

Computational overhead for computing expression (9) is 

Communication Cost

This work was partially supported by the grants from Natural Sciences and Engineering Research Council of Canada (NSERC), and The Ontario Research Network for Electronic Commerce (ORNEC).

(9)

|SAi |

O(a ∗ v ∗ k2 ∗ CP2 ∗ CPk )

In this paper a protocol is presented for a privacypreserving ID3 algorithm using the Gini Index for databases partitioned horizontally among two or more parties. The Gini Index is one of the simplest formulas which can be used to compute the Information-Gain of each normal attribute. A main sub-protocol is proposed by which involved parties can compute each expression of the summation in the gain formula in such a way that each party does not obtain any private information from the other parties. This protocol has no limitations on the number of parties that can use it simultaneously with limited interactions. It would be interesting to extend this protocol for use with arbitrary partitioned data (both horizontally and vertically). Also, the two sub-protocols, secure multi-party addition and secure multi-party multiplication, can be used as building blocks for different secure multi-party computation protocols in privacy-preserving data mining.

• The number of bits exchanging from one party to another party is denoted by b.

j=1

=

Normally, the number of parties involved in the protocol is not too large, in contrast to the number of transactions which could be very large. Therefore, k 2 in the computational and communication cost is not a large number and efficiency of the protocol is not negatively affected by this parameter.

• The number of parties involved in the protocol is denoted by k.

m X |SAi Cj |2

Computational Cost

 .

For the whole protocol, at each step of the creation of the decision tree, for each node we have to compute expression (9), (a∗v) times. Thus, by assuming that c is dominated

650

[6]

[7]

[8]

[9]

[10]

[11]

[12] [13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22] M.-J. Xiao, L.-S. Huang, Y.-L. Luo, and H. Shen. Privacy Preserving ID3 Algorithm over Horizontally Partitioned Data. In Parallel and Distributed Computing, Applications and Technologies, pages 239–243, 2005.

Theory and Applications of Cryptographic Techniques, Interlaken, Switzerland, May 2-6, 2004, Proceedings, volume 3027 of Lecture Notes in Computer Science. Springer, 2004. C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Y. Zhu. Tools for Privacy Preserving Distributed Data Mining. ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD), 4(2):28–34, 2003. DTREG. How Trees are Built. Software For Predictive Modeling and Forecasting. http://www.dtreg.com/treebuild.htm, 2006. (Last posted: 22/7/2006). W. Du and M. Atallah. Privacy-Preserving Cooperative Statistical Analysis. In ACSAC ’01: Proceedings of the 17th Annual Computer Security Applications Conference, pages 102–110, New Orleans, Louisiana, USA, December 10-14 2001. W. Du and Z. Zhan. Building Decision Tree Classifier on Private Data. In CRPITS’14: Proceedings of the IEEE international conference on Privacy, security and data mining, pages 1–8, Darlinghurst, Australia, Australia, 2002. Australian Computer Society, Inc. O. Goldreich. Foundations of Cryptography: Volume 2, Basic Applications. Cambridge University Press, New York, NY, USA, 2004. R. J. Light and B. H. Margolin. An Analysis of Variance for Categorical Data. In Journal of The American Statistical Association, volume 66, pages 534–544, 1971. Y. Lindell and B. Pinkas. Privacy Preserving Data Mining. In CRYPTO, pages 36–54, 2000. M. Naor and B. Pinkas. Oblivious Transfer and Polynomial Evaluation. In STOC ’99: Proceedings of the thirty-first annual ACM Symposium on Theory of Computing, pages 245– 254, New York, NY, USA, 1999. ACM Press. M. Naor and B. Pinkas. Efficient Oblivious Transfer Protocols. In SODA ’01: Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms, pages 448– 457, Philadelphia, PA, USA, 2001. Society for Industrial and Applied Mathematics. P. Paillier. Public-Key Cryptosystems Based on Composite Degree Residuosity Classes. In EUROCRYPT, pages 223– 238, 1999. B. Pinkas. Cryptographic Techniques for Privacy-Preserving Data Mining. ACM Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD), 4(2):12–19, 2002. L. E. Raileanu and K. Stoffel. Theoretical Comparison between the Gini Index and Information Gain Criteria. Annal of Mathematics and Artificial Intelligence, 41(1):77–93, 2004. E. Suthampan and S. Maneewongvatana. Privacy Preserving Decision Tree in Multi Party Environment. In Asia Information Retrieval Symposium (AIRS), pages 727–732, 2005. S. Systems. Do Splitting Rules Really Matter? http://www. salford-systems.com/423.php, 2006. (Last posted: 22/7/2006). J. Vaidya and C. Clifton. Privacy-Preserving Decision Trees over Vertically Partitioned Data. In Data and Application Security (DBSec), pages 139–152, 2005. J. Vaidya and C. Clifton. Secure Set Intersection Cardinality with Application to Association Rule Mining. Journal of Computer Security, 13(4):593–622, 2005.

651

Privacy Preserving ID3 using Gini Index over ...

Jul 22, 2006 - information to each other, while the database is horizontally partitioned over two or more parties. Three secure ... data mining to classify the information is ID3(Iterative Di- chotomizer 3) algorithm by which a decision tree is ...... 4(2):28–34, 2003. [7] DTREG. How Trees are Built. Software For Predictive Mod-.

215KB Sizes 1 Downloads 153 Views

Recommend Documents

Implementing Security to information using privacy preserving data ...
Abstract: Rapidly growing use and development of data mining technologies bring serious issues to the security of individual's vital and sensitive information. An emerging research topic in data mining, known as privacy- preserving data mining (PPDM)

Privacy Preserving Public Auditing for Secure Cloud Storage Using TPA
the task of allowing a third party auditor (TPA), on behalf of the cloud client, to verify the integrity of the dynamic data stored in the cloud. To securely introduce an ...

Privacy-Preserving Incremental Data Dissemination
In this paper, we consider incremental data dissemination, where a ..... In other words, the data provider must make sure that not only each ...... the best quality datasets, these data are vulnerable to inference attacks as previously shown.

MobiShare: Flexible Privacy-Preserving Location ...
ests, habits, and health conditions, especially when they are in ... Electronic Frontier Foundation (EFF), can provide the location .... tower keeps a record of A's current location in its user info ..... Social serendipity: Mobilizing social softwar

Privacy Preserving Support Vector Machines in ... - GEOCITIES.ws
public key and a signature can be used. .... authentication code (MAC) which is derived from the ... encryption-decryption and authentication to block the.

Privacy-Preserving Protocols for Perceptron ... - Semantic Scholar
the case of client-server environment, and it is assumed that the neural ... Section 4 is dedicated ... preserving protocol neural network for client-server environ-.

Privacy-Preserving Protocols for Perceptron ... - Semantic Scholar
School of Information Technology and. Engineering (SITE). University ... to the best of our knowledge, there is no privacy-preserving technique to collaboratively ...

La Froscà - Gini Vini
Nose: Elegant and complex with mineral notes, acacia flowers, white peach and pear. Flavour: Salty and spicy, with flint notes and a pronounced minerality.

La Froscà - Gini Vini
Flavour: Salty and spicy, with flint notes and a pronounced minerality. Notes of yellow apple, lemon, white peach, almond. Pronounced and velvety, of great balance. Suitable for long ageing. Serving Temperature and Food Matching. Serving temperature:

PRIVACY PRESERVING k-MEANS CLUSTERING IN ...
Extracting meaningful and valuable knowledge from databases is often done by ... Cluster analysis is a technique in data mining, by which data can be di-.

Perturbation based privacy preserving Slope One ...
If we are to predict Y from X, we can use the basic Slope One predictor as Y = X +(Y − X) ..... OS X 10.7.2 and 64-bit Java 1.6.0 29 environment on an Apple Macbook Pro ... requirement informs us that a 2-dimensional array (e.g. long[][]) is an ...

Practical privacy preserving collaborative filtering on ...
A recommendation example: Amazon's “people who buy x also buy y”. Recommendation .... Amazon Web Services Elastic Beanstalk (AWS EBS)2. PaaS cloud.

Privacy Preserving and Scalable Processing of Data ...
tremendously in accordance with the Big Data trend, thereby making it a challenge for commonly-used software tools to capture, manage and process such large ... most important research topics in data security field and it have become a serious concer

Privacy Preserving and Scalable Processing of Data ...
tremendously in accordance with the Big Data trend, thereby making it a ... Cloud computing is a model for enabling convenient, on-demand network access to a .... We briefly review recent research on data privacy preservation and privacy ...

Gmatch Secure and Privacy-Preserving Group Matching in Social ...
Each group member generate his pub- lic/private key pair (pki. , ski) for computing ring signatures. The ring signature scheme we used is BGLS [4], which is.

Feasibility of a privacy preserving collaborative filtering ... - Anirban Basu
cloud for running web applications developed in Python,. 3Report available at .... Extensions in the GAE/J, the open-source University of. Texas (Dallas) Paillier ...

PReFilter: An Efficient Privacy-preserving Relay ...
†Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario, Canada. ‡Faculty of Business and Information Technology, University of Ontario Institute of Technology, Oshawa, Ontario, Canada. §INRIA Lille - Nord E

Feasibility of a privacy preserving collaborative ...
App Engine – a performance case study. Anirban Basu Jaideep Vaidya Theo Dimitrakos ... filtering) requires computing power. Cloud is a solution for building a recommendation system, but there is a problem. . . ...privacy ...... High replication but

Privacy-preserving query log mining for business ... - ACM Digital Library
transfer this problem into the field of privacy-preserving data mining. We characterize the possible adversaries interested in disclosing Web site confidential data and the attack strategies that they could use. These attacks are based on different v

Privacy-preserving weighted Slope One predictor for ...
However, sharing user-item preferential data for use in CF poses significant ... of research in privacy-preserving collaborative filtering are: encryption-based and ...

Slicing: A New Approach for Privacy Preserving Data ...
Computer Science at Purdue University, West Lafayette, IN 47906. E-mail: {li83, ninghui ..... be an original tuple, the matching degree between t and B is the product of ...... online privacy protection, privacy-preserving data publishing, and oper-.

Privacy Preserving Public Auditing for Secure ... - IJRIT
data, applications and other resources, users can be easy to use powerful ... of PaaS are no need to buy special hardware and software to develop and.