A Privacy-Protecting Architecture for Collaborative Filtering via Forgery and Suppression of Ratings Javier Parra-Arnau, David Rebollo-Monedero and Jordi Forné http://sites.google.com/site/javierparraarnau/ Department of Telematics Engineering Technical University of Catalonia (UPC) Barcelona, Spain Leuven, Belgium
September 15, 2011
1
Outline 2
Introduction State of the Art
An Architecture for Privacy Protection in Collaborative
Filtering based Recommendation Systems Formulation of the Optimal Trade-Off between Privacy and
Utility Conclusions
Introduction
Information Overload 3
The amount of information on the Web has grown exponentially since
the advent of the Internet
Collaborative Filtering 4
A recommendation system is a filtering system that suggest
information items that are likely to be of interest to the user Recommendation systems based on collaborative filtering (CF) algorithms Examples include Amazon, Digg, Movielens and Netflix
on overload
ormation ov ion overload
ion overload ion overload tion overloa ion overload ion overload mation overl ion overload ion overload rmation ove on overload ion overload ion overload formation o rmation ove ion overload Information
User Profiles 5
Users need to communicate their preferences to the recommender
in order to obtain a prediction for those items they have not yet considered
80 76 71 71 67 62 54 51 38 34 25 25 16 12 7
7
7 3
3
AX ir IM -No m n Fil ter es W ical us n M dre il Ch ror r Ho asy n nt io y Fa at tar im en An um c Do ery ist M ar W i i-F ce Sc an m Ro e e im ur Cr ent v Ad o n ti Ac edy m Co ller ri Th a am Dr
Privacy Risk 6
The privacy risks perceived by users include computers “figuring
things out” about them, unsolicited marketing, court subpoenas, and government surveillance [Cranor 03]
Recommendation System predictions
she’s pregnant!
…
…
Forgery and Suppression of Ratings 7
Submitting false information and refusing to give private information
are strategies accepted by users concerned with their privacy [Fox 00, Hoffman 99] Our approach relies upon the forgery and suppression of ratings
SUPPRESSION
… predictions the user has read these books
Recommendation System
…
Contribution (I) 8
Our architecture protects user privacy to a certain extent
utility loss measured as forgery rate and suppression rate
Contribution (II) 9
Mathematical formulation of the optimal trade-off among privacy,
forgery rate ½ and suppression rate ¾
Privacy as the Shannon entropy of the user’s apparent profile
P(½; ¾) =
max r;s
P ri >0; P ri =½ qi >si >0; si =¾
µ
q+r¡s H 1+½¡¾
¶
Our proposal could be used in combination with other existing
approaches
State of the Art
Privacy Protection in Recommendation Systems 10
The state-of-the-art approaches may be classified according to these
main strategies perturbing the information provided by users [Pollat 03, 05, Agrawal 01, Kargupta 03, Huang 05],
using cryptographic techniques [Canny 02, Ahmad 07, Zhan 10], and distributing the information collected [Miller 04, Berkovsky 07] 3.2 + 1.5, 2.9 – 0.7, 4.1, 4.4 – 2.7
5.6, 3.3 + 1.0, 1.1, 3.4 – 0.1
recommendation system
[Pollat 03]
Privacy Protection in Recommendation Systems 10
The state-of-the-art approaches may be classified according to these
main strategies perturbing the information provided by users [Pollat 03, 05, Agrawal 01, Kargupta 03, Huang 05],
using cryptographic techniques [Canny 02, Ahmad 07, Zhan 10], and distributing the information collected [Miller 04, Berkovsky 07] q5
q4
q1
Enc(q1)+: : : + Enc(q5)= = Enc(q1 + : : : + q5)
q3 q2 [Canny 02]
Privacy Protection in Recommendation Systems 10
The state-of-the-art approaches may be classified according to these
main strategies perturbing the information provided by users [Pollat 03, 05, Agrawal 01, Kargupta 03, Huang 05],
using cryptographic techniques [Canny 02, Ahmad 07, Zhan 10], and distributing the information collected [Miller 04, Berkovsky 07]
ratings
central server
[Miller 04]
An Architecture for Privacy Protection in CF-based Recommendation Systems
Overview 11
Profiling is accomplished on the basis of user ratings Information items are classified as known or unknown
Users may wish to submit ratings to unknown items (forgery) and
refrain from rating known items (suppression)
Recommendation System
known items
unknown items
User Profile Model 12
Witty
Buddies
Clever
Fall in Love
Humorous Couple Relations Parents and Children Feel
80 76 71 71 67 62
Good
54
Best Friends Offbeat
51 38 34 25 25 16
Emotional
12 7
7
7 3
3
Human Spirit AX ir IM -No m n Fil ter es W ical us n M dre il Ch ror r Ho asy n nt io y Fa at tar im en An um c Do ery ist M ar W i i-F ce Sc an m Ro e e im ur Cr ent v Ad n tio Ac edy m Co ller ri Th a am Dr
Movielens
Slow Teenage Life
Sincere
Human Nature
Parents and Children Coming of Age
Touching Village Life
Jinni
[Toubiana 10, Fredrikson 11] suggest representing user profiles as
histograms of absolute frequencies We model the profile of a user as a probability mass function (PMF)
User Profile Construction 13
Our architecture requires to estimate the actual profile of a user to help
them decide which items should be rated and which should not Histogram based on the categories provided by the recommender Categorize items by exploring web pages and using the vector space model [Salton 75]
books \ literature & fiction \ genre fiction
…?
Adversarial Model 14
Passive attacker capable of crawling through the items rated by a user The attacker observes the apparent user profile t, a perturbed
version of the actual user profile q
ratings predictions
…
…
q
forgery and suppression NO PROTECTION! of ratings
Recommender
…
…
tq
Privacy Measure 15
We measure privacy as the Shannon entropy of the user’s apparent
profile t
number of categories
H(t) =
n X
ti log2 ti
i=1
Accordingly, privacy is compromised whenever the user’s preferences
are biased towards certain categories of interest
1
2
3
4
minimum privacy
1
2
3
4
maximum privacy
Architecture 16
User side
Known / Unknown Items Classifier
Network side
Category Extractor
!
Information Provider
Forgery Alarm ...
! Suppression Alarm
Communication Manager
x2
Recommendation System
User Profile Constructor
Forgery and Suppression Generator
uncategorized item categorized item known item unknown item rated item
Architecture 17 Block Functionality Communication with the recommender
User and side Network side Description - Starting at the beginning, the book explores how JavaScript originated evolved into what it is today. A detailed discussion of the components that make up a JavaScript implementation follows, with specific focus on standards such as ECMAScript and the Document Object Model (DOM).
Category - books \ computers & internet \ web development Average Customer Review 4.5/5
Retrieve information about the items explored by the user
Known / Unknown Items Classifier
Category Extractor
Description !- Stephen Hawking, one of the most brilliant theoretical physicists in history, wrote the Forgeryof Time to help nonscientists understand the questions being asked by modern classic A Brief History Alarm scientists today.
Information Provider
...
Category - books \ science Average Customer Review 4/5 !
Communication
Suppression Description - Written by soccer great and championship Stanford coach Bobby Clark, this book tells Manager Alarm you how, starting at point zero, an uninitiated coach can meld kids into a team and help them enjoy one of the most rewarding experiences of their youth. x2
Category - books \ sports \ coaching \ soccer Average Customer Review 4.5/5
Recommendation System Description - You’ve made it! Your baby has turned one! Now the real fununcategorized begins. From temper item tantrums to toilet training, raising a toddler brings its own set of challenges and questions —item and Toddler categorized 411 has the answers. Forgery and User Profile known item Category - books \ parenting & families \ parenting Constructor Suppression Generator unknown item Average Customer Review 3/5 rated item
Architecture 18 Block Functionality Obtain categories associated with the items downloaded by the Communication Manager
User side
Known / Unknown Items Classifier
Network side
Category Extractor
!
Information Provider
Forgery Alarm ...
! Suppression Alarm
Communication Manager
x2
Recommendation System
User Profile Constructor
Forgery and Suppression Generator
uncategorized item categorized item known item unknown item rated item
Architecture 19 Block Functionality The user classifies the items as known or unknown
User side
Known / Unknown Items Classifier
Category Extractor
! books \ computers & internet \ web development
Forgery books \ Alarm science
Network side
...
books \ sports \ coaching \ soccer
! Suppression Alarm
books \ parenting & families \ parenting
Information Provider
Communication Manager
x2
Recommendation System
User Profile Constructor
Forgery and Suppression Generator
known items
unknown
uncategorized item categorized item known item unknown item itemsrated item
Architecture 20 Block Functionality Computes the actual user profile
User side
Known / Unknown Items Classifier
Network side
Category Extractor
q !
Information Provider
Forgery Alarm ...
! Suppression Alarm
…
Communication Manager
…
x2
Recommendation System
User Profile Constructor
Forgery and Suppression Generator
uncategorized item categorized item known item unknown item rated item
Architecture 21 Block Functionality Centerpiece of the architecture
User side
The user specifies a forgery rate ½ and a suppression rate ¾
Known / Unknown Items Classifier
Category Extractor
5%
! Suppression Alarm
¾ = 10%
Communication Manager
Information Provider
FORGERY
x2
Forgery
8% Alarm ...
SUPPRESSION
!
2%
Network side
½ = 5% Recommendation System
User Profile Constructor
Forgery and Suppression Generator
uncategorized item categorized item known item unknown item rated item
Architecture 22 Block Functionality Generate an alarm when an item should be suppressed
User side
Known / Unknown Items Classifier
Network side
Category Extractor
!
Information Provider
Forgery Alarm ...
! Suppression Alarm
Communication Manager
science parenting 8%
x2
2%
Recommendation System
User Profile Constructor
Forgery and Suppression Generator
uncategorized item categorized item known item unknown item rated item
Architecture 23 Block Functionality Generate an alarm when an item should be forged
User side
Known / Unknown Items Classifier
Category Extractor
population’s rating
!
Information Provider
Forgery Alarm ...
computers
computers
! Suppression Alarm
Communication Manager
5%
x2
sports
User Profile Constructor
Network side
Forgery and Suppression Generator
Recommendation System uncategorized item categorized item known item unknown item rated item
Formulation of the Optimal Trade-Off between Privacy and Utility
Trade-Off between Privacy and Utility 24
The degradation in the accuracy of predictions is measured as ¾ and ½ We model items as r.v.’s taking on values in a common finite alphabet
of n categories We define
q as the actual user profile ½ 2 [0; 1) as the forgery rate ¾ 2 [0; 1) as the suppression rate
Accordingly, the user’s apparent profile is defined as
q+r¡s 1+½¡¾
X r = (r ; : : : ; r ); r > 0; ri = ½ 1 n i X si = ¾ s = (s1; : : : ; sn); qi > si > 0;
Trade-Off between Privacy and Utility 25
Privacy is measured as the Shannon entropy of the user’s apparent
profile The privacy-forgery-suppression function
P(½; ¾) =
max r;s
P ri >0; P ri =½ qi >si >0; si =¾
µ
q+r¡s H 1+½¡¾
¶
This formulation specifies the key functional block of our architecture, namely the ‘Forgery and Suppression Generator’
Forgery and Suppression Generator
Conclusions
Conclusions 26
The forgery and suppression of ratings arise as two simple
mechanisms in terms of infrastructure, but it comes at the cost of a loss in utility, namely the degradation in the accuracy of the predictions We propose an architecture that implements these two mechanisms in
those CF-based recommendation systems that profile users exclusively from their ratings
The centerpiece of our approach is a module responsible for computing the tuples of forgery r and suppression s
This information is used to warn the user when their privacy is being compromised
It is up to the user to decide whether to forge or eliminate a rating We present a formulation of the optimal trade-off among privacy,
forgery rate and suppression rate
A Privacy-Protecting Architecture for Collaborative Filtering via Forgery and Suppression of Ratings Javier Parra-Arnau, David Rebollo-Monedero and Jordi Forné http://sites.google.com/site/javierparraarnau/ Department of Telematics Engineering Technical University of Catalonia (UPC) Barcelona, Spain Leuven, Belgium
September 15, 2011
39