Handwritten Text Recognition for manuscripts and early printed texts
Achieving Optimal Privacy in Trust-Aware Collaborative Filtering Recommender Systems
1. Achieving Optimal Privacy
in Trust-Aware
Social Recommender
Systems
Nima Dokoohaki, Cihan Kaleli,
Huseyin Polat, Mihhail Matskin
The Second International Conference on Social Informatics
(SocInfo’10)
27-29 October, 2010,
Laxenburg, Austria
2. Emegrence of Trust in
Social Recommender Systems
• Most successful recommenders employ well-known
collaborative filtering (CF) techniques
- Social Recommender Systems (SRS) – Those CF-based
recommenders that use social network as backbone.
• CF automates the word-of-mouth process
- Finding users similar to the user receiving their
recommendation and suggesting her items rated high in
the past by similar taste users
• Shortcoming : sparsity of User-Rating Matrix
- There are always numerous items and the ratings scored
by users are sparse, often the step of finding similar
users fails . Trust is proposed as remedy.
3. Extending Social Recommender
Systems with Trust Metric
• Extend CF recommenders as follows:
- Utilizing a trust metric, which enables a trust-based
heuristic to propagate and find users whom are
trustworthy with respect to active user that we are
gathering/generating recommendations for.
• Trust has shown that it can improve the accuracy of
recommenders. ( Golbeck, Ziegler, Massa, ...)
• Complete list of problems addressed by trust-
recommenders
- Massa, P., & Avesani, P. Trust Metrics in Recommender
Systems. In Computing with Social Trust (pp. 259-285),
2009.
4. Problems with Existing Trust-aware
Recommender
• Privacy and lack of decentralization ...
• Growing concern about the vulnerability to shilling
attacks :
- Current implementations are centralized or not tested in
a decentralized fashion
• Current research has paid least attention to clearly
address the privacy issues surrounding the
architecture and components of trust recommenders.
5. Privacy issues with
Social Recommender Systems
• CF systems including social networks-based ones have
several advantages. However, they fail to protect
users’ privacy.
• … Also, Data collected for CF can be used for
unsolicited marketing, government surveillance,
profiling users, etc.
• Users who remain concerned about their privacy, …
- users might decide to give false data that effect
producing truthful recommendations.
• This in turn leads to decrease in accuracy of
performance of recommender system
6. Motivation and Contributions
• Emphasizing importance of dealing with privacy issues
surrounding the architecture and components of trust-
aware recommender systems.
- Extending Architecture of Trust Recommenders with
Privacy Preserving Module
- Proposing use data perturbation techniques to protect
users’ privacy while still providing accurate
recommendations.
• Dealing with conflict of privacy goals and trust goals
through Agent Mechanisms
- Utilizing Pareto efficiency
9. Privacy Protection Methodology:
Data Normalization with z-score
• Normalization of data is very critical to increase
privacy level.
• To privacy protection, users employ data perturbation
techniques. We propose to use normalized version of
actual ratings to improve the privacy level.
• As a result, z-score values are utilized.
• *z-score of an item indicates how far and in what
direction, that item deviates from its distribution's
mean, expressed in units of its distribution's standard
deviation.
*W. Du and H. Polat. Privacy-preserving collaborative filtering. International
Journal of Electronic Commerce, 9(4):9-36, 2005.
10. Privacy Protection Methodology:
Random Perturbations
• To disguise data, users add random numbers to z-
scores. They select such random numbers from two
different distribution which are Gaussian and uniform
• Since adding random numbers hides ratings of rated
items, users add random ratings to hide unrated
items.
• After disguising their private data, users compute
trust between each other.
11. Private Trust Estimation:
Trust Formalization
• Assume there are two users; ua and ub.
We formalize the trust between them as follows:
14. Mutual Effects of Trust and Privacy:
Notion of Conflict
• privacy and accuracy are conflicting goals
• Conflict
- Trust metrics along at each step of trust estimation
increase or maintain the accuracy of predictions.
- Increasing the amount of perturbations leads to further
information loss.
• Dealing with Conflict through Optimization
- we can argue that an optimal setting can be defined
where privacy and accuracy can be both maintained at
the same time
15. Optimization design space
• PCS (privacy configuration set)
• TCS(trust configuration set)
• Probrem space consists of all possible configurations:
17. Inferring The Optimal Privacy Set
• Heuristic. To infer OPS, following heuristic is used:
1. Perturbing the overall user data using different PCS
settings;
2. Observing the framework under variations of TCS;
• (steps 2 & 3 are interchangeable depending on goals
at hand)
3. Perturbing the sparse user data with PCS inferred
from step 2 allows for inferring OPS and finalizing the
Pareto optimal setting
18. Evaluating the Recommendation
Framework: Dataset
• Two sets of experiments:
- First set demonstrates the effect of insertion of random
data on accuracy of predictions generated as output of
the recommendation system.
- The second set of experiments demonstrates how filling
unrated items with varying f values affect the overall
accuracy of recommender system
• MovieLens dataset,
http://www.grouplens.org/node/73
- 943 user rating profiles, with more than 100000 rating
values. Rating values are on a 5 point scale.
19. Evaluating the Recommendation
Framework: Recommender
• We used Trust Recommender from:
- S. Fazeli, A. Zarghami, N. Dokoohaki, and M. Matskin,
"Elevating Prediction Accuracy in Trust-aware
Collaborative filtering Recommenders through T-index
Metric and TopTrustee lists," the Journal of Emerging
Technologies in Web Intelligence (JETWI), 2010.
• a decentralized trust-aware recommender
- T-index , as a trust metric for filtering trust between
users. Unlike previous approaches,
- a trust network between users can automatically be
built from existing ratings between users.
- a Distributed Hash Table (DHT) like list of trustees,
TopTrusteeList (TTL) [19] that wraps around the items,
which are tasted similarly to those of current user.
20. MAE of recommendation framework,
without adding any perturbations
0.85
0.86
0.87
0.88
0.89
0.9
0.91
0.92
n=2 n=3 n=5 n=10 n=20 n=50
T=0 T=25 T=50 T=100 T=200 T=500 T=1000
Zarghami, A., Fazeli, S., Dokoohaki, N., & Matskin, M. (2009). Social Trust-Aware Recommendation System:
A T-Index Approach. In Web Intelligence and Intelligent Agent Technology, IEEE/WIC/ACM International
Conference on (Vol. 3, pp. 85-90). IEEE Computer Society. doi: 10.1109/WI-IAT.2009.237.
21. MAE with added perturbations to user
data, having Gaussian distribution
0.42
0.92
1.42
1.92
2.42
2.92
β=1 β=2 β=3 β=4
N=2, T=0 N=2, T=100 N=3, T=0 N=3, T=100 N=5, T=0
22. MAE with added perturbations to user
data, having Uniform distribution
0.62
0.82
1.02
1.22
1.42
1.62
δ=1 δ=2 δ=3 δ=4
N=2, T=0 N=2, T=100 N=3, T=0 N=3, T=100 N=5, T=0
24. 0.42
0.92
1.42
1.92
2.42
2.92
β=1 β=2 β=3 β=4
N=3, T=100
0.62
0.82
1.02
1.22
1.42
1.62
δ=1 δ=2 δ=3 δ=4
N=3, T=100
(δ, β) = (1,1)
Compare the results from MAE of
framework under masked data
(n, t) = (3,100)
25. Filling Sparse Data with Random
Gaussian distribution with respect to f
0.74
0.75
0.76
0.77
0.78
0.79
0.8
Half Density Full Density Double Density
26. Fine-tuning the privacy
• Perturb the sparse user data with (δ, β, n, t) inferred
from previous step for fine-tuning the privacy.
• we observe consistent increase in intervals of f which
finalizes the choice of n, t, δ, β
• We finalize the results in
• ordered set of n=3, t=100, δ=1, β=1 and f = [0, d]
Will be the Pareto front.
27. Inferring Optimality Set:
Comparison with Non-Masked Results
• Optimality holds under Masked Data.
• Comparison with MAE results of non-masked
framework with framework under masked results:
- we inferred the optimum values for β=1, n=3 and t=100
and for these parameters MAE= 0.7994, while for
similar parameters without adding perturbations we
achieve MAE=0.881.
• MAE results are still less than results of MAE without
adding perturbations.
- we achieve the best results with MAE= 0.863 for
(n,t)=(50,100) and this value is still greater than our
optimum value.
28. MAE results are still better than
results of MAE without adding
perturbations.
29. Conclusions
• A framework for addressing the problem of privacy in
trust recommenders is proposed,
• Conflicting goals of privacy and accuracy,
• Through experiments we showed that we can infer
such setting that holds even when trust recommender
is not under privacy measures,
• As a result privacy can be introduced in trust
recommenders and can be optimized to avoid private
data loss and at the same time produce accurate
recommendations
Shilling Attacks: An underhanded and cheap way to increase recommendation frequency is to manipulate or trick the system into doing so. This can be done by having a group of users (human or agent) use the recommender system and provide specially crafted opinions” that cause it to make the desired recommendation more often. For example, it has been shown that a number of book reviews published on Amazon.com are actually written by the author of the book being reviewed. A consumer trying to decide which book to purchase could be misled by such reviews into believing that the book is better than it really is. This is known as shilling attack and recommender systems should protect against these attacks
Here we can talk about the proposed method briefly. Normalization of data is very critical step to increase privacy level. normalization is the process of isolating statistical error in repeated measured data. A normalization is sometimes based on a property
We utilize z-score transformation for normalizing data. Since z-score values have zero mean, we can hide their value by adding random numbers from a distribution with zero mean and predefined standard deviation. As a result, users will all make computations with their z-scores instead of their actual ratings
To protect the private data, the level of perturbation is vital. If the amount is too low, the masked data still discloses considerable amounts of information; if it is too high, accuracy will be very low
we take into account the configurations that affect the privacy mechanism at one hand, and take into account the configurations affecting trust in another hand,
we can argue that an optimal setting can be defined where privacy and accuracy can be both maintained at the same time we can argue that an optimal setting can be defined where privacy and accuracy can be both maintained at the same time
the goal is achieving acceptable accuracy and respective privacy at the same time then optimization problem becomes multi-objective. As a result, problem of achieving a trade-off between accuracy and privacy in the current context becomes a Pareto optimization problem.
Other methods are hypothetical optimization approaches such as Skyline and Maximal Vector Problem.
We perturb the overall user data using Gaussian and Uniform distributions (δ, β),
(δ, β) = (1,1) yield best results as it exhibits the minimal privacy loss.
Compare the results from MAE of framework under masked data,
set of (n, t) = (3,100), being fixed on (δ, β) = (1,1), yields reasonable accuracy, while privacy is maintained
So we utilize of different intervals of f with the system being fixed on (δ, β, n, t) configuration from previous step.
Through observation of consistent accuracy of different f intervals, we can fine-tune the configuration from previous step and infer an optimum privacy configuration.
Taking into account the results (Fig. 4),
we observe consistent increase in intervals of f which finalizes the choice of n, t, δ, β and finalizes the results in ordered set of n=3, t=100, δ=1, β=1 and f = [0, d] supporting both accurate and private recommendations: Considering the existing range of configurations
a framework for enabling privacy-preserving trust-aware recommendation generation.
a Pareto set can be always be found which can make a trade-off between
presented a heuristic that experimentally infers this set.
We also showed that privacy increases under proposed framework, while even optimal privacy of our framework is better than the best performance of base framework in its best configurations. As a result privacy can be introduced in trust recommenders and can be optimized to avoid private data loss and at the same time produce accurate recommendations