SlideShare una empresa de Scribd logo
1 de 34
Towards Statistical Queries over
Distributed Private User Data
R.Chen, A.Reznichenko, P.Francis – MPI-SWS, Germany
J.Gehrke – Cornell University, USA
Serafeim Chatzopoulos
M1258
schatz@di.uoa.gr
MDE519 – Distributed Systems
Instructor: Mema Roussopoulou
May 31,
2013
User Privacy
Towards Statistical Queries over Distributed Private User Data 2
 User Data is exposed to organizations in many
ways.
 Users are aware of their data being exposed.
 Make a purchase in an online store.
 Update a profile on a social network.
 Users are unaware of their data exposure.
 Third party trackers.
 Smart phone Apps.
The “user-owned and operated” principle
Towards Statistical Queries over Distributed Private User Data 3
 Personal data should be stored in a local host or a
cloud device under the user‟s control and is released
in a controlled, limited or noisy fashion.
Users must have the exclusive control of
their own data and must be able to share
data selectively or voluntarily.
Motivation and Problem
Towards Statistical Queries over Distributed Private User Data 4
 Distributed private user data is important.
 Analyst could use such data to
 understand users‟ behaviors
 discover their statistic patterns
 evaluate proposed enhancements.
 How to make statistical queries over such distributed
private user data while still preserving privacy?
Related Work
Towards Statistical Queries over Distributed Private User Data 5
 Anonymization
Removes well-known personally identifiable
information(PPI).
 Randomization
Adds random distortion values to user data.
 k-anonymity, l-diversity, t-closeness
 Differential Privacy
Differential Privacy
Towards Statistical Queries over Distributed Private User Data 6
 Differential privacy adds noise to the output of a
computation (i.e., answer of query).
 Hides the presence or absence of a record in the
dataset.
 Makes no assumption about the adversary.
Some form of distributed differential privacy is
required…
Prior Distributed Differential Privacy Designs
Towards Statistical Queries over Distributed Private User Data 7
 First design has a per-user computational load of
O(U).
Dwork et al. EUROCRYPT ‟06
 Poor scalability
 Following designs reduce per-user computational
load to O(1) by using expensive secret sharing
protocols.
Rastogi and Nath, SIGMOD ‟10 – Shi et al. NDSS ‟11
 Not tolerate churn
 Recent designs introduce two honest-but-curious
servers to collaboratively compute the query result.
Gotz and Nath, MSR-TR ‟11
 Even a single malicious user can substantially distort
the query result.
Practical Distributed Differential Privacy System
(PDDP)
Towards Statistical Queries over Distributed Private User Data 8
 Goals:
 The differential private guarantee is always maintained for
every honest client.
 Puts tight bound to the extent to which a malicious user
can distort query results.
 The maximum absolute distortion in the final result is bounded
by the number of malicious users.
 Operates at a large scale.
 Millions of users.
 Tolerates churn.
 Not prevent results from being produced.
PDDP Components
Towards Statistical Queries over Distributed Private User Data 9
 Analyst
 Makes queries to the system
and collects answers.
 Proxy
 Adds differential private noise
to client‟s answers to preserve
privacy
 Clients
 Locally maintain their own data
and answer queries.
Security Assumptions (1/2)
Towards Statistical Queries over Distributed Private User Data 10
 General Assumptions
 Clients have the correct public keys for analyst and the
proxy.
 Analyst and the proxy have the correct public keys for
each other.
 Corresponding private keys are kept secure.
 Analyst is potentially malicious (violating users‟
privacy)
 Collude with other analysts.
 Pretend to be multiple distinct analysts.
 Take control of clients and use PDDP protocol to reveal
info.
 Publish its collected answers.
 Intercept and modify all messages.
Security Assumptions (2/2)
Towards Statistical Queries over Distributed Private User Data 11
 Proxy is honest but curious (HbC)
 Follows the specified protocol.
 Tries to exploit additional info that can be learned in so
doing.
 Does not collude with other components.
 Clients are potentially malicious (distorting the
statistical results learned by analysts)
 Have churn characteristics.
 Limited resources for computation and data transmission.
 Generate false or illegitimate answers.
 Act as Sybils.
PDDP Key insights – Binary answer
Towards Statistical Queries over Distributed Private User Data 12
 How to limit query result distortion?
 Split answer‟s value into buckets.
 Enforce a binary answer in each bucket.
 Goldwasser-Micali (GM) bit-cryptosystem.
Example:
Query: “SELECT age FROM info WHERE gender=„m‟”
 4 buckets: 0~12, 13~20, 21~59, and ≥60.
 Answers: „1‟ or „0‟ per bucket
 Malicious clients cannot substantially distort the query
result.
PDDP Key insights – Blind noise
Towards Statistical Queries over Distributed Private User Data 13
 How to achieve differential privacy ?
 Honest-but-curious proxy
Generates additional binary answers in each bucket as
differentially private noise.
 If analyst publishes the final noisy result
 proxy knows the noise added
 can subtract noise from the publish result to get a noisy-free
result.
 Solution: Proxy can only blindly add noise!
 Proxy knows that the added noise is enough to achieve
differential privacy
 Proxy does not know the exact noise added.
PDDP Workflow – Step 1
Towards Statistical Queries over Distributed Private User Data 14
 Query Initialization
Analyst first issues
a query to the
Proxy.
 Message consists of 4 items:
 Query: SELECT age FROM info WHERE gender=„m‟
 Buckets: 0∼12, 13∼20, 21∼59 and ≥60.
 # clients queried (c): 1000
 DP parameter (ε): 1.0
 Controls tradeoff between accuracy of computation and strength of
its privacy guarantee.
PDDP Workflow – Step 2
Towards Statistical Queries over Distributed Private User Data 15
 Query Forwarding
Select clients and
send them the
query.
 Proxy:
 rejects the query if c is too low or too high.
 rejects the query if ε exceeds the max privacy level allowed.
 selects c unique clients and send them the query, under the one
of the following policies:
 Select c clients randomly and wait for them to connect.
 Select the first c clients that connect.
PDDP Workflow – Step 3 (1/2)
Towards Statistical Queries over Distributed Private User Data 16
 Client Response
Clients execute
the query and
send answers.
 Client executes query over its local data and produces
answer:
 „1‟ or „0‟ per bucket.
 More than one bucket may contain a „1‟.
 Per-bucket answer value is individually encrypted with the
analyst‟s public key. (GM cryptosystem)
PDDP Workflow – Step 3 (2/2)
Towards Statistical Queries over Distributed Private User Data 17
 Goldwasser-Micali (GM) cryptosystem
 Single-bit cryptosystem
 Enforces binary answer in each bucket.
 Very Efficient
 XOR – homomorphic
 E(a) * E(b) = E(a XOR b)
PDDP Workflow – Step 4
Towards Statistical Queries over Distributed Private User Data 18
 Blind noise
addition
 The proxy maintains a pool of additional binary
answers called coins and adds them as noise to
each bucket.
 Coins must be unbiased.
 Coins are encrypted with the analyst‟s public key.
 In each bucket must be added n coins:
How to generate coins blindly?
Coin pool generation
Towards Statistical Queries over Distributed Private User Data 19
 Straightforward approaches
 Proxy generates coins
 Curious proxy could know noise-free result
 Clients generate coins
 Malicious clients could generate biased coins
Collaborative coin generation
Towards Statistical Queries over Distributed Private User Data 20
 Paper‟s approach
 Each online client periodically generates an encrypted
unbiased coin E(oc) and sends it to the proxy
 The proxy receives the coin and verifies the legitimacy of the
coin.
 The proxy blindly re-flips the coin E(oc) by multiplying it with a
proxy‟s locally generated unbiased coin E(op) plus a modulo
operation.
E(oc) * E(op) mod m = E(oc XOR op),
where m is part of the analyst’s public key
 The proxy stores the unbiased coin in the locally maintained
pool.
 Proxy doesn‟t know the actual value of the generated unbiased
coin.
PDDP Workflow – Step 5
Towards Statistical Queries over Distributed Private User Data 21
 Noisy answers to
analyst
 Each bucket has clients answers + coins (noise)
 After random delay the proxy shuffles the c + n values.
 Prevents identification of a client based on the vector of „1‟ and „0‟ in its answer.
 Finally, analyst
 decrypts with its private key all encrypted binary values.
 sums the plaintext values obtained.
 obtains the noisy answer for the clients that fall within each bucket.
Practical Considerations (1/2)
Towards Statistical Queries over Distributed Private User Data 22
 Utility of aggregate result
 Depends on the amount of added noise.
 The n coins added by the proxy and the analyst‟s adjustment on
the means of n/2 form a binomial distribution (approximation of
the normal distribution N(0, n/4) ).
 Example :
c =106 , ε = 1.0
Given normal distribution in each bucket
 68% probability that the noisy answer is 15.24 away from the true answer
 95% probability that the noisy answer is 30.48 away from the true answer
 99.7% probability that the noisy answer is 45.72 away from the true answer
Practical Considerations (2/2)
Towards Statistical Queries over Distributed Private User Data 23
 Non-numeric Queries
 Map query into a numeric query.
Example:
“Which website do you visit most often?”
Map each website the analyst wishes to learn into a numeric
value.
Large number of buckets – limit the answer to 5000 buckets.
 Sybils
 Design susceptible to Sybil attacks (single client can
masquerades multiple clients).
 Proxy can limit the number of clients selected at a single IP
address for a given query.
Implementation and Deployment (1/2)
Towards Statistical Queries over Distributed Private User Data 24
 Client
 Firefox add-on
 9600 lines of Java code
 Information is stored in local SQLite storage
 Web browsing activities
 Certain online shopping activities
 Certain ad interactions
 Can be extended to capture any online activity
 Every 5 min connects to the proxy to retrieve pending queries,
return answers and periodically generated coins.
Implementation and Deployment (2/2)
Towards Statistical Queries over Distributed Private User Data 25
 Proxy
 Web service on Tomcat 6.0.33
 3600 lines of code
 Proxy state in MySQL database.
 Analyst
 800 lines of code
Deployment
 Correctness verified on a set of local machines.
 600+ real clients
Comparison: “Paillier-based” design
Towards Statistical Queries over Distributed Private User Data 26
 Honest-but-Curious Proxy
 Paillier Cryptosystem
 Additive homomorphism
 Proxy can directly sum up all clients‟ encrypted binary
answers to get the encrypted sum of each bucket.
 A single malicious client can distort substantially the
result
 Use of zero-knowledge-proofs (ZKP) to ensure that
encrypted answers are „1‟ or „0‟.
 Proxy knows exactly how much noise has been
added.
Evaluation (1/5)
Towards Statistical Queries over Distributed Private User Data 27
 Client Performance
 Clients encrypt a binary value for each bucket.
 GM cryptosystem
 Paillier cryptosystem
Evaluation (2/5)
Towards Statistical Queries over Distributed Private User Data 28
 Proxy - Analyst Performance
 Proxy
PDDP
 One encryption and one homomorphic XOR for one unbiased coin.
 Jacobi symbol checking on received coins and answer values
(faster than a decryption).
Paillier-based
 One ZKP for each client answer in each bucket.
 Homomorphically sum up all clients answers per bucket.
 Add noise to each per-bucket total sum.
Evaluation (3/5)
Towards Statistical Queries over Distributed Private User Data 29
 Proxy - Analyst Performance
 Analyst
PDDP
 Decrypt all encrypted values in each bucket.
Paillier-based
 Decrypt one encrypted value in each bucket
Evaluation (4/5)
Towards Statistical Queries over Distributed Private User Data 30
 Bandwidth overhead
 In both systems, a client transmits an encrypted answer to
each bucket.
 In PDDP, a client transmits periodically generated coin to the
proxy.
 In Paillier-based, a client transmits a ZKP for each bucket.
 Storage overhead
 In PDDP, the proxy stores all clients‟ answer values for each
bucket plus the required number of coins.
 In Paillier-based, proxy stores only one answer value per
bucket.
Evaluation (5/5)
Towards Statistical Queries over Distributed Private User Data 31
 Querying the client deployment
 Parameters
c = 250 (out of 600 clients)
ε = 5.0
clients are selected as they connect until 250 unique clients are queried or
24-hours expire.
These parameters result in 16 coins per bucket.
 Ensure that a per bucket aggregate answer is within plus or minus 2, 4, 6
of the noisy-free answer with a probability of 68%, 95% and 99,7%
Future Work
Towards Statistical Queries over Distributed Private User Data 32
 Support of statistical learning algorithms
 Scalability of non-numeric queries
 Bloom filters – map a large number of possible answers in
a small number of buckets.
 Gather statistical data for a large-scale experiment.
 Weaken proxy trust requirements.
 Use of trusted hardware (TPM)
 General: measure the actual privacy loss for
differential privacy.
Conclusion
Towards Statistical Queries over Distributed Private User Data 33
 PDDP: Practical Distributed Differential Private
System
 Scales well
 Tolerates churn
 Places tight bound on malicious user‟s capability.
 Key insights
 Binary answer in each bucket
 Blind noise addition
Towards Statistical Queries over Distributed Private User Data 34
Questions?

Más contenido relacionado

La actualidad más candente

Cis 333 Education Organization / snaptutorial.com
Cis 333   Education Organization / snaptutorial.comCis 333   Education Organization / snaptutorial.com
Cis 333 Education Organization / snaptutorial.comBaileya82
 
Comprehensive Study of Counter-acting Security Threats in Mobile Ad Hoc Networks
Comprehensive Study of Counter-acting Security Threats in Mobile Ad Hoc NetworksComprehensive Study of Counter-acting Security Threats in Mobile Ad Hoc Networks
Comprehensive Study of Counter-acting Security Threats in Mobile Ad Hoc Networksdrsrinivasanvenkataramani
 
International Journal of Wireless & Mobile Networks (IJWMN)
International Journal of Wireless & Mobile Networks (IJWMN) International Journal of Wireless & Mobile Networks (IJWMN)
International Journal of Wireless & Mobile Networks (IJWMN) ijwmn
 
A Reliable Peer-to-Peer Platform for Adding New Node Using Trust Based Model
A Reliable Peer-to-Peer Platform for Adding New Node Using Trust Based Model    A Reliable Peer-to-Peer Platform for Adding New Node Using Trust Based Model
A Reliable Peer-to-Peer Platform for Adding New Node Using Trust Based Model IJECEIAES
 
An Efficient Secured And Inspection of Malicious Node Using Double Encryption...
An Efficient Secured And Inspection of Malicious Node Using Double Encryption...An Efficient Secured And Inspection of Malicious Node Using Double Encryption...
An Efficient Secured And Inspection of Malicious Node Using Double Encryption...IRJET Journal
 
Whitepaper- User Behavior-Based Anomaly Detection for Cyber Network Security
Whitepaper- User Behavior-Based Anomaly Detection for Cyber Network SecurityWhitepaper- User Behavior-Based Anomaly Detection for Cyber Network Security
Whitepaper- User Behavior-Based Anomaly Detection for Cyber Network SecurityHappiest Minds Technologies
 
Classification of Malware Attacks Using Machine Learning In Decision Tree
Classification of Malware Attacks Using Machine Learning In Decision TreeClassification of Malware Attacks Using Machine Learning In Decision Tree
Classification of Malware Attacks Using Machine Learning In Decision TreeCSCJournals
 
Building AI with Security and Privacy in mind
Building AI with Security and Privacy in mindBuilding AI with Security and Privacy in mind
Building AI with Security and Privacy in mindgeetachauhan
 
Securing Personal Information in Data Mining
Securing Personal Information in Data MiningSecuring Personal Information in Data Mining
Securing Personal Information in Data MiningIJMER
 
Survey of Security Threats and Protection Techniques in Mobile Ad Hoc Networks
Survey of Security Threats and Protection Techniques in Mobile Ad Hoc NetworksSurvey of Security Threats and Protection Techniques in Mobile Ad Hoc Networks
Survey of Security Threats and Protection Techniques in Mobile Ad Hoc Networksdrsrinivasanvenkataramani
 
Do s and d dos attacks at osi layers
Do s and d dos attacks at osi layersDo s and d dos attacks at osi layers
Do s and d dos attacks at osi layersHadeel Sadiq Obaid
 
An efficient distributed trust model for wireless sensor networks
An efficient distributed trust model for wireless sensor networksAn efficient distributed trust model for wireless sensor networks
An efficient distributed trust model for wireless sensor networksIISTech2015
 
Individual Project - Final Report
Individual Project - Final ReportIndividual Project - Final Report
Individual Project - Final ReportSteven Hooper
 
Evaluation of network intrusion detection using markov chain
Evaluation of network intrusion detection using markov chainEvaluation of network intrusion detection using markov chain
Evaluation of network intrusion detection using markov chainIJCI JOURNAL
 
An efficient distributed trust model for wireless sensor networks
An efficient distributed trust model for wireless sensor networksAn efficient distributed trust model for wireless sensor networks
An efficient distributed trust model for wireless sensor networksPvrtechnologies Nellore
 
INFRINGEMENT PRECLUSION SYSTEM VIA SADEC: STEALTHY ATTACK DETECTION AND COUNT...
INFRINGEMENT PRECLUSION SYSTEM VIA SADEC: STEALTHY ATTACK DETECTION AND COUNT...INFRINGEMENT PRECLUSION SYSTEM VIA SADEC: STEALTHY ATTACK DETECTION AND COUNT...
INFRINGEMENT PRECLUSION SYSTEM VIA SADEC: STEALTHY ATTACK DETECTION AND COUNT...ijp2p
 

La actualidad más candente (16)

Cis 333 Education Organization / snaptutorial.com
Cis 333   Education Organization / snaptutorial.comCis 333   Education Organization / snaptutorial.com
Cis 333 Education Organization / snaptutorial.com
 
Comprehensive Study of Counter-acting Security Threats in Mobile Ad Hoc Networks
Comprehensive Study of Counter-acting Security Threats in Mobile Ad Hoc NetworksComprehensive Study of Counter-acting Security Threats in Mobile Ad Hoc Networks
Comprehensive Study of Counter-acting Security Threats in Mobile Ad Hoc Networks
 
International Journal of Wireless & Mobile Networks (IJWMN)
International Journal of Wireless & Mobile Networks (IJWMN) International Journal of Wireless & Mobile Networks (IJWMN)
International Journal of Wireless & Mobile Networks (IJWMN)
 
A Reliable Peer-to-Peer Platform for Adding New Node Using Trust Based Model
A Reliable Peer-to-Peer Platform for Adding New Node Using Trust Based Model    A Reliable Peer-to-Peer Platform for Adding New Node Using Trust Based Model
A Reliable Peer-to-Peer Platform for Adding New Node Using Trust Based Model
 
An Efficient Secured And Inspection of Malicious Node Using Double Encryption...
An Efficient Secured And Inspection of Malicious Node Using Double Encryption...An Efficient Secured And Inspection of Malicious Node Using Double Encryption...
An Efficient Secured And Inspection of Malicious Node Using Double Encryption...
 
Whitepaper- User Behavior-Based Anomaly Detection for Cyber Network Security
Whitepaper- User Behavior-Based Anomaly Detection for Cyber Network SecurityWhitepaper- User Behavior-Based Anomaly Detection for Cyber Network Security
Whitepaper- User Behavior-Based Anomaly Detection for Cyber Network Security
 
Classification of Malware Attacks Using Machine Learning In Decision Tree
Classification of Malware Attacks Using Machine Learning In Decision TreeClassification of Malware Attacks Using Machine Learning In Decision Tree
Classification of Malware Attacks Using Machine Learning In Decision Tree
 
Building AI with Security and Privacy in mind
Building AI with Security and Privacy in mindBuilding AI with Security and Privacy in mind
Building AI with Security and Privacy in mind
 
Securing Personal Information in Data Mining
Securing Personal Information in Data MiningSecuring Personal Information in Data Mining
Securing Personal Information in Data Mining
 
Survey of Security Threats and Protection Techniques in Mobile Ad Hoc Networks
Survey of Security Threats and Protection Techniques in Mobile Ad Hoc NetworksSurvey of Security Threats and Protection Techniques in Mobile Ad Hoc Networks
Survey of Security Threats and Protection Techniques in Mobile Ad Hoc Networks
 
Do s and d dos attacks at osi layers
Do s and d dos attacks at osi layersDo s and d dos attacks at osi layers
Do s and d dos attacks at osi layers
 
An efficient distributed trust model for wireless sensor networks
An efficient distributed trust model for wireless sensor networksAn efficient distributed trust model for wireless sensor networks
An efficient distributed trust model for wireless sensor networks
 
Individual Project - Final Report
Individual Project - Final ReportIndividual Project - Final Report
Individual Project - Final Report
 
Evaluation of network intrusion detection using markov chain
Evaluation of network intrusion detection using markov chainEvaluation of network intrusion detection using markov chain
Evaluation of network intrusion detection using markov chain
 
An efficient distributed trust model for wireless sensor networks
An efficient distributed trust model for wireless sensor networksAn efficient distributed trust model for wireless sensor networks
An efficient distributed trust model for wireless sensor networks
 
INFRINGEMENT PRECLUSION SYSTEM VIA SADEC: STEALTHY ATTACK DETECTION AND COUNT...
INFRINGEMENT PRECLUSION SYSTEM VIA SADEC: STEALTHY ATTACK DETECTION AND COUNT...INFRINGEMENT PRECLUSION SYSTEM VIA SADEC: STEALTHY ATTACK DETECTION AND COUNT...
INFRINGEMENT PRECLUSION SYSTEM VIA SADEC: STEALTHY ATTACK DETECTION AND COUNT...
 

Similar a Towards Statistical Queries over Distributed Private User Data

Collusion tolerable privacy-preserving sum
Collusion tolerable privacy-preserving sumCollusion tolerable privacy-preserving sum
Collusion tolerable privacy-preserving sumnexgentech15
 
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...Nexgen Technology
 
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...nexgentechnology
 
Collusion tolerable privacy-preserving sum
Collusion tolerable privacy-preserving sumCollusion tolerable privacy-preserving sum
Collusion tolerable privacy-preserving sumNexgen Technology
 
Privacy-preserving Information Sharing: Tools and Applications
Privacy-preserving Information Sharing: Tools and ApplicationsPrivacy-preserving Information Sharing: Tools and Applications
Privacy-preserving Information Sharing: Tools and ApplicationsEmiliano De Cristofaro
 
IEEE 2014 JAVA DATA MINING PROJECTS M privacy for collaborative data publishing
IEEE 2014 JAVA DATA MINING PROJECTS M privacy for collaborative data publishingIEEE 2014 JAVA DATA MINING PROJECTS M privacy for collaborative data publishing
IEEE 2014 JAVA DATA MINING PROJECTS M privacy for collaborative data publishingIEEEFINALYEARSTUDENTPROJECTS
 
2014 IEEE JAVA DATA MINING PROJECT M privacy for collaborative data publishing
2014 IEEE JAVA DATA MINING PROJECT M privacy for collaborative data publishing2014 IEEE JAVA DATA MINING PROJECT M privacy for collaborative data publishing
2014 IEEE JAVA DATA MINING PROJECT M privacy for collaborative data publishingIEEEMEMTECHSTUDENTSPROJECTS
 
Privacy preserving computing and secure multi party computation
Privacy preserving computing and secure multi party computationPrivacy preserving computing and secure multi party computation
Privacy preserving computing and secure multi party computationUlf Mattsson
 
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Krishnaram Kenthapadi
 
Building AI with Security Privacy in Mind
Building AI with Security Privacy in MindBuilding AI with Security Privacy in Mind
Building AI with Security Privacy in Mindgeetachauhan
 
Technologies in Support of Big Data Ethics
Technologies in Support of Big Data EthicsTechnologies in Support of Big Data Ethics
Technologies in Support of Big Data EthicsMark Underwood
 
In Processes We Trust: Privacy and Trust in Business Processes
In Processes We Trust: Privacy and Trust in Business ProcessesIn Processes We Trust: Privacy and Trust in Business Processes
In Processes We Trust: Privacy and Trust in Business ProcessesMarlon Dumas
 
IRJET- Portable Biometric E-Voting System
IRJET- Portable Biometric E-Voting SystemIRJET- Portable Biometric E-Voting System
IRJET- Portable Biometric E-Voting SystemIRJET Journal
 
PUBLIC INTEGRIYT AUDITING FOR SHARED DYNAMIC DATA STORAGE UNDER ONTIME GENERA...
PUBLIC INTEGRIYT AUDITING FOR SHARED DYNAMIC DATA STORAGE UNDER ONTIME GENERA...PUBLIC INTEGRIYT AUDITING FOR SHARED DYNAMIC DATA STORAGE UNDER ONTIME GENERA...
PUBLIC INTEGRIYT AUDITING FOR SHARED DYNAMIC DATA STORAGE UNDER ONTIME GENERA...paperpublications3
 
Privacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposurePrivacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposurePvrtechnologies Nellore
 
Anonymity based privacy-preserving data
Anonymity based privacy-preserving dataAnonymity based privacy-preserving data
Anonymity based privacy-preserving dataKamal Spring
 
Privacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposurePrivacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposureredpel dot com
 
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...CREST @ University of Adelaide
 
JAVA 2013 IEEE NETWORKSECURITY PROJECT Utility privacy tradeoff in databases ...
JAVA 2013 IEEE NETWORKSECURITY PROJECT Utility privacy tradeoff in databases ...JAVA 2013 IEEE NETWORKSECURITY PROJECT Utility privacy tradeoff in databases ...
JAVA 2013 IEEE NETWORKSECURITY PROJECT Utility privacy tradeoff in databases ...IEEEGLOBALSOFTTECHNOLOGIES
 

Similar a Towards Statistical Queries over Distributed Private User Data (20)

Collusion tolerable privacy-preserving sum
Collusion tolerable privacy-preserving sumCollusion tolerable privacy-preserving sum
Collusion tolerable privacy-preserving sum
 
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...
 
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...
COLLUSION-TOLERABLE PRIVACY-PRESERVING SUM AND PRODUCT CALCULATION WITHOUT SE...
 
Collusion tolerable privacy-preserving sum
Collusion tolerable privacy-preserving sumCollusion tolerable privacy-preserving sum
Collusion tolerable privacy-preserving sum
 
Privacy-preserving Information Sharing: Tools and Applications
Privacy-preserving Information Sharing: Tools and ApplicationsPrivacy-preserving Information Sharing: Tools and Applications
Privacy-preserving Information Sharing: Tools and Applications
 
IEEE 2014 JAVA DATA MINING PROJECTS M privacy for collaborative data publishing
IEEE 2014 JAVA DATA MINING PROJECTS M privacy for collaborative data publishingIEEE 2014 JAVA DATA MINING PROJECTS M privacy for collaborative data publishing
IEEE 2014 JAVA DATA MINING PROJECTS M privacy for collaborative data publishing
 
2014 IEEE JAVA DATA MINING PROJECT M privacy for collaborative data publishing
2014 IEEE JAVA DATA MINING PROJECT M privacy for collaborative data publishing2014 IEEE JAVA DATA MINING PROJECT M privacy for collaborative data publishing
2014 IEEE JAVA DATA MINING PROJECT M privacy for collaborative data publishing
 
Privacy preserving computing and secure multi party computation
Privacy preserving computing and secure multi party computationPrivacy preserving computing and secure multi party computation
Privacy preserving computing and secure multi party computation
 
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...
 
Building AI with Security Privacy in Mind
Building AI with Security Privacy in MindBuilding AI with Security Privacy in Mind
Building AI with Security Privacy in Mind
 
Technologies in Support of Big Data Ethics
Technologies in Support of Big Data EthicsTechnologies in Support of Big Data Ethics
Technologies in Support of Big Data Ethics
 
In Processes We Trust: Privacy and Trust in Business Processes
In Processes We Trust: Privacy and Trust in Business ProcessesIn Processes We Trust: Privacy and Trust in Business Processes
In Processes We Trust: Privacy and Trust in Business Processes
 
IRJET- Portable Biometric E-Voting System
IRJET- Portable Biometric E-Voting SystemIRJET- Portable Biometric E-Voting System
IRJET- Portable Biometric E-Voting System
 
PUBLIC INTEGRIYT AUDITING FOR SHARED DYNAMIC DATA STORAGE UNDER ONTIME GENERA...
PUBLIC INTEGRIYT AUDITING FOR SHARED DYNAMIC DATA STORAGE UNDER ONTIME GENERA...PUBLIC INTEGRIYT AUDITING FOR SHARED DYNAMIC DATA STORAGE UNDER ONTIME GENERA...
PUBLIC INTEGRIYT AUDITING FOR SHARED DYNAMIC DATA STORAGE UNDER ONTIME GENERA...
 
Privacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposurePrivacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposure
 
Differential privacy and ml
Differential privacy and mlDifferential privacy and ml
Differential privacy and ml
 
Anonymity based privacy-preserving data
Anonymity based privacy-preserving dataAnonymity based privacy-preserving data
Anonymity based privacy-preserving data
 
Privacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposurePrivacy preserving detection of sensitive data exposure
Privacy preserving detection of sensitive data exposure
 
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
Privacy Engineering: Enabling Mobility of Mental Health Services with Data Pr...
 
JAVA 2013 IEEE NETWORKSECURITY PROJECT Utility privacy tradeoff in databases ...
JAVA 2013 IEEE NETWORKSECURITY PROJECT Utility privacy tradeoff in databases ...JAVA 2013 IEEE NETWORKSECURITY PROJECT Utility privacy tradeoff in databases ...
JAVA 2013 IEEE NETWORKSECURITY PROJECT Utility privacy tradeoff in databases ...
 

Último

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Último (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Towards Statistical Queries over Distributed Private User Data

  • 1. Towards Statistical Queries over Distributed Private User Data R.Chen, A.Reznichenko, P.Francis – MPI-SWS, Germany J.Gehrke – Cornell University, USA Serafeim Chatzopoulos M1258 schatz@di.uoa.gr MDE519 – Distributed Systems Instructor: Mema Roussopoulou May 31, 2013
  • 2. User Privacy Towards Statistical Queries over Distributed Private User Data 2  User Data is exposed to organizations in many ways.  Users are aware of their data being exposed.  Make a purchase in an online store.  Update a profile on a social network.  Users are unaware of their data exposure.  Third party trackers.  Smart phone Apps.
  • 3. The “user-owned and operated” principle Towards Statistical Queries over Distributed Private User Data 3  Personal data should be stored in a local host or a cloud device under the user‟s control and is released in a controlled, limited or noisy fashion. Users must have the exclusive control of their own data and must be able to share data selectively or voluntarily.
  • 4. Motivation and Problem Towards Statistical Queries over Distributed Private User Data 4  Distributed private user data is important.  Analyst could use such data to  understand users‟ behaviors  discover their statistic patterns  evaluate proposed enhancements.  How to make statistical queries over such distributed private user data while still preserving privacy?
  • 5. Related Work Towards Statistical Queries over Distributed Private User Data 5  Anonymization Removes well-known personally identifiable information(PPI).  Randomization Adds random distortion values to user data.  k-anonymity, l-diversity, t-closeness  Differential Privacy
  • 6. Differential Privacy Towards Statistical Queries over Distributed Private User Data 6  Differential privacy adds noise to the output of a computation (i.e., answer of query).  Hides the presence or absence of a record in the dataset.  Makes no assumption about the adversary. Some form of distributed differential privacy is required…
  • 7. Prior Distributed Differential Privacy Designs Towards Statistical Queries over Distributed Private User Data 7  First design has a per-user computational load of O(U). Dwork et al. EUROCRYPT ‟06  Poor scalability  Following designs reduce per-user computational load to O(1) by using expensive secret sharing protocols. Rastogi and Nath, SIGMOD ‟10 – Shi et al. NDSS ‟11  Not tolerate churn  Recent designs introduce two honest-but-curious servers to collaboratively compute the query result. Gotz and Nath, MSR-TR ‟11  Even a single malicious user can substantially distort the query result.
  • 8. Practical Distributed Differential Privacy System (PDDP) Towards Statistical Queries over Distributed Private User Data 8  Goals:  The differential private guarantee is always maintained for every honest client.  Puts tight bound to the extent to which a malicious user can distort query results.  The maximum absolute distortion in the final result is bounded by the number of malicious users.  Operates at a large scale.  Millions of users.  Tolerates churn.  Not prevent results from being produced.
  • 9. PDDP Components Towards Statistical Queries over Distributed Private User Data 9  Analyst  Makes queries to the system and collects answers.  Proxy  Adds differential private noise to client‟s answers to preserve privacy  Clients  Locally maintain their own data and answer queries.
  • 10. Security Assumptions (1/2) Towards Statistical Queries over Distributed Private User Data 10  General Assumptions  Clients have the correct public keys for analyst and the proxy.  Analyst and the proxy have the correct public keys for each other.  Corresponding private keys are kept secure.  Analyst is potentially malicious (violating users‟ privacy)  Collude with other analysts.  Pretend to be multiple distinct analysts.  Take control of clients and use PDDP protocol to reveal info.  Publish its collected answers.  Intercept and modify all messages.
  • 11. Security Assumptions (2/2) Towards Statistical Queries over Distributed Private User Data 11  Proxy is honest but curious (HbC)  Follows the specified protocol.  Tries to exploit additional info that can be learned in so doing.  Does not collude with other components.  Clients are potentially malicious (distorting the statistical results learned by analysts)  Have churn characteristics.  Limited resources for computation and data transmission.  Generate false or illegitimate answers.  Act as Sybils.
  • 12. PDDP Key insights – Binary answer Towards Statistical Queries over Distributed Private User Data 12  How to limit query result distortion?  Split answer‟s value into buckets.  Enforce a binary answer in each bucket.  Goldwasser-Micali (GM) bit-cryptosystem. Example: Query: “SELECT age FROM info WHERE gender=„m‟”  4 buckets: 0~12, 13~20, 21~59, and ≥60.  Answers: „1‟ or „0‟ per bucket  Malicious clients cannot substantially distort the query result.
  • 13. PDDP Key insights – Blind noise Towards Statistical Queries over Distributed Private User Data 13  How to achieve differential privacy ?  Honest-but-curious proxy Generates additional binary answers in each bucket as differentially private noise.  If analyst publishes the final noisy result  proxy knows the noise added  can subtract noise from the publish result to get a noisy-free result.  Solution: Proxy can only blindly add noise!  Proxy knows that the added noise is enough to achieve differential privacy  Proxy does not know the exact noise added.
  • 14. PDDP Workflow – Step 1 Towards Statistical Queries over Distributed Private User Data 14  Query Initialization Analyst first issues a query to the Proxy.  Message consists of 4 items:  Query: SELECT age FROM info WHERE gender=„m‟  Buckets: 0∼12, 13∼20, 21∼59 and ≥60.  # clients queried (c): 1000  DP parameter (ε): 1.0  Controls tradeoff between accuracy of computation and strength of its privacy guarantee.
  • 15. PDDP Workflow – Step 2 Towards Statistical Queries over Distributed Private User Data 15  Query Forwarding Select clients and send them the query.  Proxy:  rejects the query if c is too low or too high.  rejects the query if ε exceeds the max privacy level allowed.  selects c unique clients and send them the query, under the one of the following policies:  Select c clients randomly and wait for them to connect.  Select the first c clients that connect.
  • 16. PDDP Workflow – Step 3 (1/2) Towards Statistical Queries over Distributed Private User Data 16  Client Response Clients execute the query and send answers.  Client executes query over its local data and produces answer:  „1‟ or „0‟ per bucket.  More than one bucket may contain a „1‟.  Per-bucket answer value is individually encrypted with the analyst‟s public key. (GM cryptosystem)
  • 17. PDDP Workflow – Step 3 (2/2) Towards Statistical Queries over Distributed Private User Data 17  Goldwasser-Micali (GM) cryptosystem  Single-bit cryptosystem  Enforces binary answer in each bucket.  Very Efficient  XOR – homomorphic  E(a) * E(b) = E(a XOR b)
  • 18. PDDP Workflow – Step 4 Towards Statistical Queries over Distributed Private User Data 18  Blind noise addition  The proxy maintains a pool of additional binary answers called coins and adds them as noise to each bucket.  Coins must be unbiased.  Coins are encrypted with the analyst‟s public key.  In each bucket must be added n coins: How to generate coins blindly?
  • 19. Coin pool generation Towards Statistical Queries over Distributed Private User Data 19  Straightforward approaches  Proxy generates coins  Curious proxy could know noise-free result  Clients generate coins  Malicious clients could generate biased coins
  • 20. Collaborative coin generation Towards Statistical Queries over Distributed Private User Data 20  Paper‟s approach  Each online client periodically generates an encrypted unbiased coin E(oc) and sends it to the proxy  The proxy receives the coin and verifies the legitimacy of the coin.  The proxy blindly re-flips the coin E(oc) by multiplying it with a proxy‟s locally generated unbiased coin E(op) plus a modulo operation. E(oc) * E(op) mod m = E(oc XOR op), where m is part of the analyst’s public key  The proxy stores the unbiased coin in the locally maintained pool.  Proxy doesn‟t know the actual value of the generated unbiased coin.
  • 21. PDDP Workflow – Step 5 Towards Statistical Queries over Distributed Private User Data 21  Noisy answers to analyst  Each bucket has clients answers + coins (noise)  After random delay the proxy shuffles the c + n values.  Prevents identification of a client based on the vector of „1‟ and „0‟ in its answer.  Finally, analyst  decrypts with its private key all encrypted binary values.  sums the plaintext values obtained.  obtains the noisy answer for the clients that fall within each bucket.
  • 22. Practical Considerations (1/2) Towards Statistical Queries over Distributed Private User Data 22  Utility of aggregate result  Depends on the amount of added noise.  The n coins added by the proxy and the analyst‟s adjustment on the means of n/2 form a binomial distribution (approximation of the normal distribution N(0, n/4) ).  Example : c =106 , ε = 1.0 Given normal distribution in each bucket  68% probability that the noisy answer is 15.24 away from the true answer  95% probability that the noisy answer is 30.48 away from the true answer  99.7% probability that the noisy answer is 45.72 away from the true answer
  • 23. Practical Considerations (2/2) Towards Statistical Queries over Distributed Private User Data 23  Non-numeric Queries  Map query into a numeric query. Example: “Which website do you visit most often?” Map each website the analyst wishes to learn into a numeric value. Large number of buckets – limit the answer to 5000 buckets.  Sybils  Design susceptible to Sybil attacks (single client can masquerades multiple clients).  Proxy can limit the number of clients selected at a single IP address for a given query.
  • 24. Implementation and Deployment (1/2) Towards Statistical Queries over Distributed Private User Data 24  Client  Firefox add-on  9600 lines of Java code  Information is stored in local SQLite storage  Web browsing activities  Certain online shopping activities  Certain ad interactions  Can be extended to capture any online activity  Every 5 min connects to the proxy to retrieve pending queries, return answers and periodically generated coins.
  • 25. Implementation and Deployment (2/2) Towards Statistical Queries over Distributed Private User Data 25  Proxy  Web service on Tomcat 6.0.33  3600 lines of code  Proxy state in MySQL database.  Analyst  800 lines of code Deployment  Correctness verified on a set of local machines.  600+ real clients
  • 26. Comparison: “Paillier-based” design Towards Statistical Queries over Distributed Private User Data 26  Honest-but-Curious Proxy  Paillier Cryptosystem  Additive homomorphism  Proxy can directly sum up all clients‟ encrypted binary answers to get the encrypted sum of each bucket.  A single malicious client can distort substantially the result  Use of zero-knowledge-proofs (ZKP) to ensure that encrypted answers are „1‟ or „0‟.  Proxy knows exactly how much noise has been added.
  • 27. Evaluation (1/5) Towards Statistical Queries over Distributed Private User Data 27  Client Performance  Clients encrypt a binary value for each bucket.  GM cryptosystem  Paillier cryptosystem
  • 28. Evaluation (2/5) Towards Statistical Queries over Distributed Private User Data 28  Proxy - Analyst Performance  Proxy PDDP  One encryption and one homomorphic XOR for one unbiased coin.  Jacobi symbol checking on received coins and answer values (faster than a decryption). Paillier-based  One ZKP for each client answer in each bucket.  Homomorphically sum up all clients answers per bucket.  Add noise to each per-bucket total sum.
  • 29. Evaluation (3/5) Towards Statistical Queries over Distributed Private User Data 29  Proxy - Analyst Performance  Analyst PDDP  Decrypt all encrypted values in each bucket. Paillier-based  Decrypt one encrypted value in each bucket
  • 30. Evaluation (4/5) Towards Statistical Queries over Distributed Private User Data 30  Bandwidth overhead  In both systems, a client transmits an encrypted answer to each bucket.  In PDDP, a client transmits periodically generated coin to the proxy.  In Paillier-based, a client transmits a ZKP for each bucket.  Storage overhead  In PDDP, the proxy stores all clients‟ answer values for each bucket plus the required number of coins.  In Paillier-based, proxy stores only one answer value per bucket.
  • 31. Evaluation (5/5) Towards Statistical Queries over Distributed Private User Data 31  Querying the client deployment  Parameters c = 250 (out of 600 clients) ε = 5.0 clients are selected as they connect until 250 unique clients are queried or 24-hours expire. These parameters result in 16 coins per bucket.  Ensure that a per bucket aggregate answer is within plus or minus 2, 4, 6 of the noisy-free answer with a probability of 68%, 95% and 99,7%
  • 32. Future Work Towards Statistical Queries over Distributed Private User Data 32  Support of statistical learning algorithms  Scalability of non-numeric queries  Bloom filters – map a large number of possible answers in a small number of buckets.  Gather statistical data for a large-scale experiment.  Weaken proxy trust requirements.  Use of trusted hardware (TPM)  General: measure the actual privacy loss for differential privacy.
  • 33. Conclusion Towards Statistical Queries over Distributed Private User Data 33  PDDP: Practical Distributed Differential Private System  Scales well  Tolerates churn  Places tight bound on malicious user‟s capability.  Key insights  Binary answer in each bucket  Blind noise addition
  • 34. Towards Statistical Queries over Distributed Private User Data 34 Questions?