ICT role in 21st century education and its challenges
Dual Query: Practical Private Query Release for High Dimensional Data
1. Dual Query:
Practical Private Query Release for
High Dimensional Data
Speaker: Steven Wu
University of Pennsylvania
ICML 2014
Joint work with
Marco Gaboardi
Emilio Jesús Gallego Arias
Justin Hsu
Aaron Roth
4. Differential Privacy (DMNS06)
• An algorithm A with domain X and range R satisfies
ε-differential privacy if for every outcome r and every
pair of databases D, D’ differing in one record:
Pr[ A(D) = r ] ≤ (1 + ε)Pr[ A(D’) = r ]
Useful Properties:
• Strong, worst-cast notion of privacy
• Similar to stability for learning algorithms
6. Answer Exponentially Many queries
• Privately learn a distribution D’ approximating D
True Database Approximate Database
Learning
Algorithm
Approximately
Same Answers
on the queries
7. Learn from Learning Theory
• [DRV08]: query release via boosting
• [HR10]: use multiplicative weights (MW) update
algorithm to learn a distribution
• [HLM12]: experimentally evaluated the MW
algorithm, performs well for ≤ 80 attributes
8. What is the bottleneck?
The algorithm operates on the distribution of all
possible data records:
Exponential in d !
9. Impossibility Result
• No private algorithm can answer exponentially large
collection of queries efficiently and accurately
• Shown by a line of lower bounds:
[DNRRV09] [Ullman-Vadhan11] [Ullman13] [BUV14]
• Problem theoretically hard in the worst case
• But can we do something in practice?
(not with exponential space)
15. Best Response Problem
• Minimize error w.r.t query player’s distribution
• Concisely represented but NP-Hard
• Can be encoded as an integer program
Send it to CPLEX Solver
16. Don’t Need to Optimize Exactly
If the optimization problem is too hard, stop CPLEX
and return the current solution
20. Take-Away
• Private Query Release for High Dimensional Data is Hard
• Reconfigure Existing Algorithm to Isolate the Hard Part
• Dual Query: an algorithm that performs well in practice
21. Dual Query:
Practical Private Query Release for High
Dimensional Data
Speaker: Steven Wu
University of Pennsylvania
ICML 2014
Joint work with
Marco Gaboardi
Emilio Jesús Gallego Arias
Justin Hsu
Aaron Roth
Editor's Notes
What is the fraction of people with a certain property?
Stability of machine learning
generating synthetic data: a fresh, safe version of the dataset approximates the real dataset on every statistical query of interest.
Optimal in terms of privacy and accuracy trade-off.
Both players are quite happy with their distributions