Dual Query: Practical Private Query Release for High Dimensional Data

•Download as PPTX, PDF•

0 likes•387 views

Steven Wu

Technology Education

Sensitive Database
(Medical Records)
Queries
Release answers that
preserve privacy
Private Query Release

D
Differential Privacy
Algorithm
ratio bounded
Alice Bob Chris Donna ErnieXavier

Differential Privacy (DMNS06)
• An algorithm A with domain X and range R satisfies
ε-differential privacy if for every outcome r and every
pair of databases D, D’ differing in one record:
Pr[ A(D) = r ] ≤ (1 + ε)Pr[ A(D’) = r ]
Useful Properties:
• Strong, worst-cast notion of privacy
• Similar to stability for learning algorithms

More Formally
Release approximate answers to a
large collection of queries with
Privacy and Accuracy

Answer Exponentially Many queries
• Privately learn a distribution D’ approximating D
True Database Approximate Database
Learning
Algorithm
Approximately
Same Answers
on the queries

Learn from Learning Theory
• [DRV08]: query release via boosting
• [HR10]: use multiplicative weights (MW) update
algorithm to learn a distribution
• [HLM12]: experimentally evaluated the MW
algorithm, performs well for ≤ 80 attributes

What is the bottleneck?
The algorithm operates on the distribution of all
possible data records:
Exponential in d !

Impossibility Result
• No private algorithm can answer exponentially large
collection of queries efficiently and accurately
• Shown by a line of lower bounds:
[DNRRV09] [Ullman-Vadhan11] [Ullman13] [BUV14]
• Problem theoretically hard in the worst case
• But can we do something in practice?
(not with exponential space)

Query Release Game
Data Player
actions
Query Player
actions
Maximize
Error
Minimize
Error

Approximate Equilibrium Implies Accuracy

Computing the Equilibrium
Multiplicative Weights vs. Best Response
Data Player Query Player
Converge to
Approximate Equilibrium
exponential size
distribution

Dual Approach
Multiplicative Weights vs. Best Response
Data PlayerQuery Player
Solve an NP-Hard
Problem

Best Response Problem
• Minimize error w.r.t query player’s distribution
• Concisely represented but NP-Hard
• Can be encoded as an integer program
Send it to CPLEX Solver

Don’t Need to Optimize Exactly
If the optimization problem is too hard, stop CPLEX
and return the current solution

Accuracy
Accuracy versus ε
500,000 queries; 17,770 attributes

Scalability
Accuracy versus number of attributes
100,000 queries; up to 512,000 attributes

Scalability
Runtime (secs) versus Number of Attributes
100,000 queries; up to 512,000 attributes

Take-Away
• Private Query Release for High Dimensional Data is Hard
• Reconfigure Existing Algorithm to Isolate the Hard Part
• Dual Query: an algorithm that performs well in practice

What's hot

J. Park, AAAI 2022, MLILAB, KAIST AIMLILAB

Robot’s personality neural networksKafeza Law Offices

Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...Neelabha Pant

Exploiting Constant Pool with MerlYoshihiro TANAKA

An enhanced fuzzy rough set based clustering algorithm for categorical dataeSAT Publishing House

Pedersen semeval-2013-poster-may24University of Minnesota, Duluth

Face recognition and deep learning โดย ดร. สรรพฤทธิ์ มฤคทัต NECTECBAINIDA

What's hot (7)

J. Park, AAAI 2022, MLILAB, KAIST AI

Robot’s personality neural networks

Hyperoptimized Machine Learning and Deep Learning Methods For Geospatial and ...

Exploiting Constant Pool with Merl

An enhanced fuzzy rough set based clustering algorithm for categorical data

Pedersen semeval-2013-poster-may24

Face recognition and deep learning โดย ดร. สรรพฤทธิ์ มฤคทัต NECTEC

Viewers also liked

Battista Biggio @ ICML2012: "Poisoning attacks against support vector machines"Pluribus One

Battista Biggio @ AISec 2013 - Is Data Clustering in Adversarial Settings Sec...Pluribus One

Causative Adversarial LearningDavid Dao

Transfer learningforclpDanushka Bollegala

Adversarial Learning_Rupam BhattacharyaRupam Bhattacharya

Spectral graph theoryDanushka Bollegala

深層意味表現学習 (Deep Semantic Representations)Danushka Bollegala

単語の分散表現と構成性の計算モデルの発展Naoaki Okazaki

Viewers also liked (8)

Battista Biggio @ ICML2012: "Poisoning attacks against support vector machines"

Battista Biggio @ AISec 2013 - Is Data Clustering in Adversarial Settings Sec...

Causative Adversarial Learning

Transfer learningforclp

Adversarial Learning_Rupam Bhattacharya

Spectral graph theory

深層意味表現学習 (Deep Semantic Representations)

単語の分散表現と構成性の計算モデルの発展

Similar to Dual Query: Practical Private Query Release for High Dimensional Data

Big Data Challenges and SolutionsNew York City College of Technology Computer Systems Technology Colloquium

MachineLearning.pptbutest

5This is a long company name trying to break the username filed

7Ankita Gupta

7test prod1

15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdfMcSwathi

L1 intro2 supervised_learningYogendra Singh

Clique and stingSubramanyam Natarajan

Learning when to give up: theory, practice and perspectivesGiuseppe (Pino) Di Fabbrizio

Supervised learning: Types of Machine LearningLibya Thomas

Deep learning: Modeling high-level face features through deep networksNelson Forte

Deep learningAman Kamboj

Microsoft PowerPoint - Lec 04 - Decision Tree Learning.pdfZainabShahzad9

Domain adaptation: A Theoretical ViewChia-Ching Lin

Bayesian Neural NetworksNatan Katz

Zero-shot Image Recognition Using Relational Matching, Adaptation and Calibra...Debasmit Das

Neural Networks with Focus on Language ModelingAdel Rahimi

Most Asked Coding Questions .pdfkrishna415649

Similar to Dual Query: Practical Private Query Release for High Dimensional Data (20)

Big Data Challenges and Solutions

MachineLearning.ppt

15_wk4_unsupervised-learning_manifold-EM-cs365-2014.pdf

L1 intro2 supervised_learning

Clique and sting

Learning when to give up: theory, practice and perspectives

Supervised learning: Types of Machine Learning

Deep learning: Modeling high-level face features through deep networks

Deep learning

Microsoft PowerPoint - Lec 04 - Decision Tree Learning.pdf

Domain adaptation: A Theoretical View

Bayesian Neural Networks

Zero-shot Image Recognition Using Relational Matching, Adaptation and Calibra...

Neural Networks with Focus on Language Modeling

Most Asked Coding Questions .pdf

Recently uploaded

AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood

MS Copilot expands with MS Graph connectorsNanddeep Nachan

Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

MINDCTI Revenue Release Quarter One 2024MIND CTI

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

DBX First Quarter 2024 Investor PresentationDropbox

Ransomware_Q4_2023. The report. [EN].pdfOverkill Security

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

Why Teams call analytics are critical to your entire businesspanagenda

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

ICT role in 21st century education and its challengesrafiqahmad00786416

Recently uploaded (20)

AXA XL - Insurer Innovation Award Americas 2024

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

MS Copilot expands with MS Graph connectors

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Strategies for Landing an Oracle DBA Job as a Fresher

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

MINDCTI Revenue Release Quarter One 2024

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

DBX First Quarter 2024 Investor Presentation

Ransomware_Q4_2023. The report. [EN].pdf

presentation ICT roal in 21st century education

Why Teams call analytics are critical to your entire business

Exploring the Future Potential of AI-Enabled Smartphone Processors

Automating Google Workspace (GWS) & more with Apps Script

Apidays New York 2024 - The value of a flexible API Management solution for O...

ICT role in 21st century education and its challenges

Dual Query: Practical Private Query Release for High Dimensional Data

1. Dual Query: Practical Private Query Release for High Dimensional Data Speaker: Steven Wu University of Pennsylvania ICML 2014 Joint work with Marco Gaboardi Emilio Jesús Gallego Arias Justin Hsu Aaron Roth

2. Sensitive Database (Medical Records) Queries Release answers that preserve privacy Private Query Release

3. D Differential Privacy Algorithm ratio bounded Alice Bob Chris Donna ErnieXavier

4. Differential Privacy (DMNS06) • An algorithm A with domain X and range R satisfies ε-differential privacy if for every outcome r and every pair of databases D, D’ differing in one record: Pr[ A(D) = r ] ≤ (1 + ε)Pr[ A(D’) = r ] Useful Properties: • Strong, worst-cast notion of privacy • Similar to stability for learning algorithms

5. More Formally Release approximate answers to a large collection of queries with Privacy and Accuracy

6. Answer Exponentially Many queries • Privately learn a distribution D’ approximating D True Database Approximate Database Learning Algorithm Approximately Same Answers on the queries

7. Learn from Learning Theory • [DRV08]: query release via boosting • [HR10]: use multiplicative weights (MW) update algorithm to learn a distribution • [HLM12]: experimentally evaluated the MW algorithm, performs well for ≤ 80 attributes

8. What is the bottleneck? The algorithm operates on the distribution of all possible data records: Exponential in d !

9. Impossibility Result • No private algorithm can answer exponentially large collection of queries efficiently and accurately • Shown by a line of lower bounds: [DNRRV09] [Ullman-Vadhan11] [Ullman13] [BUV14] • Problem theoretically hard in the worst case • But can we do something in practice? (not with exponential space)

10. Query Release as Zero-Sum Game

11. Query Release Game Data Player actions Query Player actions Maximize Error Minimize Error

12. Approximate Equilibrium Implies Accuracy

13. Computing the Equilibrium Multiplicative Weights vs. Best Response Data Player Query Player Converge to Approximate Equilibrium exponential size distribution

14. Dual Approach Multiplicative Weights vs. Best Response Data PlayerQuery Player Solve an NP-Hard Problem

15. Best Response Problem • Minimize error w.r.t query player’s distribution • Concisely represented but NP-Hard • Can be encoded as an integer program Send it to CPLEX Solver

16. Don’t Need to Optimize Exactly If the optimization problem is too hard, stop CPLEX and return the current solution

17. Accuracy Accuracy versus ε 500,000 queries; 17,770 attributes

18. Scalability Accuracy versus number of attributes 100,000 queries; up to 512,000 attributes

19. Scalability Runtime (secs) versus Number of Attributes 100,000 queries; up to 512,000 attributes

20. Take-Away • Private Query Release for High Dimensional Data is Hard • Reconfigure Existing Algorithm to Isolate the Hard Part • Dual Query: an algorithm that performs well in practice

21. Dual Query: Practical Private Query Release for High Dimensional Data Speaker: Steven Wu University of Pennsylvania ICML 2014 Joint work with Marco Gaboardi Emilio Jesús Gallego Arias Justin Hsu Aaron Roth

Editor's Notes

What is the fraction of people with a certain property?
Stability of machine learning
generating synthetic data: a fresh, safe version of the dataset approximates the real dataset on every statistical query of interest.
Optimal in terms of privacy and accuracy trade-off.
Both players are quite happy with their distributions
Repeated play; the previous approach
Repeated play; more intuition

Dual Query: Practical Private Query Release for High Dimensional Data

Recommended

Recommended

More Related Content

What's hot

What's hot (7)

Viewers also liked

Viewers also liked (8)

Similar to Dual Query: Practical Private Query Release for High Dimensional Data

Similar to Dual Query: Practical Private Query Release for High Dimensional Data (20)

Recently uploaded

Recently uploaded (20)

Dual Query: Practical Private Query Release for High Dimensional Data

Editor's Notes