SlideShare a Scribd company logo
1 of 29
Partitioned Data Security on
Outsourced Sensitive and Non-
sensitive Data
Sharad Mehrotra1, Shantanu Sharma1, Jeffrey D. Ullman2, and
Anurag Mishra1
1University of California, Irvine, USA
2Stanford University, USA
IEEE International Conference on Data Engineering (ICDE), 2019.
Secure Data Outsourcing
2
Can we design an outsourcing solution that is simultaneously
Efficient – significantly better compared to downloading encrypted
data, and
Secure – similar to downloading the data and local processing
Use cryptographic mechanisms to protect sensitive data on
the cloud
• State-of-the-art in secure data outsourcing
• Partitioned Computing & corresponding security properties
• Binning algorithm to achieve partitioned security
• Performance results
Roadmap
Data/Computation Outsourcing over the Years
Keyword Search over Encrypted Documents
[IEEE SP, 2000, ACNS 04, Cryto, 08,Cryto 09…]
SQL over Encrypted data: [ICDE 02, SIGMOD 02,
VLDB04, Eurocrypt 03,SIGMOD 04, Crypto 11, STOC
09, SOSP 11, …]
MPC and Secret Sharing [CACM 79, Eurocrypt
14,15,17 VLDB 17, Tech 19]
OS
Process 1
Process 2
Trusted
Enclave
Encrypte
d Data
Cache
PageTa
ble
Ecall
Ocall
The adversary can observe the
cache-lines and page table
access
Secure Hardware [CIDR 13, Usenix Security
15, IEEE SP 15,17, NSDI 18]
Solutions represent points in the
spectrum of possibilities
– Explore tradeoffs between
Generality, security,
efficiency.
More secure but orders of magnitude
worse in performance compared to
plaintext processing.
Not secure and software techniques
to make such solutions secure
inefficient
• coarse grain page faults, branch
shadow, cache-line attacks
Cryptographic Techniques: Security Threats &
Performance
represents technique is resilient to a
given attack.
DSSE: Distributed Searchable
Symmetric Encryption (PULSAR by
Stealth)
MPC: Multi-party computation (Jana
by Galois)
Opaque SGX based solution [Zhang
et al., NSDI, 2017]
Selecting a single row from TPC-H Customer table of
1.5M rows and 8 columns
• Cryptographic Overheads:
• Searchable encryption – ~2 orders of magnitude
• Secure hardware - ~3-4 order of magnitude
• MPC based solution - ~5-6 orders of magnitude
• Organization data is often only partially sensitive [refs in paper]
• Sensitivity dictated by policies
• Sensitivity dictates what data and in what form is it outsourced
• E.g., General office emails possibly not sensitive (hence outsourced)
• Information related to a sensitive project sensitive (hence not outsourced in
plaintext)
• Can we exploit partially sensitive nature of data to scale cryptographic
solutions without compromising security of sensitive data?
• Commercial encrypted database solutions (e.g., Jana by Galois) are beginning to
explore such solutions
Data Sensitivity & Outsourcing
Key Insight: Partial Sensitivity of Data (1)
• Data about entry/exit from buildings 
possibly sensitive (inference about time spent at work)
• Location within office building  possibly not sensitive
• Surveillance video  not sensitive
• Surveillance video  sensitive, if visitor prefers not to be monitored (OK
to know visitor not in frame, but not if visitor in frame!)
Partial sensitivity is also true for other
domains
http://cybersecurity.ieee.org/blog/2015/11/13/ident
ify-sensitive-data-and-how-they-should-be-handled/
https://digitalguardian.com/
Can we exploit partial sensitivity
to develop efficient (yet secure)
solutions to scale secure
computing and/or data sharing
Key Insight: Partial Sensitivity of Data (2)
• Existing work on data classification
• Inference detection using graph-based semantic data modeling [Hinke, IEEE SP, 88]
• User-defined relationships between sensitive and non-sensitive data [Smith, IEEE SP, 90]
• Sensitive patterns hiding using sanitization matrix [Lee et al., COMPSAC, 2004]
• Common knowledge-based association rules [Li et al., DASFAA, 2007]
• Constraints-based mechanisms
• Objectives of finding data-sensitivity
• Data-sharing while keeping sensitive data at the trusted user
• Multi-level secure data accessing
• Allowing data for mining purposes while also preserving the confidentiality of the data
Partitioned Computations
Name Department
t1 E(Adam) E(Defense)
t2 E(John) E(Security)
t3 E(Clark) E(Crypto)
t4 E(Lisa) E(Defense)
Name Department
t5 Adam Testing
t6 John Testing
t7 Lisa Design
t8 Clark Design
Query Q Answer A
Query Qs Query Qns
Answer Ans
Answer As
Sensitive Data Ds
Non-sensitive Data Dns
Leakage due to Partitioned Computing…
Name Department
t1 E(Adam) E(Defense)
t2 E(John) E(Security)
t3 E(Clark) E(Crypto)
t4 E(Lisa) E(Defense)
Name Department
t5 Adam Testing
t6 John Testing
t7 Lisa Design
t8 Clark Design
Sensitive Data Ds
Non-sensitive Data Dns
Query: Retrieve John rows
Query
value
Tuples retrieved
from sensitive side
Tuples retrieved from
non-sensitive side
John T2 T6
Adversarial view
T2 is John’s row.
What if we use access-pattern-hiding techniques?
Name Department
t1 E(Adam) E(Defense)
t2 E(John) E(Security)
t3 E(Clark) E(Crypto)
t4 E(Lisa) E(Defense)
Name Department
t5 Adam Testing
t6 John Testing
t7 Lisa Design
t8 Clark Design
Sensitive Data Ds
Non-sensitive Data Dns
Query: Retrieve John rows
Query
value
Tuples retrieved
from sensitive side
Tuples retrieved from
non-sensitive side
John E(….) T6
Adversarial view
Output size reveals that one of
John’s record is sensitive.
Partitioned Data Security
• Non-Linkability
• The Adversary does not learn relationship between any encrypted and plaintext
value
• Cyphertext Indistinguishability
• The adversary does not learn any relationships between encrypted values
• unless underlying crypto allows such relationships to be learnt (e.g., OPE)
Secure Partitioned Computation (1)
• Data partitioned into bins
• Non-sensitive data partitioned into
non-sensitive bins (NSB)
• Sensitive data partitioned into
sensitive bin (SB)
……E( x)……..
…… x ……..
…… y……..
…… z .……..
…….……..
……E(y) ……..
…… E(z)……..
…….……..
Ds
Dns
SB(x)
SB(y)
SB(z)
NSB(x)
NSB(y)
NSB(z)
Query
value
Tuples retrieved
from sensitive side
Tuples retrieved from
non-sensitive side
John SB(y) NSB(y)
Adversarial view
• Query Q for value y mapped to
all values in the bin
corresponding to y
• Retrieves all data in NSB(y) over
non-sensitive data
• Retrieves all data in SB(y) over
sensitive data
Secure Partitioned Computation (2)
• Bins are created such that for each pair of sensitive and non-sensitive
bins s & ns, there exists a value v,
• such that s =SB(v) and ns =NSB(v)
……E( x)……..
…… x ……..
…… y……..
…… z .……..
…….……..
……E(y) ……..
…… E(z)……..
…….……..
Ds
Dns
SB(x)
SB(y)
SB(z)
NSB(x)
NSB(y)
NSB(z)
adversarial view does not allow learning linkability between
sensitive and non-sensitive records
Secure Partitioned Computation (3)
• Association amongst each sensitive bin and non-sensitive bin prevents
• Leakage through joint access of data
• Output size attacks
• Workload skew attacks can be prevented through (careful) addition of
(minimal) fake queries
……E( x)……..
…… x ……..
…… y……..
…… z .……..
…….……..
……E(y) ……..
…… E(z)……..
…….……..
Ds
SB(x)
SB(y)
SB(z)
NSB(x)
NSB(y)
NSB(z)
Dns
Query Binning
• Assumptions
• Equal number of sensitive and non-sensitive attribute values
• Each distinct attribute value appears in at most one tuple in sensitive and one
tuple in non-sensitive data
• Number of values are a product of approximately equal factors
***The paper relaxes all these assumptions
The Algorithm: One Tuple Per Value
Bin Creation: Inputs: S and NS
• Permute all sensitive values
• Find approximate square factor of |NS| = x * y such that x
≥ y
• Create x sensitive bins; contains at most y inputs in each
• Create |NS|/x non-sensitive bins
• Assign ith sensitive value to (i mod x)th sensitive bin
• Assigning non-sensitive values: Assign non-sensitive value
corresponding to ith sensitive value, which is allocated to
jth bin, to jth position of ith non-sensitive bin
• NSB[j][i]  allocateNS(SB[i][j])
• Fill remaining NS values
S = 6 NS = 6
x = 3
y = 2
SB1
SB2
SB3
NSB1
NSB2
S1
S2
S3
S4
S5
S6
NS2 NS3NS1
NS7 NS6NS4
S = {S1, S2, S3, S4, S5, S6}
NS = {NS1, NS2, NS3, NS6, NS7}
The Algorithm: One Tuple Per Value
• Bin Retrieval: Input: Query(w)
• If w is in a sensitive bin SB[i][j], then
• Retrieve ith sensitive bin and jth non-sensitive bin
• If w is in a non-sensitive bin NSB[i][j], then
• Retrieve ith non-sensitive bin and jth sensitive bin
S = 6 NS = 6
x = 3
y = 2
S = {S1, S2, S3, S4, S5, S6}
NS = {NS1, NS2, NS3, NS6, NS7}
Query: S2 SB2, NSB1
Query: NS7 NSB1, SB2
SB1
SB2
SB3
NSB1
NSB2
S1
S2
S3
S4
S5
S6
NS2 NS3NS1
NS7 NS6NS4
Query Execution Cost on Outsourced Data
Techniques Time Resilient to attacks
Size Workload-skew Access-patterns
SGX 10500x
Query Binning + SGX (60% sensitivity) 8929x
Multi-party computations-Jana 954363x
Query Binning + Jana (60% sensitivity) 680131x
x is the time to search a predicate in cleartext.
is showing a technique is resilient to a given attack.
Experiments are conducted over 1.5M rows.
Experimental Results (Selection Query)
• X-axis = Data sensitivity (1%, 2%, 20%, 40%, 60%)
• Y-axis = time
SGX Opaque + Partition computing vs SGX Opaque
Data set size = 6M rows
Jana MPC + Partition computing vs Jana MPC
Data set size = 1M rows
Analytical Model
• When is query binning better compared to pure cryptographic approach?
Ratio of cost of QB versus
crypto only approach
After several rounds of
simplications (see paper)
Under ideal assumptions….
QB is better than cryptographic only
solution if this holds (see paper)
Ratio of computation cost of cryptographic
techniques vs plaintext per tuple
Ratio of cryptographic computation vs
communication cost per tuple (typically much
greater than 1 for strong cryptographic techniques)
Average query selectivityRatio of sensitive data
• If there is no approximate square factor?
• Select nearest square number
• If there is no 1-to-1 mapping of sensitive and non-sensitive value, and
differences in size of the values?
• Bin-packing algorithm
• What about range queries?
• With the help of a modified B-tree created over non-sensitive bins
• What about join queries?
• Keep pseudo-sensitive data with sensitive data
• What about aggregation queries?
• Execute like a selection query without tuple fetching
Query Binning Extensions
Distinct Values are not a Product of Approximately
Square Factor (1)
• What will happen when the number of distinct values is not a product
of approximately square factor ???
• Increasing communication cost
• For example 82 non-sensitive values, results in 41 sensitive bins and 2 non-
sensitive bins
ns1, ns2, …, ns41
ns42, ns43, …, ns82
E(s1)
E(s2)
E(s41)
SB1
SB2
SB41
NSB1
NSB2
Communication cost = 42
At most 1 value in
a sensitive bin
At most 41 values in a
non-sensitive bin
Distinct Values are not a Product of Approximately
Square Factor (2)
• Reducing communication cost --- by finding nearest square number
• In the case of 82 non-sensitive values, 81 is nearest square number
• Thus, create 9-9 sensitive and non-sensitive bins
ns1, ns2, …, ns10
ns11, ns12, …, ns19
….E(x)….
…E(y)…..
….E(z)…..
SB1
SB2
SB9
41Sensitivevalue
82Non-sensitivevalue
Communication cost = 15
ns74, ns75, …, ns82
At most 5 values
in a sensitive bin
At most 10 values in a
non-sensitive bin
NSB1
NSB2
NSB9
The Algorithm: General Case: Multiple Tuples per Value
(1)
• What will happen if all values have a
different number of tuples??
• Size of each sensitive bin is different now
• Assumption: More non-sensitive values
have more sensitive associated tuples.
• The adversary learns from tuple retrieval
that which bin contain sensitive value
corresponding to non-sensitive values
• E.g., retrieval of SB1 and NSB1 reveals that
S1 is allocated to SB1
S = 6 NS = 6
x = 3
y = 2
SB1
SB2
SB3
NSB1
NSB2
S1
S2
S3
S4
S5
S6
NS2 NS3NS1
NS7 NS6NS4
S1 = 10
S2 = 2
S3 = 1
S4 = 15
S5 = 2
S6 = 1
NS1 = 200
NS2 = 20
NS3 = 10
NS4 = 150
NS5 = 10
NS7 = 10
Size of bin
25
4
2
Size of
bin
230
170
The Algorithm: General Case: Multiple Tuples per Value
(2)
• What will happen if all values have a
different number of tuples?
• Solution: Simply add fake tuples to
sensitive bins
• Problem: too many fake tuples
leading to increases communication
cost
• So how to overcome this problem???
S = 6 NS = 6
x = 3
y = 2
SB1
SB2
SB3
NSB1
NSB2
S1
S2
S3
S4
S5
S6
NS2 NS3NS1
NS7 NS6NS4
S1 = 10
S2 = 2
S3 = 1
S4 = 15
S5 = 2
S6 = 1
NS1 = 200
NS2 = 20
NS3 = 10
NS4 = 150
NS5 = 10
NS7 = 10
Size of bin
25
4
2
Size of
bin
230
170
Added fake
tuples
0
21
23
We add 44 fake tuples to
sensitive data
The Algorithm: General Case: Multiple Tuples per Value
(3)
• What will happen if all values have a
different number of tuples?
• Solution: Bin-packing-based approach
• Sorting: Sort all the values in a decreasing
order of the number of tuples.
• Allocate sensitive values
• Add fake tuples
• Allocate non-sensitive values as we showed
previously
S = 6 NS = 6
x = 3
y = 2
SB1
SB2
SB3
NSB1
NSB2
S4
S1
S2
S6
S3
S5
NS1 NS2NS7
NS3 NS5NS6
S1 = 10
S2 = 2
S3 = 1
S4 = 15
S5 = 2
S6 = 1
NS1 = 200
NS2 = 20
NS3 = 10
NS4 = 150
NS5 = 10
NS7 = 10
Size of bins
before adding
faking tuples
16
11
4
Added fake
tuples
0
5
12
S4 = 15
S1 = 10
S2 = 2
S5 = 2
S3 = 1
S6 = 1
After
sorting
We add fewer fake tuples than a simple
solution of adding fake tuples
44 vs 17 fake tuples
Range Queries
• A full binary-tree is constructed for all non-sensitive value
• Bins are created for each level of the tree, except the root node
• Bins are retrieved based on least-matching
• For example, a range query from ns8 to ns12  Bins as per node ns23 and ns8
Bins for each node of each level of the tree
• Existing cryptographic techniques are orders of magnitude slower as
compared to cleartext processing
• Differentiating between sensitive and non-sensitive data can make
cryptographic techniques faster
• By avoiding expensive cryptographic operation on non-sensitive data
• However, a naïve query execution on partitioned data can lead to information
leakage
• Partitioned security
• Query binning
• Implements partitioned security
• While ensuring efficiency
• Interesting side-effect of QB:
• Makes existing cryptographic techniques more secure as a side-effect.
Conclusion

More Related Content

Similar to Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- ICDE 2019

Eloi Sanfelix - Hardware security: Side Channel Attacks [RootedCON 2011]
Eloi Sanfelix - Hardware security: Side Channel Attacks [RootedCON 2011]Eloi Sanfelix - Hardware security: Side Channel Attacks [RootedCON 2011]
Eloi Sanfelix - Hardware security: Side Channel Attacks [RootedCON 2011]RootedCON
 
Security Operations, Engineering, and Intelligence Integration through the po...
Security Operations, Engineering, and Intelligence Integration through the po...Security Operations, Engineering, and Intelligence Integration through the po...
Security Operations, Engineering, and Intelligence Integration through the po...Christopher Clark
 
The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?Raffael Marty
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksUsman Qayyum
 
Secure and Privacy-Preserving Big-Data Processing
Secure and Privacy-Preserving Big-Data ProcessingSecure and Privacy-Preserving Big-Data Processing
Secure and Privacy-Preserving Big-Data ProcessingShantanu Sharma
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningJeff Heaton
 
Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...Facultad de Informática UCM
 
How we use functional programming to find the bad guys @ Build Stuff LT and U...
How we use functional programming to find the bad guys @ Build Stuff LT and U...How we use functional programming to find the bad guys @ Build Stuff LT and U...
How we use functional programming to find the bad guys @ Build Stuff LT and U...Richard Minerich
 
Splunk Enterpise for Information Security Hands-On
Splunk Enterpise for Information Security Hands-OnSplunk Enterpise for Information Security Hands-On
Splunk Enterpise for Information Security Hands-OnSplunk
 
Information system security wk3-1
Information system security wk3-1Information system security wk3-1
Information system security wk3-1Bee Lalita
 
DigitalWatermarking.ppt
DigitalWatermarking.pptDigitalWatermarking.ppt
DigitalWatermarking.pptVinodSaini85
 
Digital Watermarking
Digital WatermarkingDigital Watermarking
Digital WatermarkingParag Agarwal
 
(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305Amazon Web Services
 
Impactpoint kernel-based-protection
Impactpoint kernel-based-protectionImpactpoint kernel-based-protection
Impactpoint kernel-based-protectionimpactpoint
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017StampedeCon
 
Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Li Shen
 

Similar to Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- ICDE 2019 (20)

Eloi Sanfelix - Hardware security: Side Channel Attacks [RootedCON 2011]
Eloi Sanfelix - Hardware security: Side Channel Attacks [RootedCON 2011]Eloi Sanfelix - Hardware security: Side Channel Attacks [RootedCON 2011]
Eloi Sanfelix - Hardware security: Side Channel Attacks [RootedCON 2011]
 
Security Operations, Engineering, and Intelligence Integration through the po...
Security Operations, Engineering, and Intelligence Integration through the po...Security Operations, Engineering, and Intelligence Integration through the po...
Security Operations, Engineering, and Intelligence Integration through the po...
 
The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?The Heatmap
 - Why is Security Visualization so Hard?
The Heatmap
 - Why is Security Visualization so Hard?
 
cryptography
cryptographycryptography
cryptography
 
Shilpa ppt
Shilpa pptShilpa ppt
Shilpa ppt
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Secure and Privacy-Preserving Big-Data Processing
Secure and Privacy-Preserving Big-Data ProcessingSecure and Privacy-Preserving Big-Data Processing
Secure and Privacy-Preserving Big-Data Processing
 
How to Share a Secret
How to Share a SecretHow to Share a Secret
How to Share a Secret
 
rsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morningrsec2a-2016-jheaton-morning
rsec2a-2016-jheaton-morning
 
Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...Hardware/software security contracts: Principled foundations for building sec...
Hardware/software security contracts: Principled foundations for building sec...
 
How we use functional programming to find the bad guys @ Build Stuff LT and U...
How we use functional programming to find the bad guys @ Build Stuff LT and U...How we use functional programming to find the bad guys @ Build Stuff LT and U...
How we use functional programming to find the bad guys @ Build Stuff LT and U...
 
Splunk Enterpise for Information Security Hands-On
Splunk Enterpise for Information Security Hands-OnSplunk Enterpise for Information Security Hands-On
Splunk Enterpise for Information Security Hands-On
 
Information system security wk3-1
Information system security wk3-1Information system security wk3-1
Information system security wk3-1
 
G04701051058
G04701051058G04701051058
G04701051058
 
DigitalWatermarking.ppt
DigitalWatermarking.pptDigitalWatermarking.ppt
DigitalWatermarking.ppt
 
Digital Watermarking
Digital WatermarkingDigital Watermarking
Digital Watermarking
 
(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305(CMP305) Deep Learning on AWS Made EasyCmp305
(CMP305) Deep Learning on AWS Made EasyCmp305
 
Impactpoint kernel-based-protection
Impactpoint kernel-based-protectionImpactpoint kernel-based-protection
Impactpoint kernel-based-protection
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015Next-generation sequencing data format and visualization with ngs.plot 2015
Next-generation sequencing data format and visualization with ngs.plot 2015
 

More from Shantanu Sharma

OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation QueriesOBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation QueriesShantanu Sharma
 
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)Shantanu Sharma
 
Private and secure secret shared map reduce
Private and secure secret shared map reducePrivate and secure secret shared map reduce
Private and secure secret shared map reduceShantanu Sharma
 
A Survey on 5G: The Next Generation of Mobile Communication
A Survey on 5G: The Next Generation of Mobile CommunicationA Survey on 5G: The Next Generation of Mobile Communication
A Survey on 5G: The Next Generation of Mobile CommunicationShantanu Sharma
 
Meta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
Meta-MapReduce- A Technique for Reducing Communication in MapReduce ComputationsMeta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
Meta-MapReduce- A Technique for Reducing Communication in MapReduce ComputationsShantanu Sharma
 
On Detecting Termination in Cognitive Radio Networks
On Detecting Termination in Cognitive Radio NetworksOn Detecting Termination in Cognitive Radio Networks
On Detecting Termination in Cognitive Radio NetworksShantanu Sharma
 
Bounds for overlapping interval join on MapReduce
Bounds for overlapping interval join on MapReduceBounds for overlapping interval join on MapReduce
Bounds for overlapping interval join on MapReduceShantanu Sharma
 
Assignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduceAssignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduceShantanu Sharma
 
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...Shantanu Sharma
 

More from Shantanu Sharma (9)

OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation QueriesOBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
OBSCURE: Information Theoretic Oblivious and Verifiable Aggregation Queries
 
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)
Verifiable Round-Robin Scheme for Smart Homes (CODASPY 2019)
 
Private and secure secret shared map reduce
Private and secure secret shared map reducePrivate and secure secret shared map reduce
Private and secure secret shared map reduce
 
A Survey on 5G: The Next Generation of Mobile Communication
A Survey on 5G: The Next Generation of Mobile CommunicationA Survey on 5G: The Next Generation of Mobile Communication
A Survey on 5G: The Next Generation of Mobile Communication
 
Meta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
Meta-MapReduce- A Technique for Reducing Communication in MapReduce ComputationsMeta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
Meta-MapReduce- A Technique for Reducing Communication in MapReduce Computations
 
On Detecting Termination in Cognitive Radio Networks
On Detecting Termination in Cognitive Radio NetworksOn Detecting Termination in Cognitive Radio Networks
On Detecting Termination in Cognitive Radio Networks
 
Bounds for overlapping interval join on MapReduce
Bounds for overlapping interval join on MapReduceBounds for overlapping interval join on MapReduce
Bounds for overlapping interval join on MapReduce
 
Assignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduceAssignment of Different-Sized Inputs in MapReduce
Assignment of Different-Sized Inputs in MapReduce
 
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...
Self-Stabilizing End-to-End Communication in Bounded Capacity, Omitting, D...
 

Recently uploaded

DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdfDEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdfAkritiPradhan2
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfShreyas Pandit
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Sumanth A
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfisabel213075
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESkarthi keyan
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfDrew Moseley
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Communityprachaibot
 
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书rnrncn29
 
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodManicka Mamallan Andavar
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSneha Padhiar
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptJohnWilliam111370
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosVictor Morales
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communicationpanditadesh123
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdfsahilsajad201
 

Recently uploaded (20)

DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdfDEVICE DRIVERS AND INTERRUPTS  SERVICE MECHANISM.pdf
DEVICE DRIVERS AND INTERRUPTS SERVICE MECHANISM.pdf
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdf
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdf
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdf
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Community
 
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
 
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptxCurve setting (Basic Mine Surveying)_MI10412MI.pptx
Curve setting (Basic Mine Surveying)_MI10412MI.pptx
 
Levelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument methodLevelling - Rise and fall - Height of instrument method
Levelling - Rise and fall - Height of instrument method
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
 
KCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitosKCD Costa Rica 2024 - Nephio para parvulitos
KCD Costa Rica 2024 - Nephio para parvulitos
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communication
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
 

Partitioned Data Security on Outsourced Sensitive and Non-sensitive Data -- ICDE 2019

  • 1. Partitioned Data Security on Outsourced Sensitive and Non- sensitive Data Sharad Mehrotra1, Shantanu Sharma1, Jeffrey D. Ullman2, and Anurag Mishra1 1University of California, Irvine, USA 2Stanford University, USA IEEE International Conference on Data Engineering (ICDE), 2019.
  • 2. Secure Data Outsourcing 2 Can we design an outsourcing solution that is simultaneously Efficient – significantly better compared to downloading encrypted data, and Secure – similar to downloading the data and local processing Use cryptographic mechanisms to protect sensitive data on the cloud
  • 3. • State-of-the-art in secure data outsourcing • Partitioned Computing & corresponding security properties • Binning algorithm to achieve partitioned security • Performance results Roadmap
  • 4. Data/Computation Outsourcing over the Years Keyword Search over Encrypted Documents [IEEE SP, 2000, ACNS 04, Cryto, 08,Cryto 09…] SQL over Encrypted data: [ICDE 02, SIGMOD 02, VLDB04, Eurocrypt 03,SIGMOD 04, Crypto 11, STOC 09, SOSP 11, …] MPC and Secret Sharing [CACM 79, Eurocrypt 14,15,17 VLDB 17, Tech 19] OS Process 1 Process 2 Trusted Enclave Encrypte d Data Cache PageTa ble Ecall Ocall The adversary can observe the cache-lines and page table access Secure Hardware [CIDR 13, Usenix Security 15, IEEE SP 15,17, NSDI 18] Solutions represent points in the spectrum of possibilities – Explore tradeoffs between Generality, security, efficiency. More secure but orders of magnitude worse in performance compared to plaintext processing. Not secure and software techniques to make such solutions secure inefficient • coarse grain page faults, branch shadow, cache-line attacks
  • 5. Cryptographic Techniques: Security Threats & Performance represents technique is resilient to a given attack. DSSE: Distributed Searchable Symmetric Encryption (PULSAR by Stealth) MPC: Multi-party computation (Jana by Galois) Opaque SGX based solution [Zhang et al., NSDI, 2017] Selecting a single row from TPC-H Customer table of 1.5M rows and 8 columns • Cryptographic Overheads: • Searchable encryption – ~2 orders of magnitude • Secure hardware - ~3-4 order of magnitude • MPC based solution - ~5-6 orders of magnitude
  • 6. • Organization data is often only partially sensitive [refs in paper] • Sensitivity dictated by policies • Sensitivity dictates what data and in what form is it outsourced • E.g., General office emails possibly not sensitive (hence outsourced) • Information related to a sensitive project sensitive (hence not outsourced in plaintext) • Can we exploit partially sensitive nature of data to scale cryptographic solutions without compromising security of sensitive data? • Commercial encrypted database solutions (e.g., Jana by Galois) are beginning to explore such solutions Data Sensitivity & Outsourcing
  • 7. Key Insight: Partial Sensitivity of Data (1) • Data about entry/exit from buildings  possibly sensitive (inference about time spent at work) • Location within office building  possibly not sensitive • Surveillance video  not sensitive • Surveillance video  sensitive, if visitor prefers not to be monitored (OK to know visitor not in frame, but not if visitor in frame!) Partial sensitivity is also true for other domains http://cybersecurity.ieee.org/blog/2015/11/13/ident ify-sensitive-data-and-how-they-should-be-handled/ https://digitalguardian.com/ Can we exploit partial sensitivity to develop efficient (yet secure) solutions to scale secure computing and/or data sharing
  • 8. Key Insight: Partial Sensitivity of Data (2) • Existing work on data classification • Inference detection using graph-based semantic data modeling [Hinke, IEEE SP, 88] • User-defined relationships between sensitive and non-sensitive data [Smith, IEEE SP, 90] • Sensitive patterns hiding using sanitization matrix [Lee et al., COMPSAC, 2004] • Common knowledge-based association rules [Li et al., DASFAA, 2007] • Constraints-based mechanisms • Objectives of finding data-sensitivity • Data-sharing while keeping sensitive data at the trusted user • Multi-level secure data accessing • Allowing data for mining purposes while also preserving the confidentiality of the data
  • 9. Partitioned Computations Name Department t1 E(Adam) E(Defense) t2 E(John) E(Security) t3 E(Clark) E(Crypto) t4 E(Lisa) E(Defense) Name Department t5 Adam Testing t6 John Testing t7 Lisa Design t8 Clark Design Query Q Answer A Query Qs Query Qns Answer Ans Answer As Sensitive Data Ds Non-sensitive Data Dns
  • 10. Leakage due to Partitioned Computing… Name Department t1 E(Adam) E(Defense) t2 E(John) E(Security) t3 E(Clark) E(Crypto) t4 E(Lisa) E(Defense) Name Department t5 Adam Testing t6 John Testing t7 Lisa Design t8 Clark Design Sensitive Data Ds Non-sensitive Data Dns Query: Retrieve John rows Query value Tuples retrieved from sensitive side Tuples retrieved from non-sensitive side John T2 T6 Adversarial view T2 is John’s row.
  • 11. What if we use access-pattern-hiding techniques? Name Department t1 E(Adam) E(Defense) t2 E(John) E(Security) t3 E(Clark) E(Crypto) t4 E(Lisa) E(Defense) Name Department t5 Adam Testing t6 John Testing t7 Lisa Design t8 Clark Design Sensitive Data Ds Non-sensitive Data Dns Query: Retrieve John rows Query value Tuples retrieved from sensitive side Tuples retrieved from non-sensitive side John E(….) T6 Adversarial view Output size reveals that one of John’s record is sensitive.
  • 12. Partitioned Data Security • Non-Linkability • The Adversary does not learn relationship between any encrypted and plaintext value • Cyphertext Indistinguishability • The adversary does not learn any relationships between encrypted values • unless underlying crypto allows such relationships to be learnt (e.g., OPE)
  • 13. Secure Partitioned Computation (1) • Data partitioned into bins • Non-sensitive data partitioned into non-sensitive bins (NSB) • Sensitive data partitioned into sensitive bin (SB) ……E( x)…….. …… x …….. …… y…….. …… z .…….. …….…….. ……E(y) …….. …… E(z)…….. …….…….. Ds Dns SB(x) SB(y) SB(z) NSB(x) NSB(y) NSB(z) Query value Tuples retrieved from sensitive side Tuples retrieved from non-sensitive side John SB(y) NSB(y) Adversarial view • Query Q for value y mapped to all values in the bin corresponding to y • Retrieves all data in NSB(y) over non-sensitive data • Retrieves all data in SB(y) over sensitive data
  • 14. Secure Partitioned Computation (2) • Bins are created such that for each pair of sensitive and non-sensitive bins s & ns, there exists a value v, • such that s =SB(v) and ns =NSB(v) ……E( x)…….. …… x …….. …… y…….. …… z .…….. …….…….. ……E(y) …….. …… E(z)…….. …….…….. Ds Dns SB(x) SB(y) SB(z) NSB(x) NSB(y) NSB(z) adversarial view does not allow learning linkability between sensitive and non-sensitive records
  • 15. Secure Partitioned Computation (3) • Association amongst each sensitive bin and non-sensitive bin prevents • Leakage through joint access of data • Output size attacks • Workload skew attacks can be prevented through (careful) addition of (minimal) fake queries ……E( x)…….. …… x …….. …… y…….. …… z .…….. …….…….. ……E(y) …….. …… E(z)…….. …….…….. Ds SB(x) SB(y) SB(z) NSB(x) NSB(y) NSB(z) Dns
  • 16. Query Binning • Assumptions • Equal number of sensitive and non-sensitive attribute values • Each distinct attribute value appears in at most one tuple in sensitive and one tuple in non-sensitive data • Number of values are a product of approximately equal factors ***The paper relaxes all these assumptions
  • 17. The Algorithm: One Tuple Per Value Bin Creation: Inputs: S and NS • Permute all sensitive values • Find approximate square factor of |NS| = x * y such that x ≥ y • Create x sensitive bins; contains at most y inputs in each • Create |NS|/x non-sensitive bins • Assign ith sensitive value to (i mod x)th sensitive bin • Assigning non-sensitive values: Assign non-sensitive value corresponding to ith sensitive value, which is allocated to jth bin, to jth position of ith non-sensitive bin • NSB[j][i]  allocateNS(SB[i][j]) • Fill remaining NS values S = 6 NS = 6 x = 3 y = 2 SB1 SB2 SB3 NSB1 NSB2 S1 S2 S3 S4 S5 S6 NS2 NS3NS1 NS7 NS6NS4 S = {S1, S2, S3, S4, S5, S6} NS = {NS1, NS2, NS3, NS6, NS7}
  • 18. The Algorithm: One Tuple Per Value • Bin Retrieval: Input: Query(w) • If w is in a sensitive bin SB[i][j], then • Retrieve ith sensitive bin and jth non-sensitive bin • If w is in a non-sensitive bin NSB[i][j], then • Retrieve ith non-sensitive bin and jth sensitive bin S = 6 NS = 6 x = 3 y = 2 S = {S1, S2, S3, S4, S5, S6} NS = {NS1, NS2, NS3, NS6, NS7} Query: S2 SB2, NSB1 Query: NS7 NSB1, SB2 SB1 SB2 SB3 NSB1 NSB2 S1 S2 S3 S4 S5 S6 NS2 NS3NS1 NS7 NS6NS4
  • 19. Query Execution Cost on Outsourced Data Techniques Time Resilient to attacks Size Workload-skew Access-patterns SGX 10500x Query Binning + SGX (60% sensitivity) 8929x Multi-party computations-Jana 954363x Query Binning + Jana (60% sensitivity) 680131x x is the time to search a predicate in cleartext. is showing a technique is resilient to a given attack. Experiments are conducted over 1.5M rows.
  • 20. Experimental Results (Selection Query) • X-axis = Data sensitivity (1%, 2%, 20%, 40%, 60%) • Y-axis = time SGX Opaque + Partition computing vs SGX Opaque Data set size = 6M rows Jana MPC + Partition computing vs Jana MPC Data set size = 1M rows
  • 21. Analytical Model • When is query binning better compared to pure cryptographic approach? Ratio of cost of QB versus crypto only approach After several rounds of simplications (see paper) Under ideal assumptions…. QB is better than cryptographic only solution if this holds (see paper) Ratio of computation cost of cryptographic techniques vs plaintext per tuple Ratio of cryptographic computation vs communication cost per tuple (typically much greater than 1 for strong cryptographic techniques) Average query selectivityRatio of sensitive data
  • 22. • If there is no approximate square factor? • Select nearest square number • If there is no 1-to-1 mapping of sensitive and non-sensitive value, and differences in size of the values? • Bin-packing algorithm • What about range queries? • With the help of a modified B-tree created over non-sensitive bins • What about join queries? • Keep pseudo-sensitive data with sensitive data • What about aggregation queries? • Execute like a selection query without tuple fetching Query Binning Extensions
  • 23. Distinct Values are not a Product of Approximately Square Factor (1) • What will happen when the number of distinct values is not a product of approximately square factor ??? • Increasing communication cost • For example 82 non-sensitive values, results in 41 sensitive bins and 2 non- sensitive bins ns1, ns2, …, ns41 ns42, ns43, …, ns82 E(s1) E(s2) E(s41) SB1 SB2 SB41 NSB1 NSB2 Communication cost = 42 At most 1 value in a sensitive bin At most 41 values in a non-sensitive bin
  • 24. Distinct Values are not a Product of Approximately Square Factor (2) • Reducing communication cost --- by finding nearest square number • In the case of 82 non-sensitive values, 81 is nearest square number • Thus, create 9-9 sensitive and non-sensitive bins ns1, ns2, …, ns10 ns11, ns12, …, ns19 ….E(x)…. …E(y)….. ….E(z)….. SB1 SB2 SB9 41Sensitivevalue 82Non-sensitivevalue Communication cost = 15 ns74, ns75, …, ns82 At most 5 values in a sensitive bin At most 10 values in a non-sensitive bin NSB1 NSB2 NSB9
  • 25. The Algorithm: General Case: Multiple Tuples per Value (1) • What will happen if all values have a different number of tuples?? • Size of each sensitive bin is different now • Assumption: More non-sensitive values have more sensitive associated tuples. • The adversary learns from tuple retrieval that which bin contain sensitive value corresponding to non-sensitive values • E.g., retrieval of SB1 and NSB1 reveals that S1 is allocated to SB1 S = 6 NS = 6 x = 3 y = 2 SB1 SB2 SB3 NSB1 NSB2 S1 S2 S3 S4 S5 S6 NS2 NS3NS1 NS7 NS6NS4 S1 = 10 S2 = 2 S3 = 1 S4 = 15 S5 = 2 S6 = 1 NS1 = 200 NS2 = 20 NS3 = 10 NS4 = 150 NS5 = 10 NS7 = 10 Size of bin 25 4 2 Size of bin 230 170
  • 26. The Algorithm: General Case: Multiple Tuples per Value (2) • What will happen if all values have a different number of tuples? • Solution: Simply add fake tuples to sensitive bins • Problem: too many fake tuples leading to increases communication cost • So how to overcome this problem??? S = 6 NS = 6 x = 3 y = 2 SB1 SB2 SB3 NSB1 NSB2 S1 S2 S3 S4 S5 S6 NS2 NS3NS1 NS7 NS6NS4 S1 = 10 S2 = 2 S3 = 1 S4 = 15 S5 = 2 S6 = 1 NS1 = 200 NS2 = 20 NS3 = 10 NS4 = 150 NS5 = 10 NS7 = 10 Size of bin 25 4 2 Size of bin 230 170 Added fake tuples 0 21 23 We add 44 fake tuples to sensitive data
  • 27. The Algorithm: General Case: Multiple Tuples per Value (3) • What will happen if all values have a different number of tuples? • Solution: Bin-packing-based approach • Sorting: Sort all the values in a decreasing order of the number of tuples. • Allocate sensitive values • Add fake tuples • Allocate non-sensitive values as we showed previously S = 6 NS = 6 x = 3 y = 2 SB1 SB2 SB3 NSB1 NSB2 S4 S1 S2 S6 S3 S5 NS1 NS2NS7 NS3 NS5NS6 S1 = 10 S2 = 2 S3 = 1 S4 = 15 S5 = 2 S6 = 1 NS1 = 200 NS2 = 20 NS3 = 10 NS4 = 150 NS5 = 10 NS7 = 10 Size of bins before adding faking tuples 16 11 4 Added fake tuples 0 5 12 S4 = 15 S1 = 10 S2 = 2 S5 = 2 S3 = 1 S6 = 1 After sorting We add fewer fake tuples than a simple solution of adding fake tuples 44 vs 17 fake tuples
  • 28. Range Queries • A full binary-tree is constructed for all non-sensitive value • Bins are created for each level of the tree, except the root node • Bins are retrieved based on least-matching • For example, a range query from ns8 to ns12  Bins as per node ns23 and ns8 Bins for each node of each level of the tree
  • 29. • Existing cryptographic techniques are orders of magnitude slower as compared to cleartext processing • Differentiating between sensitive and non-sensitive data can make cryptographic techniques faster • By avoiding expensive cryptographic operation on non-sensitive data • However, a naïve query execution on partitioned data can lead to information leakage • Partitioned security • Query binning • Implements partitioned security • While ensuring efficiency • Interesting side-effect of QB: • Makes existing cryptographic techniques more secure as a side-effect. Conclusion