SlideShare una empresa de Scribd logo
1 de 61
Copyright ©Protegrity Corp.
Protecting Data Privacy in
Analytics and Machine Learning
Ulf Mattsson
Chief Security Strategist
www.Protegrity.com
Copyright ©Protegrity Corp.
PaymentCardIndustry(PCI)
SecurityStandards
Council (SSC):
1. TokenizationTask Force
2. Encryption Task Force, Pointto Point
Encryption Task Force
3. Risk Assessment
4. eCommerce SIG
5. Cloud SIG, Virtualization SIG
6. Pre-Authorization SIG, Scoping SIG
Working Group
Ulf Mattsson
2
Dec 2019
May 2020
Cloud Security Alliance
Quantum Computing
Tokenization Management and
Security
Cloud Management and Security
ISACA JOURNAL May 2021
Privacy-Preserving Analytics and
Secure Multi-Party Computation
ISACA JOURNAL May 2020
Practical Data Security and
Privacy for GDPR and CCPA
• Chief Security
Strategist, Protegrity
• Chief Technology
Officer, Protegrity, Atlantic
BT, and Compliance
Engineering
• Head of Innovation,
TokenEx
• IT Architect, IBM
• Develops Industry Standards
• Inventor of more than 70 issued US Patents
• Products and Services:
• Data Encryption, Tokenization, and Data Discovery
• Cloud Application Security Brokers (CASB) and Web Application
Firewalls (WAF)
• Security Operation Center (SOC) and Managed Security Services
(MSSP)
• Robotics and Applications
Copyright ©Protegrity Corp.
Agenda
• Machine learning (ML) and AI (Artificial Intelligence)
• Secure Data-sharing
• Secure multi-party computation (SMPC) and uses cases
• Homomorphic encryption (HE) and use cases
• Zero trust architecture (ZTA) vs. Zero knowledge
• Trusted execution environments (TEE)
• Hybrid cloud
• Regulations and Standards in Data Privacy
• International privacy standards
• Differential Privacy (DP) and K-Anonymity
3
Copyright ©Protegrity Corp.
Unlockthe Potential of Data Security
- Data Security Governance Stakeholders
4
4
Source: Gartner
Copyright ©Protegrity Corp.
Opportunities
Controls
Regulations
Policies
RiskManagement
Breaches
Balance
Protect datainwaysthatare transparent to business processes andcomplianttoregulations
Source: Gartner
Copyright ©Protegrity Corp.
Global Hadoop Big Data
Analytics Market
(USD Billion)
Real-time data is significant in global
datasphere
Between 2018 and 2025 the size of real-time data
in the global datasphere is expected to expand
tenfold, from five zettabytes to 51 zettabytes.
Statista 2021
Increase in
information
volume of
Real-time
Analytics
Copyright ©Protegrity Corp.
The advent of big data
era due to the increase
in the information
volume of the whole
world
ResearchGate
Big
Data AI
Copyright ©Protegrity Corp.
https://www.marketresearchfuture.com/reports/machine-learning-market-2494
Global Machine Learning Market
Machine learning (ML) market for hardware, software, and services
ML has multiple uses in today’s technology market concerning safety and security such as face detection,
face recognition, image classification, speech recognition, antivirus, Google, antispam, genetic, signal
diagnosing, and weather forecast.
USD Million
8
Copyright ©Protegrity Corp.
Gartner Hype Cycle
for
Emerging
Technologies,
2020
 Time
AI & ML
Gartner
Copyright ©Protegrity Corp.
Gartner Hype Cycle for Emerging Technologies, 2020
Algorithmic
Trust
Models Can
Help
Ensure
Data
Privacy
Emerging technologies
tied to algorithmic trust
include
1. Secure access service
edge (SASE)
2. Explainable AI
3. Responsible AI
4. Bring your own
identity
5. Differential privacy
6. Authenticated
provenance
Gartner
10
Copyright ©Protegrity Corp.
11
Copyright ©Protegrity Corp.
Secure AI – Use Case
12
Copyright ©Protegrity Corp.
Use Case: Insilico Medicine
An alternative to animal testing for research and development programs in the pharmaceutical industry.
• By using artificial intelligence and deep-learning techniques, Insilico is able to analyze how a compound will affect cells
and what drugs can be used to treat the cells in addition to possible side effects
A comprehensive drug discovery engine, which utilizes millions of samples and multiple data types to discover
signatures of disease and identify the most promising targets for billions of molecules that already exist or can be
generated de novo with the desired set of parameters.
https://insilico.com/
13
Copyright ©Protegrity Corp.
Gartner MQ for Data Science and
Machine Learning Platforms
https://www.kdnuggets.com/2020/02/gartner-
mq-2020-data-science-machine-learning.html
Data and analytics pipeline,
including all the following areas:
1. Data ingestion
2. Data preparation
3. Data exploration
4. Feature engineering
5. Model creation and training
6. Model testing
7. Deployment
8. Monitoring
9. Maintenance
10.Collaboration
2020 vs 2019 changes
Copyright ©Protegrity Corp.
Digikey, techbrij
Machine Learning Model Lifecycle - Example
1. Define the model: using the Sequential or Model class and add the layers
2. Compile the model: call compile method and specify the loss, optimizer and
metrics
3. Train the model: call fit method and use training data
4. Evaluate the model: call evaluate method and use testing data to evaluate
trained model
5. Get predictions: use predict method on new data for predictions
15
Copyright ©Protegrity Corp.
wikipedia
Flux
Keras
MATLAB + Deep
Learning Toolbox
Apache
MXNet
PlaidML
PyTorch
(Facebook)
TensorFlow
(Google)
Open Source
Linux,
MacOS,
Windows
C++
Julia
Python
MATLAB
Perl, Clojure
JavaScript, Go, Scala
Android,
iOS
R
Swift
Machine Learning
Deep-learning
Platforms
Languages
Frameworks & libraries
16
Copyright ©Protegrity Corp.
*: https://www.fastcompany.com/1675910/7-ways-real-life-crime-fighting-mirrors-minority-report
AI is reality for parts of the legal industry for how lawyers
use data to predict and protect their clients’ futures.
“Predictive analytics" can mine years of incident reports and law enforcement data to "forecast
criminal 'hot spots.'" Police in Memphis have success with the $11-billion "precrime" predicting tool:
Since installing IBM Blue CRUSH*, the city has seen a 31% drop in serious crime."
Copyright ©Protegrity Corp.
Global Map Of PrivacyRights And Regulations
18
Copyright ©Protegrity Corp.
TrustArc
Legal and Regulatory Risks Are Exploding
19
Copyright ©Protegrity Corp.
Encryptionand
Tokenization
Discover Data
Assets
Security by
Design
GDPR Security Requirements –Encryption and Tokenization
20
Copyright ©Protegrity Corp.
Data flow mapping under GDPR
• If there is not already a documented workflow in place in your organization, it can be worthwhile for a
team to be sent out to identify how the data is being gathered.
• This will enable you to see how your data flow is different from reality and what needs to be done
Organizations needs to look at how the data was captured, who is accountable for it, where it is
located and who has access.
Source:
BigID
Copyright ©Protegrity Corp.
FindYourSensitive Datain Cloud and On-Premise
www.protegrity.com
22
Copyright ©Protegrity Corp.
Whichof thefollowing aspectsof dataprivacyare you particularlyconcernedabout?
FTIConsulting- CorporateData
Privacy Today,2020
23
Copyright ©Protegrity Corp.
Source: Gartner
Six Important
Privacy-Preserving
Computation
Techniques
24
Source: Gartner
Copyright ©Protegrity Corp.
Increased need for data analytics drives requirements.
Data Lake,
ETL, Files
…
• Policy Enforcement Point (PEP)
Protected data fields
U
• Encryption Key Management
U
External Data
Internal
Data
Secure Multi Party Computation
Analytics, Data Science, AI and ML
Data Pipeline
Data Collaboration
Data Pipeline
Data Privacy
On-premises
Cloud
Internal and Individual Third-Party Data Sharing
25
Copyright ©Protegrity Corp. http://homomorphicencryption.org
Use Cases for Secure Multi Party Computation &
Homomorphic Encryption (HE)
26
Copyright ©Protegrity Corp.
Use case – Retail - Data for Secondary Purposes
Large aggregator of credit card transaction data.
Open a new revenue stream
• Using its data with its business partners: retailers, banks and advertising companies.
• They could help their partners achieve better ad conversion rate, improved customer satisfaction, and more timely
offerings.
• Needed to respect user privacy and specific regulations. In this specific case, they wanted to work with a retailer.
• Allow the retailer to gain insights while protecting user privacy, and the credit card organization’s IP.
• An analyst at each organization’s office first used the software to link the data without exchanging any of the
underlying data.
Data used to train the machine learning and statistical models.
• A logistic and linear regression model was trained using secure multi-party computation (SMC).
• In the simplest form SMC splits a dataset into secret shares and enables you to train a model without needing to put
together the pieces.
• The information that is communicated between the peers is encrypted at all times and cannot be reverse engineered.
With the augmented dataset, the retailer was able to get a better picture of its customers buying habits.
27
Copyright ©Protegrity Corp.
Use case - Financial services industry
Confidential financial datasets which are vital for gaining significant insights.
• The use of this data requires navigating a minefield of private client information as well as sharing data
between independent financial institutions, to create a statistically significant dataset.
• Data privacy regulations such as CCPA, GDPR and other emerging regulations around the world
• Data residency controls as well as enable data sharing in a secure and private fashion.
Reduce and remove the legal, risk and compliance processes
• Collaboration across divisions, other organizations and across jurisdictions where data cannot be
relocated or shared
• Generating privacy respectful datasets with higher analytical value for Data Science and Analytics
applications.
28
Copyright ©Protegrity Corp.
Use case: Bank - Internal Data Usage by Other Units
A large bank wanted to broaden access to its data lake without compromising data privacy, preserving the data’s
analytical value, and at reasonable infrastructure costs.
• Current approaches to de-identify data did not fulfill the compliance requirements and business needs, which had
led to several bank projects being stopped.
• The issue with these techniques, like masking, tokenization, and aggregation, was that they did not sufficiently
protect the data without overly degrading data quality.
This approach allows creating privacy protected datasets that retain their analytical value for Data Science and business
applications.
A plug-in to the organization’s analytical pipeline to enforce the compliance policies before the data was consumed by
data science and business teams from the data lake.
• The analytical quality of the data was preserved for machine learning purposes by-using AI and leveraging privacy
models like differential privacy and k-anonymity.
Improved data access for teams increased the business’ bottom line without adding excessive infrastructure costs,
while reducing the risk of-consumer information exposure.
29
Copyright ©Protegrity Corp.
Area Timing Focus Comments
Requirements Short Internal requirements International regulations
Cloud Short Machine Learning Start with basic ML training and inference on senstivie data in cloud
Competition Short Competitive advantage ML and NLP-powered services can give banks a competitive edge
Short Encrypted data Important
Long Synthetic data Computing cost?
Medium AML / KYC What are other Large banks doing?
Short Analytics Initial focus
Short
Operation on encrypted
data
Computation on sensitive data to the cloud. Trade-offs with performance, protection and utility?
Industry Short Industry dialog Working groups in standard bodies (ANSI X9, Cloud Security Alliance, Homomorphic Encryption Org)
Model Short Encrypted model Important
Short Experimentation What are other Large banks doing?
Short Scotia Bank case study Query solution for AML / KYC
Proven Medium Fast follower What are some proven solutions?
Short
Homomorphic
Encryption post-
Lattice-based cryptography is a promising post-quantum cryptography family, both in terms of
foundational properties as well as its application to both traditional and homomorphic encryption
Medium Quantum Plan for quantum safe algorithms
Long Quantum Plan for quantum ML algorithms
Sharing Short
Secure Multi-party
Computing (SMPC)
Without revealing their own private inputs and outputs. Encrypted data and encryption keys never
comingled while computation on the encrypted data is occurring or an encryption key is split into
shares
Short Vendor positioning
Nonlinear ML regression needed? Linear Regression is one of the fundamental supervised-ML. Linear
and non-linear credit scoring by combining logistic regression and support vector machines
Short Framework integration Important
3rd party Long 3rd party integration Mining first
Long Federated learning Complicated
Long TEE Emerging
Analytics
Data
Quantum
Solutions
Training ML
Pilot
Use case: Bank
Copyright ©Protegrity Corp.
Source: Gartner
Six Important
Privacy-Preserving
Computation
Techniques
31
Copyright ©Protegrity Corp.
https://royalsociety.org
Secure Multi-Party Computation (MPC)
Private multi-party machine learning with MPC
Using MPC, different
parties send
encrypted messages
to each other, and
obtain the model
F(A,B,C) they wanted
to compute without
revealing their own
private input, and
without the need for a
trusted central
authority.
Secure Multi-Party machine learning
Central trusted authority
A B C
F(A, B,C)
F(A, B,C) F(A, B,C)
Protected data fields
U
B
A C
F(A, B,C)
U U
U
32
Copyright ©Protegrity Corp.
Medium.com
Example of Multi-
party Computation:
Average Salary #1
33
Copyright ©Protegrity Corp.
Source: Gartner
Six Important
Privacy-Preserving
Computation
Techniques
34
Copyright ©Protegrity Corp.
Case Study – HE and Securely sharing sensitive information
An example from the healthcare domain.
The recent ability to fully map the human genome has opened endless possibilities for advances in
healthcare.
1. Data from DNA analysis can test for genetic abnormalities, empower disease-risk analysis, discover family
history, and the presence of an Alzheimer’s allele.
• But these studies require very large DNA sample sizes to detect accurate patterns.
2. However, sharing personal DNA data is a particularly problematic domain.
• Many citizens hesitate to share such personal information with third-party providers, uncertain of if,
how and to whom the information might be shared downstream.
3. Moreover, legal limitations designed to protect privacy restrict providers from sharing this data as well.
4. HE techniques enable citizens to share their genome data and retain key privacy concerns without the
traditional all-or-nothing trust threshold with third-party providers.
35
Copyright ©Protegrity Corp.
https://royalsociety.org
Homomorphic encryption (HE)
HE depicted in a client-server model
• The client sends encrypted
data to a server, where a
specific analysis is performed
on the encrypted data,
without decrypting that data.
• The encrypted result is then
sent to the client, who can
decrypt it to obtain the
result of the analysis they
wished to outsource.
Encryption of x
Client
Server
Analysis
Encrypted F(x)
• Policy Enforcement Point (PEP)
Protected data fields
U
• Encryption Key Management
36
Copyright ©Protegrity Corp.
Secure
Exec Env
Zero
Trust
Open
Source
Encrypted
Query
Enc
Sort
Encr
Proxy
Quantum
Safe
AI
HomomorphicEncryption.org
Private Set
Intersection
*: 12 Smaller HE Vendors
Differential
Priv
Commercial-applications Off The Shelf
TEE (Trusted
Execution
Environment)
Lattice-based
algorithm
DP
Extended Encrypted
Operations
Extended Privacy
Features
Extended ML
Features
Extended
Protection
Features
Dynamic
Security
Controls
Standardization of Homomorphic Encryption*
200 to 50 Employees
1 2 3 4 5
49 to 20 Employees
6 7 8
19 and fewer Employees
9 10 11 12
Federated
Learning
Fuzzy
Search
Block
Chain
COTS
Examples
of some
Features
37
Copyright ©Protegrity Corp.
Source: Gartner
Important
Privacy-Preserving
Computation
Techniques
38
Copyright ©Protegrity Corp.
Trusted execution environments
Trusted Execution Environments (TEEs) provide secure computation capability through a combination of special-purpose
hardware in modern processors and software built to use those hardware features.
The special-purpose hardware provides a mechanism by which a process can run on a processor without its memory or
execution state being visible to any other process on the processor,
• not even the operating system or other privileged code.
*: Source: http://publications.officialstatistics.org
Computation in a TEE is not
performed on data while it
remains encrypted.
• Typically, the memory space
of each TEE (enclave)
application is protected from
access
• AES-encrypted when
and if it is stored off-
chip.
Usability is low and products/services are emerging in MS Azure, IBM’s cloud service Amazon AWS (late 2020)*
39
Copyright ©Protegrity Corp.
Source: Gartner
Important
Privacy-Preserving
Computation
Techniques
40
Copyright ©Protegrity Corp.
Random
differential
privacy
Probabilistic
differential
privacy
Concentrated
differential
privacy
Noise is very low.
Used in practice.
Tailored to large numbers
of computations.
Approximate
differential
privacy
More useful analysis can be performed.
Well-studied.
Can lead to unlikely outputs.
Widely used
Computational
differential privacy
Multiparty
differential
privacy
Can ensure the privacy of individual contributions.
Aggregation is performed locally.
Strong degree of protection.
High accuracy
6 Differential
Privacy
Models
A pure model provides protection even against attackers with
unlimited computational power.
In differential
privacy, the
concern is about
privacy as the
relative difference
in the result
whether a
specific individual
or entity is
included in the
input or excluded
41
Copyright ©Protegrity Corp.
Source: Gartner
Important
Privacy-Preserving
Computation
Techniques
42
Copyright ©Protegrity Corp.
Private Set
Intersection
(PSI)
Identifies the
common customers
without disclosing
any other
information
This replaces simplistic approaches such as one-way hashing functions that are susceptible to dictionary attacks.
Applications for PSI include identifying the overlap with potential data partners (i.e. “Is there a large enough client
base in common to be worthwhile to work together?”), as well as aligning datasets with data partners in
preparation for using MPC to train a machine learning model
Copyright ©Protegrity Corp.
Examples of Data De-identification
44
Copyright ©Protegrity Corp.
Data protection techniques: Deployment on-premises, and clouds
Data
Warehouse
Centralized Distributed
On-
premises
Public
Cloud
Private
Cloud
Vault-based tokenization y y
Vault-less tokenization y y y y y y
Format preserving
encryption
y y y y y
Homomorphic encryption y y
Masking y y y y y y
Hashing y y y y y y
Server model y y y y y y
Local model y y y y y y
L-diversity y y y y y y
T-closeness y y y y y y
Privacy enhancing data de-identification
terminology and classification of techniques
De-
identification
techniques
Tokenization
Cryptographic
tools
Suppression
techniques
Formal
privacy
measurement
models
Differential
Privacy
K-anonymity
model
45
Copyright ©Protegrity Corp.
2-way
HomomorphicEncryption
(HE) K-anonymity
Tokenization
Masking
Hashing
1-way
Analytics andMachine Learning(ML)
Different DataProtectionTechniques
Algorithmic
Random
Computingon
encrypteddata
Format
Preserving
Fast Slow Very slow Fast Fast
FormatPreserving
DifferentialPrivacy
(DP)
Noise
added
FormatPreserving
Encryption
(FPE)
46
Copyright ©Protegrity Corp.
IS: International Standard
TR: Technical Report
TS: Technical Specification
Guidelines to help comply
with ethical standards
20889 IS Privacy enhancing de-identification terminology and classification of
techniques
27018 IS Code of practice for protection of PII in public clouds acting as PII processors
27701 IS Security techniques - Extension to ISO/IEC 27001 and ISO/IEC 27002 for privacy
information management - Requirements and guidelines
29100 IS Privacy framework
29101 IS Privacy architecture framework
29134 IS Guidelines for Privacy impact assessment
29151 IS Code of Practice for PII Protection
29190 IS Privacy capability assessment model
29191 IS Requirements for partially anonymous, partially un-linkable authentication
Cloud
11 Published International Privacy Standards
Framework
Management
Techniques
Impact
19608 TS Guidance for developing security and privacy functional requirements based
on 15408
Requirements
27550 TR Privacy engineering for system lifecycle processes
Process
ISO
47
Copyright ©Protegrity Corp.
Risk
Reduction
Source:
INTERNATIONAL
STANDARD ISO/IEC
20889
48
Copyright ©Protegrity Corp.
Personally Identifiable Information(PII) in compliance with the
EUCross Border Data Protection Laws, specifically
• Datenschutzgesetz 2000(DSG 2000)in Austria, and
• Bundesdatenschutzgesetz inGermany.
This requiredaccess to Austrianand German customer data to
berestricted to onlyrequesters ineach respective country.
• Achieved targeted compliance with EU Cross Border Data
Security laws
• Implemented country-specificdata access restrictions
Datasources
Case Study
Amajor international bankperformed a consolidationofallEuropeanoperationaldatasources
to Italy
49
Copyright ©Protegrity Corp.
Lower Risk andHigher Productivity with More AccesstoMoreData
50
Copyright ©Protegrity Corp. 51
Copyright ©Protegrity Corp.
PaymentApplication
Payment
Network
Payment
Data
Policy, tokenization,
encryption
and keys
Gateway
Call Center
Application
PI*Data
Salesforce
Analytics
Application
DifferentialPrivacy
AndK-anonymity
PI*Data
Microsoft
ElectionGuard
Election
Data
Homomorphic Encryption
DataWarehouse
PI*Data
Vault-less tokenization
Use-Cases of Some Data Privacy Techniques
Voting
Application
Dev/testSystems
Masking
PI*Data
Vault-less tokenization
52
Copyright ©Protegrity Corp.
A DataSecurityGateway Can Protect Sensitive Datain Cloud and On-premise
53
Copyright ©Protegrity Corp.
Big DataProtectionwith GranularField Level Protectionfor GoogleCloud
54
Copyright ©Protegrity Corp.
Use Case (Financial Services) - Compliance with Cross-Border and Other
Privacy Restrictions
55
Copyright ©Protegrity Corp.
Use this shape toput
copy inside
(you can change the sizing tofit your copy needs)
Protection ofdata
in AWS S3 with Separation ofDuties
• Applications can use de-identified
data or data inthe clear based on
policies
• Protection of data inAWSS3 before
landing in a S3 bucket
Separation of Duties
• EncryptionKeyManagement
• PolicyEnforcementPoint(PEP)
56
Copyright ©Protegrity Corp.
Securosis, 2019
Consistency
• Most firmsarequite familiar with their on-premises
encryption andkeymanagement systems, so they often
prefer toleverage the same tool and skills across multiple
clouds.
• Firms often adopt a “best of breed”cloud approach.
Examples ofHybrid Cloud considerations
Trust
• Some customers simply donot trusttheir vendors.
Vendor Lock-in and Migration
• A commonconcern is vendorlock-in, andan
inabilitytomigratetoanothercloud serviceprovider.
Google Cloud AWSCloud Azure Cloud
Cloud Gateway
S3 Salesforce
Data Analytics
BigQuery
57
Copyright ©Protegrity Corp.
IS: International
Standard
TR: Technical Report
TS: Technical
Specification
Guidelines to help
comply with ethical
standards
20889 IS Privacy enhancing de-identification terminology and
classification of techniques
27018 IS Code of practice for protection of PII in public clouds acting
as PII processors
27701 IS Security techniques - Extension to ISO/IEC 27001 and
ISO/IEC 27002 for privacy information management - Requirements
and guidelines
29100 IS Privacy framework
29101 IS Privacy architecture framework
29134 IS Guidelines for Privacy impact assessment
29151 IS Code of Practice for PII Protection
29190 IS Privacy capability assessment model
29191 IS Requirements for partially anonymous, partially unlinkable
authentication
Cloud
11 Published International Privacy Standards
Framework
Management
Techniques
Impact
19608 TS Guidance for developing security and privacy functional
requirements based on 15408
Requirements
27550 TR Privacy engineering for system lifecycle processes
Process
ISO Privacy Standards
58
Copyright ©Protegrity Corp.
References A:
1. C. Gentry. “A Fully Homomorphic Encryption Scheme.” Stanford University. September 2009,
https://crypto.stanford.edu/craig/craig-thesis.pdf
2. Status Report on the Second Round of the NIST Post-Quantum Cryptography Standardization Process,
https://csrc.nist.gov/publications/detail/nistir/8309/final
3. ISO/IEC 29101:2013 (Information technology – Security techniques – Privacy architecture framework)
4. ISO/IEC 19592-1:2016 (Information technology – Security techniques – Secret sharing – Part 1: General)
5. ISO/IEC 19592-2:2017 (Information technology – Security techniques – Secret sharing – Part 2: Fundamental
mechanisms
6. Homomorphic Encryption Standardization, Academic Consortium to Advance Secure Computation,
https://homomorphicencryption.org/standards-meetings/
7. Homomorphic Encryption Standardization, https://homomorphicencryption.org/
8. NIST Post-Quantum Cryptography PQC, https://csrc.nist.gov/Projects/Post-Quantum-Cryptography
9. UN Handbook on Privacy-Preserving Computation Techniques,
http://publications.officialstatistics.org/handbooks/privacy-preserving-techniques-
handbook/UN%20Handbook%20for%20Privacy-Preserving%20Techniques.pdf
10. ISO/IEC 29101:2013 Information technology – Security techniques – Privacy architecture framework,
https://www.iso.org/standard/45124.html
11. Homomorphic encryption, https://brilliant.org/wiki/homomorphic-encryption/ 59
Copyright ©Protegrity Corp.
References B:
1. California Consumer Privacy Act, OCT 4, 2019, https://www.csoonline.com/article/3182578/california-consumer-privacy-act-what-
you-need-to-know-to-be-compliant.html
2. GDPR and Tokenizing Data, https://tdwi.org/articles/2018/06/06/biz-all-gdpr-and-tokenizing-data-3.aspx
3. GDPR VS CCPA, https://wirewheel.io/wp-content/uploads/2018/10/GDPR-vs-CCPA-Cheatsheet.pdf
4. General Data Protection Regulation, https://en.wikipedia.org/wiki/General_Data_Protection_Regulation
5. IBM Framework Helps Clients Prepare for the EU's General Data Protection Regulation, https://ibmsystemsmag.com/IBM-
Z/03/2018/ibm-framework-gdpr
6. INTERNATIONAL STANDARD ISO/IEC 20889, https://webstore.ansi.org/Standards/ISO/ISOIEC208892018?gclid=EAIaIQobChMIvI-
k3sXd5gIVw56zCh0Y0QeeEAAYASAAEgLVKfD_BwE
7. INTERNATIONAL STANDARD ISO/IEC 27018, https://webstore.ansi.org/Standards/ISO/
ISOIEC270182019?gclid=EAIaIQobChMIleWM6MLd5gIVFKSzCh3k2AxKEAAYASAAEgKbHvD_BwE
8. Machine Learning and AI in a Brave New Cloud World https://www.brighttalk.com/webcast/14723/357660/machine-learning-and-
ai-in-a-brave-new-cloud-world
9. Emerging Data Privacy and Security for Cloud https://www.brighttalk.com/webinar/emerging-data-privacy-and-security-for-cloud/
10. New Application and Data Protection Strategies https://www.brighttalk.com/webinar/new-application-and-data-protection-
strategies-2/
11. The Day When 3rd Party Security Providers Disappear into Cloud https://www.brighttalk.com/webinar/the-day-when-3rd-party-
security-providers-disappear-into-cloud/
12. Advanced PII/PI Data Discovery https://www.brighttalk.com/webinar/advanced-pii-pi-data-discovery/
13. Emerging Application and Data Protection for Cloud https://www.brighttalk.com/webinar/emerging-application-and-data-
protection-for-cloud/
14. Practical Data Security and Privacy for GDPR and CCPA, ISACA Journal, May 2020
15. Data Security: On Premise or in the Cloud, ISSA Journal, December 2019, ulf@ulfmattsson.com
16. Data Privacy: De-Identification Techniques, ISSA Journal, May 2020
60
Copyright ©Protegrity Corp.
UlfMattsson
Chief SecurityStrategist
www.Protegrity.com
Thank You!

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Business intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lakeBusiness intelligence 3.0 and the data lake
Business intelligence 3.0 and the data lake
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with Alation
 
How to design ai functions to the cloud native infra
How to design ai functions to the cloud native infraHow to design ai functions to the cloud native infra
How to design ai functions to the cloud native infra
 
Service generated big data and big data-as-a-service
Service generated big data and big data-as-a-serviceService generated big data and big data-as-a-service
Service generated big data and big data-as-a-service
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Operationalizing Data Science St. Louis Big Data IDEA
Operationalizing Data Science St. Louis Big Data IDEAOperationalizing Data Science St. Louis Big Data IDEA
Operationalizing Data Science St. Louis Big Data IDEA
 
Taming Big Data With Modern Software Architecture
Taming Big Data  With Modern Software ArchitectureTaming Big Data  With Modern Software Architecture
Taming Big Data With Modern Software Architecture
 
Data Virtualization - Enabling Next Generation Analytics
Data Virtualization - Enabling Next Generation AnalyticsData Virtualization - Enabling Next Generation Analytics
Data Virtualization - Enabling Next Generation Analytics
 
IBM Industry Models and Data Lake
IBM Industry Models and Data Lake IBM Industry Models and Data Lake
IBM Industry Models and Data Lake
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big Data
 
GE’s Industrial Data Lake Platform
GE’s Industrial Data Lake PlatformGE’s Industrial Data Lake Platform
GE’s Industrial Data Lake Platform
 
What is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use CasesWhat is big data - Architectures and Practical Use Cases
What is big data - Architectures and Practical Use Cases
 
[XConf Brasil 2020] Data mesh
[XConf Brasil 2020] Data mesh[XConf Brasil 2020] Data mesh
[XConf Brasil 2020] Data mesh
 
Big data case study collection
Big data   case study collectionBig data   case study collection
Big data case study collection
 
Big Data Overview
Big Data OverviewBig Data Overview
Big Data Overview
 
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
 
Maximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data PlatformMaximize the Value of Your Data: Neo4j Graph Data Platform
Maximize the Value of Your Data: Neo4j Graph Data Platform
 

Similar a Protecting data privacy in analytics and machine learning ISACA London UK

Safeguarding customer and financial data in analytics and machine learning
Safeguarding customer and financial data in analytics and machine learningSafeguarding customer and financial data in analytics and machine learning
Safeguarding customer and financial data in analytics and machine learning
Ulf Mattsson
 
ISSA Atlanta - Emerging application and data protection for multi cloud
ISSA Atlanta - Emerging application and data protection for multi cloudISSA Atlanta - Emerging application and data protection for multi cloud
ISSA Atlanta - Emerging application and data protection for multi cloud
Ulf Mattsson
 
the world of technology is changing at an unprecedented pace, and th.docx
the world of technology is changing at an unprecedented pace, and th.docxthe world of technology is changing at an unprecedented pace, and th.docx
the world of technology is changing at an unprecedented pace, and th.docx
pelise1
 
The Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent ApplicationsThe Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent Applications
Neo4j
 
What is tokenization in blockchain?
What is tokenization in blockchain?What is tokenization in blockchain?
What is tokenization in blockchain?
Ulf Mattsson
 

Similar a Protecting data privacy in analytics and machine learning ISACA London UK (20)

ISC2 Privacy-Preserving Analytics and Secure Multiparty Computation
ISC2 Privacy-Preserving Analytics and Secure Multiparty ComputationISC2 Privacy-Preserving Analytics and Secure Multiparty Computation
ISC2 Privacy-Preserving Analytics and Secure Multiparty Computation
 
New technologies for data protection
New technologies for data protectionNew technologies for data protection
New technologies for data protection
 
Protecting data privacy in analytics and machine learning - ISACA
Protecting data privacy in analytics and machine learning - ISACAProtecting data privacy in analytics and machine learning - ISACA
Protecting data privacy in analytics and machine learning - ISACA
 
Safeguarding customer and financial data in analytics and machine learning
Safeguarding customer and financial data in analytics and machine learningSafeguarding customer and financial data in analytics and machine learning
Safeguarding customer and financial data in analytics and machine learning
 
ISSA Atlanta - Emerging application and data protection for multi cloud
ISSA Atlanta - Emerging application and data protection for multi cloudISSA Atlanta - Emerging application and data protection for multi cloud
ISSA Atlanta - Emerging application and data protection for multi cloud
 
Jun 15 privacy in the cloud at financial institutions at the object managemen...
Jun 15 privacy in the cloud at financial institutions at the object managemen...Jun 15 privacy in the cloud at financial institutions at the object managemen...
Jun 15 privacy in the cloud at financial institutions at the object managemen...
 
ISACA Houston - Practical data privacy and de-identification techniques
ISACA Houston  - Practical data privacy and de-identification techniquesISACA Houston  - Practical data privacy and de-identification techniques
ISACA Houston - Practical data privacy and de-identification techniques
 
Evolving regulations are changing the way we think about tools and technology
Evolving regulations are changing the way we think about tools and technologyEvolving regulations are changing the way we think about tools and technology
Evolving regulations are changing the way we think about tools and technology
 
Unlock the potential of data security 2020
Unlock the potential of data security 2020Unlock the potential of data security 2020
Unlock the potential of data security 2020
 
Data monetization webinar
Data monetization webinarData monetization webinar
Data monetization webinar
 
Privacy preserving computing and secure multi party computation
Privacy preserving computing and secure multi party computationPrivacy preserving computing and secure multi party computation
Privacy preserving computing and secure multi party computation
 
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
Intelligent data summit: Self-Service Big Data and AI/ML: Reality or Myth?
 
What is tokenization in blockchain - BCS London
What is tokenization in blockchain - BCS LondonWhat is tokenization in blockchain - BCS London
What is tokenization in blockchain - BCS London
 
Future of Big Data
Future of Big DataFuture of Big Data
Future of Big Data
 
the world of technology is changing at an unprecedented pace, and th.docx
the world of technology is changing at an unprecedented pace, and th.docxthe world of technology is changing at an unprecedented pace, and th.docx
the world of technology is changing at an unprecedented pace, and th.docx
 
AI, Blockchain, IoT Convergence Use Case System Implementation Insights from ...
AI, Blockchain, IoT Convergence Use Case System Implementation Insights from ...AI, Blockchain, IoT Convergence Use Case System Implementation Insights from ...
AI, Blockchain, IoT Convergence Use Case System Implementation Insights from ...
 
Big Data LDN 2017: Applied AI for GDPR
Big Data LDN 2017: Applied AI for GDPRBig Data LDN 2017: Applied AI for GDPR
Big Data LDN 2017: Applied AI for GDPR
 
Big Data Brussels 2019 v.4.0 I 'How to Build Big Data Analytics Capabilities ...
Big Data Brussels 2019 v.4.0 I 'How to Build Big Data Analytics Capabilities ...Big Data Brussels 2019 v.4.0 I 'How to Build Big Data Analytics Capabilities ...
Big Data Brussels 2019 v.4.0 I 'How to Build Big Data Analytics Capabilities ...
 
The Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent ApplicationsThe Data Platform for Today’s Intelligent Applications
The Data Platform for Today’s Intelligent Applications
 
What is tokenization in blockchain?
What is tokenization in blockchain?What is tokenization in blockchain?
What is tokenization in blockchain?
 

Más de Ulf Mattsson

Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...
Ulf Mattsson
 
Secure analytics and machine learning in cloud use cases
Secure analytics and machine learning in cloud use casesSecure analytics and machine learning in cloud use cases
Secure analytics and machine learning in cloud use cases
Ulf Mattsson
 
Data encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeData encryption and tokenization for international unicode
Data encryption and tokenization for international unicode
Ulf Mattsson
 
New opportunities and business risks with evolving privacy regulations
New opportunities and business risks with evolving privacy regulationsNew opportunities and business risks with evolving privacy regulations
New opportunities and business risks with evolving privacy regulations
Ulf Mattsson
 
Protecting Data Privacy in Analytics and Machine Learning
Protecting Data Privacy in Analytics and Machine LearningProtecting Data Privacy in Analytics and Machine Learning
Protecting Data Privacy in Analytics and Machine Learning
Ulf Mattsson
 

Más de Ulf Mattsson (20)

Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...Jun 29 new privacy technologies for unicode and international data standards ...
Jun 29 new privacy technologies for unicode and international data standards ...
 
Book
BookBook
Book
 
May 6 evolving international privacy regulations and cross border data tran...
May 6   evolving international privacy regulations and cross border data tran...May 6   evolving international privacy regulations and cross border data tran...
May 6 evolving international privacy regulations and cross border data tran...
 
Qubit conference-new-york-2021
Qubit conference-new-york-2021Qubit conference-new-york-2021
Qubit conference-new-york-2021
 
Secure analytics and machine learning in cloud use cases
Secure analytics and machine learning in cloud use casesSecure analytics and machine learning in cloud use cases
Secure analytics and machine learning in cloud use cases
 
Evolving international privacy regulations and cross border data transfer - g...
Evolving international privacy regulations and cross border data transfer - g...Evolving international privacy regulations and cross border data transfer - g...
Evolving international privacy regulations and cross border data transfer - g...
 
Data encryption and tokenization for international unicode
Data encryption and tokenization for international unicodeData encryption and tokenization for international unicode
Data encryption and tokenization for international unicode
 
The future of data security and blockchain
The future of data security and blockchainThe future of data security and blockchain
The future of data security and blockchain
 
GDPR and evolving international privacy regulations
GDPR and evolving international privacy regulationsGDPR and evolving international privacy regulations
GDPR and evolving international privacy regulations
 
Privacy preserving computing and secure multi-party computation ISACA Atlanta
Privacy preserving computing and secure multi-party computation ISACA AtlantaPrivacy preserving computing and secure multi-party computation ISACA Atlanta
Privacy preserving computing and secure multi-party computation ISACA Atlanta
 
New opportunities and business risks with evolving privacy regulations
New opportunities and business risks with evolving privacy regulationsNew opportunities and business risks with evolving privacy regulations
New opportunities and business risks with evolving privacy regulations
 
What is tokenization in blockchain?
What is tokenization in blockchain?What is tokenization in blockchain?
What is tokenization in blockchain?
 
Nov 2 security for blockchain and analytics ulf mattsson 2020 nov 2b
Nov 2 security for blockchain and analytics   ulf mattsson 2020 nov 2bNov 2 security for blockchain and analytics   ulf mattsson 2020 nov 2b
Nov 2 security for blockchain and analytics ulf mattsson 2020 nov 2b
 
What is tokenization in blockchain?
What is tokenization in blockchain?What is tokenization in blockchain?
What is tokenization in blockchain?
 
Protecting Data Privacy in Analytics and Machine Learning
Protecting Data Privacy in Analytics and Machine LearningProtecting Data Privacy in Analytics and Machine Learning
Protecting Data Privacy in Analytics and Machine Learning
 
ISACA Houston - How to de-classify data and rethink transfer of data between ...
ISACA Houston - How to de-classify data and rethink transfer of data between ...ISACA Houston - How to de-classify data and rethink transfer of data between ...
ISACA Houston - How to de-classify data and rethink transfer of data between ...
 
Isaca atlanta - practical data security and privacy
Isaca atlanta - practical data security and privacyIsaca atlanta - practical data security and privacy
Isaca atlanta - practical data security and privacy
 
Jul 16 isaca london data protection, security and privacy risks - on premis...
Jul 16 isaca london   data protection, security and privacy risks - on premis...Jul 16 isaca london   data protection, security and privacy risks - on premis...
Jul 16 isaca london data protection, security and privacy risks - on premis...
 
New regulations and the evolving cybersecurity technology landscape
New regulations and the evolving cybersecurity technology landscapeNew regulations and the evolving cybersecurity technology landscape
New regulations and the evolving cybersecurity technology landscape
 
How to protect privacy sensitive data that is collected to control the corona...
How to protect privacy sensitive data that is collected to control the corona...How to protect privacy sensitive data that is collected to control the corona...
How to protect privacy sensitive data that is collected to control the corona...
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Protecting data privacy in analytics and machine learning ISACA London UK

  • 1. Copyright ©Protegrity Corp. Protecting Data Privacy in Analytics and Machine Learning Ulf Mattsson Chief Security Strategist www.Protegrity.com
  • 2. Copyright ©Protegrity Corp. PaymentCardIndustry(PCI) SecurityStandards Council (SSC): 1. TokenizationTask Force 2. Encryption Task Force, Pointto Point Encryption Task Force 3. Risk Assessment 4. eCommerce SIG 5. Cloud SIG, Virtualization SIG 6. Pre-Authorization SIG, Scoping SIG Working Group Ulf Mattsson 2 Dec 2019 May 2020 Cloud Security Alliance Quantum Computing Tokenization Management and Security Cloud Management and Security ISACA JOURNAL May 2021 Privacy-Preserving Analytics and Secure Multi-Party Computation ISACA JOURNAL May 2020 Practical Data Security and Privacy for GDPR and CCPA • Chief Security Strategist, Protegrity • Chief Technology Officer, Protegrity, Atlantic BT, and Compliance Engineering • Head of Innovation, TokenEx • IT Architect, IBM • Develops Industry Standards • Inventor of more than 70 issued US Patents • Products and Services: • Data Encryption, Tokenization, and Data Discovery • Cloud Application Security Brokers (CASB) and Web Application Firewalls (WAF) • Security Operation Center (SOC) and Managed Security Services (MSSP) • Robotics and Applications
  • 3. Copyright ©Protegrity Corp. Agenda • Machine learning (ML) and AI (Artificial Intelligence) • Secure Data-sharing • Secure multi-party computation (SMPC) and uses cases • Homomorphic encryption (HE) and use cases • Zero trust architecture (ZTA) vs. Zero knowledge • Trusted execution environments (TEE) • Hybrid cloud • Regulations and Standards in Data Privacy • International privacy standards • Differential Privacy (DP) and K-Anonymity 3
  • 4. Copyright ©Protegrity Corp. Unlockthe Potential of Data Security - Data Security Governance Stakeholders 4 4 Source: Gartner
  • 5. Copyright ©Protegrity Corp. Opportunities Controls Regulations Policies RiskManagement Breaches Balance Protect datainwaysthatare transparent to business processes andcomplianttoregulations Source: Gartner
  • 6. Copyright ©Protegrity Corp. Global Hadoop Big Data Analytics Market (USD Billion) Real-time data is significant in global datasphere Between 2018 and 2025 the size of real-time data in the global datasphere is expected to expand tenfold, from five zettabytes to 51 zettabytes. Statista 2021 Increase in information volume of Real-time Analytics
  • 7. Copyright ©Protegrity Corp. The advent of big data era due to the increase in the information volume of the whole world ResearchGate Big Data AI
  • 8. Copyright ©Protegrity Corp. https://www.marketresearchfuture.com/reports/machine-learning-market-2494 Global Machine Learning Market Machine learning (ML) market for hardware, software, and services ML has multiple uses in today’s technology market concerning safety and security such as face detection, face recognition, image classification, speech recognition, antivirus, Google, antispam, genetic, signal diagnosing, and weather forecast. USD Million 8
  • 9. Copyright ©Protegrity Corp. Gartner Hype Cycle for Emerging Technologies, 2020  Time AI & ML Gartner
  • 10. Copyright ©Protegrity Corp. Gartner Hype Cycle for Emerging Technologies, 2020 Algorithmic Trust Models Can Help Ensure Data Privacy Emerging technologies tied to algorithmic trust include 1. Secure access service edge (SASE) 2. Explainable AI 3. Responsible AI 4. Bring your own identity 5. Differential privacy 6. Authenticated provenance Gartner 10
  • 13. Copyright ©Protegrity Corp. Use Case: Insilico Medicine An alternative to animal testing for research and development programs in the pharmaceutical industry. • By using artificial intelligence and deep-learning techniques, Insilico is able to analyze how a compound will affect cells and what drugs can be used to treat the cells in addition to possible side effects A comprehensive drug discovery engine, which utilizes millions of samples and multiple data types to discover signatures of disease and identify the most promising targets for billions of molecules that already exist or can be generated de novo with the desired set of parameters. https://insilico.com/ 13
  • 14. Copyright ©Protegrity Corp. Gartner MQ for Data Science and Machine Learning Platforms https://www.kdnuggets.com/2020/02/gartner- mq-2020-data-science-machine-learning.html Data and analytics pipeline, including all the following areas: 1. Data ingestion 2. Data preparation 3. Data exploration 4. Feature engineering 5. Model creation and training 6. Model testing 7. Deployment 8. Monitoring 9. Maintenance 10.Collaboration 2020 vs 2019 changes
  • 15. Copyright ©Protegrity Corp. Digikey, techbrij Machine Learning Model Lifecycle - Example 1. Define the model: using the Sequential or Model class and add the layers 2. Compile the model: call compile method and specify the loss, optimizer and metrics 3. Train the model: call fit method and use training data 4. Evaluate the model: call evaluate method and use testing data to evaluate trained model 5. Get predictions: use predict method on new data for predictions 15
  • 16. Copyright ©Protegrity Corp. wikipedia Flux Keras MATLAB + Deep Learning Toolbox Apache MXNet PlaidML PyTorch (Facebook) TensorFlow (Google) Open Source Linux, MacOS, Windows C++ Julia Python MATLAB Perl, Clojure JavaScript, Go, Scala Android, iOS R Swift Machine Learning Deep-learning Platforms Languages Frameworks & libraries 16
  • 17. Copyright ©Protegrity Corp. *: https://www.fastcompany.com/1675910/7-ways-real-life-crime-fighting-mirrors-minority-report AI is reality for parts of the legal industry for how lawyers use data to predict and protect their clients’ futures. “Predictive analytics" can mine years of incident reports and law enforcement data to "forecast criminal 'hot spots.'" Police in Memphis have success with the $11-billion "precrime" predicting tool: Since installing IBM Blue CRUSH*, the city has seen a 31% drop in serious crime."
  • 18. Copyright ©Protegrity Corp. Global Map Of PrivacyRights And Regulations 18
  • 19. Copyright ©Protegrity Corp. TrustArc Legal and Regulatory Risks Are Exploding 19
  • 20. Copyright ©Protegrity Corp. Encryptionand Tokenization Discover Data Assets Security by Design GDPR Security Requirements –Encryption and Tokenization 20
  • 21. Copyright ©Protegrity Corp. Data flow mapping under GDPR • If there is not already a documented workflow in place in your organization, it can be worthwhile for a team to be sent out to identify how the data is being gathered. • This will enable you to see how your data flow is different from reality and what needs to be done Organizations needs to look at how the data was captured, who is accountable for it, where it is located and who has access. Source: BigID
  • 22. Copyright ©Protegrity Corp. FindYourSensitive Datain Cloud and On-Premise www.protegrity.com 22
  • 23. Copyright ©Protegrity Corp. Whichof thefollowing aspectsof dataprivacyare you particularlyconcernedabout? FTIConsulting- CorporateData Privacy Today,2020 23
  • 24. Copyright ©Protegrity Corp. Source: Gartner Six Important Privacy-Preserving Computation Techniques 24 Source: Gartner
  • 25. Copyright ©Protegrity Corp. Increased need for data analytics drives requirements. Data Lake, ETL, Files … • Policy Enforcement Point (PEP) Protected data fields U • Encryption Key Management U External Data Internal Data Secure Multi Party Computation Analytics, Data Science, AI and ML Data Pipeline Data Collaboration Data Pipeline Data Privacy On-premises Cloud Internal and Individual Third-Party Data Sharing 25
  • 26. Copyright ©Protegrity Corp. http://homomorphicencryption.org Use Cases for Secure Multi Party Computation & Homomorphic Encryption (HE) 26
  • 27. Copyright ©Protegrity Corp. Use case – Retail - Data for Secondary Purposes Large aggregator of credit card transaction data. Open a new revenue stream • Using its data with its business partners: retailers, banks and advertising companies. • They could help their partners achieve better ad conversion rate, improved customer satisfaction, and more timely offerings. • Needed to respect user privacy and specific regulations. In this specific case, they wanted to work with a retailer. • Allow the retailer to gain insights while protecting user privacy, and the credit card organization’s IP. • An analyst at each organization’s office first used the software to link the data without exchanging any of the underlying data. Data used to train the machine learning and statistical models. • A logistic and linear regression model was trained using secure multi-party computation (SMC). • In the simplest form SMC splits a dataset into secret shares and enables you to train a model without needing to put together the pieces. • The information that is communicated between the peers is encrypted at all times and cannot be reverse engineered. With the augmented dataset, the retailer was able to get a better picture of its customers buying habits. 27
  • 28. Copyright ©Protegrity Corp. Use case - Financial services industry Confidential financial datasets which are vital for gaining significant insights. • The use of this data requires navigating a minefield of private client information as well as sharing data between independent financial institutions, to create a statistically significant dataset. • Data privacy regulations such as CCPA, GDPR and other emerging regulations around the world • Data residency controls as well as enable data sharing in a secure and private fashion. Reduce and remove the legal, risk and compliance processes • Collaboration across divisions, other organizations and across jurisdictions where data cannot be relocated or shared • Generating privacy respectful datasets with higher analytical value for Data Science and Analytics applications. 28
  • 29. Copyright ©Protegrity Corp. Use case: Bank - Internal Data Usage by Other Units A large bank wanted to broaden access to its data lake without compromising data privacy, preserving the data’s analytical value, and at reasonable infrastructure costs. • Current approaches to de-identify data did not fulfill the compliance requirements and business needs, which had led to several bank projects being stopped. • The issue with these techniques, like masking, tokenization, and aggregation, was that they did not sufficiently protect the data without overly degrading data quality. This approach allows creating privacy protected datasets that retain their analytical value for Data Science and business applications. A plug-in to the organization’s analytical pipeline to enforce the compliance policies before the data was consumed by data science and business teams from the data lake. • The analytical quality of the data was preserved for machine learning purposes by-using AI and leveraging privacy models like differential privacy and k-anonymity. Improved data access for teams increased the business’ bottom line without adding excessive infrastructure costs, while reducing the risk of-consumer information exposure. 29
  • 30. Copyright ©Protegrity Corp. Area Timing Focus Comments Requirements Short Internal requirements International regulations Cloud Short Machine Learning Start with basic ML training and inference on senstivie data in cloud Competition Short Competitive advantage ML and NLP-powered services can give banks a competitive edge Short Encrypted data Important Long Synthetic data Computing cost? Medium AML / KYC What are other Large banks doing? Short Analytics Initial focus Short Operation on encrypted data Computation on sensitive data to the cloud. Trade-offs with performance, protection and utility? Industry Short Industry dialog Working groups in standard bodies (ANSI X9, Cloud Security Alliance, Homomorphic Encryption Org) Model Short Encrypted model Important Short Experimentation What are other Large banks doing? Short Scotia Bank case study Query solution for AML / KYC Proven Medium Fast follower What are some proven solutions? Short Homomorphic Encryption post- Lattice-based cryptography is a promising post-quantum cryptography family, both in terms of foundational properties as well as its application to both traditional and homomorphic encryption Medium Quantum Plan for quantum safe algorithms Long Quantum Plan for quantum ML algorithms Sharing Short Secure Multi-party Computing (SMPC) Without revealing their own private inputs and outputs. Encrypted data and encryption keys never comingled while computation on the encrypted data is occurring or an encryption key is split into shares Short Vendor positioning Nonlinear ML regression needed? Linear Regression is one of the fundamental supervised-ML. Linear and non-linear credit scoring by combining logistic regression and support vector machines Short Framework integration Important 3rd party Long 3rd party integration Mining first Long Federated learning Complicated Long TEE Emerging Analytics Data Quantum Solutions Training ML Pilot Use case: Bank
  • 31. Copyright ©Protegrity Corp. Source: Gartner Six Important Privacy-Preserving Computation Techniques 31
  • 32. Copyright ©Protegrity Corp. https://royalsociety.org Secure Multi-Party Computation (MPC) Private multi-party machine learning with MPC Using MPC, different parties send encrypted messages to each other, and obtain the model F(A,B,C) they wanted to compute without revealing their own private input, and without the need for a trusted central authority. Secure Multi-Party machine learning Central trusted authority A B C F(A, B,C) F(A, B,C) F(A, B,C) Protected data fields U B A C F(A, B,C) U U U 32
  • 33. Copyright ©Protegrity Corp. Medium.com Example of Multi- party Computation: Average Salary #1 33
  • 34. Copyright ©Protegrity Corp. Source: Gartner Six Important Privacy-Preserving Computation Techniques 34
  • 35. Copyright ©Protegrity Corp. Case Study – HE and Securely sharing sensitive information An example from the healthcare domain. The recent ability to fully map the human genome has opened endless possibilities for advances in healthcare. 1. Data from DNA analysis can test for genetic abnormalities, empower disease-risk analysis, discover family history, and the presence of an Alzheimer’s allele. • But these studies require very large DNA sample sizes to detect accurate patterns. 2. However, sharing personal DNA data is a particularly problematic domain. • Many citizens hesitate to share such personal information with third-party providers, uncertain of if, how and to whom the information might be shared downstream. 3. Moreover, legal limitations designed to protect privacy restrict providers from sharing this data as well. 4. HE techniques enable citizens to share their genome data and retain key privacy concerns without the traditional all-or-nothing trust threshold with third-party providers. 35
  • 36. Copyright ©Protegrity Corp. https://royalsociety.org Homomorphic encryption (HE) HE depicted in a client-server model • The client sends encrypted data to a server, where a specific analysis is performed on the encrypted data, without decrypting that data. • The encrypted result is then sent to the client, who can decrypt it to obtain the result of the analysis they wished to outsource. Encryption of x Client Server Analysis Encrypted F(x) • Policy Enforcement Point (PEP) Protected data fields U • Encryption Key Management 36
  • 37. Copyright ©Protegrity Corp. Secure Exec Env Zero Trust Open Source Encrypted Query Enc Sort Encr Proxy Quantum Safe AI HomomorphicEncryption.org Private Set Intersection *: 12 Smaller HE Vendors Differential Priv Commercial-applications Off The Shelf TEE (Trusted Execution Environment) Lattice-based algorithm DP Extended Encrypted Operations Extended Privacy Features Extended ML Features Extended Protection Features Dynamic Security Controls Standardization of Homomorphic Encryption* 200 to 50 Employees 1 2 3 4 5 49 to 20 Employees 6 7 8 19 and fewer Employees 9 10 11 12 Federated Learning Fuzzy Search Block Chain COTS Examples of some Features 37
  • 38. Copyright ©Protegrity Corp. Source: Gartner Important Privacy-Preserving Computation Techniques 38
  • 39. Copyright ©Protegrity Corp. Trusted execution environments Trusted Execution Environments (TEEs) provide secure computation capability through a combination of special-purpose hardware in modern processors and software built to use those hardware features. The special-purpose hardware provides a mechanism by which a process can run on a processor without its memory or execution state being visible to any other process on the processor, • not even the operating system or other privileged code. *: Source: http://publications.officialstatistics.org Computation in a TEE is not performed on data while it remains encrypted. • Typically, the memory space of each TEE (enclave) application is protected from access • AES-encrypted when and if it is stored off- chip. Usability is low and products/services are emerging in MS Azure, IBM’s cloud service Amazon AWS (late 2020)* 39
  • 40. Copyright ©Protegrity Corp. Source: Gartner Important Privacy-Preserving Computation Techniques 40
  • 41. Copyright ©Protegrity Corp. Random differential privacy Probabilistic differential privacy Concentrated differential privacy Noise is very low. Used in practice. Tailored to large numbers of computations. Approximate differential privacy More useful analysis can be performed. Well-studied. Can lead to unlikely outputs. Widely used Computational differential privacy Multiparty differential privacy Can ensure the privacy of individual contributions. Aggregation is performed locally. Strong degree of protection. High accuracy 6 Differential Privacy Models A pure model provides protection even against attackers with unlimited computational power. In differential privacy, the concern is about privacy as the relative difference in the result whether a specific individual or entity is included in the input or excluded 41
  • 42. Copyright ©Protegrity Corp. Source: Gartner Important Privacy-Preserving Computation Techniques 42
  • 43. Copyright ©Protegrity Corp. Private Set Intersection (PSI) Identifies the common customers without disclosing any other information This replaces simplistic approaches such as one-way hashing functions that are susceptible to dictionary attacks. Applications for PSI include identifying the overlap with potential data partners (i.e. “Is there a large enough client base in common to be worthwhile to work together?”), as well as aligning datasets with data partners in preparation for using MPC to train a machine learning model
  • 44. Copyright ©Protegrity Corp. Examples of Data De-identification 44
  • 45. Copyright ©Protegrity Corp. Data protection techniques: Deployment on-premises, and clouds Data Warehouse Centralized Distributed On- premises Public Cloud Private Cloud Vault-based tokenization y y Vault-less tokenization y y y y y y Format preserving encryption y y y y y Homomorphic encryption y y Masking y y y y y y Hashing y y y y y y Server model y y y y y y Local model y y y y y y L-diversity y y y y y y T-closeness y y y y y y Privacy enhancing data de-identification terminology and classification of techniques De- identification techniques Tokenization Cryptographic tools Suppression techniques Formal privacy measurement models Differential Privacy K-anonymity model 45
  • 46. Copyright ©Protegrity Corp. 2-way HomomorphicEncryption (HE) K-anonymity Tokenization Masking Hashing 1-way Analytics andMachine Learning(ML) Different DataProtectionTechniques Algorithmic Random Computingon encrypteddata Format Preserving Fast Slow Very slow Fast Fast FormatPreserving DifferentialPrivacy (DP) Noise added FormatPreserving Encryption (FPE) 46
  • 47. Copyright ©Protegrity Corp. IS: International Standard TR: Technical Report TS: Technical Specification Guidelines to help comply with ethical standards 20889 IS Privacy enhancing de-identification terminology and classification of techniques 27018 IS Code of practice for protection of PII in public clouds acting as PII processors 27701 IS Security techniques - Extension to ISO/IEC 27001 and ISO/IEC 27002 for privacy information management - Requirements and guidelines 29100 IS Privacy framework 29101 IS Privacy architecture framework 29134 IS Guidelines for Privacy impact assessment 29151 IS Code of Practice for PII Protection 29190 IS Privacy capability assessment model 29191 IS Requirements for partially anonymous, partially un-linkable authentication Cloud 11 Published International Privacy Standards Framework Management Techniques Impact 19608 TS Guidance for developing security and privacy functional requirements based on 15408 Requirements 27550 TR Privacy engineering for system lifecycle processes Process ISO 47
  • 49. Copyright ©Protegrity Corp. Personally Identifiable Information(PII) in compliance with the EUCross Border Data Protection Laws, specifically • Datenschutzgesetz 2000(DSG 2000)in Austria, and • Bundesdatenschutzgesetz inGermany. This requiredaccess to Austrianand German customer data to berestricted to onlyrequesters ineach respective country. • Achieved targeted compliance with EU Cross Border Data Security laws • Implemented country-specificdata access restrictions Datasources Case Study Amajor international bankperformed a consolidationofallEuropeanoperationaldatasources to Italy 49
  • 50. Copyright ©Protegrity Corp. Lower Risk andHigher Productivity with More AccesstoMoreData 50
  • 52. Copyright ©Protegrity Corp. PaymentApplication Payment Network Payment Data Policy, tokenization, encryption and keys Gateway Call Center Application PI*Data Salesforce Analytics Application DifferentialPrivacy AndK-anonymity PI*Data Microsoft ElectionGuard Election Data Homomorphic Encryption DataWarehouse PI*Data Vault-less tokenization Use-Cases of Some Data Privacy Techniques Voting Application Dev/testSystems Masking PI*Data Vault-less tokenization 52
  • 53. Copyright ©Protegrity Corp. A DataSecurityGateway Can Protect Sensitive Datain Cloud and On-premise 53
  • 54. Copyright ©Protegrity Corp. Big DataProtectionwith GranularField Level Protectionfor GoogleCloud 54
  • 55. Copyright ©Protegrity Corp. Use Case (Financial Services) - Compliance with Cross-Border and Other Privacy Restrictions 55
  • 56. Copyright ©Protegrity Corp. Use this shape toput copy inside (you can change the sizing tofit your copy needs) Protection ofdata in AWS S3 with Separation ofDuties • Applications can use de-identified data or data inthe clear based on policies • Protection of data inAWSS3 before landing in a S3 bucket Separation of Duties • EncryptionKeyManagement • PolicyEnforcementPoint(PEP) 56
  • 57. Copyright ©Protegrity Corp. Securosis, 2019 Consistency • Most firmsarequite familiar with their on-premises encryption andkeymanagement systems, so they often prefer toleverage the same tool and skills across multiple clouds. • Firms often adopt a “best of breed”cloud approach. Examples ofHybrid Cloud considerations Trust • Some customers simply donot trusttheir vendors. Vendor Lock-in and Migration • A commonconcern is vendorlock-in, andan inabilitytomigratetoanothercloud serviceprovider. Google Cloud AWSCloud Azure Cloud Cloud Gateway S3 Salesforce Data Analytics BigQuery 57
  • 58. Copyright ©Protegrity Corp. IS: International Standard TR: Technical Report TS: Technical Specification Guidelines to help comply with ethical standards 20889 IS Privacy enhancing de-identification terminology and classification of techniques 27018 IS Code of practice for protection of PII in public clouds acting as PII processors 27701 IS Security techniques - Extension to ISO/IEC 27001 and ISO/IEC 27002 for privacy information management - Requirements and guidelines 29100 IS Privacy framework 29101 IS Privacy architecture framework 29134 IS Guidelines for Privacy impact assessment 29151 IS Code of Practice for PII Protection 29190 IS Privacy capability assessment model 29191 IS Requirements for partially anonymous, partially unlinkable authentication Cloud 11 Published International Privacy Standards Framework Management Techniques Impact 19608 TS Guidance for developing security and privacy functional requirements based on 15408 Requirements 27550 TR Privacy engineering for system lifecycle processes Process ISO Privacy Standards 58
  • 59. Copyright ©Protegrity Corp. References A: 1. C. Gentry. “A Fully Homomorphic Encryption Scheme.” Stanford University. September 2009, https://crypto.stanford.edu/craig/craig-thesis.pdf 2. Status Report on the Second Round of the NIST Post-Quantum Cryptography Standardization Process, https://csrc.nist.gov/publications/detail/nistir/8309/final 3. ISO/IEC 29101:2013 (Information technology – Security techniques – Privacy architecture framework) 4. ISO/IEC 19592-1:2016 (Information technology – Security techniques – Secret sharing – Part 1: General) 5. ISO/IEC 19592-2:2017 (Information technology – Security techniques – Secret sharing – Part 2: Fundamental mechanisms 6. Homomorphic Encryption Standardization, Academic Consortium to Advance Secure Computation, https://homomorphicencryption.org/standards-meetings/ 7. Homomorphic Encryption Standardization, https://homomorphicencryption.org/ 8. NIST Post-Quantum Cryptography PQC, https://csrc.nist.gov/Projects/Post-Quantum-Cryptography 9. UN Handbook on Privacy-Preserving Computation Techniques, http://publications.officialstatistics.org/handbooks/privacy-preserving-techniques- handbook/UN%20Handbook%20for%20Privacy-Preserving%20Techniques.pdf 10. ISO/IEC 29101:2013 Information technology – Security techniques – Privacy architecture framework, https://www.iso.org/standard/45124.html 11. Homomorphic encryption, https://brilliant.org/wiki/homomorphic-encryption/ 59
  • 60. Copyright ©Protegrity Corp. References B: 1. California Consumer Privacy Act, OCT 4, 2019, https://www.csoonline.com/article/3182578/california-consumer-privacy-act-what- you-need-to-know-to-be-compliant.html 2. GDPR and Tokenizing Data, https://tdwi.org/articles/2018/06/06/biz-all-gdpr-and-tokenizing-data-3.aspx 3. GDPR VS CCPA, https://wirewheel.io/wp-content/uploads/2018/10/GDPR-vs-CCPA-Cheatsheet.pdf 4. General Data Protection Regulation, https://en.wikipedia.org/wiki/General_Data_Protection_Regulation 5. IBM Framework Helps Clients Prepare for the EU's General Data Protection Regulation, https://ibmsystemsmag.com/IBM- Z/03/2018/ibm-framework-gdpr 6. INTERNATIONAL STANDARD ISO/IEC 20889, https://webstore.ansi.org/Standards/ISO/ISOIEC208892018?gclid=EAIaIQobChMIvI- k3sXd5gIVw56zCh0Y0QeeEAAYASAAEgLVKfD_BwE 7. INTERNATIONAL STANDARD ISO/IEC 27018, https://webstore.ansi.org/Standards/ISO/ ISOIEC270182019?gclid=EAIaIQobChMIleWM6MLd5gIVFKSzCh3k2AxKEAAYASAAEgKbHvD_BwE 8. Machine Learning and AI in a Brave New Cloud World https://www.brighttalk.com/webcast/14723/357660/machine-learning-and- ai-in-a-brave-new-cloud-world 9. Emerging Data Privacy and Security for Cloud https://www.brighttalk.com/webinar/emerging-data-privacy-and-security-for-cloud/ 10. New Application and Data Protection Strategies https://www.brighttalk.com/webinar/new-application-and-data-protection- strategies-2/ 11. The Day When 3rd Party Security Providers Disappear into Cloud https://www.brighttalk.com/webinar/the-day-when-3rd-party- security-providers-disappear-into-cloud/ 12. Advanced PII/PI Data Discovery https://www.brighttalk.com/webinar/advanced-pii-pi-data-discovery/ 13. Emerging Application and Data Protection for Cloud https://www.brighttalk.com/webinar/emerging-application-and-data- protection-for-cloud/ 14. Practical Data Security and Privacy for GDPR and CCPA, ISACA Journal, May 2020 15. Data Security: On Premise or in the Cloud, ISSA Journal, December 2019, ulf@ulfmattsson.com 16. Data Privacy: De-Identification Techniques, ISSA Journal, May 2020 60
  • 61. Copyright ©Protegrity Corp. UlfMattsson Chief SecurityStrategist www.Protegrity.com Thank You!