More Related Content Similar to Automated Data Governance 101 - A Guide to Proactively Addressing Your Privacy, Security, and Compliance Needs (20) More from DATAVERSITY (20) Automated Data Governance 101 - A Guide to Proactively Addressing Your Privacy, Security, and Compliance Needs2. © 2019 IMMUTA
What is data governance?
Automated Data Governance 101
Privacy ComplianceSecurity
7. © 2019 IMMUTA
▪ Chris Wong perusing Twitter in March 2014
▪ Sees a Taxi & Limousine Commission chart
on traffic patterns
▪ Makes a freedom of information request for
12 months of data
▪ Receives 50 gigabytes of data
A New York City Researcher Gets Curious...
8. © 2019 IMMUTA
▪ Data was released containing taxi pickups,
dropoffs, location, time, amount, and tip
amount, among others
▪ This seems pretty harmless, right?
The New York Taxi & Limousine Commission
9. © 2019 IMMUTA
▪ This photo was geotagged (with time), so
by simply querying by medallion and time,
we know how much Judd and Leslie tip!
Well, Judd and Leslie May Not Think It’s Harmless...
10. © 2019 IMMUTA
This is an Example of a “Link Attack”
Medallion & Photo Time Medallion & Pickup Time
New York
Taxi Data
14. © 2019 IMMUTA
“... the dates and locations of four
purchases are enough to identify 90
percent of the people in a data set
recording three months’ worth of credit
card transactions by 1.1 million users ...
someone with copies of just three of your
recent receipts - or one receipt, one
Instagram photo, and one tweet about
the phone you just bought - would have
a 94 percent chance of extracting your
credit card records from those of a
million other people.”
In Fact...
15. © 2019 IMMUTA
▪ The volume of data we generate has
undermined privacy as we know it
▪ Instead of focusing on how and when our data is
gathered…
▪ Privacy is best served in limiting how our data
is being used - or how the data consumers
within our organizations are using this data
▪ The privacy problem begets another challenge
for enterprises: how do you balance data
privacy with utility?
The End of Privacy (As We Know It)
16. Data can either
be useful or
perfectly
anonymous,
but never both.
Paul Ohm
Broken Promises of Privacy
57 UCLA Law Review 1701 (2010)
17. © 2019 IMMUTA
▪ To preserve privacy, organizations have to
make the data less closely resemble the
raw data (or full data).
▪ Moving along this curve, data becomes
more robust against certain types of privacy
risks.
▪ The actual trade-off is highly coupled with
analytical context.
In Practice, Privacy is a Continuum
20. © 2019 IMMUTA
Traditionally defined as a “triad”:
● Confidentiality only the right people can view the right data . . .
● Integrity . . . in the right form . . .
● Availability . . . at the right time.
Information Security
21. © 2019 IMMUTA
Today’s IT Landscape
▪ 2.5 quintillion bytes of data created each day
▪ 90 percent of the data in the world was generated in the
last 2 years
▪ Estimated 50 billion connected devices by next year - over
six per person on the planet
▪ Average of 40,000 searches conducted on Google per
second
▪ Web browsing, email, cell tower pings, image and video,
audio, and more
▪ Average business uses ~500 custom software
applications, only 40 percent of which are known to IT
▪ Number of known vulnerabilities is increasing (significantly)
over time
▪ Complexity of software systems and IT environments also
appears to be increasing
▪ Adoption of AI tools and techniques is exacerbating these
trends
The Data We Generate (And Collect) The Software We Use
26. The number and complexity of regulations on data is
increasing drastically.
27. 150+ Privacy Laws
Proposed in 25 States
250+ Information Security Laws
Proposed in 45 States
Could Cost Organizations Up to $122B Per Year
In 2019 in the U.S. alone…
28. © 2019 IMMUTA
GDPR
▪ EU’s General Data Protection
Regulation
▪ Came into force May 2018 as the first
and most stringent law in a new wave
of global privacy regulations
▪ Fines up to four percent of global
revenue
▪ Driven many global companies to
rethink how they collect and reuse
their data
A Few Examples
CCPA
▪ California Consumer Privacy Act
▪ Passed in 2018 and goes into effect
January 2020
▪ State legislators implemented some
of the strictest standards on
consumer data in the nation
▪ Potentially affects any business that
collects the data of California
residents
Cybersecurity Law
▪ Enacted by the Chinese government
in 2017
▪ Increased penalties on the misuse of
data collected or stored in the world’s
second largest economy
30. © 2019 IMMUTA
▪ Involves lots of analytical work
▪ “Meetings and memos” approach
▪ Increasing number of stakeholders and
regulatory environments are now involved
▪ Not simple, and not fast!
Compliance Takes Time
33. © 2019 IMMUTA
❏ Time-Consuming Meetings
❏ Long Policy Memos
❏ Custom Permissions
❏ Varying Policies Per Database
❏ Creation of New Copies of Data to Satisfy Compliance or Privacy Concerns
Traditional Signs of Passive Data Governance
34. © 2019 IMMUTA
How long does it take between
1) when your organization collects data, and
2) when that data can be accessed and used?
Is your approach to data governance passive? Ask yourself...
A. Days
B. Weeks
C. Months
35. How can we move away
from a passive approach
to data governance?
37. © 2019 IMMUTA
Automated data governance is the process
of proactively applying rules on data
to ensure compliance and drive data analytics.
What Is Automated Data Governance?
39. © 2019 IMMUTA
1. Any Tool
2. Any Data
3. No Copies
4. Any Level of Expertise
5. One Policy, In One Place
5 Pillars of Effective Automated Data Governance
41. © 2019 IMMUTA
Pillar 1: Any Tool
▪ Automated data governance must support
any tool a data scientist or analyst uses, now
or in the future
▪ Enables data science and analytics teams to
use their tool of choice to access the data
they need
▪ Avoids tool “lock in” for governance reasons
▪ Incentivizes governance for the long-term
43. © 2019 IMMUTA
Pillar 2: Any Data
▪ Must enable the use of ALL data, regardless
of where it’s stored or the underlying storage
technology
▪ Otherwise, leaves insights undiscovered or
incentivizes non-compliance
▪ Flexibility is key to long-term governance
efforts
45. © 2019 IMMUTA
Pillar 3: No Copies
▪ A passive approach frequently relies on
creating new copies of data, usually with
sensitive identifiers removed or obscured -
this can’t scale!
▪ Automated data governance requires direct
access to the same live data across the
organization
▪ Data must never be copied for governance
purposes
47. © 2019 IMMUTA
Pillar 4: Any Level of Expertise
▪ Requires that anyone, with any level of
expertise, can understand what rules are
being applied to enterprise data
▪ Must empower both those with technical skill
sets and those with privacy and compliance
knowledge, so all teams can play a
meaningful role controlling how data is used
49. © 2019 IMMUTA
Pillar 5: One Policy, In One Place
▪ Requires that data policies live in one central
location, so they can be easily tracked,
monitored
▪ Allows for standardization - and updates over
time
▪ Key to long-term governance efforts
51. © 2019 IMMUTA
1. What process governs how an analyst receives new data?
2. Where do your policies come from? What rules do you most care about adhering to?
3. Where is your data, and who’s responsible for it?
4. How is your data used, and how is it catalogued and tagged?
5. What technology stack do you rely on to share data faster, and control data more
effectively?
How You Can Automate Data Governance
54. © 2019 IMMUTA
Data Access and Governance
▪ AI-based digital diagnostics and personalized therapies.
▪ Self-service data for data scientists for exploration, experimentation,
and analytics.
▪ Run any queries they want without taking additional steps to ensure
HIPAA compliance.
▪ Data privacy and security concerns are paramount.
▪ Data is stored in Amazon Aurora and analyzed in Databricks and
Tableau.
▪ Create different views of the same data for different parties with
varying functional responsibilities.
▪ Removing ePHI and HIPAA sensitive information for model building
was extremely time and labor intensive.
Cognoa:
Digital Behavioral
Health Company
55. © 2019 IMMUTA
GDPR Compliant Analytics
▪ Data collection from vehicles requires complex controls.
▪ Different use cases require different levels of anonymization.
▪ Differential privacy is a key enabler.
▪ A model to allow for individual-related insights and/or use cases
without violating privacy protection.
▪ Re-identification required: predictive maintenance on a vehicle; they
need to unmask the owner in order to provide maintenance
(purpose-based views).
▪ Analyzing the most-listened-to radio stations. This does not require
identifying an individual, and thus only requires aggregate questions.
Multinational
Automobile
Corporation
56. © 2019 IMMUTA
Accelerated Time to Insight from
Highly Sensitive Government Data
▪ Built an integrated data analytics platform for the Office of the
Secretary of Defense.
▪ Maintenance and Availability Data Warehouse (MADW) contains
availability, cost, inventory, and transactional data on nearly every
Department of Defense weapons system and readiness reportable
piece of equipment.
▪ More than one billion maintenance records from 46 authoritative data
systems.
▪ Integration of availability, cost, inventory, maintenance, and supply
data makes numerous analyses available to leaders across the DoD
enterprise.
LMI:
Consultancy Dedicated to
Improving the Business of
Government
57. © 2019 IMMUTA
Built a Self-Service Environment for Easy
Access to Operational Data
▪ Scalable, no-code self-service data access for business intelligence
operations.
▪ Provide a single interface for legal teams to implement global policy
enforcement on controlled metadata vs one-by-one policy creation.
▪ Automate reporting for credit and loan decisions.
▪ Set project-, purpose-, and role-based restrictions that ensure users
can only see the data they are entitled to see.
▪ Controlling access to all data in Data Lake, providing automated
reports on the purpose of all data usage (a core GDPR requirement)
▪ Exposed over 8000 data sources, abstracted policy enforcement and
instantaneously allowed non-technical users to gain access to the
data for the first time, e.g. HR and Marketing.
Global Financial
Institution
59. Q&A
Andrew Burt, Chief Privacy Officer, Immuta, Twitter: @AndBurt
Matt Vogt, Head of Global Solution Architecture, Immuta, Twitter: @mattvogt