SlideShare una empresa de Scribd logo
1 de 23
Suresh Yadagotti Jayaram
Sr. IT Technical Architect
Multi Tenant Security Architecture
for Big Data Systems
“Big Data refers to datasets whose size and/or structure is beyond the ability of traditional
software tools or database systems to store, process, and analyze within reasonable
timeframes”
HADOOP is a computing environment built on top of a distributed clustered file system
(HDFS) that was designed specifically for large scale data operations (e.g. MapReduce)
What is Big Data
Pre-Cursor
Reasons for securing data in Big Data systems
 Teams go from a POC to
deploying a production
cluster, and with it
petabytes of data.
 Contains sensitive
cardholder and other
customer or corporate
data that must be
protected
Compliance to PCI
DSS, FISMA, HIPAA,
federal/state laws to
protect PII
 Usage was restricted
to non-sensitive data
 Allow access to
restricted datasets
with Security
Contains Sensitive
Data
Subject to Regulatory
Compliance
Business
Enablement
Data Breaches & Hacks
Different kinds of PII, financial data, and IP breached. Healthcare, Retail, Federal Govt., Financial
Institutions, Tech companies etc.
Per capita cost – Industry Sector
Certain industries have higher data breach costs. compares 2018 year’s per capita costs for the consolidated sample by
industry classification.
As can be seen, heavily regulated industries such as healthcare and financial organizations have a per capita data breach
cost substantially higher than the overall mean.
$75
$92
$116
$120
$128
$128
$134
$140
$145
$152
$166
$167
$170
$174
$181
$206
$408
$0 $50 $100 $150 $200 $250 $300 $350 $400 $450
Public
Reatail
Transportati…
Media
Entertainme…
Education
Technology
Services
Health
Measured in US$
Root Causes
27%
25%
48%
48% Malicious
or Criminal
Attack
27%
Human Errors
25%
System glitch
Goals of an Attacker
0301
The primary goal is to
obtain sensitive data that
sits in Organization
Databases
02
This could include different
kinds of regulated data (e.g.
Payment data, Heath data)
or other personally
identifiable data (PII)
Other attacks could
include attacks
attempting to destroy or
modify data or prevent
availability of this
platform.
Threats
Host Level Data at Rest
Attacks
 Application Level
 HDFS level
 File System/Volume level
Infrastructure Security
 Automation
 SELinux
Unauthorized access
 Authentication
 Authorization
 Auditing
Network Based Attacks
 Transport Layer
Security
 SASL Encryption
Types of
Threats
Attacker attempts to gain privileges to access data
Security Objectives
For securing data
technologies
Best Practices
Standards alignment
Alliance
State of Organization
Contractual Obligations
With regulatory mandate
requirements
Compliance
Evidence of controls
SOC2/ Type 2 Audit
Successful implementation of Data Lakes in
organizations will demonstrate confidentiality,
integrity, and availability across the enterprise.
“It’s all about the data.”
Achieve Secure Data Enablement
By understanding the key criteria:
USERS
 Who is using the
data?
 Who needs what
kind of access?
LIFECYCLE
 How does
information connect
across systems?
 What are retention
requirements for the
data?
CONTROLS
 Engage early to
understand controls
complexity
 Know the value & risk
factors indicated by the
data & solutions.
GOVERNANCE
 Knowing what the
information is
 What is the function
of the data?
Enterpriseis the highest level and any
data stored at this level is visible /
available for all the tenants
(geographical data, code sets, etc.)
To minimize the impact to the
existing legacy systems and home-
grown services, we will use the
additional attributes like “Tenant ID”
and “Data Delimiters” to identify
which records belong to which
tenant. Members can have multiple
records in the same system with
different Tenant ID’s in case s/he
purchased products from more than
one tenant.
Application Layer/Domains to
control access and/or capabilities
(such as LOB, group, segment, or
other data restrictions or
classifications) within the tenants they
use. Application layer to control what
the constituent experiences, what
data they can access, and how.
Every data set will include audit attributes
such as:
• Who is providing the data? ,
• What data is being collected ?,
• When the data is collected?,
• Where the data is collected from?
• Why is the data collected ?
Enterprise Level
Tenant Level
Domain Level
Database/Table
Data level hierarchy & OBJECTIVES
 Be visible & available to ALL tenants
 Data Classified, labeled, or segregated in a manner that indicates it has been approved for
enterprise wide use (classification is TBD) which may include Geographical data, code sets,
etc.
 Data Classified as Public
 Support both internal and external users depending on classification
 Internal users get access through an application Id or directly with User Id
Enterprise level objectives
Enterprise Level Data will…
Enterprise
Tenant Level Data will…
Tenant level OBJECTIVES
 Support multiple tenants
 Be segregated logically (tagged, labeled, or container segregated based on tenant ID or data delimiters, not
physically where possible based on controls objectives for organizations
 Be co-mingled; all applications are storing data together with the following defaults:
 Logical separation when applicable (controlled by Ranger Policies and data object implementation)
 Default = Applications (Different Log Locations). Services (Ex; Ranger. Same Log locations).
 Use an additional fields: Tenant ID and Data Delimiters
 This minimizes impact to existing legacy systems and home-grown services
 Tenant IDs and Data Delimiters will be used in tables to identify which records belong to which tenant and
Enterprise Line of Business.
 Use applications to enforce 100% usage of Tenant IDs and Data Delimiters verified through exceptions, audit &
recon
 Adhere to the original idea of Individuation—each individual should be identified as one individual in the
Individuation database, regardless of whether s/he has bought products from more than one tenant.
 S/he can have multiple records in the same system with different Tenant ID’s in case s/he purchased
products from more than one tenant.
Enterprise
 Control access and/or capabilities (such as LOB,
group, segment, etc.) within the tenants they use
 Include application layer that controls what the
constituent experiences or what data they may
access
 Also controls how the constituent accesses the
data
Tenant Level
Domain Level Data will…
Domain Level OBJECTIVES
Enterprise
 Retain data classifications as they exist today
 For employee/state/federal employee, etc.
 ePHI attribute classification and inventory
 User Permissions/Authorizations
 Include audit attributes that answer the following
questions for every dataset:
 Who provided the data?
 What data was collected?
 When was the data collected?
 From where is the data collected?
 Why is the data collected ?
 Data activity monitoring - Who accessed, when
accessed, where accessed
Domain
Database/Application Level Data will…
Tenant Level Data
Database Level OBJECTIVES
DataHandling–Tenant,Domain,
Application,Database,Table(Row&Column) Level
 Create an AD
group that includes
all users
 Resources common
across org will be
shared across users
Table - RowTable - Column
Domain/
Application
Tenant
Enterprise
 Create separate AD
groups based on tenant
ID; add appropriate
users respectively
 Data gets comingled
from different Tenants;
Ranger policies control
access
 Create separate AD
groups for read,
read/write, &
appropriate users
from respectively
 There could be
multiple applications
as part of domain
• Data in tables could be categorized based on roles, such as
accessing data based on column or row level.
• Policies are created for Read and Read/write
• Policies are created at Row and Column level
• Policies are created to mask sensitive data
1
2
3
4
5
Administration
Central Management & Consistent Security
Authentication
Authenticate Users and System
Authorization
Provision Access to Data
Audit
Maintain a record of Data Access
Data Protection
Protect Data at Rest & in Motion
Five Pillars of Security
Ranger – Centralized Administration
Single pane of glass for security administration across multiple Hadoop Components for Creating,
implement, Manage and Monitor Security Policies
Central Management & Consistent security
Ranger – Authorization Policies
Consistent authorization policy structure across Hadoop components
Ranger – Row-filter, Column-masking
Ranger – Access Audit Logs
Apache Ranger generates detailed logs of access to protected resources
Audit logs to multiple destinations like HDFS, Solr and Log4j appender
Interactive view of audit logs in Admin console
Ranger – Architecture
Questions

Más contenido relacionado

La actualidad más candente

Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeDataWorks Summit
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...DataWorks Summit
 
Operating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentOperating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentDataWorks Summit
 
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...DataWorks Summit
 
Building a future-proof cyber security platform with Apache Metron
Building a future-proof cyber security platform with Apache MetronBuilding a future-proof cyber security platform with Apache Metron
Building a future-proof cyber security platform with Apache MetronDataWorks Summit
 
Enterprise large scale graph analytics and computing base on distribute graph...
Enterprise large scale graph analytics and computing base on distribute graph...Enterprise large scale graph analytics and computing base on distribute graph...
Enterprise large scale graph analytics and computing base on distribute graph...DataWorks Summit
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsDataWorks Summit
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryDataWorks Summit
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataScott Clinton
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsDataWorks Summit
 
Continuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseContinuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseDataWorks Summit
 
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...DataWorks Summit
 
Data Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsData Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsSnapLogic
 
HDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSHDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSDataWorks Summit
 
Data Virtualization and ETL
Data Virtualization and ETLData Virtualization and ETL
Data Virtualization and ETLLily Luo
 
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...DataWorks Summit
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationDataWorks Summit
 
Enabling Self-Service Analytics with Logical Data Warehouse (APAC)
Enabling Self-Service Analytics with Logical Data Warehouse (APAC)Enabling Self-Service Analytics with Logical Data Warehouse (APAC)
Enabling Self-Service Analytics with Logical Data Warehouse (APAC)Denodo
 
Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboardDataWorks Summit
 

La actualidad más candente (20)

Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short Time
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
Operating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environmentOperating a secure big data platform in a multi-cloud environment
Operating a secure big data platform in a multi-cloud environment
 
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
Integrating and Analyzing Data from Multiple Manufacturing Sites using Apache...
 
Building a future-proof cyber security platform with Apache Metron
Building a future-proof cyber security platform with Apache MetronBuilding a future-proof cyber security platform with Apache Metron
Building a future-proof cyber security platform with Apache Metron
 
Enterprise large scale graph analytics and computing base on distribute graph...
Enterprise large scale graph analytics and computing base on distribute graph...Enterprise large scale graph analytics and computing base on distribute graph...
Enterprise large scale graph analytics and computing base on distribute graph...
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
 
Pouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy Industry
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your data
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real Problems
 
Continuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseContinuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the Enterprise
 
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
Not Just a necessary evil, it’s good for business: implementing PCI DSS contr...
 
Data Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management RequirementsData Lakes: 8 Enterprise Data Management Requirements
Data Lakes: 8 Enterprise Data Management Requirements
 
Benefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a ServiceBenefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a Service
 
HDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFSHDFS tiered storage: mounting object stores in HDFS
HDFS tiered storage: mounting object stores in HDFS
 
Data Virtualization and ETL
Data Virtualization and ETLData Virtualization and ETL
Data Virtualization and ETL
 
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software Integration
 
Enabling Self-Service Analytics with Logical Data Warehouse (APAC)
Enabling Self-Service Analytics with Logical Data Warehouse (APAC)Enabling Self-Service Analytics with Logical Data Warehouse (APAC)
Enabling Self-Service Analytics with Logical Data Warehouse (APAC)
 
Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboard
 

Similar a Security Framework for Multitenant Architecture

Gdpr ccpa automated compliance - spark java application features and functi...
Gdpr   ccpa automated compliance - spark java application features and functi...Gdpr   ccpa automated compliance - spark java application features and functi...
Gdpr ccpa automated compliance - spark java application features and functi...Steven Meister
 
IAPP PSR 2022: How do you engineer DSAR for Complexity?
IAPP PSR 2022: How do you engineer DSAR for Complexity?IAPP PSR 2022: How do you engineer DSAR for Complexity?
IAPP PSR 2022: How do you engineer DSAR for Complexity?Cillian Kieran
 
Eu gdpr technical workflow and productionalization neccessary w privacy ass...
Eu gdpr technical workflow and productionalization   neccessary w privacy ass...Eu gdpr technical workflow and productionalization   neccessary w privacy ass...
Eu gdpr technical workflow and productionalization neccessary w privacy ass...Steven Meister
 
Data Classification Presentation
Data Classification PresentationData Classification Presentation
Data Classification PresentationDerroylo
 
GDPR BigDataRevealed Readiness Requirements and Evaluation
GDPR BigDataRevealed Readiness Requirements and EvaluationGDPR BigDataRevealed Readiness Requirements and Evaluation
GDPR BigDataRevealed Readiness Requirements and EvaluationSteven Meister
 
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...Steven Meister
 
Unified Information Governance, Powered by Knowledge Graph
Unified Information Governance, Powered by Knowledge GraphUnified Information Governance, Powered by Knowledge Graph
Unified Information Governance, Powered by Knowledge GraphVaticle
 
[Webinar Slides] Data Privacy – Learn What It Takes to Protect Your Information
[Webinar Slides] Data Privacy – Learn What It Takes to Protect Your Information[Webinar Slides] Data Privacy – Learn What It Takes to Protect Your Information
[Webinar Slides] Data Privacy – Learn What It Takes to Protect Your InformationAIIM International
 
Ethyca CodeDriven - Data Privacy Compliance for Engineers & Data Teams
Ethyca CodeDriven - Data Privacy Compliance for Engineers & Data TeamsEthyca CodeDriven - Data Privacy Compliance for Engineers & Data Teams
Ethyca CodeDriven - Data Privacy Compliance for Engineers & Data TeamsCillian Kieran
 
The EU General Protection Regulation and how Oracle can help
The EU General Protection Regulation and how Oracle can help The EU General Protection Regulation and how Oracle can help
The EU General Protection Regulation and how Oracle can help Niklas Hjorthen
 
Michael Josephs
Michael JosephsMichael Josephs
Michael JosephsdaveGBE
 
eBook: 5 Steps to Secure Cloud Data Governance
eBook: 5 Steps to Secure Cloud Data GovernanceeBook: 5 Steps to Secure Cloud Data Governance
eBook: 5 Steps to Secure Cloud Data GovernanceKim Cook
 
Cedar Day 2018 - Is Your PeopleSoft Ready for the GDPR - Sarah Hurley
Cedar Day 2018 - Is Your PeopleSoft Ready for the GDPR - Sarah HurleyCedar Day 2018 - Is Your PeopleSoft Ready for the GDPR - Sarah Hurley
Cedar Day 2018 - Is Your PeopleSoft Ready for the GDPR - Sarah HurleyCedar Consulting
 
A Study on Big Data Privacy Protection Models using Data Masking Methods
A Study on Big Data Privacy Protection Models using Data Masking Methods A Study on Big Data Privacy Protection Models using Data Masking Methods
A Study on Big Data Privacy Protection Models using Data Masking Methods IJECEIAES
 
What Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfWhat Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfPridesys IT Ltd.
 
What Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfWhat Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfPridesys IT Ltd.
 

Similar a Security Framework for Multitenant Architecture (20)

Gdpr ccpa automated compliance - spark java application features and functi...
Gdpr   ccpa automated compliance - spark java application features and functi...Gdpr   ccpa automated compliance - spark java application features and functi...
Gdpr ccpa automated compliance - spark java application features and functi...
 
IAPP PSR 2022: How do you engineer DSAR for Complexity?
IAPP PSR 2022: How do you engineer DSAR for Complexity?IAPP PSR 2022: How do you engineer DSAR for Complexity?
IAPP PSR 2022: How do you engineer DSAR for Complexity?
 
Eu gdpr technical workflow and productionalization neccessary w privacy ass...
Eu gdpr technical workflow and productionalization   neccessary w privacy ass...Eu gdpr technical workflow and productionalization   neccessary w privacy ass...
Eu gdpr technical workflow and productionalization neccessary w privacy ass...
 
Data Classification Presentation
Data Classification PresentationData Classification Presentation
Data Classification Presentation
 
GDPR BigDataRevealed Readiness Requirements and Evaluation
GDPR BigDataRevealed Readiness Requirements and EvaluationGDPR BigDataRevealed Readiness Requirements and Evaluation
GDPR BigDataRevealed Readiness Requirements and Evaluation
 
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
Gdpr ccpa steps to near as close to compliancy as possible with low risk of f...
 
Unified Information Governance, Powered by Knowledge Graph
Unified Information Governance, Powered by Knowledge GraphUnified Information Governance, Powered by Knowledge Graph
Unified Information Governance, Powered by Knowledge Graph
 
[Webinar Slides] Data Privacy – Learn What It Takes to Protect Your Information
[Webinar Slides] Data Privacy – Learn What It Takes to Protect Your Information[Webinar Slides] Data Privacy – Learn What It Takes to Protect Your Information
[Webinar Slides] Data Privacy – Learn What It Takes to Protect Your Information
 
Ethyca CodeDriven - Data Privacy Compliance for Engineers & Data Teams
Ethyca CodeDriven - Data Privacy Compliance for Engineers & Data TeamsEthyca CodeDriven - Data Privacy Compliance for Engineers & Data Teams
Ethyca CodeDriven - Data Privacy Compliance for Engineers & Data Teams
 
The EU General Protection Regulation and how Oracle can help
The EU General Protection Regulation and how Oracle can help The EU General Protection Regulation and how Oracle can help
The EU General Protection Regulation and how Oracle can help
 
Michael Josephs
Michael JosephsMichael Josephs
Michael Josephs
 
Bigdata
Bigdata Bigdata
Bigdata
 
eBook: 5 Steps to Secure Cloud Data Governance
eBook: 5 Steps to Secure Cloud Data GovernanceeBook: 5 Steps to Secure Cloud Data Governance
eBook: 5 Steps to Secure Cloud Data Governance
 
Cedar Day 2018 - Is Your PeopleSoft Ready for the GDPR - Sarah Hurley
Cedar Day 2018 - Is Your PeopleSoft Ready for the GDPR - Sarah HurleyCedar Day 2018 - Is Your PeopleSoft Ready for the GDPR - Sarah Hurley
Cedar Day 2018 - Is Your PeopleSoft Ready for the GDPR - Sarah Hurley
 
A Study on Big Data Privacy Protection Models using Data Masking Methods
A Study on Big Data Privacy Protection Models using Data Masking Methods A Study on Big Data Privacy Protection Models using Data Masking Methods
A Study on Big Data Privacy Protection Models using Data Masking Methods
 
What Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfWhat Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdf
 
Sensitive Data Assesment
Sensitive Data AssesmentSensitive Data Assesment
Sensitive Data Assesment
 
Security for Big Data
Security for Big DataSecurity for Big Data
Security for Big Data
 
InsiderAttack_p3.ppt
InsiderAttack_p3.pptInsiderAttack_p3.ppt
InsiderAttack_p3.ppt
 
What Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdfWhat Is Big Data How Big Data Works.pdf
What Is Big Data How Big Data Works.pdf
 

Más de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...DataWorks Summit
 
Open Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart CitiesOpen Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart CitiesDataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 
Open Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart CitiesOpen Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart Cities
 

Último

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Último (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Security Framework for Multitenant Architecture

  • 1. Suresh Yadagotti Jayaram Sr. IT Technical Architect Multi Tenant Security Architecture for Big Data Systems
  • 2. “Big Data refers to datasets whose size and/or structure is beyond the ability of traditional software tools or database systems to store, process, and analyze within reasonable timeframes” HADOOP is a computing environment built on top of a distributed clustered file system (HDFS) that was designed specifically for large scale data operations (e.g. MapReduce) What is Big Data
  • 3. Pre-Cursor Reasons for securing data in Big Data systems  Teams go from a POC to deploying a production cluster, and with it petabytes of data.  Contains sensitive cardholder and other customer or corporate data that must be protected Compliance to PCI DSS, FISMA, HIPAA, federal/state laws to protect PII  Usage was restricted to non-sensitive data  Allow access to restricted datasets with Security Contains Sensitive Data Subject to Regulatory Compliance Business Enablement
  • 4. Data Breaches & Hacks Different kinds of PII, financial data, and IP breached. Healthcare, Retail, Federal Govt., Financial Institutions, Tech companies etc.
  • 5. Per capita cost – Industry Sector Certain industries have higher data breach costs. compares 2018 year’s per capita costs for the consolidated sample by industry classification. As can be seen, heavily regulated industries such as healthcare and financial organizations have a per capita data breach cost substantially higher than the overall mean. $75 $92 $116 $120 $128 $128 $134 $140 $145 $152 $166 $167 $170 $174 $181 $206 $408 $0 $50 $100 $150 $200 $250 $300 $350 $400 $450 Public Reatail Transportati… Media Entertainme… Education Technology Services Health Measured in US$
  • 6. Root Causes 27% 25% 48% 48% Malicious or Criminal Attack 27% Human Errors 25% System glitch
  • 7. Goals of an Attacker 0301 The primary goal is to obtain sensitive data that sits in Organization Databases 02 This could include different kinds of regulated data (e.g. Payment data, Heath data) or other personally identifiable data (PII) Other attacks could include attacks attempting to destroy or modify data or prevent availability of this platform.
  • 8. Threats Host Level Data at Rest Attacks  Application Level  HDFS level  File System/Volume level Infrastructure Security  Automation  SELinux Unauthorized access  Authentication  Authorization  Auditing Network Based Attacks  Transport Layer Security  SASL Encryption Types of Threats Attacker attempts to gain privileges to access data
  • 9. Security Objectives For securing data technologies Best Practices Standards alignment Alliance State of Organization Contractual Obligations With regulatory mandate requirements Compliance Evidence of controls SOC2/ Type 2 Audit Successful implementation of Data Lakes in organizations will demonstrate confidentiality, integrity, and availability across the enterprise. “It’s all about the data.”
  • 10. Achieve Secure Data Enablement By understanding the key criteria: USERS  Who is using the data?  Who needs what kind of access? LIFECYCLE  How does information connect across systems?  What are retention requirements for the data? CONTROLS  Engage early to understand controls complexity  Know the value & risk factors indicated by the data & solutions. GOVERNANCE  Knowing what the information is  What is the function of the data?
  • 11. Enterpriseis the highest level and any data stored at this level is visible / available for all the tenants (geographical data, code sets, etc.) To minimize the impact to the existing legacy systems and home- grown services, we will use the additional attributes like “Tenant ID” and “Data Delimiters” to identify which records belong to which tenant. Members can have multiple records in the same system with different Tenant ID’s in case s/he purchased products from more than one tenant. Application Layer/Domains to control access and/or capabilities (such as LOB, group, segment, or other data restrictions or classifications) within the tenants they use. Application layer to control what the constituent experiences, what data they can access, and how. Every data set will include audit attributes such as: • Who is providing the data? , • What data is being collected ?, • When the data is collected?, • Where the data is collected from? • Why is the data collected ? Enterprise Level Tenant Level Domain Level Database/Table Data level hierarchy & OBJECTIVES
  • 12.  Be visible & available to ALL tenants  Data Classified, labeled, or segregated in a manner that indicates it has been approved for enterprise wide use (classification is TBD) which may include Geographical data, code sets, etc.  Data Classified as Public  Support both internal and external users depending on classification  Internal users get access through an application Id or directly with User Id Enterprise level objectives Enterprise Level Data will…
  • 13. Enterprise Tenant Level Data will… Tenant level OBJECTIVES  Support multiple tenants  Be segregated logically (tagged, labeled, or container segregated based on tenant ID or data delimiters, not physically where possible based on controls objectives for organizations  Be co-mingled; all applications are storing data together with the following defaults:  Logical separation when applicable (controlled by Ranger Policies and data object implementation)  Default = Applications (Different Log Locations). Services (Ex; Ranger. Same Log locations).  Use an additional fields: Tenant ID and Data Delimiters  This minimizes impact to existing legacy systems and home-grown services  Tenant IDs and Data Delimiters will be used in tables to identify which records belong to which tenant and Enterprise Line of Business.  Use applications to enforce 100% usage of Tenant IDs and Data Delimiters verified through exceptions, audit & recon  Adhere to the original idea of Individuation—each individual should be identified as one individual in the Individuation database, regardless of whether s/he has bought products from more than one tenant.  S/he can have multiple records in the same system with different Tenant ID’s in case s/he purchased products from more than one tenant.
  • 14. Enterprise  Control access and/or capabilities (such as LOB, group, segment, etc.) within the tenants they use  Include application layer that controls what the constituent experiences or what data they may access  Also controls how the constituent accesses the data Tenant Level Domain Level Data will… Domain Level OBJECTIVES
  • 15. Enterprise  Retain data classifications as they exist today  For employee/state/federal employee, etc.  ePHI attribute classification and inventory  User Permissions/Authorizations  Include audit attributes that answer the following questions for every dataset:  Who provided the data?  What data was collected?  When was the data collected?  From where is the data collected?  Why is the data collected ?  Data activity monitoring - Who accessed, when accessed, where accessed Domain Database/Application Level Data will… Tenant Level Data Database Level OBJECTIVES
  • 16. DataHandling–Tenant,Domain, Application,Database,Table(Row&Column) Level  Create an AD group that includes all users  Resources common across org will be shared across users Table - RowTable - Column Domain/ Application Tenant Enterprise  Create separate AD groups based on tenant ID; add appropriate users respectively  Data gets comingled from different Tenants; Ranger policies control access  Create separate AD groups for read, read/write, & appropriate users from respectively  There could be multiple applications as part of domain • Data in tables could be categorized based on roles, such as accessing data based on column or row level. • Policies are created for Read and Read/write • Policies are created at Row and Column level • Policies are created to mask sensitive data
  • 17. 1 2 3 4 5 Administration Central Management & Consistent Security Authentication Authenticate Users and System Authorization Provision Access to Data Audit Maintain a record of Data Access Data Protection Protect Data at Rest & in Motion Five Pillars of Security
  • 18. Ranger – Centralized Administration Single pane of glass for security administration across multiple Hadoop Components for Creating, implement, Manage and Monitor Security Policies Central Management & Consistent security
  • 19. Ranger – Authorization Policies Consistent authorization policy structure across Hadoop components
  • 20. Ranger – Row-filter, Column-masking
  • 21. Ranger – Access Audit Logs Apache Ranger generates detailed logs of access to protected resources Audit logs to multiple destinations like HDFS, Solr and Log4j appender Interactive view of audit logs in Admin console