SlideShare una empresa de Scribd logo
1 de 23
©2014 LinkedIn Corporation. All Rights Reserved.
Taking Hadoop to Enterprise Security
Standards
Access Control
How many of you need or have
access control in Hadoop?
©2014 LinkedIn Corporation. All Rights Reserved.
Users First Internal Threat
Keeping Data Secure
External Threat
More granular the access controls are
more people can have access to the data
©2014 LinkedIn Corporation. All Rights Reserved.
Hadoop – Status Quo
Multiple Query
Execution
Engines
Custom Code
Execution
Auditing
©2014 LinkedIn Corporation. All Rights Reserved.
User ID Email Address IP address Billing address
Security Customer Service Data Scientist
Adding & Removing group membership can take up to few hours
HDFS file permissions are very coarse (at file level)
HDFS File Permissions
©2014 LinkedIn Corporation. All Rights Reserved.
Other Access Control Solutions
©2014 LinkedIn Corporation. All Rights Reserved.
Mixed Data Multiple Data Processing
Systems
Data for Everyone
Challenges
©2014 LinkedIn Corporation. All Rights Reserved.
Extensible
Authorization
Fine Grain
Control
Fast Changes to
Authorization
Rules
What do we need?
©2014 LinkedIn Corporation. All Rights Reserved.
Our Solution: Access Control via Encryption
Apache Kafka
HDFS
Key Server
Parquet
ETLEncrypted
Events
©2014 LinkedIn Corporation. All Rights Reserved.
User A’s Job
User B’s Job
User C’s Job
Producer
Job
ETL User
Parquet File
User Columns
A 5
B 2, 5
Key Server
Access Control via Encryption
©2014 LinkedIn Corporation. All Rights Reserved.
Columnar Storage
Page 0
Page 1
Page 2
Column a Column b
Rowgroup
Parquet Format
Brief Overview of Parquet
©2014 LinkedIn Corporation. All Rights Reserved.
*Yet to be integrated into open source Parquet
Field mode
Page
Column
| Page Mode | Hybrid Mode
Encryption Support in Parquet*
©2014 LinkedIn Corporation. All Rights Reserved.
Examples
 Emails – Analysts need it to join with other tables but may not require
access to individual emails
N Values
(Page)
Encrypt each value at
a time
karthik@gmail.com
harsh@gmail.com
harsh@gmail.com
arvind@gmail.com
xxxxxxx
yyyyyyy
yyyyyyy
zzzzzzz
Field Mode
©2014 LinkedIn Corporation. All Rights Reserved.
Field Mode
Joins Counts
Distribution Analysis
No/Low compression
©2014 LinkedIn Corporation. All Rights Reserved.
Page Mode
 No information is leaked except entropy of the data
 Better performance than other modes
N Values
(Page)
Encode Compress Encrypt
©2014 LinkedIn Corporation. All Rights Reserved.
Hybrid Mode
 More fine grain control of information
 Increase in overhead due to double encryption/decryption
N Values
(Page)
Encrypt each
value
Encrypt
©2014 LinkedIn Corporation. All Rights Reserved.
Plain Text | Encrypted Value | No Access
Field Mode Page Mode
Hybrid Mode
©2014 LinkedIn Corporation. All Rights Reserved.
Key Versioning
 Each key is versioned and specific for a source (File/Event name)
 Reduces the exposure incase of key leakage
 Time based access control
– All users by default can access only last 30 days of data
– Give users access to data in specific time period
 Authentication of producers can be done separately
©2014 LinkedIn Corporation. All Rights Reserved.
Better Auditing
Coverage
Retention
Enforcement
Key Server Features
Multifactor
Authentication
©2014 LinkedIn Corporation. All Rights Reserved.
PIG Usage
Thank you!

Más contenido relacionado

Similar a Taking Hadoop to Enterprise Security Standards

Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Datameer
 
DDS Security for the Industrial Internet - London Connext DDS Conference
DDS Security for the Industrial Internet - London Connext DDS ConferenceDDS Security for the Industrial Internet - London Connext DDS Conference
DDS Security for the Industrial Internet - London Connext DDS ConferenceGerardo Pardo-Castellote
 
EfficientIP webinar mitigate dns zero day vulnerability
EfficientIP webinar mitigate dns zero day vulnerabilityEfficientIP webinar mitigate dns zero day vulnerability
EfficientIP webinar mitigate dns zero day vulnerabilityEfficientIP
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifyHortonworks
 
Application Whitelisting and DPI in ICS (English)
Application Whitelisting and DPI in ICS (English)Application Whitelisting and DPI in ICS (English)
Application Whitelisting and DPI in ICS (English)Digital Bond
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachDataWorks Summit
 
Big Data for Security - DNS Analytics
Big Data for Security - DNS AnalyticsBig Data for Security - DNS Analytics
Big Data for Security - DNS AnalyticsMarco Casassa Mont
 
Hive at LinkedIn
Hive at LinkedIn Hive at LinkedIn
Hive at LinkedIn mislam77
 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoTEric Kavanagh
 
IRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET Journal
 
Privacy Preserving Data Analytics using Cryptographic Technique for Large Dat...
Privacy Preserving Data Analytics using Cryptographic Technique for Large Dat...Privacy Preserving Data Analytics using Cryptographic Technique for Large Dat...
Privacy Preserving Data Analytics using Cryptographic Technique for Large Dat...IRJET Journal
 
Online Focus Groups Privacy and Security Considerations
Online Focus Groups Privacy and Security ConsiderationsOnline Focus Groups Privacy and Security Considerations
Online Focus Groups Privacy and Security ConsiderationsAlfonso Sintjago
 
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...Edureka!
 
Four Factors for Evaluating SD-WAN Solutions
Four Factors for Evaluating SD-WAN SolutionsFour Factors for Evaluating SD-WAN Solutions
Four Factors for Evaluating SD-WAN SolutionsRiverbed Technology
 
Taking Control of SharePoint in the Cloud
Taking Control of SharePoint in the CloudTaking Control of SharePoint in the Cloud
Taking Control of SharePoint in the CloudSherWeb
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...DataWorks Summit/Hadoop Summit
 
Hortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks
 
Big Data Use Cases | Hadoop Tutorial for Beginners | Hadoop Training | Edureka
Big Data Use Cases | Hadoop Tutorial for Beginners | Hadoop Training | EdurekaBig Data Use Cases | Hadoop Tutorial for Beginners | Hadoop Training | Edureka
Big Data Use Cases | Hadoop Tutorial for Beginners | Hadoop Training | EdurekaEdureka!
 
CIS14: Network-Aware IAM
CIS14: Network-Aware IAMCIS14: Network-Aware IAM
CIS14: Network-Aware IAMCloudIDSummit
 

Similar a Taking Hadoop to Enterprise Security Standards (20)

Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?Is Your Hadoop Environment Secure?
Is Your Hadoop Environment Secure?
 
DDS Security for the Industrial Internet - London Connext DDS Conference
DDS Security for the Industrial Internet - London Connext DDS ConferenceDDS Security for the Industrial Internet - London Connext DDS Conference
DDS Security for the Industrial Internet - London Connext DDS Conference
 
EfficientIP webinar mitigate dns zero day vulnerability
EfficientIP webinar mitigate dns zero day vulnerabilityEfficientIP webinar mitigate dns zero day vulnerability
EfficientIP webinar mitigate dns zero day vulnerability
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and CentrifySimplify and Secure your Hadoop Environment with Hortonworks and Centrify
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
 
Application Whitelisting and DPI in ICS (English)
Application Whitelisting and DPI in ICS (English)Application Whitelisting and DPI in ICS (English)
Application Whitelisting and DPI in ICS (English)
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
 
Big Data for Security - DNS Analytics
Big Data for Security - DNS AnalyticsBig Data for Security - DNS Analytics
Big Data for Security - DNS Analytics
 
Hive at LinkedIn
Hive at LinkedIn Hive at LinkedIn
Hive at LinkedIn
 
DDS Security
DDS SecurityDDS Security
DDS Security
 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoT
 
IRJET- Secured Hadoop Environment
IRJET- Secured Hadoop EnvironmentIRJET- Secured Hadoop Environment
IRJET- Secured Hadoop Environment
 
Privacy Preserving Data Analytics using Cryptographic Technique for Large Dat...
Privacy Preserving Data Analytics using Cryptographic Technique for Large Dat...Privacy Preserving Data Analytics using Cryptographic Technique for Large Dat...
Privacy Preserving Data Analytics using Cryptographic Technique for Large Dat...
 
Online Focus Groups Privacy and Security Considerations
Online Focus Groups Privacy and Security ConsiderationsOnline Focus Groups Privacy and Security Considerations
Online Focus Groups Privacy and Security Considerations
 
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
Hadoop vs Spark | Which One to Choose? | Hadoop Training | Spark Training | E...
 
Four Factors for Evaluating SD-WAN Solutions
Four Factors for Evaluating SD-WAN SolutionsFour Factors for Evaluating SD-WAN Solutions
Four Factors for Evaluating SD-WAN Solutions
 
Taking Control of SharePoint in the Cloud
Taking Control of SharePoint in the CloudTaking Control of SharePoint in the Cloud
Taking Control of SharePoint in the Cloud
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
 
Hortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinarHortonworks and Voltage Security webinar
Hortonworks and Voltage Security webinar
 
Big Data Use Cases | Hadoop Tutorial for Beginners | Hadoop Training | Edureka
Big Data Use Cases | Hadoop Tutorial for Beginners | Hadoop Training | EdurekaBig Data Use Cases | Hadoop Tutorial for Beginners | Hadoop Training | Edureka
Big Data Use Cases | Hadoop Tutorial for Beginners | Hadoop Training | Edureka
 
CIS14: Network-Aware IAM
CIS14: Network-Aware IAMCIS14: Network-Aware IAM
CIS14: Network-Aware IAM
 

Más de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 

Último (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 

Taking Hadoop to Enterprise Security Standards

  • 1. ©2014 LinkedIn Corporation. All Rights Reserved. Taking Hadoop to Enterprise Security Standards
  • 3. How many of you need or have access control in Hadoop?
  • 4. ©2014 LinkedIn Corporation. All Rights Reserved. Users First Internal Threat Keeping Data Secure External Threat
  • 5. More granular the access controls are more people can have access to the data
  • 6. ©2014 LinkedIn Corporation. All Rights Reserved. Hadoop – Status Quo Multiple Query Execution Engines Custom Code Execution Auditing
  • 7. ©2014 LinkedIn Corporation. All Rights Reserved. User ID Email Address IP address Billing address Security Customer Service Data Scientist Adding & Removing group membership can take up to few hours HDFS file permissions are very coarse (at file level) HDFS File Permissions
  • 8. ©2014 LinkedIn Corporation. All Rights Reserved. Other Access Control Solutions
  • 9. ©2014 LinkedIn Corporation. All Rights Reserved. Mixed Data Multiple Data Processing Systems Data for Everyone Challenges
  • 10. ©2014 LinkedIn Corporation. All Rights Reserved. Extensible Authorization Fine Grain Control Fast Changes to Authorization Rules What do we need?
  • 11. ©2014 LinkedIn Corporation. All Rights Reserved. Our Solution: Access Control via Encryption Apache Kafka HDFS Key Server Parquet ETLEncrypted Events
  • 12. ©2014 LinkedIn Corporation. All Rights Reserved. User A’s Job User B’s Job User C’s Job Producer Job ETL User Parquet File User Columns A 5 B 2, 5 Key Server Access Control via Encryption
  • 13. ©2014 LinkedIn Corporation. All Rights Reserved. Columnar Storage Page 0 Page 1 Page 2 Column a Column b Rowgroup Parquet Format Brief Overview of Parquet
  • 14. ©2014 LinkedIn Corporation. All Rights Reserved. *Yet to be integrated into open source Parquet Field mode Page Column | Page Mode | Hybrid Mode Encryption Support in Parquet*
  • 15. ©2014 LinkedIn Corporation. All Rights Reserved. Examples  Emails – Analysts need it to join with other tables but may not require access to individual emails N Values (Page) Encrypt each value at a time karthik@gmail.com harsh@gmail.com harsh@gmail.com arvind@gmail.com xxxxxxx yyyyyyy yyyyyyy zzzzzzz Field Mode
  • 16. ©2014 LinkedIn Corporation. All Rights Reserved. Field Mode Joins Counts Distribution Analysis No/Low compression
  • 17. ©2014 LinkedIn Corporation. All Rights Reserved. Page Mode  No information is leaked except entropy of the data  Better performance than other modes N Values (Page) Encode Compress Encrypt
  • 18. ©2014 LinkedIn Corporation. All Rights Reserved. Hybrid Mode  More fine grain control of information  Increase in overhead due to double encryption/decryption N Values (Page) Encrypt each value Encrypt
  • 19. ©2014 LinkedIn Corporation. All Rights Reserved. Plain Text | Encrypted Value | No Access Field Mode Page Mode Hybrid Mode
  • 20. ©2014 LinkedIn Corporation. All Rights Reserved. Key Versioning  Each key is versioned and specific for a source (File/Event name)  Reduces the exposure incase of key leakage  Time based access control – All users by default can access only last 30 days of data – Give users access to data in specific time period  Authentication of producers can be done separately
  • 21. ©2014 LinkedIn Corporation. All Rights Reserved. Better Auditing Coverage Retention Enforcement Key Server Features Multifactor Authentication
  • 22. ©2014 LinkedIn Corporation. All Rights Reserved. PIG Usage