SlideShare una empresa de Scribd logo
1 de 32
Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
HDP.Next: Governance
Andrew Ahn
Summit San Jose
Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas
Architecture
• Overview
• Type System
• Screens / Demo
Overview
• Vision
• Features
• Roadmap
Governance
• Enterprise Goals
• Atlas positioning
Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Speakers
Venkatesh Seetharam, Hortonworks
Development Architect
Principle Architect for Apache Atlas and leads development. He was part of the
Hadoop team at Yahoo where he built data management solutions. He has
been involved in Hadoop and contributed to many open source projects in the
ecosystem over the last 7 years. He is a committer on numerous Apache
projects including Falcon and Atlas.
Andrew Ahn, Hortonworks
Director of Governance Product Management
Responsible for development roadmap of Apache Falcon and Apache Atlas, co-
development relationships and GTM strategies at Hortonworks for governance
projects. He has over 6 years of big data domain experience in the financial
sector.
Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Enterprise Data Governance Goals
GOAL: Provide a common approach to
data governance across all systems
and data within the organization
• Transparent
Governance standards & protocols must be
clearly defined and available to all
• Reproducible
Recreate the relevant data landscape at a
point in time
• Auditable
All relevant events and assets but be
traceable with appropriate historical lineage
• Consistent
Compliance practices must be consistent
ETL/DQ
BPM
Business
Analytics
Visualization
& Dashboards
ERP
CRM
SCM
MDM
ARCHIVE
Governance
Framework
Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Data Governance Initiative for Hadoop
ETL/DQ
BPM
Business
Analytics
Visualization
& Dashboards
ERP
CRM
SCM
MDM
ARCHIVE
Data Governance Initiative
Common
Governance
Framework
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
° °
° °
°
°
ApachePig
ApacheHive
ApacheHBase
ApacheAccumulo
ApacheSolr
ApacheSpark
ApacheStorm
TWO Requirements
1. Hadoop must snap in to
the existing frameworks
and be a good citizen
2. Hadoop must also provide
governance within its own
stack of technologies
A group of companies dedicated to meeting
these requirements in the open
Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas Overview
We Do Hadoop
Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas Vision
Metadata Services
• Flexible Knowledge Store
• Business Catalog / Operational Data
• Search & Proscriptive Lineage
• Centralized location for all metadata within HDP
• Interface point for Metadata Exchange with platforms
outside of HDP.
Metadata will enrich every component
• Hive – Complete lineage, every HiveQL tracked
• Ranger – Tag or Attribute security ABAC
• Falcon – Business Taxonomy
Apache Atlas
Hive
Ranger
Falcon
Kafka
Storm
Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Push model: Component Instrumentation Request
• Model of component Type
• Model of Component
Execution
• Lineage / Search – Minimum of
Coarse Grain
• Relevant Operational Metadata
HDP components will make audit entries into Atlas metadata store, proscriptively
tracking lineage. As this true for the HDP stack it will also apply to workflow outside of
Hadoop.
Apache Atlas
Hive
Ranger
Falcon
Kafka
Storm
Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas Capabilities: Overview
Data Classification
• Import or define taxonomy business-oriented annotations for data
• Define, annotate, and automate capture of relationships between data sets and underlying
elements including source, target, and derivation processes
• Export metadata to third-party systems
Centralized Auditing
• Capture security access information for every application, process, and interaction with data
• Capture the operational information for execution, steps, and activities
Search & Lineage (Browse)
• Pre-defined navigation paths to explore the data classification and audit information
• Text-based search features locates relevant data and audit event across Data Lake quickly
and accurately
• Browse visualization of data set lineage allowing users to drill-down into operational, security,
and provenance related information
Security & Policy Engine
• Rationalize compliance policy at runtime based on data classification schemes
• Advanced definition of policies for preventing data derivation based on classification (i.e. re-
identification)
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Policy Engine
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Custom
CWM
Retail
PCI
PII
Other
Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Load Wrapper
Sample Use Case: ETL Offload
RDMS
Business
Catalog
Metadata
ETL:
CTAS
SQL
Hive
Traditional
EDW
New ETL
Hadoop
Atlas
Sqoop
Reporter
via REST
API
Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Governance Ready Certification Program
Curated group of vendor partners to provide
rich & complete features
Customers choose features that they want to
deploy – a la carte.
Low switching costs !
HDP at core to provide stability and
interoperability
Discovery
Tagging
Prep /
Cleanse
ETL
Governance
BPM
Self
Service
Visual-
ization
Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
• ASF MVP (5/15) – Preview Core Metadata Services: Type
system, API’s, basic UI, Hive connecter
• HDP 2.3 (6/30) - GA Core Metadata Services. Preview
Metadata Business Glossary
• M10 – (7/20) – Preview ABAC with Ranger integration, Aetna
Automated tests, Sqoop component for DB2
• M20 – Preview Kafka, Storm connectors, Gov Ready
Certification program, Column lineage, Preview row level
masking.
• HDP 2.4 (Q4’15) GA all preview features
12
High Level Roadmap
Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Architecture
Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
High Level Architecture
Type System
Repository
Search DSL
Bridge
Hive Storm OthersSqoop
REST API
Titan / HBase
Solr/Elastic
Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
Technology Stack
• Knowledge Store
o Titan Graph DB
• Pluggable Search Backend
o Elastic search
o Solr
• Rules Engine
o TBD
• Audit Store
o YARN ATS - Time series DB
• Java 1.7
• Dashboard
o TBD
Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
Admin
GET: /admin/stack
GET: /admin/version
Entity
GET: /entities/definition/{guid}
POST: /entities/submit/{typeName}
GET: /entities/list/{entityType}
Metadata Discovery
GET: /discovery/search/gremlin/{gremlinQuery}
GET: /discovery/search/relationships/{guid}
GET: /discovery/search/fullText?text=<query>
GET: /discovery/getIndexedFields
Rexster
GET: /graph/vertices/{id}
GET: /graph/vertices/properties/{id}
GET: /graph/vertices
GET: /graph/vertices/{id}/{direction}
GET: /graph/edges/{id}
Types
POST: /types/submit/{typeName}
GET: /types/definition/{typeName}
GET: /types/list
Hive Lineage
GET: /bridge/hive/{id}
GET: /bridge/hive
POST: /bridge/hive
16
Tech Preview APIs: Jan 2015
Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Type System – Overview of Types
• Class
• Struct
• Trait
• Primitives
• Collections
• Map
• Array
• Instances (Entity)
• Referenceable
Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Type System – Data Types
Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
_class("Column") {
"name" ~ (string, required)
"dataType" ~ (string, required)
"sd" ~ ("StorageDesc", required)
}
_class("Table", List()) {
"name" ~ (string, required, indexed)
"db" ~ ("DB", required)
"sd" ~ ("StorageDesc", required)
}
_trait("Dimension") {}
_trait("PII") {}
_trait("Metric") {}
_trait("ETL") {}
_trait("JdbcAccess") {}
_class("DB") {
"name" ~ (string, required, indexed, unique)
"owner" ~ (string)
"createTime" ~ (int)
}
_class("StorageDesc") {
"inputFormat" ~ (string, required)
"outputFormat" ~ (string, required)
}
Page20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Repository
• Graph Database
• Titan with storage backed by HBase
• Types and instances are mapped to the Graph DB
• Classes, Structs and Traits map to a vertex
• Relationships are mapped as edges
• Search - plugin enabled
• Indexing based on type annotations
• Solr
• Elastic search
Page21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Search
• DSL with SQL Like Syntax
• from $type is $trait where $clause select|has $attributes loop $loopExpression withPath, repeat
• Examples
• from DB
• DB where name="Reporting" select name, owner
• DB has name
• DB is JdbcAccess
• Column where Column isa PII
• Table where name="sales_fact", columns
• Table where name="sales_fact", columns as column select column.name, column.dataType,
column.comment
• Full-text search
Page22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Lineage
• Uses Search DSL Loop expression
• Everything results in search
• Named Queries
• inputs
• Table where (name = "sales_fact_monthly_mv") as src loop (LoadProcess->outputTable
inputTables) as dest select src.name as src_name, dest.name as dest_name withPath
• outputs
• Table where (name = "sales_fact") as src loop (LoadProcess->inputTables outputTables) as dest
select src.name as src_name, dest.name as dest_name withPath
• schema
• Table where name="sales_fact", columns
Page23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hive Integration
Apache Atlas
Hive Bridge
(Client)
Hive Hook
(Post-execution)
REST API
Page24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas Screens
Page25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
25
Page26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Demo – Class Diagram
Page31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Demo – Instance Graph
Page32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Questions and Answers

Más contenido relacionado

La actualidad más candente

Tag based policies using Apache Atlas and Ranger
Tag based policies using Apache Atlas and RangerTag based policies using Apache Atlas and Ranger
Tag based policies using Apache Atlas and RangerVimal Sharma
 
Atlas ApacheCon 2017
Atlas ApacheCon 2017Atlas ApacheCon 2017
Atlas ApacheCon 2017Vimal Sharma
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasDataWorks Summit/Hadoop Summit
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...DataWorks Summit/Hadoop Summit
 
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentBest Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentDataWorks Summit/Hadoop Summit
 
Apache atlas sydney 2017-v4
Apache atlas   sydney 2017-v4Apache atlas   sydney 2017-v4
Apache atlas sydney 2017-v4Nigel Jones
 
Apache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsApache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsDataWorks Summit/Hadoop Summit
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasDataWorks Summit
 
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache AtlasGDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache AtlasDataWorks Summit
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetupAlex Zeltov
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies DataWorks Summit/Hadoop Summit
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success DataWorks Summit/Hadoop Summit
 
GDPR Community Showcase for Apache Ranger and Apache Atlas
GDPR Community Showcase for Apache Ranger and Apache AtlasGDPR Community Showcase for Apache Ranger and Apache Atlas
GDPR Community Showcase for Apache Ranger and Apache AtlasDataWorks Summit
 
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...DataWorks Summit
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...DataWorks Summit
 
Data Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopData Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopDataWorks Summit
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewDataWorks Summit/Hadoop Summit
 
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun MurthySpark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun MurthySpark Summit
 

La actualidad más candente (20)

Tag based policies using Apache Atlas and Ranger
Tag based policies using Apache Atlas and RangerTag based policies using Apache Atlas and Ranger
Tag based policies using Apache Atlas and Ranger
 
Atlas ApacheCon 2017
Atlas ApacheCon 2017Atlas ApacheCon 2017
Atlas ApacheCon 2017
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
Best Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop EnvironmentBest Practices for Enterprise User Management in Hadoop Environment
Best Practices for Enterprise User Management in Hadoop Environment
 
Apache atlas sydney 2017-v4
Apache atlas   sydney 2017-v4Apache atlas   sydney 2017-v4
Apache atlas sydney 2017-v4
 
Apache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsApache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop components
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache Atlas
 
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache AtlasGDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
 
Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetup
 
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
Apache Atlas: Why Big Data Management Requires Hierarchical Taxonomies
 
Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success Swimming Across the Data Lake, Lessons learned and keys to success
Swimming Across the Data Lake, Lessons learned and keys to success
 
Apache Atlas: Governance for your Data
Apache Atlas: Governance for your DataApache Atlas: Governance for your Data
Apache Atlas: Governance for your Data
 
GDPR Community Showcase for Apache Ranger and Apache Atlas
GDPR Community Showcase for Apache Ranger and Apache AtlasGDPR Community Showcase for Apache Ranger and Apache Atlas
GDPR Community Showcase for Apache Ranger and Apache Atlas
 
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
Understanding Your Crown Jewels: Finding, Organizing, and Profiling Sensitive...
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
 
Data Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise HadoopData Discovery & Lineage in Enterprise Hadoop
Data Discovery & Lineage in Enterprise Hadoop
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
 
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun MurthySpark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun Murthy
 

Similar a HDP Next: Governance

Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?DataWorks Summit/Hadoop Summit
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in HadoopMadhan Neethiraj
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceHortonworks
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...DataWorks Summit/Hadoop Summit
 
Tableau and hadoop
Tableau and hadoopTableau and hadoop
Tableau and hadoopCraig Jordan
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudDataWorks Summit/Hadoop Summit
 
What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it DataWorks Summit/Hadoop Summit
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseJeffrey T. Pollock
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017alanfgates
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoDataWorks Summit
 
One Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceOne Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceJeffrey T. Pollock
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Innovative Management Services
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?DataWorks Summit
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019alanfgates
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyInside Analysis
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?DataWorks Summit
 

Similar a HDP Next: Governance (20)

Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
 
Tableau and hadoop
Tableau and hadoopTableau and hadoop
Tableau and hadoop
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
 
What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it What the #$* is a Business Catalog and why you need it
What the #$* is a Business Catalog and why you need it
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San Jose
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
 
One Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceOne Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and Governance
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?What is New in Apache Hive 3.0?
What is New in Apache Hive 3.0?
 
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019Hive 3 New Horizons DataWorks Summit Melbourne February 2019
Hive 3 New Horizons DataWorks Summit Melbourne February 2019
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 

Más de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Último (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

HDP Next: Governance

  • 1. Page1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HDP.Next: Governance Andrew Ahn Summit San Jose
  • 2. Page2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas Architecture • Overview • Type System • Screens / Demo Overview • Vision • Features • Roadmap Governance • Enterprise Goals • Atlas positioning
  • 3. Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Speakers Venkatesh Seetharam, Hortonworks Development Architect Principle Architect for Apache Atlas and leads development. He was part of the Hadoop team at Yahoo where he built data management solutions. He has been involved in Hadoop and contributed to many open source projects in the ecosystem over the last 7 years. He is a committer on numerous Apache projects including Falcon and Atlas. Andrew Ahn, Hortonworks Director of Governance Product Management Responsible for development roadmap of Apache Falcon and Apache Atlas, co- development relationships and GTM strategies at Hortonworks for governance projects. He has over 6 years of big data domain experience in the financial sector.
  • 4. Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Enterprise Data Governance Goals GOAL: Provide a common approach to data governance across all systems and data within the organization • Transparent Governance standards & protocols must be clearly defined and available to all • Reproducible Recreate the relevant data landscape at a point in time • Auditable All relevant events and assets but be traceable with appropriate historical lineage • Consistent Compliance practices must be consistent ETL/DQ BPM Business Analytics Visualization & Dashboards ERP CRM SCM MDM ARCHIVE Governance Framework
  • 5. Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Data Governance Initiative for Hadoop ETL/DQ BPM Business Analytics Visualization & Dashboards ERP CRM SCM MDM ARCHIVE Data Governance Initiative Common Governance Framework 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ApachePig ApacheHive ApacheHBase ApacheAccumulo ApacheSolr ApacheSpark ApacheStorm TWO Requirements 1. Hadoop must snap in to the existing frameworks and be a good citizen 2. Hadoop must also provide governance within its own stack of technologies A group of companies dedicated to meeting these requirements in the open
  • 6. Page6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas Overview We Do Hadoop
  • 7. Page7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas Vision Metadata Services • Flexible Knowledge Store • Business Catalog / Operational Data • Search & Proscriptive Lineage • Centralized location for all metadata within HDP • Interface point for Metadata Exchange with platforms outside of HDP. Metadata will enrich every component • Hive – Complete lineage, every HiveQL tracked • Ranger – Tag or Attribute security ABAC • Falcon – Business Taxonomy Apache Atlas Hive Ranger Falcon Kafka Storm
  • 8. Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Push model: Component Instrumentation Request • Model of component Type • Model of Component Execution • Lineage / Search – Minimum of Coarse Grain • Relevant Operational Metadata HDP components will make audit entries into Atlas metadata store, proscriptively tracking lineage. As this true for the HDP stack it will also apply to workflow outside of Hadoop. Apache Atlas Hive Ranger Falcon Kafka Storm
  • 9. Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas Capabilities: Overview Data Classification • Import or define taxonomy business-oriented annotations for data • Define, annotate, and automate capture of relationships between data sets and underlying elements including source, target, and derivation processes • Export metadata to third-party systems Centralized Auditing • Capture security access information for every application, process, and interaction with data • Capture the operational information for execution, steps, and activities Search & Lineage (Browse) • Pre-defined navigation paths to explore the data classification and audit information • Text-based search features locates relevant data and audit event across Data Lake quickly and accurately • Browse visualization of data set lineage allowing users to drill-down into operational, security, and provenance related information Security & Policy Engine • Rationalize compliance policy at runtime based on data classification schemes • Advanced definition of policies for preventing data derivation based on classification (i.e. re- identification) Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Policy Engine Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other
  • 10. Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Load Wrapper Sample Use Case: ETL Offload RDMS Business Catalog Metadata ETL: CTAS SQL Hive Traditional EDW New ETL Hadoop Atlas Sqoop Reporter via REST API
  • 11. Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Governance Ready Certification Program Curated group of vendor partners to provide rich & complete features Customers choose features that they want to deploy – a la carte. Low switching costs ! HDP at core to provide stability and interoperability Discovery Tagging Prep / Cleanse ETL Governance BPM Self Service Visual- ization
  • 12. Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved © Hortonworks Inc. 2012 • ASF MVP (5/15) – Preview Core Metadata Services: Type system, API’s, basic UI, Hive connecter • HDP 2.3 (6/30) - GA Core Metadata Services. Preview Metadata Business Glossary • M10 – (7/20) – Preview ABAC with Ranger integration, Aetna Automated tests, Sqoop component for DB2 • M20 – Preview Kafka, Storm connectors, Gov Ready Certification program, Column lineage, Preview row level masking. • HDP 2.4 (Q4’15) GA all preview features 12 High Level Roadmap
  • 13. Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Architecture
  • 14. Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved High Level Architecture Type System Repository Search DSL Bridge Hive Storm OthersSqoop REST API Titan / HBase Solr/Elastic
  • 15. Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved © Hortonworks Inc. 2012 Technology Stack • Knowledge Store o Titan Graph DB • Pluggable Search Backend o Elastic search o Solr • Rules Engine o TBD • Audit Store o YARN ATS - Time series DB • Java 1.7 • Dashboard o TBD
  • 16. Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved © Hortonworks Inc. 2012 Admin GET: /admin/stack GET: /admin/version Entity GET: /entities/definition/{guid} POST: /entities/submit/{typeName} GET: /entities/list/{entityType} Metadata Discovery GET: /discovery/search/gremlin/{gremlinQuery} GET: /discovery/search/relationships/{guid} GET: /discovery/search/fullText?text=<query> GET: /discovery/getIndexedFields Rexster GET: /graph/vertices/{id} GET: /graph/vertices/properties/{id} GET: /graph/vertices GET: /graph/vertices/{id}/{direction} GET: /graph/edges/{id} Types POST: /types/submit/{typeName} GET: /types/definition/{typeName} GET: /types/list Hive Lineage GET: /bridge/hive/{id} GET: /bridge/hive POST: /bridge/hive 16 Tech Preview APIs: Jan 2015
  • 17. Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Type System – Overview of Types • Class • Struct • Trait • Primitives • Collections • Map • Array • Instances (Entity) • Referenceable
  • 18. Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Type System – Data Types
  • 19. Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved _class("Column") { "name" ~ (string, required) "dataType" ~ (string, required) "sd" ~ ("StorageDesc", required) } _class("Table", List()) { "name" ~ (string, required, indexed) "db" ~ ("DB", required) "sd" ~ ("StorageDesc", required) } _trait("Dimension") {} _trait("PII") {} _trait("Metric") {} _trait("ETL") {} _trait("JdbcAccess") {} _class("DB") { "name" ~ (string, required, indexed, unique) "owner" ~ (string) "createTime" ~ (int) } _class("StorageDesc") { "inputFormat" ~ (string, required) "outputFormat" ~ (string, required) }
  • 20. Page20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Repository • Graph Database • Titan with storage backed by HBase • Types and instances are mapped to the Graph DB • Classes, Structs and Traits map to a vertex • Relationships are mapped as edges • Search - plugin enabled • Indexing based on type annotations • Solr • Elastic search
  • 21. Page21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Search • DSL with SQL Like Syntax • from $type is $trait where $clause select|has $attributes loop $loopExpression withPath, repeat • Examples • from DB • DB where name="Reporting" select name, owner • DB has name • DB is JdbcAccess • Column where Column isa PII • Table where name="sales_fact", columns • Table where name="sales_fact", columns as column select column.name, column.dataType, column.comment • Full-text search
  • 22. Page22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Lineage • Uses Search DSL Loop expression • Everything results in search • Named Queries • inputs • Table where (name = "sales_fact_monthly_mv") as src loop (LoadProcess->outputTable inputTables) as dest select src.name as src_name, dest.name as dest_name withPath • outputs • Table where (name = "sales_fact") as src loop (LoadProcess->inputTables outputTables) as dest select src.name as src_name, dest.name as dest_name withPath • schema • Table where name="sales_fact", columns
  • 23. Page23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hive Integration Apache Atlas Hive Bridge (Client) Hive Hook (Post-execution) REST API
  • 24. Page24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas Screens
  • 25. Page25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved © Hortonworks Inc. 2012 25
  • 26. Page26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 27. Page27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 28. Page28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 29. Page29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 30. Page30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Demo – Class Diagram
  • 31. Page31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Demo – Instance Graph
  • 32. Page32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Questions and Answers

Notas del editor

  1. We have a lot to cover, want to apologize in advance
  2. We have a lot to cover, want to apologize in advance
  3. A data governance framework in any organization comprises a combination of people, process and technology that are in place to establish decision rights and accountabilities for information. A governance policy defines who can take what actions with what information, and when, under what circumstances, using what methods.   The technology goals for a data governance framework are to provide a platform for a common approach across all systems and data within the organization, Explicitly they need to be: - Transparent: Governance standards & protocols must be clearly defined and available to all - Reproducible: Recreate the relevant data landscape at a point in time - Auditable: All relevant events and assets but be traceable with appropriate historical lineage - Consistent: Compliance practices must be consistent    
  4. Consistent with our approach, Hortonworks recently led the development Data Governance Initiative which brings together industry experts from some of the largest enterprise, traditional vendors and Hadoop experts to address governance within Hadoop.   The members have already defined a comprehensive solution for data governance in Hadoop that will provide the base functions of metadata services, deep audit store and an advanced policy rules engine, but also and more importantly, the solution will interoperate with and extend existing third-party data governance and management tools by shedding light on the access of data within Hadoop.   This approach allows organizations to apply comprehensive governance policy across their data.   In order to provide these functions for Hadoop, much work will need to be accomplished throughout all of the Hadoop projects. The initiative is unique in that it is not a spot solution, but will build on top of existing projects such as Apache Falcon for data lifecycle management and Apache Ranger for global security policies. The founding members of this initiative include, Aetna, Target, Merck, SAS and Hortonworks and many more are lined up to contribute. This is a broad community based initiative set to deliver the right features for the right requirements and is perfectly representative of the Hortonworks open community approach.  
  5. One of the most important and challenging goals is consistency and when we inspect implementation of data governance for a Hadoop implementation this consistency seems almost impossible.   While there are constructs within the various Hadoop related projects to aid governance, there is no single, consistent approach to implementation/architecture of this important framework. Disparate tools, such as HCatalog, Ranger and Falcon provide pieces of the overall solution, but are not holistic across the stack in approach.   Further, with Hadoop gaining traction in the enterprise, it has only recently been asked to integrate with existing frameworks such as operations, security and now governance. It is now a foundational concern.    
  6. Roll tax into data disco
  7. Show – clearly identify customer metadata. Change Add customer classification example – Aetna – make the use case story have continuity. Use DX procedures to diagonsis ** bring meta from external systems into hadoop – keep it together
  8. Which Vendors would you be interested in ?