SlideShare una empresa de Scribd logo
1 de 37
Descargar para leer sin conexión
Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas: Data Governance
July 2015
Partner Solutions
Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Agenda
Overview
•  Enterprise Goals
•  Data Governance
Initative
Demo
•  Example: Sqoop
•  Walk through step
•  Search Tables / Tags
Atlas
•  Feature tour
•  Roadmap
•  UI Tour
Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Enterprise Data Governance Goals
GOAL: Provide a common approach to
data governance across all systems
and data within the organization
•  Transparent
Governance standards & protocols must be
clearly defined and available to all
•  Reproducible
Recreate the relevant data landscape at a
point in time
•  Auditable
All relevant events and assets but be
traceable with appropriate historical lineage
•  Consistent
Compliance practices must be consistent
ETL/DQ
BPM
Business
Analytics
Visualization
& Dashboards
ERP
CRM
SCM
MDM
ARCHIVE
Governance
Framework
Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Data Governance Initiative for Hadoop
ETL/DQ
BPM
Business
Analytics
Visualization
& Dashboards
ERP
CRM
SCM
MDM
ARCHIVE
Data Governance Initiative
Common
Governance
Framework
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
° °
° °
°
°
ApachePig
ApacheHive
ApacheHBase
ApacheAccumulo
ApacheSolr
ApacheSpark
ApacheStorm
TWO Requirements
1.  Hadoop must snap in to
the existing frameworks
and be a good citizen
2.  Hadoop must also provide
governance within its own
stack of technologies
A group of companies dedicated to meeting
these requirements in the open
Major
Bank
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas Overview
We Do Hadoop
Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas Vision
Metadata Services
•  Flexible Knowledge Store
•  Business Catalog / Operational Data
•  Search & Proscriptive Lineage
•  Centralized location for all metadata within HDP
•  Interface point for Metadata Exchange with platforms
outside of HDP.
Metadata will enrich every component
•  Hive – Complete lineage, every HiveQL tracked
•  Ranger – Tag or Attribute security ABAC
•  Falcon – Business Taxonomy
Apache Atlas
Hive
Ranger
Falcon
Kafka
Storm
Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas Capabilities: Overview
Data Classification
•  Import or define taxonomy business-oriented annotations for data
•  Define, annotate, and automate capture of relationships between data sets and underlying
elements including source, target, and derivation processes
•  Export metadata to third-party systems
Centralized Auditing
•  Capture security access information for every application, process, and interaction with data
•  Capture the operational information for execution, steps, and activities
Search & Lineage (Browse)
•  Pre-defined navigation paths to explore the data classification and audit information
•  Text-based search features locates relevant data and audit event across Data Lake quickly
and accurately
•  Browse visualization of data set lineage allowing users to drill-down into operational, security,
and provenance related information
Security & Policy Engine
•  Rationalize compliance policy at runtime based on data classification schemes
•  Advanced definition of policies for preventing data derivation based on classification (i.e. re-
identification)
Apache Atlas
Knowledge Store
Audit Store
ModelsType-System
Policy RulesTaxonomies
Policy Engine
Data Lifecycle
Management
Security
REST API
Services
Search Lineage Exchange
Healthcare
HIPAA
HL7
Financial
SOX
Dodd-Frank
Custom
CWM
Retail
PCI
PII
Other
Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Load Wrapper
Sample Use Case: ETL Offload
RDMS
Business
Catalog
Metadata
Hive:
Landing
Hive:
CTAS
Traditional
EDW
New ETL
Hadoop
Atlas
Sqoop
Reporter
via REST
API
Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hive Integration
Apache Atlas
Hive Bridge
(Client)
Hive Hook
(Post-execution)
REST API
Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Governance Ready Certification Program
Curated group of vendor partners to provide
rich & complete features
Customers choose features that they want to
deploy – a la carte.
Low switching costs !
HDP at core to provide stability and
interoperability
Discovery
Tagging
Prep /
Cleanse
ETL
Governance
BPM
Self
Service
Visual-
ization
Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
•  ASF MVP (May) – Preview Core Metadata Services: Type
system, API’s, basic UI, Hive connecter
•  HDP 2.3 (July) - GA Core Metadata Services. Preview
Metadata Business Glossary
•  M10 – (Sept) – Preview ABAC with Ranger integration and
Preview Sqoop component connector
•  M20 – Preview Kafka, Storm connectors, Gov Ready
Certification program, Preview row level & Column masking.
•  HDP 2.4 (Q4’15) GA all preview features
11
High Level Roadmap
Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Architecture
Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
High Level Architecture
Type System
Repository
Search DSL
Bridge
Hive Storm OthersSqoop
REST API
Titan / HBase
Solr/Elastic
Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
Technology Stack
•  Knowledge Store
o  Titan Graph DB
•  Pluggable Search Backend
o  Elastic search
o  Solr
•  Rules Engine
o  TBD
•  Audit Store
o  YARN ATS - Time series DB
•  Java 1.7
•  Dashboard
o  TBD
Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
Admin
GET: /admin/stack
GET: /admin/version
Entity
GET: /entities/definition/{guid}
POST: /entities/submit/{typeName}
GET: /entities/list/{entityType}
Metadata Discovery
GET: /discovery/search/gremlin/{gremlinQuery}
GET: /discovery/search/relationships/{guid}
GET: /discovery/search/fullText?text=<query>
GET: /discovery/getIndexedFields
Rexster
GET: /graph/vertices/{id}
GET: /graph/vertices/properties/{id}
GET: /graph/vertices
GET: /graph/vertices/{id}/{direction}
GET: /graph/edges/{id}
Types
POST: /types/submit/{typeName}
GET: /types/definition/{typeName}
GET: /types/list
Hive Lineage
GET: /bridge/hive/{id}
GET: /bridge/hive
POST: /bridge/hive
15
APIs: Examples
Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Type System – Overview of Types
•  Class
•  Struct
•  Trait
•  Primitives
•  Collections
•  Map
•  Array
•  Instances (Entity)
•  Referenceable
Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Type System – Data Types
Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
_class("Column") {!
"name" ~ (string, required)!
"dataType" ~ (string, required)!
"sd" ~ ("StorageDesc", required)!
}!
!
_class("Table", List()) {!
"name" ~ (string, required, indexed)!
"db" ~ ("DB", required)!
"sd" ~ ("StorageDesc", required)!
}!
!
	
  
_trait("Dimension") {}!
_trait("PII") {}!
_trait("Metric") {}!
_trait("ETL") {}!
_trait("JdbcAccess") {}!
!
_class("DB") {!
"name" ~ (string, required,
indexed, unique)!
"owner" ~ (string)!
"createTime" ~ (int)!
}!
!
_class("StorageDesc") {!
"inputFormat" ~ (string,
required)!
"outputFormat" ~ (string,
required)!
}!
Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Repository
•  Graph Database
•  Titan with storage backed by HBase
•  Types and instances are mapped to the Graph DB
•  Classes, Structs and Traits map to a vertex
•  Relationships are mapped as edges
•  Search - plugin enabled
•  Indexing based on type annotations
•  Solr
•  Elastic search
Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Search
•  DSL with SQL Like Syntax
•  from $type is $trait where $clause select|has $attributes loop $loopExpression withPath, repeat
•  Examples
•  from DB
•  DB where name="Reporting" select name, owner
•  DB has name
•  DB is JdbcAccess
•  Column where Column is a PII
•  Table where name="sales_fact", columns
•  Table where name="sales_fact", columns as column select column.name, column.dataType,
column.comment
•  Full-text search
Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Lineage
•  Uses Search DSL Loop expression
•  Everything results in search
•  Named Queries
•  inputs
•  Table where (name = "sales_fact_monthly_mv") as src loop (LoadProcess->outputTable
inputTables) as dest select src.name as src_name, dest.name as dest_name withPath
•  outputs
•  Table where (name = "sales_fact") as src loop (LoadProcess->inputTables outputTables) as dest
select src.name as src_name, dest.name as dest_name withPath
•  schema
•  Table where name="sales_fact", columns
Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hive Integration
Apache Atlas
Hive Bridge
(Client)
Hive Hook
(Post-execution)
REST API
Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Apache Atlas Screens
Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
24
Page 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Demo Atlas
Page 30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Atlas UI demostration
Search DSL
•  Type – DB, Table, Column
•  Tag - PII
•  Keyword
Results
•  Details
•  Schema
•  Lineage
Coming Features
Page 31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Ingestion Demo Objective
•  Show Lineage with Sqoop Ingestion of data
•  Custom process instrumention
•  Use the Hive Hook CTAS Operation
•  Atlas Follow Lineage
•  Metadata Model in Atlas
•  The Open Framework
•  Create Custom Types
•  Create Custom Process
•  Sample Codes
Page 32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Setup
•  Source System
•  MySQL Database
•  DRIVERS
•  TIMESHEET
•  Destination System
•  Single Node HDP 2.3 (Tech Preview)
•  Apache Atlas
Page 33 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Steps to Create Metadata
•  Create a Atlas Client Instance
•  Create Type Definitions
–  Class Types
–  Attributes
–  List the Types
•  Instantiate Entities
•  - Create Entities (Class Type)
•  - Search the Types
•  Create Process
•  Create DataSet Type
•  Create Process Type
•  Connect a Process Lineage
Page 34 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION
Attribute Definition
•  Name
•  Data Type
•  Multiplicity
•  Composite
•  isIndexable
•  ReverseAttribute
Page 35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Questions and Answers
Page 36 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
© Hortonworks Inc. 2012
•  HDP 2.3 Preview Sandbox VM:
–  http://hortonworks.com/hdp/whats-new/
•  Apache Atlas:
–  http://atlas.incubator.apache.org/
–  http://incubator.apache.org/projects/atlas.html
–  https://git-wip-us.apache.org/repos/asf/incubator-atlas.git
•  Partner Workshops
–  http://hortonworks.com/partners/learn/
•  More to come with official GA release of HDP 2.3
36
Atlas Resources
Page 37 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Thank you !

Más contenido relacionado

La actualidad más candente

Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetupAlex Zeltov
 
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache AtlasGDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache AtlasDataWorks Summit
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasDataWorks Summit
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceHortonworks
 
Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015Sean Roberts
 
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun MurthySpark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun MurthySpark Summit
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...DataWorks Summit/Hadoop Summit
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewDataWorks Summit/Hadoop Summit
 
GDPR Community Showcase for Apache Ranger and Apache Atlas
GDPR Community Showcase for Apache Ranger and Apache AtlasGDPR Community Showcase for Apache Ranger and Apache Atlas
GDPR Community Showcase for Apache Ranger and Apache AtlasDataWorks Summit
 
Apache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsApache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsDataWorks Summit/Hadoop Summit
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasDataWorks Summit
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionDataWorks Summit/Hadoop Summit
 
Unleashing the power of apache atlas with apache - virtual dataconnector
Unleashing the power of apache atlas with apache  - virtual dataconnectorUnleashing the power of apache atlas with apache  - virtual dataconnector
Unleashing the power of apache atlas with apache - virtual dataconnectorNigel Jones
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidInteractive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidDataWorks Summit
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...DataWorks Summit/Hadoop Summit
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in HadoopMadhan Neethiraj
 

La actualidad más candente (20)

Atlas and ranger epam meetup
Atlas and ranger epam meetupAtlas and ranger epam meetup
Atlas and ranger epam meetup
 
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache AtlasGDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
 
Apache Atlas: Governance for your Data
Apache Atlas: Governance for your DataApache Atlas: Governance for your Data
Apache Atlas: Governance for your Data
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache Atlas
 
Implementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data GovernanceImplementing a Data Lake with Enterprise Grade Data Governance
Implementing a Data Lake with Enterprise Grade Data Governance
 
Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015
 
Spark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun MurthySpark and Hadoop Perfect Togeher by Arun Murthy
Spark and Hadoop Perfect Togeher by Arun Murthy
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
 
GDPR Community Showcase for Apache Ranger and Apache Atlas
GDPR Community Showcase for Apache Ranger and Apache AtlasGDPR Community Showcase for Apache Ranger and Apache Atlas
GDPR Community Showcase for Apache Ranger and Apache Atlas
 
Apache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop componentsApache Atlas: Tracking dataset lineage across Hadoop components
Apache Atlas: Tracking dataset lineage across Hadoop components
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache AtlasPartner Ecosystem Showcase for Apache Ranger and Apache Atlas
Partner Ecosystem Showcase for Apache Ranger and Apache Atlas
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Apache Hadoop Crash Course
Apache Hadoop Crash CourseApache Hadoop Crash Course
Apache Hadoop Crash Course
 
Unleashing the power of apache atlas with apache - virtual dataconnector
Unleashing the power of apache atlas with apache  - virtual dataconnectorUnleashing the power of apache atlas with apache  - virtual dataconnector
Unleashing the power of apache atlas with apache - virtual dataconnector
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Interactive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using DruidInteractive Analytics at Scale in Apache Hive Using Druid
Interactive Analytics at Scale in Apache Hive Using Druid
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 
Classification based security in Hadoop
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
 

Similar a Data Governance - Atlas 7.12.2015

Data Governance Initiative
Data Governance InitiativeData Governance Initiative
Data Governance InitiativeDataWorks Summit
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?DataWorks Summit/Hadoop Summit
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudDataWorks Summit/Hadoop Summit
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoDataWorks Summit
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017alanfgates
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin DataWorks Summit/Hadoop Summit
 
Tableau and hadoop
Tableau and hadoopTableau and hadoop
Tableau and hadoopCraig Jordan
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizonArtem Ervits
 
One Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceOne Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceJeffrey T. Pollock
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseJeffrey T. Pollock
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligenceshraddha mane
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizonThejas Nair
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...DataWorks Summit/Hadoop Summit
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 DataWorks Summit
 

Similar a Data Governance - Atlas 7.12.2015 (20)

Data Governance Initiative
Data Governance InitiativeData Governance Initiative
Data Governance Initiative
 
Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?Is your Enterprise Data lake Metadata Driven AND Secure?
Is your Enterprise Data lake Metadata Driven AND Secure?
 
Moving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloudMoving towards enterprise ready Hadoop clusters on the cloud
Moving towards enterprise ready Hadoop clusters on the cloud
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
What's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - TokyoWhat's New in Apache Hive 3.0 - Tokyo
What's New in Apache Hive 3.0 - Tokyo
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
Crash Course HS16Melb - Hands on Intro to Spark & Zeppelin
 
Tableau and hadoop
Tableau and hadoopTableau and hadoop
Tableau and hadoop
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
One Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceOne Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and Governance
 
Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014Teradata - Presentation at Hortonworks Booth - Strata 2014
Teradata - Presentation at Hortonworks Booth - Strata 2014
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat Alwell
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San Jose
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Hive 3 - a new horizon
Hive 3 - a new horizonHive 3 - a new horizon
Hive 3 - a new horizon
 
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
Implementing the Business Catalog in the Modern Enterprise: Bridging Traditio...
 
Hive 3 a new horizon
Hive 3  a new horizonHive 3  a new horizon
Hive 3 a new horizon
 
A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0 A Reference Architecture for ETL 2.0
A Reference Architecture for ETL 2.0
 

Más de Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyHortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakHortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsHortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysHortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerHortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsHortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeHortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidHortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleHortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATAHortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseHortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseHortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationHortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementHortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCHortonworks
 

Más de Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Último

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Último (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Data Governance - Atlas 7.12.2015

  • 1. Page 1 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas: Data Governance July 2015 Partner Solutions
  • 2. Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Agenda Overview •  Enterprise Goals •  Data Governance Initative Demo •  Example: Sqoop •  Walk through step •  Search Tables / Tags Atlas •  Feature tour •  Roadmap •  UI Tour
  • 3. Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Enterprise Data Governance Goals GOAL: Provide a common approach to data governance across all systems and data within the organization •  Transparent Governance standards & protocols must be clearly defined and available to all •  Reproducible Recreate the relevant data landscape at a point in time •  Auditable All relevant events and assets but be traceable with appropriate historical lineage •  Consistent Compliance practices must be consistent ETL/DQ BPM Business Analytics Visualization & Dashboards ERP CRM SCM MDM ARCHIVE Governance Framework
  • 4. Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Data Governance Initiative for Hadoop ETL/DQ BPM Business Analytics Visualization & Dashboards ERP CRM SCM MDM ARCHIVE Data Governance Initiative Common Governance Framework 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ApachePig ApacheHive ApacheHBase ApacheAccumulo ApacheSolr ApacheSpark ApacheStorm TWO Requirements 1.  Hadoop must snap in to the existing frameworks and be a good citizen 2.  Hadoop must also provide governance within its own stack of technologies A group of companies dedicated to meeting these requirements in the open Major Bank
  • 5. Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas Overview We Do Hadoop
  • 6. Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas Vision Metadata Services •  Flexible Knowledge Store •  Business Catalog / Operational Data •  Search & Proscriptive Lineage •  Centralized location for all metadata within HDP •  Interface point for Metadata Exchange with platforms outside of HDP. Metadata will enrich every component •  Hive – Complete lineage, every HiveQL tracked •  Ranger – Tag or Attribute security ABAC •  Falcon – Business Taxonomy Apache Atlas Hive Ranger Falcon Kafka Storm
  • 7. Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas Capabilities: Overview Data Classification •  Import or define taxonomy business-oriented annotations for data •  Define, annotate, and automate capture of relationships between data sets and underlying elements including source, target, and derivation processes •  Export metadata to third-party systems Centralized Auditing •  Capture security access information for every application, process, and interaction with data •  Capture the operational information for execution, steps, and activities Search & Lineage (Browse) •  Pre-defined navigation paths to explore the data classification and audit information •  Text-based search features locates relevant data and audit event across Data Lake quickly and accurately •  Browse visualization of data set lineage allowing users to drill-down into operational, security, and provenance related information Security & Policy Engine •  Rationalize compliance policy at runtime based on data classification schemes •  Advanced definition of policies for preventing data derivation based on classification (i.e. re- identification) Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Policy Engine Data Lifecycle Management Security REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Custom CWM Retail PCI PII Other
  • 8. Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Load Wrapper Sample Use Case: ETL Offload RDMS Business Catalog Metadata Hive: Landing Hive: CTAS Traditional EDW New ETL Hadoop Atlas Sqoop Reporter via REST API
  • 9. Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hive Integration Apache Atlas Hive Bridge (Client) Hive Hook (Post-execution) REST API
  • 10. Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Governance Ready Certification Program Curated group of vendor partners to provide rich & complete features Customers choose features that they want to deploy – a la carte. Low switching costs ! HDP at core to provide stability and interoperability Discovery Tagging Prep / Cleanse ETL Governance BPM Self Service Visual- ization
  • 11. Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved © Hortonworks Inc. 2012 •  ASF MVP (May) – Preview Core Metadata Services: Type system, API’s, basic UI, Hive connecter •  HDP 2.3 (July) - GA Core Metadata Services. Preview Metadata Business Glossary •  M10 – (Sept) – Preview ABAC with Ranger integration and Preview Sqoop component connector •  M20 – Preview Kafka, Storm connectors, Gov Ready Certification program, Preview row level & Column masking. •  HDP 2.4 (Q4’15) GA all preview features 11 High Level Roadmap
  • 12. Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Architecture
  • 13. Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved High Level Architecture Type System Repository Search DSL Bridge Hive Storm OthersSqoop REST API Titan / HBase Solr/Elastic
  • 14. Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved © Hortonworks Inc. 2012 Technology Stack •  Knowledge Store o  Titan Graph DB •  Pluggable Search Backend o  Elastic search o  Solr •  Rules Engine o  TBD •  Audit Store o  YARN ATS - Time series DB •  Java 1.7 •  Dashboard o  TBD
  • 15. Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved © Hortonworks Inc. 2012 Admin GET: /admin/stack GET: /admin/version Entity GET: /entities/definition/{guid} POST: /entities/submit/{typeName} GET: /entities/list/{entityType} Metadata Discovery GET: /discovery/search/gremlin/{gremlinQuery} GET: /discovery/search/relationships/{guid} GET: /discovery/search/fullText?text=<query> GET: /discovery/getIndexedFields Rexster GET: /graph/vertices/{id} GET: /graph/vertices/properties/{id} GET: /graph/vertices GET: /graph/vertices/{id}/{direction} GET: /graph/edges/{id} Types POST: /types/submit/{typeName} GET: /types/definition/{typeName} GET: /types/list Hive Lineage GET: /bridge/hive/{id} GET: /bridge/hive POST: /bridge/hive 15 APIs: Examples
  • 16. Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Type System – Overview of Types •  Class •  Struct •  Trait •  Primitives •  Collections •  Map •  Array •  Instances (Entity) •  Referenceable
  • 17. Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Type System – Data Types
  • 18. Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved _class("Column") {! "name" ~ (string, required)! "dataType" ~ (string, required)! "sd" ~ ("StorageDesc", required)! }! ! _class("Table", List()) {! "name" ~ (string, required, indexed)! "db" ~ ("DB", required)! "sd" ~ ("StorageDesc", required)! }! !   _trait("Dimension") {}! _trait("PII") {}! _trait("Metric") {}! _trait("ETL") {}! _trait("JdbcAccess") {}! ! _class("DB") {! "name" ~ (string, required, indexed, unique)! "owner" ~ (string)! "createTime" ~ (int)! }! ! _class("StorageDesc") {! "inputFormat" ~ (string, required)! "outputFormat" ~ (string, required)! }!
  • 19. Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Repository •  Graph Database •  Titan with storage backed by HBase •  Types and instances are mapped to the Graph DB •  Classes, Structs and Traits map to a vertex •  Relationships are mapped as edges •  Search - plugin enabled •  Indexing based on type annotations •  Solr •  Elastic search
  • 20. Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Search •  DSL with SQL Like Syntax •  from $type is $trait where $clause select|has $attributes loop $loopExpression withPath, repeat •  Examples •  from DB •  DB where name="Reporting" select name, owner •  DB has name •  DB is JdbcAccess •  Column where Column is a PII •  Table where name="sales_fact", columns •  Table where name="sales_fact", columns as column select column.name, column.dataType, column.comment •  Full-text search
  • 21. Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Lineage •  Uses Search DSL Loop expression •  Everything results in search •  Named Queries •  inputs •  Table where (name = "sales_fact_monthly_mv") as src loop (LoadProcess->outputTable inputTables) as dest select src.name as src_name, dest.name as dest_name withPath •  outputs •  Table where (name = "sales_fact") as src loop (LoadProcess->inputTables outputTables) as dest select src.name as src_name, dest.name as dest_name withPath •  schema •  Table where name="sales_fact", columns
  • 22. Page 22 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hive Integration Apache Atlas Hive Bridge (Client) Hive Hook (Post-execution) REST API
  • 23. Page 23 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Apache Atlas Screens
  • 24. Page 24 © Hortonworks Inc. 2011 – 2015. All Rights Reserved © Hortonworks Inc. 2012 24
  • 25. Page 25 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 26. Page 26 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 27. Page 27 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 28. Page 28 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 29. Page 29 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Demo Atlas
  • 30. Page 30 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Atlas UI demostration Search DSL •  Type – DB, Table, Column •  Tag - PII •  Keyword Results •  Details •  Schema •  Lineage Coming Features
  • 31. Page 31 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Ingestion Demo Objective •  Show Lineage with Sqoop Ingestion of data •  Custom process instrumention •  Use the Hive Hook CTAS Operation •  Atlas Follow Lineage •  Metadata Model in Atlas •  The Open Framework •  Create Custom Types •  Create Custom Process •  Sample Codes
  • 32. Page 32 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Setup •  Source System •  MySQL Database •  DRIVERS •  TIMESHEET •  Destination System •  Single Node HDP 2.3 (Tech Preview) •  Apache Atlas
  • 33. Page 33 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Steps to Create Metadata •  Create a Atlas Client Instance •  Create Type Definitions –  Class Types –  Attributes –  List the Types •  Instantiate Entities •  - Create Entities (Class Type) •  - Search the Types •  Create Process •  Create DataSet Type •  Create Process Type •  Connect a Process Lineage
  • 34. Page 34 © Hortonworks Inc. 2011 – 2015. All Rights Reserved HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Attribute Definition •  Name •  Data Type •  Multiplicity •  Composite •  isIndexable •  ReverseAttribute
  • 35. Page 35 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Questions and Answers
  • 36. Page 36 © Hortonworks Inc. 2011 – 2015. All Rights Reserved © Hortonworks Inc. 2012 •  HDP 2.3 Preview Sandbox VM: –  http://hortonworks.com/hdp/whats-new/ •  Apache Atlas: –  http://atlas.incubator.apache.org/ –  http://incubator.apache.org/projects/atlas.html –  https://git-wip-us.apache.org/repos/asf/incubator-atlas.git •  Partner Workshops –  http://hortonworks.com/partners/learn/ •  More to come with official GA release of HDP 2.3 36 Atlas Resources
  • 37. Page 37 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Thank you !