SlideShare una empresa de Scribd logo
1 de 22
The rise of big data
governance: Insight on this
emerging trend from active
open source initiatives
April 18, 2018 – Dataworks Summit Berlin 2018
@ODPiOrg
TODAY’S SPEAKERS
2
John Mertic,
Director of Program
Management, Linux
Foundation
Maryna Strelchuk,
Information Architect
and Application
Developer at ING
@ODPiOrg
IMAGINE …
An enterprise data catalogue that lists all
of your data, where it is located, its origin
(lineage), owner, structure, meaning,
classification and quality
No matter where the data resides
Search
@ODPiOrg
New tools from any vendor connect to your data catalogue out of the box
No vendor lock-in and no expensive population of yet another proprietary, siloed
metadata repository
Search
Open Metadata Management & Governance
IMAGINE …
@ODPiOrg
Metadata is added automatically to the catalogue as new
data is created
Databases
Applications
Function
Function
Functions
Files
It’s possible if data-driven enterprises collaborate to build it
Let’s talk about how
IMAGINE …
@ODPiOrg
• The Metadata Problem
• Building an Open Ecosystem
• Benefits for Data Governance Professionals
AGENDA
@ODPiOrg
1.Use data outside the application
that created it
2.Find the right data sets
3.Automate governance processes
WHY DO WE NEED METADATA?
@ODPiOrg
• Many data platforms do not
have metadata support
• Proprietary tools support a
limited range of data sources
and governance actions
• Expensive efforts to create
an enterprise data catalogue
TODAY’S REALITY
@ODPiOrg
TODAY’S REALITY
@ODPiOrg
i. The maintenance of metadata must be automated
ii. Metadata management must become ubiquitous
iii. Metadata access must become open and remotely accessible
iv. Metadata should be used to drive the governance of data
v. Wherever possible, discovery and maintenance of metadata has to an integral
part of all tools that access, change and move information.
10
METADATA GOVERNANCE MANIFESTO
@ODPiOrg
Open and
Unified Metadata
Atlas
Metadata
repository
IBM
Metadata
repository
Custom
Metadata
repository
Open Metadata Repository Service
Open Metadata Access Service Open
and
Unified
Metadata
WHAT NEEDS TO CHANGE
@ODPiOrg
Update to Apache Atlas
12
Automation
Capture of metadata from data platforms, data
movement engines and data protection engines.
Exception management and stewardship
Business Value
Specialized services for key data roles such as CDO,
Data Scientist, Developer, DevOps Operator, Asset
Owner, Applications
Connectivity
Metadata Highway offering open metadata exchange,
linking and federation between heterogeneous
metadata repositories.
@ODPiOrg
Open and
Unified Metadata
Atlas
Metadata
repository
IBM
Metadata
repository
Microsoft
SSAS
Open Metadata Repository Service
OMAS Open and
Unified
Metadata
CURRENTLY IN DEVELOPMENT
Information
View
Asset CatalogSubject Area
Catalog
Search UI
Power BI
@ODPiOrg
OPEN SOURCE COLLABORTION
14
@ODPiOrg
Good metadata enables subject matter experts to
collaborate around the data
Locate the data they need, quickly and efficiently
Feeding back their knowledge about the data and the uses
they have made about it to help others and support
economic evaluation of data
CO-CREATION WITH PRACTITIONERS
@ODPiOrg
Your governance program if based on established
definitions
Allow a broader range of tools in your organization
Automated governance processes protect and
manage your data
Metadata-driven access control
Auditing, metering and monitoring
Quality control and exception management
Rights management
Your metadata offerings will deliver value faster as
they tap into metadata collected by other vendor’s
tools.
ODPi packages extend your metadata system’s
and tools’ capabilities
Conformance tests minimize your effort in being
compliant with key standards and regulations.
Customers have increased confidence in your
tools and services due to ODPi certification.
Data Governance Professionals Vendors
HOW THIS HELPS
@ODPiOrg
ROADMAP
March April May June July August September
Data Governance PMC meets weekly
• Focus of meetings are to develop the
open metadata usage guidelines, best
practices, connector descriptions
• Two threads every other week on the
PMC
• Thread 1 : Compliance tools and packs
• Thread 2 : Practitioner - Subject matter
experts
• Learn more at
https://lists.odpi.org/g/odpi-pmc-
datagovernance
Strata,
San Jose
Dataworks
Summit,
Berlin
IBM Think,
Las Vegas Webinar for
Offering
Managers
Webinar for
Developers
Privacy Pack
GA
Apache Atlas
1.0 GA
Releases upcoming
• Privacy pack due in June
(https://jira.odpi.org/browse/DG-3)
• Apache Atlas 1.0 GA to support
work due in late June
(https://cwiki.apache.org/confluenc
e/display/ATLAS/Open+Metadata+
and+Governance)
Future work
• Metadata tools and solutions will
integrate through the open
metadata interfaces
• Integrated solutions and products
with the open metadata interfaces
Dataworks
Summit,
San Jose
Apache Atlas
1.0 beta
Strata,
NYC
@ODPiOrg
18
ODPi – A NEUTRAL HOME FOR COLLABORATION
FOUNDATIONS ENABLE TRUSTED
INNOVATION
Successful Projects depend
on members, developers,
infrastructure to develop
technology, which is turned
into products that the
market will adopt.
Ecosystem
GET INVOLVED WITH ODPi DATA GOVERNANCE
Have your organization support ODPi
https://www.odpi.org/about/join
Visit ODPi website and join the quarterly newsletter
https://www.odpi.org/
Learn more about Data Governance PMC
https://www.odpi.org/projects/data-governance-pmc
Join the Data Governance PMC Mailing List
https://lists.odpi.org/g/odpi-pmc-datagovernance
@ODPiOrg
z
zz
z
z
z
z
Questions?
@ODPiOrg

Más contenido relacionado

La actualidad más candente

IBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query IntroductionIBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query IntroductionTorsten Steinbach
 
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...DataWorks Summit
 
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache AtlasGDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache AtlasDataWorks Summit
 
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community SitesOracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community SitesSteven Feuerstein
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...DataWorks Summit/Hadoop Summit
 
Building Audi’s enterprise big data platform
Building Audi’s enterprise big data platformBuilding Audi’s enterprise big data platform
Building Audi’s enterprise big data platformDataWorks Summit
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...DataWorks Summit
 
Manage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repositoryManage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repositorySynaltic Group
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationDataWorks Summit
 
Enterprise large scale graph analytics and computing base on distribute graph...
Enterprise large scale graph analytics and computing base on distribute graph...Enterprise large scale graph analytics and computing base on distribute graph...
Enterprise large scale graph analytics and computing base on distribute graph...DataWorks Summit
 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopDataWorks Summit
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Hortonworks
 
Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboardDataWorks Summit
 
Highly configurable and extensible data processing framework at PubMatic
Highly configurable and extensible data processing framework at PubMaticHighly configurable and extensible data processing framework at PubMatic
Highly configurable and extensible data processing framework at PubMaticDataWorks Summit
 
Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...DataWorks Summit
 
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...DataWorks Summit
 
Multi-tenant Hadoop - the challenge of maintaining high SLAS
Multi-tenant Hadoop - the challenge of maintaining high SLASMulti-tenant Hadoop - the challenge of maintaining high SLAS
Multi-tenant Hadoop - the challenge of maintaining high SLASDataWorks Summit
 

La actualidad más candente (20)

IBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query IntroductionIBM THINK 2018 - IBM Cloud SQL Query Introduction
IBM THINK 2018 - IBM Cloud SQL Query Introduction
 
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
Artificial Intelligence and Analytic Ops to Continuously Improve Business Out...
 
Hadoop and other animals
Hadoop and other animalsHadoop and other animals
Hadoop and other animals
 
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache AtlasGDPR-focused partner community showcase for Apache Ranger and Apache Atlas
GDPR-focused partner community showcase for Apache Ranger and Apache Atlas
 
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community SitesOracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...Automatic Detection, Classification and Authorization of Sensitive Personal D...
Automatic Detection, Classification and Authorization of Sensitive Personal D...
 
Building Audi’s enterprise big data platform
Building Audi’s enterprise big data platformBuilding Audi’s enterprise big data platform
Building Audi’s enterprise big data platform
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
 
Manage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repositoryManage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repository
 
Security, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software IntegrationSecurity, ETL, BI & Analytics, and Software Integration
Security, ETL, BI & Analytics, and Software Integration
 
Enterprise large scale graph analytics and computing base on distribute graph...
Enterprise large scale graph analytics and computing base on distribute graph...Enterprise large scale graph analytics and computing base on distribute graph...
Enterprise large scale graph analytics and computing base on distribute graph...
 
The convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on HadoopThe convergence of reporting and interactive BI on Hadoop
The convergence of reporting and interactive BI on Hadoop
 
Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015Data Governance - Atlas 7.12.2015
Data Governance - Atlas 7.12.2015
 
Compute-based sizing and system dashboard
Compute-based sizing and system dashboardCompute-based sizing and system dashboard
Compute-based sizing and system dashboard
 
HDP Next: Governance
HDP Next: GovernanceHDP Next: Governance
HDP Next: Governance
 
Highly configurable and extensible data processing framework at PubMatic
Highly configurable and extensible data processing framework at PubMaticHighly configurable and extensible data processing framework at PubMatic
Highly configurable and extensible data processing framework at PubMatic
 
Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...Adding structure to your streaming pipelines: moving from Spark streaming to ...
Adding structure to your streaming pipelines: moving from Spark streaming to ...
 
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
 
IOT, Streaming Analytics and Machine Learning
IOT, Streaming Analytics and Machine Learning IOT, Streaming Analytics and Machine Learning
IOT, Streaming Analytics and Machine Learning
 
Multi-tenant Hadoop - the challenge of maintaining high SLAS
Multi-tenant Hadoop - the challenge of maintaining high SLASMulti-tenant Hadoop - the challenge of maintaining high SLAS
Multi-tenant Hadoop - the challenge of maintaining high SLAS
 

Similar a The Rise of Big Data Governance: Insight on this Emerging Trend from Active Open Source Initiatives

The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...DataWorks Summit
 
Apache atlas sydney 2017-v4
Apache atlas   sydney 2017-v4Apache atlas   sydney 2017-v4
Apache atlas sydney 2017-v4Nigel Jones
 
Technical Challenges in Open Metadata
Technical Challenges in Open MetadataTechnical Challenges in Open Metadata
Technical Challenges in Open MetadataAll Things Open
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIDenodo
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsFredReynolds2
 
Become an data driven organization through unified metadata using ODPi Egeria
Become an data driven organization through unified metadata using ODPi EgeriaBecome an data driven organization through unified metadata using ODPi Egeria
Become an data driven organization through unified metadata using ODPi EgeriaData Con LA
 
Enterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data AssetsEnterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data AssetsDenodo
 
Master Meta Data
Master Meta DataMaster Meta Data
Master Meta DataDigikrit
 
MarkLogic Semantic use cases
MarkLogic Semantic use cases MarkLogic Semantic use cases
MarkLogic Semantic use cases Fernando Mesa
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderDataconomy Media
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchSheetal Pratik
 
zData BI & Advanced Analytics Platform + 8 Week Pilot Programs
zData BI & Advanced Analytics Platform + 8 Week Pilot ProgramszData BI & Advanced Analytics Platform + 8 Week Pilot Programs
zData BI & Advanced Analytics Platform + 8 Week Pilot ProgramszData Inc.
 
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...DataWorks Summit
 
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudArtificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudJuarez Junior
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesDenodo
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Rittman Analytics
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Prof.Balakrishnan S
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...Denodo
 

Similar a The Rise of Big Data Governance: Insight on this Emerging Trend from Active Open Source Initiatives (20)

The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...The rise of big data governance: insight on this emerging trend from active o...
The rise of big data governance: insight on this emerging trend from active o...
 
Apache atlas sydney 2017-v4
Apache atlas   sydney 2017-v4Apache atlas   sydney 2017-v4
Apache atlas sydney 2017-v4
 
Technical Challenges in Open Metadata
Technical Challenges in Open MetadataTechnical Challenges in Open Metadata
Technical Challenges in Open Metadata
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
 
Become an data driven organization through unified metadata using ODPi Egeria
Become an data driven organization through unified metadata using ODPi EgeriaBecome an data driven organization through unified metadata using ODPi Egeria
Become an data driven organization through unified metadata using ODPi Egeria
 
Enterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data AssetsEnterprise Data Marketplace: A Centralized Portal for All Your Data Assets
Enterprise Data Marketplace: A Centralized Portal for All Your Data Assets
 
Master Meta Data
Master Meta DataMaster Meta Data
Master Meta Data
 
MarkLogic Semantic use cases
MarkLogic Semantic use cases MarkLogic Semantic use cases
MarkLogic Semantic use cases
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern Staender
 
LinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbenchLinkedInSaxoBankDataWorkbench
LinkedInSaxoBankDataWorkbench
 
zData BI & Advanced Analytics Platform + 8 Week Pilot Programs
zData BI & Advanced Analytics Platform + 8 Week Pilot ProgramszData BI & Advanced Analytics Platform + 8 Week Pilot Programs
zData BI & Advanced Analytics Platform + 8 Week Pilot Programs
 
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
Streamline Data Governance with Egeria: The Industry's First Open Metadata St...
 
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudArtificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Virtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & BénéficesVirtualisation de données : Enjeux, Usages & Bénéfices
Virtualisation de données : Enjeux, Usages & Bénéfices
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 

Más de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Último (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

The Rise of Big Data Governance: Insight on this Emerging Trend from Active Open Source Initiatives

  • 1. The rise of big data governance: Insight on this emerging trend from active open source initiatives April 18, 2018 – Dataworks Summit Berlin 2018
  • 2. @ODPiOrg TODAY’S SPEAKERS 2 John Mertic, Director of Program Management, Linux Foundation Maryna Strelchuk, Information Architect and Application Developer at ING
  • 3. @ODPiOrg IMAGINE … An enterprise data catalogue that lists all of your data, where it is located, its origin (lineage), owner, structure, meaning, classification and quality No matter where the data resides Search
  • 4. @ODPiOrg New tools from any vendor connect to your data catalogue out of the box No vendor lock-in and no expensive population of yet another proprietary, siloed metadata repository Search Open Metadata Management & Governance IMAGINE …
  • 5. @ODPiOrg Metadata is added automatically to the catalogue as new data is created Databases Applications Function Function Functions Files It’s possible if data-driven enterprises collaborate to build it Let’s talk about how IMAGINE …
  • 6. @ODPiOrg • The Metadata Problem • Building an Open Ecosystem • Benefits for Data Governance Professionals AGENDA
  • 7. @ODPiOrg 1.Use data outside the application that created it 2.Find the right data sets 3.Automate governance processes WHY DO WE NEED METADATA?
  • 8. @ODPiOrg • Many data platforms do not have metadata support • Proprietary tools support a limited range of data sources and governance actions • Expensive efforts to create an enterprise data catalogue TODAY’S REALITY
  • 10. @ODPiOrg i. The maintenance of metadata must be automated ii. Metadata management must become ubiquitous iii. Metadata access must become open and remotely accessible iv. Metadata should be used to drive the governance of data v. Wherever possible, discovery and maintenance of metadata has to an integral part of all tools that access, change and move information. 10 METADATA GOVERNANCE MANIFESTO
  • 11. @ODPiOrg Open and Unified Metadata Atlas Metadata repository IBM Metadata repository Custom Metadata repository Open Metadata Repository Service Open Metadata Access Service Open and Unified Metadata WHAT NEEDS TO CHANGE
  • 12. @ODPiOrg Update to Apache Atlas 12 Automation Capture of metadata from data platforms, data movement engines and data protection engines. Exception management and stewardship Business Value Specialized services for key data roles such as CDO, Data Scientist, Developer, DevOps Operator, Asset Owner, Applications Connectivity Metadata Highway offering open metadata exchange, linking and federation between heterogeneous metadata repositories.
  • 13. @ODPiOrg Open and Unified Metadata Atlas Metadata repository IBM Metadata repository Microsoft SSAS Open Metadata Repository Service OMAS Open and Unified Metadata CURRENTLY IN DEVELOPMENT Information View Asset CatalogSubject Area Catalog Search UI Power BI
  • 15. @ODPiOrg Good metadata enables subject matter experts to collaborate around the data Locate the data they need, quickly and efficiently Feeding back their knowledge about the data and the uses they have made about it to help others and support economic evaluation of data CO-CREATION WITH PRACTITIONERS
  • 16. @ODPiOrg Your governance program if based on established definitions Allow a broader range of tools in your organization Automated governance processes protect and manage your data Metadata-driven access control Auditing, metering and monitoring Quality control and exception management Rights management Your metadata offerings will deliver value faster as they tap into metadata collected by other vendor’s tools. ODPi packages extend your metadata system’s and tools’ capabilities Conformance tests minimize your effort in being compliant with key standards and regulations. Customers have increased confidence in your tools and services due to ODPi certification. Data Governance Professionals Vendors HOW THIS HELPS
  • 17. @ODPiOrg ROADMAP March April May June July August September Data Governance PMC meets weekly • Focus of meetings are to develop the open metadata usage guidelines, best practices, connector descriptions • Two threads every other week on the PMC • Thread 1 : Compliance tools and packs • Thread 2 : Practitioner - Subject matter experts • Learn more at https://lists.odpi.org/g/odpi-pmc- datagovernance Strata, San Jose Dataworks Summit, Berlin IBM Think, Las Vegas Webinar for Offering Managers Webinar for Developers Privacy Pack GA Apache Atlas 1.0 GA Releases upcoming • Privacy pack due in June (https://jira.odpi.org/browse/DG-3) • Apache Atlas 1.0 GA to support work due in late June (https://cwiki.apache.org/confluenc e/display/ATLAS/Open+Metadata+ and+Governance) Future work • Metadata tools and solutions will integrate through the open metadata interfaces • Integrated solutions and products with the open metadata interfaces Dataworks Summit, San Jose Apache Atlas 1.0 beta Strata, NYC
  • 18. @ODPiOrg 18 ODPi – A NEUTRAL HOME FOR COLLABORATION
  • 19. FOUNDATIONS ENABLE TRUSTED INNOVATION Successful Projects depend on members, developers, infrastructure to develop technology, which is turned into products that the market will adopt. Ecosystem
  • 20. GET INVOLVED WITH ODPi DATA GOVERNANCE Have your organization support ODPi https://www.odpi.org/about/join Visit ODPi website and join the quarterly newsletter https://www.odpi.org/ Learn more about Data Governance PMC https://www.odpi.org/projects/data-governance-pmc Join the Data Governance PMC Mailing List https://lists.odpi.org/g/odpi-pmc-datagovernance

Notas del editor

  1. Metadata enables data to be used outside of the application that created it. Analytics and decision making New business applications Reporting and compliance Metadata describes the format and content of data allowing people to judge which data set to use for a new project Structure Meaning Origin Valid values and quality Usage and ownership Regulations and classifications that apply <more> Metadata describes the business context and classification of data allowing automated governance processes to operate.
  2. Many data platforms do not have metadata support Proprietary tools support a range of data sources and governance actions No-one supports everything you need and assumes all tools come from their suite Each tool starts “empty” requiring effort to populate metadata Each tool operates as if it is the only tool No integration/interoperability of metadata repositories from different vendors Expensive efforts to create an enterprise data catalogue
  3. The maintenance of metadata must be automated to scale to the sheer volumes and variety of data involved in modern business.   Metadata management must become ubiquitous in cloud platforms and large data platforms, such as Apache Hadoop so that the processing engines on these platforms can rely on its availability and build capability around it. Metadata access must become open and remotely accessible so that tools from different vendors can work with metadata located on different platforms. This implies unique identifiers for metadata elements, some level of standardization in the types and formats for metadata and standard interfaces for manipulating metadata. Metadata should be used to drive the governance of data and create a business friendly logical interface to the data landscape. Wherever possible, discovery and maintenance of metadata has to an integral part of all tools that access, change and move information.
  4. Code development and standards development relationship
  5. ODPi, a Linux Foundation Project, can provide the platform for industry collaboration on shared technology In pursuit of its mission to make Apache Hadoop and associated Big Data solutions ready for enterprise-wide deployment, ODPi is focused on the biggest hurdles In 2016, the largest hurdles were cross-distro harmonization Today, a key blocker to broad-based production use of Big Data is Governance
  6. Mention that individuals can get involved.