Each of today’s most forward-thinking enterprises have been forced to face similar data challenges: the reliance on real-time data to better serve their customers and, subsequently, the requirement of complying with regulations to protect that data – one example being the General Data Protection Regulation (GDPR).
The solution to this emerging challenge is a tricky one – for companies like ING, this data governance challenge has been met with metadata, a consistent view across a large heterogeneous ecosystem and collaboration with an active open source community.
This joint presentation, John Mertic – Director of ODPi – and Ferd Scheepers – Global Chief Information Architect of ING – will address the benefits of a vendor-neutral approach to data governance, the need for an open metadata standard, along with insight around how companies ING, IBM, Hortonworks and more are delivering solutions to this challenge as an open source initiative.
Audience Takeaways include:
Understand the role of metadata;
Understand the need for a cross technology view on metadata;
Understand the role of Apache Atlas as a reference implementation; and
Understand the role of ODPi in offering value-added services including certification.
Speaker
John Mertic, Director of Program Management for ODPi, R Consortium, and Open Mainframe Project, The Linux Foundation
3. @ODPiOrg
IMAGINE …
An enterprise data catalogue that lists all
of your data, where it is located, its origin
(lineage), owner, structure, meaning,
classification and quality
No matter where the data resides
Search
4. @ODPiOrg
New tools from any vendor connect to your data catalogue out of the box
No vendor lock-in and no expensive population of yet another proprietary, siloed
metadata repository
Search
Open Metadata Management & Governance
IMAGINE …
5. @ODPiOrg
Metadata is added automatically to the catalogue as new
data is created
Databases
Applications
Function
Function
Functions
Files
It’s possible if data-driven enterprises collaborate to build it
Let’s talk about how
IMAGINE …
6. @ODPiOrg
• The Metadata Problem
• Building an Open Ecosystem
• Benefits for Data Governance Professionals
AGENDA
7. @ODPiOrg
1.Use data outside the application
that created it
2.Find the right data sets
3.Automate governance processes
WHY DO WE NEED METADATA?
8. @ODPiOrg
• Many data platforms do not
have metadata support
• Proprietary tools support a
limited range of data sources
and governance actions
• Expensive efforts to create
an enterprise data catalogue
TODAY’S REALITY
10. @ODPiOrg
i. The maintenance of metadata must be automated
ii. Metadata management must become ubiquitous
iii. Metadata access must become open and remotely accessible
iv. Metadata should be used to drive the governance of data
v. Wherever possible, discovery and maintenance of metadata has to an integral
part of all tools that access, change and move information.
10
METADATA GOVERNANCE MANIFESTO
12. @ODPiOrg
Update to Apache Atlas
12
Automation
Capture of metadata from data platforms, data
movement engines and data protection engines.
Exception management and stewardship
Business Value
Specialized services for key data roles such as CDO,
Data Scientist, Developer, DevOps Operator, Asset
Owner, Applications
Connectivity
Metadata Highway offering open metadata exchange,
linking and federation between heterogeneous
metadata repositories.
15. @ODPiOrg
Good metadata enables subject matter experts to
collaborate around the data
Locate the data they need, quickly and efficiently
Feeding back their knowledge about the data and the uses
they have made about it to help others and support
economic evaluation of data
CO-CREATION WITH PRACTITIONERS
16. @ODPiOrg
Your governance program if based on established
definitions
Allow a broader range of tools in your organization
Automated governance processes protect and
manage your data
Metadata-driven access control
Auditing, metering and monitoring
Quality control and exception management
Rights management
Your metadata offerings will deliver value faster as
they tap into metadata collected by other vendor’s
tools.
ODPi packages extend your metadata system’s
and tools’ capabilities
Conformance tests minimize your effort in being
compliant with key standards and regulations.
Customers have increased confidence in your
tools and services due to ODPi certification.
Data Governance Professionals Vendors
HOW THIS HELPS
17. @ODPiOrg
ROADMAP
March April May June July August September
Data Governance PMC meets weekly
• Focus of meetings are to develop the
open metadata usage guidelines, best
practices, connector descriptions
• Two threads every other week on the
PMC
• Thread 1 : Compliance tools and packs
• Thread 2 : Practitioner - Subject matter
experts
• Learn more at
https://lists.odpi.org/g/odpi-pmc-
datagovernance
Strata,
San Jose
Dataworks
Summit,
Berlin
IBM Think,
Las Vegas Webinar for
Offering
Managers
Webinar for
Developers
Privacy Pack
GA
Apache Atlas
1.0 GA
Releases upcoming
• Privacy pack due in June
(https://jira.odpi.org/browse/DG-3)
• Apache Atlas 1.0 GA to support
work due in late June
(https://cwiki.apache.org/confluenc
e/display/ATLAS/Open+Metadata+
and+Governance)
Future work
• Metadata tools and solutions will
integrate through the open
metadata interfaces
• Integrated solutions and products
with the open metadata interfaces
Dataworks
Summit,
San Jose
Apache Atlas
1.0 beta
Strata,
NYC
19. FOUNDATIONS ENABLE TRUSTED
INNOVATION
Successful Projects depend
on members, developers,
infrastructure to develop
technology, which is turned
into products that the
market will adopt.
Ecosystem
20. GET INVOLVED WITH ODPi DATA GOVERNANCE
Have your organization support ODPi
https://www.odpi.org/about/join
Visit ODPi website and join the quarterly newsletter
https://www.odpi.org/
Learn more about Data Governance PMC
https://www.odpi.org/projects/data-governance-pmc
Join the Data Governance PMC Mailing List
https://lists.odpi.org/g/odpi-pmc-datagovernance
Metadata enables data to be used outside of the application that created it.
Analytics and decision making
New business applications
Reporting and compliance
Metadata describes the format and content of data allowing people to judge which data set to use for a new project
Structure
Meaning
Origin
Valid values and quality
Usage and ownership
Regulations and classifications that apply
<more>
Metadata describes the business context and classification of data allowing automated governance processes to operate.
Many data platforms do not have metadata support
Proprietary tools support a range of data sources and governance actions
No-one supports everything you need and assumes all tools come from their suite
Each tool starts “empty” requiring effort to populate metadata
Each tool operates as if it is the only tool
No integration/interoperability of metadata repositories from different vendors
Expensive efforts to create an enterprise data catalogue
The maintenance of metadata must be automated to scale to the sheer volumes and variety of data involved in modern business.
Metadata management must become ubiquitous in cloud platforms and large data platforms, such as Apache Hadoop so that the processing engines on these platforms can rely on its availability and build capability around it.
Metadata access must become open and remotely accessible so that tools from different vendors can work with metadata located on different platforms. This implies unique identifiers for metadata elements, some level of standardization in the types and formats for metadata and standard interfaces for manipulating metadata.
Metadata should be used to drive the governance of data and create a business friendly logical interface to the data landscape.
Wherever possible, discovery and maintenance of metadata has to an integral part of all tools that access, change and move information.
Code development and standards development relationship
ODPi, a Linux Foundation Project, can provide the platform for industry collaboration on shared technology
In pursuit of its mission to make Apache Hadoop and associated Big Data solutions ready for enterprise-wide deployment, ODPi is focused on the biggest hurdles
In 2016, the largest hurdles were cross-distro harmonization
Today, a key blocker to broad-based production use of Big Data is Governance