SlideShare una empresa de Scribd logo
1 de 15
CDAO Fall Boston
October 27th- 28th, 2021
Data Catalog and Metadata Management
- Is It a Fad or Is It the Future?
Presented by Srinivasan Sankar
Disclaimer
Please note this presentation is for general informational and discussion
purposes only. The opinions expressed in this presentation are those of the
presenter and not necessarily those of their employer. The presenter does not
guarantee the accuracy or reliability of the information provided herein. No
representation or warranty, express or implied, is provided by the presenter in
relation to the fairness, accuracy, correctness, completeness or reliability of
the information, opinions or conclusions expressed herein. The information
contained in this presentation is subject to change without notice. This
presentation does not, and is not designed to, provide legal advice and
meeting participants should consult with an attorney concerning the use of
data to ensure all legal and regulatory requirements are satisfied.
AGENDA TOPICS
• Are Data Catalogs / Metadata “the new black,”?
• Leverage metadata management to streamline data governance and ensure
transparency
• Improve insights by extracting value from unstructured data utilizing a machine
learning augmented data catalog
• Let the insights come to you with AI-augmentation
• Practical steps to deal with the onslaught of data and learn how to implement an
effective data catalog
• Multi-source data to increase the potential of data value
• Data Catalog – key enabler of a Data Mesh
Definition
A data catalog creates and maintains an
inventory of data assets through the
discovery, description and organization of
distributed datasets. The data catalog
provides context to enable data stewards,
data/business analysts, data engineers, data
scientists and other line of business (LOB) data
consumers to find and understand
relevant datasets for the purpose of extracting
business value.
In a nutshell,a data catalog is a place that shows what data assets you have and where they are
located.You might be asking,what is a data asset? That is any entity (i.e.,reports,databases,
websites) that contains data.
Data Catalogs Are the “New Black” in Data Management and Analytics
METADATA / CATALOG ROLE
IN DATA MANAGEMENT
• To understand Metadata’s vital role in data
management, imagine a large library, with
hundreds of thousands of books and
magazines, but no card catalog.Without a card
catalog, readers might not even know how to
start looking for a specific book or even a
specific topic.The card catalog not only
provides the necessary information (which
books and materials the library owns and
where they are shelved) it also enables
patrons to find materials using different
starting points (subject area, author, or title).
Without the catalog, finding a specific book
would be difficult if not impossible.
An organization without Metadata is like a library without a card catalog
THE CASE FOR DATA CATALOGS
Analyze Data not chase Data – Many data scientists spend over 2/3rd of their time understanding and
finding the data.The main reason for this problem in an organization is the poor mechanism of handling
and tracking all the data. A good Catalog helps the Data Scientist or Business Analyst understand the
data and answer the question they have.
Efficient Access Control – When an organization grows, role-based policies are needed, don’t want
everybody to modify the data. Access Control should be implemented while building the Data Lake.
Roles are assigned to the users, and according to those roles, Data Access should be controlled.
Eliminate Data Redundancies – A good Catalogue Tool helped us find the data redundancies and
eliminate them.This can help us to save storage costs and data management costs.
To follow Laws – There are different protection laws to follow as per the data, such as GDPR, BASEL,
GDSN, HIPAA, and many more.These laws must be followed while dealing with any data. But these laws
stand for different use cases and don’t imply every data set, to understand that we need to know about
the data set. A good Catalog helps us make sure that Data Compliance’s followed by giving a view on
Data Lineage and using Access Control.
Phase
1
Catalog and
Lineage
• Infrastructure
and
Installation of
Catalog tool
• Data
Architects to
initiate the
collection of
data assets,
catalog and
identify
lineage
Phase
2
Data
Stewardship,
Business
Glossary
•Appoint Part-
time
Governance
Lead role
(cross-
functional
business facing)
•Supporting
Analyst
•Manage
Governance
activities
Phase
3
Operationalize
Governance
activities
•Accountability,
Ownership of
Data
•Operationalize
Data
Governance
activities
•Report Metrics
•Iterate
activities for all
information /
data projects
Improve / Enhance
Data Governance
HOW TO ADOPT DATA CATALOGS INTO GOVERNANCE
Manage Data Lifecycle
Establish
Data Governance
Sustain Data Governance
Communicate
Manage Return
On Investment
Maintain Organization &
Sponsorship
Review/Update Processes
Review//Update Scope
(Quarterly Workshop)
Business Change
Management
Review & Approve New Projects
Maintain Data Definitions
Maintain Metrics
Identify Data Stewards
Conflict Resolution, Escalation
Plan
Organize
Organize
Define
Deploy
Core Foundation
Enterprise Data Asset Catalog
Phased approach
Data Cataloging is a journey……
DATA
CATALOG
BEST
PRACTICES
Assigning Ownership for the data set – Ownership of
each data set must be defined.There must be a person
to whom the user contacts in case of an issue. A good
Catalog also must talk about the owner of any data set.
Human Touch – After building a Catalog, the users must
verify the data sets to make them more accurate.
Searchability –The Catalog should support searchability.
Searchability enables Data Asset Discovery; data
consumers easily find assets that meet their needs.
Data Protection – Define Access policies to prevent
unauthorized data access.
HIGH ROI FOR MULTI-SOURCE DATA WITH DATA
CATALOG
Graphic
Source:
CEB
analysis
Weather,
Highway safety
Industry
Enterprise Data Integration and Data Lake
External data empowers teams to make better data-driven decisions, especially when it’s integrated with first-party data.
Single source data has value in relation to other data in the organization, and the ability to search and analyze across
multiple information sources provides tremendous insight
Traditional DW
•Driving Tracker
•Nest Protect
•GPS Fleet
Tracking
D
A
T
A
C
A
T
A
L
O
G
AI powered process for curating, verifying, and classifying data that enhances speed and usability
How does it work?
What is it?
Use Algorithms (Advanced Statistics and Deep
Learning) to learn from the large scale data to:
Applicable to large, complex and
often streaming data sets
3rd party data, sensor data, customer
data, transactions
• Algorithmic sampling of data to
identify key patterns and business
rules
• Continuous monitoring to alert Data Stewards
of exceptions for timely resolution
• Correlation of data concepts across domains
and data sources to track usage and establish
lineage
• Ability to ingest and apply quality rules to
third party and unstructured data sources
• Establishes feedback loop that refines the
machine learning models to improve data quality
over time
Identify patterns Quality issues and anomalies
across massive, complex and
often streaming data sets
Business rules
BUILD AN INTELLIGENT DATA CATALOG BY
INTEGRATING ARTIFICIAL INTELLIGENCE INTO IT
DATA CATALOG
THE NUCLEI OF A DATA MESH*
• A data product must be easily discoverable
especially with a data catalogue, with their meta
information such as their owners, source of origin,
lineage, sample datasets, etc.This centralized
discoverability service allows data consumers,
engineers and scientists in an organization, to find
a dataset of their interest easily. Each domain data
product must register itself with this centralized
data catalogue for easy discoverability.
• Note the perspective shift here is from a single
platform extracting and owning the data for its use,
to each domain providing its data as a product in a
discoverable fashion.
• Data catalog platforms provide central
discoverability, access control and governance of
distributed domain datasets.
*Data Mesh (concept founded by Zhamak Dehghani) is a sociotechnical approach to share, access and manage analytical data in complex and large-scale environments - within or across organizations
QUESTIONS?
http://www.linkedin.com/in/srinisankar
https://twitter.com/srinisankar

Más contenido relacionado

La actualidad más candente

How to Consume Your Data for AI
How to Consume Your Data for AIHow to Consume Your Data for AI
How to Consume Your Data for AIDATAVERSITY
 
Metadata Strategies
Metadata StrategiesMetadata Strategies
Metadata StrategiesDATAVERSITY
 
NTXISSACSC3 - Why Enterprise Information Management is the Key to GRC by Mika...
NTXISSACSC3 - Why Enterprise Information Management is the Key to GRC by Mika...NTXISSACSC3 - Why Enterprise Information Management is the Key to GRC by Mika...
NTXISSACSC3 - Why Enterprise Information Management is the Key to GRC by Mika...North Texas Chapter of the ISSA
 
Drive your business with predictive analytics
Drive your business with predictive analyticsDrive your business with predictive analytics
Drive your business with predictive analyticsThe Marketing Distillery
 
Death of the Dashboard
Death of the DashboardDeath of the Dashboard
Death of the DashboardDATAVERSITY
 
Slides: Data Governance Reality Check
Slides: Data Governance Reality CheckSlides: Data Governance Reality Check
Slides: Data Governance Reality CheckDATAVERSITY
 
Why Data Science Projects Fail
Why Data Science Projects FailWhy Data Science Projects Fail
Why Data Science Projects FailSense Corp
 
Slides: Metadata Management for the Governance Minded
Slides: Metadata Management for the Governance MindedSlides: Metadata Management for the Governance Minded
Slides: Metadata Management for the Governance MindedDATAVERSITY
 
Slides: The Automated Business Glossary
Slides: The Automated Business GlossarySlides: The Automated Business Glossary
Slides: The Automated Business GlossaryDATAVERSITY
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesLars E Martinsson
 
Business Semantics for Data Governance and Stewardship
Business Semantics for Data Governance and StewardshipBusiness Semantics for Data Governance and Stewardship
Business Semantics for Data Governance and StewardshipPieter De Leenheer
 
Data Quality Strategies
Data Quality StrategiesData Quality Strategies
Data Quality StrategiesDATAVERSITY
 
Noise to Signal - The Biggest Problem in Data
Noise to Signal - The Biggest Problem in DataNoise to Signal - The Biggest Problem in Data
Noise to Signal - The Biggest Problem in DataDATAVERSITY
 
The Disappearing Data Scientist
The Disappearing Data ScientistThe Disappearing Data Scientist
The Disappearing Data ScientistDATAVERSITY
 
Maximizing The Value of Your Structured and Unstructured Data with Data Catal...
Maximizing The Value of Your Structured and Unstructured Data with Data Catal...Maximizing The Value of Your Structured and Unstructured Data with Data Catal...
Maximizing The Value of Your Structured and Unstructured Data with Data Catal...Molly Alexander
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityPrecisely
 
Data Insights and Analytics: The Importance of Effective Communications in An...
Data Insights and Analytics: The Importance of Effective Communications in An...Data Insights and Analytics: The Importance of Effective Communications in An...
Data Insights and Analytics: The Importance of Effective Communications in An...DATAVERSITY
 
RWDG Slides: Using Tools to Advance Your Data Governance Program
RWDG Slides: Using Tools to Advance Your Data Governance ProgramRWDG Slides: Using Tools to Advance Your Data Governance Program
RWDG Slides: Using Tools to Advance Your Data Governance ProgramDATAVERSITY
 
Data Architecture Best Practices for Today’s Rapidly Changing Data Landscape
Data Architecture Best Practices for Today’s Rapidly Changing Data LandscapeData Architecture Best Practices for Today’s Rapidly Changing Data Landscape
Data Architecture Best Practices for Today’s Rapidly Changing Data LandscapeDATAVERSITY
 
Applying Data Quality Best Practices at Big Data Scale
Applying Data Quality Best Practices at Big Data ScaleApplying Data Quality Best Practices at Big Data Scale
Applying Data Quality Best Practices at Big Data ScalePrecisely
 

La actualidad más candente (20)

How to Consume Your Data for AI
How to Consume Your Data for AIHow to Consume Your Data for AI
How to Consume Your Data for AI
 
Metadata Strategies
Metadata StrategiesMetadata Strategies
Metadata Strategies
 
NTXISSACSC3 - Why Enterprise Information Management is the Key to GRC by Mika...
NTXISSACSC3 - Why Enterprise Information Management is the Key to GRC by Mika...NTXISSACSC3 - Why Enterprise Information Management is the Key to GRC by Mika...
NTXISSACSC3 - Why Enterprise Information Management is the Key to GRC by Mika...
 
Drive your business with predictive analytics
Drive your business with predictive analyticsDrive your business with predictive analytics
Drive your business with predictive analytics
 
Death of the Dashboard
Death of the DashboardDeath of the Dashboard
Death of the Dashboard
 
Slides: Data Governance Reality Check
Slides: Data Governance Reality CheckSlides: Data Governance Reality Check
Slides: Data Governance Reality Check
 
Why Data Science Projects Fail
Why Data Science Projects FailWhy Data Science Projects Fail
Why Data Science Projects Fail
 
Slides: Metadata Management for the Governance Minded
Slides: Metadata Management for the Governance MindedSlides: Metadata Management for the Governance Minded
Slides: Metadata Management for the Governance Minded
 
Slides: The Automated Business Glossary
Slides: The Automated Business GlossarySlides: The Automated Business Glossary
Slides: The Automated Business Glossary
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture Deliverables
 
Business Semantics for Data Governance and Stewardship
Business Semantics for Data Governance and StewardshipBusiness Semantics for Data Governance and Stewardship
Business Semantics for Data Governance and Stewardship
 
Data Quality Strategies
Data Quality StrategiesData Quality Strategies
Data Quality Strategies
 
Noise to Signal - The Biggest Problem in Data
Noise to Signal - The Biggest Problem in DataNoise to Signal - The Biggest Problem in Data
Noise to Signal - The Biggest Problem in Data
 
The Disappearing Data Scientist
The Disappearing Data ScientistThe Disappearing Data Scientist
The Disappearing Data Scientist
 
Maximizing The Value of Your Structured and Unstructured Data with Data Catal...
Maximizing The Value of Your Structured and Unstructured Data with Data Catal...Maximizing The Value of Your Structured and Unstructured Data with Data Catal...
Maximizing The Value of Your Structured and Unstructured Data with Data Catal...
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
Data Insights and Analytics: The Importance of Effective Communications in An...
Data Insights and Analytics: The Importance of Effective Communications in An...Data Insights and Analytics: The Importance of Effective Communications in An...
Data Insights and Analytics: The Importance of Effective Communications in An...
 
RWDG Slides: Using Tools to Advance Your Data Governance Program
RWDG Slides: Using Tools to Advance Your Data Governance ProgramRWDG Slides: Using Tools to Advance Your Data Governance Program
RWDG Slides: Using Tools to Advance Your Data Governance Program
 
Data Architecture Best Practices for Today’s Rapidly Changing Data Landscape
Data Architecture Best Practices for Today’s Rapidly Changing Data LandscapeData Architecture Best Practices for Today’s Rapidly Changing Data Landscape
Data Architecture Best Practices for Today’s Rapidly Changing Data Landscape
 
Applying Data Quality Best Practices at Big Data Scale
Applying Data Quality Best Practices at Big Data ScaleApplying Data Quality Best Practices at Big Data Scale
Applying Data Quality Best Practices at Big Data Scale
 

Similar a Chief Data & Analytics Officer Fall Boston - Presentation

Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeThomas Kelly, PMP
 
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...Enterprise Knowledge
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeCognizant
 
Data Systems Integration & Business Value Pt. 1: Metadata
Data Systems Integration & Business Value Pt. 1: MetadataData Systems Integration & Business Value Pt. 1: Metadata
Data Systems Integration & Business Value Pt. 1: MetadataDATAVERSITY
 
Data-Ed: Data Systems Integration & Business Value PT. 1: Metadata
Data-Ed: Data Systems Integration & Business Value PT. 1: MetadataData-Ed: Data Systems Integration & Business Value PT. 1: Metadata
Data-Ed: Data Systems Integration & Business Value PT. 1: MetadataData Blueprint
 
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deckDC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deckBeth Fitzpatrick
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It? Caserta
 
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdfDataScienceConferenc1
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationDatabricks
 
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdfACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdfJerichoGerance
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
 
Self-service analytics risk_September_2016
Self-service analytics risk_September_2016Self-service analytics risk_September_2016
Self-service analytics risk_September_2016Leigh Ulpen
 
Using Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales GoalsUsing Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales GoalsIrshadKhan682442
 
Using Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales GoalsUsing Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales GoalsWilliamJohnson288536
 
Using Data Lakes To Sail Through Your Sales Goals
Using Data Lakes To Sail Through Your Sales GoalsUsing Data Lakes To Sail Through Your Sales Goals
Using Data Lakes To Sail Through Your Sales GoalsKevinJohnson667312
 
WHAT IS A DATA LAKE? Know DATA LAKES & SALES ECOSYSTEM
WHAT IS A DATA LAKE? Know DATA LAKES & SALES ECOSYSTEMWHAT IS A DATA LAKE? Know DATA LAKES & SALES ECOSYSTEM
WHAT IS A DATA LAKE? Know DATA LAKES & SALES ECOSYSTEMRajaraj64
 

Similar a Chief Data & Analytics Officer Fall Boston - Presentation (20)

Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data Lake
 
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...
DGIQ - Case Studies_ Applications of Data Governance in the Enterprise (Final...
 
Abstract
AbstractAbstract
Abstract
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data Lake
 
Data Systems Integration & Business Value Pt. 1: Metadata
Data Systems Integration & Business Value Pt. 1: MetadataData Systems Integration & Business Value Pt. 1: Metadata
Data Systems Integration & Business Value Pt. 1: Metadata
 
Data-Ed: Data Systems Integration & Business Value PT. 1: Metadata
Data-Ed: Data Systems Integration & Business Value PT. 1: MetadataData-Ed: Data Systems Integration & Business Value PT. 1: Metadata
Data-Ed: Data Systems Integration & Business Value PT. 1: Metadata
 
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deckDC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
 
Data Mining
Data MiningData Mining
Data Mining
 
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with Alation
 
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdfACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Self-service analytics risk_September_2016
Self-service analytics risk_September_2016Self-service analytics risk_September_2016
Self-service analytics risk_September_2016
 
9. Data Warehousing & Mining.pptx
9. Data Warehousing & Mining.pptx9. Data Warehousing & Mining.pptx
9. Data Warehousing & Mining.pptx
 
Using Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales GoalsUsing Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales Goals
 
Using Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales GoalsUsing Data Lakes to Sail Through Your Sales Goals
Using Data Lakes to Sail Through Your Sales Goals
 
Using Data Lakes To Sail Through Your Sales Goals
Using Data Lakes To Sail Through Your Sales GoalsUsing Data Lakes To Sail Through Your Sales Goals
Using Data Lakes To Sail Through Your Sales Goals
 
WHAT IS A DATA LAKE? Know DATA LAKES & SALES ECOSYSTEM
WHAT IS A DATA LAKE? Know DATA LAKES & SALES ECOSYSTEMWHAT IS A DATA LAKE? Know DATA LAKES & SALES ECOSYSTEM
WHAT IS A DATA LAKE? Know DATA LAKES & SALES ECOSYSTEM
 

Último

Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 

Último (20)

Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 

Chief Data & Analytics Officer Fall Boston - Presentation

  • 1. CDAO Fall Boston October 27th- 28th, 2021 Data Catalog and Metadata Management - Is It a Fad or Is It the Future? Presented by Srinivasan Sankar
  • 2. Disclaimer Please note this presentation is for general informational and discussion purposes only. The opinions expressed in this presentation are those of the presenter and not necessarily those of their employer. The presenter does not guarantee the accuracy or reliability of the information provided herein. No representation or warranty, express or implied, is provided by the presenter in relation to the fairness, accuracy, correctness, completeness or reliability of the information, opinions or conclusions expressed herein. The information contained in this presentation is subject to change without notice. This presentation does not, and is not designed to, provide legal advice and meeting participants should consult with an attorney concerning the use of data to ensure all legal and regulatory requirements are satisfied.
  • 3. AGENDA TOPICS • Are Data Catalogs / Metadata “the new black,”? • Leverage metadata management to streamline data governance and ensure transparency • Improve insights by extracting value from unstructured data utilizing a machine learning augmented data catalog • Let the insights come to you with AI-augmentation • Practical steps to deal with the onslaught of data and learn how to implement an effective data catalog • Multi-source data to increase the potential of data value • Data Catalog – key enabler of a Data Mesh
  • 4.
  • 5.
  • 6. Definition A data catalog creates and maintains an inventory of data assets through the discovery, description and organization of distributed datasets. The data catalog provides context to enable data stewards, data/business analysts, data engineers, data scientists and other line of business (LOB) data consumers to find and understand relevant datasets for the purpose of extracting business value. In a nutshell,a data catalog is a place that shows what data assets you have and where they are located.You might be asking,what is a data asset? That is any entity (i.e.,reports,databases, websites) that contains data. Data Catalogs Are the “New Black” in Data Management and Analytics
  • 7. METADATA / CATALOG ROLE IN DATA MANAGEMENT • To understand Metadata’s vital role in data management, imagine a large library, with hundreds of thousands of books and magazines, but no card catalog.Without a card catalog, readers might not even know how to start looking for a specific book or even a specific topic.The card catalog not only provides the necessary information (which books and materials the library owns and where they are shelved) it also enables patrons to find materials using different starting points (subject area, author, or title). Without the catalog, finding a specific book would be difficult if not impossible. An organization without Metadata is like a library without a card catalog
  • 8.
  • 9. THE CASE FOR DATA CATALOGS Analyze Data not chase Data – Many data scientists spend over 2/3rd of their time understanding and finding the data.The main reason for this problem in an organization is the poor mechanism of handling and tracking all the data. A good Catalog helps the Data Scientist or Business Analyst understand the data and answer the question they have. Efficient Access Control – When an organization grows, role-based policies are needed, don’t want everybody to modify the data. Access Control should be implemented while building the Data Lake. Roles are assigned to the users, and according to those roles, Data Access should be controlled. Eliminate Data Redundancies – A good Catalogue Tool helped us find the data redundancies and eliminate them.This can help us to save storage costs and data management costs. To follow Laws – There are different protection laws to follow as per the data, such as GDPR, BASEL, GDSN, HIPAA, and many more.These laws must be followed while dealing with any data. But these laws stand for different use cases and don’t imply every data set, to understand that we need to know about the data set. A good Catalog helps us make sure that Data Compliance’s followed by giving a view on Data Lineage and using Access Control.
  • 10. Phase 1 Catalog and Lineage • Infrastructure and Installation of Catalog tool • Data Architects to initiate the collection of data assets, catalog and identify lineage Phase 2 Data Stewardship, Business Glossary •Appoint Part- time Governance Lead role (cross- functional business facing) •Supporting Analyst •Manage Governance activities Phase 3 Operationalize Governance activities •Accountability, Ownership of Data •Operationalize Data Governance activities •Report Metrics •Iterate activities for all information / data projects Improve / Enhance Data Governance HOW TO ADOPT DATA CATALOGS INTO GOVERNANCE Manage Data Lifecycle Establish Data Governance Sustain Data Governance Communicate Manage Return On Investment Maintain Organization & Sponsorship Review/Update Processes Review//Update Scope (Quarterly Workshop) Business Change Management Review & Approve New Projects Maintain Data Definitions Maintain Metrics Identify Data Stewards Conflict Resolution, Escalation Plan Organize Organize Define Deploy Core Foundation Enterprise Data Asset Catalog Phased approach Data Cataloging is a journey……
  • 11. DATA CATALOG BEST PRACTICES Assigning Ownership for the data set – Ownership of each data set must be defined.There must be a person to whom the user contacts in case of an issue. A good Catalog also must talk about the owner of any data set. Human Touch – After building a Catalog, the users must verify the data sets to make them more accurate. Searchability –The Catalog should support searchability. Searchability enables Data Asset Discovery; data consumers easily find assets that meet their needs. Data Protection – Define Access policies to prevent unauthorized data access.
  • 12. HIGH ROI FOR MULTI-SOURCE DATA WITH DATA CATALOG Graphic Source: CEB analysis Weather, Highway safety Industry Enterprise Data Integration and Data Lake External data empowers teams to make better data-driven decisions, especially when it’s integrated with first-party data. Single source data has value in relation to other data in the organization, and the ability to search and analyze across multiple information sources provides tremendous insight Traditional DW •Driving Tracker •Nest Protect •GPS Fleet Tracking D A T A C A T A L O G
  • 13. AI powered process for curating, verifying, and classifying data that enhances speed and usability How does it work? What is it? Use Algorithms (Advanced Statistics and Deep Learning) to learn from the large scale data to: Applicable to large, complex and often streaming data sets 3rd party data, sensor data, customer data, transactions • Algorithmic sampling of data to identify key patterns and business rules • Continuous monitoring to alert Data Stewards of exceptions for timely resolution • Correlation of data concepts across domains and data sources to track usage and establish lineage • Ability to ingest and apply quality rules to third party and unstructured data sources • Establishes feedback loop that refines the machine learning models to improve data quality over time Identify patterns Quality issues and anomalies across massive, complex and often streaming data sets Business rules BUILD AN INTELLIGENT DATA CATALOG BY INTEGRATING ARTIFICIAL INTELLIGENCE INTO IT
  • 14. DATA CATALOG THE NUCLEI OF A DATA MESH* • A data product must be easily discoverable especially with a data catalogue, with their meta information such as their owners, source of origin, lineage, sample datasets, etc.This centralized discoverability service allows data consumers, engineers and scientists in an organization, to find a dataset of their interest easily. Each domain data product must register itself with this centralized data catalogue for easy discoverability. • Note the perspective shift here is from a single platform extracting and owning the data for its use, to each domain providing its data as a product in a discoverable fashion. • Data catalog platforms provide central discoverability, access control and governance of distributed domain datasets. *Data Mesh (concept founded by Zhamak Dehghani) is a sociotechnical approach to share, access and manage analytical data in complex and large-scale environments - within or across organizations