SlideShare una empresa de Scribd logo
1 de 16
Descargar para leer sin conexión
IDMA 2021 Fall/Winter Conference
October 13th-14th, 2021
Data Catalog as a Business Enabler
Presented by Srinivasan Sankar
Disclaimer
Please note that the views expressed by our speakers are
their own and may not necessarily reflect those of their
respective employers.
This material is for general informational purposes only and
is not legal advice. It is not designed to be comprehensive,
and it may not apply to your particular facts and
circumstances.
TOPICS
• Improve insights by extracting value from unstructured data utilizing a machine
learning augmented data catalog
• Practical steps to deal with the onslaught of data and learn how to implement an
effective data catalog
• Overcoming data silos using intelligent tools
• Let the insights come to you with AI-augmentation
• Multi-source data to increase the potential of data value
• Data Catalog – key enabler of a Data Mesh
NEW DATA, NEW INSIGHTS:
MAXIMIZING THE VALUE OF
YOUR STRUCTURED AND
UNSTRUCTURED DATA
Definition
A data catalog creates and maintains an inventory of data assets through the
discovery, description and organization of distributed datasets. The data catalog
provides context to enable data stewards, data/business analysts, data engineers, data
scientists and other line of business (LOB) data consumers to find and understand
relevant datasets for the purpose of extracting business value.
In a nutshell,a data catalog is a place that shows what data assets you have and where they are
located.You might be asking,what is a data asset? That is any entity (i.e.,reports,databases,
websites) that contains data.
Data Catalogs Are the New Black in Data Management and Analytics
• Leverage an ML-augmented data catalog as a first step in metadata management
• Deploy data catalogs with the capability to scale beyond narrow (or tactical) use-case
requirements (such as cataloging data only within a Hadoop distribution),
AI POWERED PROCESS FOR CURATING,
VERIFYING, AND CLASSIFYING DATA THAT
ENHANCES SPEED AND USABILITY
How does it work?
What is it?
Use Algorithms (Advanced Statistics and Deep
Learning) to learn from the large scale data to:
Applicable to large, complex and
often streaming data sets
3rd party data, sensor data, customer
data, transactions
• Algorithmic sampling of data to
identify key patterns and business
rules
• Continuous monitoring to alert Data Stewards of
exceptions for timely resolution
• Correlation of data concepts across domains
and data sources to track usage and establish
lineage
• Ability to ingest and apply quality rules to
third party and unstructured data sources
• Establishes feedback loop that refines the
machine learning models to improve data quality
over time
Identify patterns Quality issues and anomalies
across massive, complex and
often streaming data sets
Business rules
THE CASE FOR DATA CATALOGS
Analyze Data not chase Data – Many data scientists spend over 2/3rd of their time understanding and
finding the data.The main reason for this problem in an organization is the poor mechanism of handling
and tracking all the data. A good Catalog helps the Data Scientist or Business Analyst understand the
data and answer the question they have.
Efficient Access Control – When an organization grows, role-based policies are needed, don’t want
everybody to modify the data. Access Control should be implemented while building the Data Lake.
Roles are assigned to the users, and according to those roles, Data Access should be controlled.
Eliminate Data Redundancies – A good Catalogue Tool helped us find the data redundancies and
eliminate them.This can help us to save storage costs and data management costs.
To follow Laws – There are different protection laws to follow as per the data, such as GDPR, BASEL,
GDSN, HIPAA, and many more.These laws must be followed while dealing with any data. But these laws
stand for different usecases and don’t imply every data set, to understand that we need to know about
the data set. A good Catalog helps us make sure that Data Compliance’s followed by giving a view on
Data Lineage and using Access Control.
Phase
1
Catalog and
Lineage
• Infrastructure
and
Installation of
Catalog tool
• Data
Architects to
initiate the
collection of
data assets,
catalog and
identify
lineage
Phase
2
Data
Stewardship,
Business
Glossary
•Appoint Part-
time
Governance
Lead role
(cross-
functional
business facing)
•Supporting
Analyst
•Manage
Governance
activities
Phase
3
Operationalize
Governance
activities
•Accountability,
Ownership of
Data
•Operationalize
Data
Governance
activities
•Report Metrics
•Iterate
activities for all
information /
data projects
Improve / Enhance
Data Governance
HOW TO ADOPT DATA CATALOGS
Manage Data Lifecycle
Establish
Data Governance
Sustain Data Governance
Communicate
Manage Return
On Investment
Maintain Organization &
Sponsorship
Review/Update Processes
Review//Update Scope
(Quarterly Workshop)
Business Change
Management
Review & Approve New Projects
Maintain Data Definitions
Maintain Metrics
Identify Data Stewards
Conflict Resolution, Escalation
Plan
Organize
Organize
Define
Deploy
Core Foundation
Augmented Data Catalog*
* Machine learning powered process for curating, verifying, and classifying data that enhances speed and usability
Phased approach
Data Cataloging is a journey……
DATA
CATALOG
BEST
PRACTICES
Assigning Ownership for the data set – Ownership of
each data set must be defined.There must be a person
to whom the user contacts in case of an issue. A good
Catalog also must talk about the owner of any data set.
Human Touch – After building a Catalog, the users must
verify the data sets to make them more accurate.
Searchability –The Catalog should support searchability.
Searchability enables Data Asset Discovery; data
consumers easily find assets that meet their needs.
Data Protection – Define Access policies to prevent
unauthorized data access.
HIGH ROI FOR MULTI-SOURCE DATA
WITH DATA CATALOG
Graphic
Source:
CEB
analysis
Weather,
Highway safety
Industry
Enterprise Data Integration and Data Lake
Single source data has value in relation to other data in the organization, and the ability
to search and analyze across multiple information sources provides tremendous insight
Traditional DW
•Driving Tracker
•Nest Protect
•GPS Fleet
Tracking
D
A
T
A
C
A
T
A
L
O
G
DATA CATALOG
THE NUCLEI OF A DATA MESH*
• A data product must be easily discoverable
especially with a data catalogue, with their meta
information such as their owners, source of origin,
lineage, sample datasets, etc.This centralized
discoverability service allows data consumers,
engineers and scientists in an organization, to find a
dataset of their interest easily. Each domain data
product must register itself with this centralized
data catalogue for easy discoverability.
• Note the perspective shift here is from a single
platform extracting and owning the data for its use,
to each domain providing its data as a product in a
discoverable fashion.
• Data catalog platforms provide central
discoverability, access control and governance of
distributed domain datasets.
*Data Mesh (concept founded by Zhamak Dehghani) is a sociotechnical approach to share, access and manage analytical data in complex and large-scale environments - within or across organizations
QUESTIONS?
http://www.linkedin.com/in/srinisankar
https://twitter.com/srinisankar

Más contenido relacionado

La actualidad más candente

Why an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessWhy an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessInformatica
 
Data Quality Strategies
Data Quality StrategiesData Quality Strategies
Data Quality StrategiesDATAVERSITY
 
Data Catalog as the Platform for Data Intelligence
Data Catalog as the Platform for Data IntelligenceData Catalog as the Platform for Data Intelligence
Data Catalog as the Platform for Data IntelligenceAlation
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
 
Data as a Product by Wayne Eckerson
Data as a Product by Wayne EckersonData as a Product by Wayne Eckerson
Data as a Product by Wayne EckersonZoomdata
 
Data Management, Metadata Management, and Data Governance – Working Together
Data Management, Metadata Management, and Data Governance – Working TogetherData Management, Metadata Management, and Data Governance – Working Together
Data Management, Metadata Management, and Data Governance – Working TogetherDATAVERSITY
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data StrategyBecoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data StrategyDATAVERSITY
 
DAS Slides: Data Governance - Combining Data Management with Organizational ...
DAS Slides: Data Governance -  Combining Data Management with Organizational ...DAS Slides: Data Governance -  Combining Data Management with Organizational ...
DAS Slides: Data Governance - Combining Data Management with Organizational ...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
The Business Glossary, Data Dictionary, Data Catalog Trifecta
The Business Glossary, Data Dictionary, Data Catalog TrifectaThe Business Glossary, Data Dictionary, Data Catalog Trifecta
The Business Glossary, Data Dictionary, Data Catalog Trifectageorgefirican
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management DATAVERSITY
 
Data Quality
Data QualityData Quality
Data Qualityjerdeb
 
Data Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityData Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityDATAVERSITY
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata ManagementDATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Data Governance
Data GovernanceData Governance
Data GovernanceRob Lux
 
Data strategy in a Big Data world
Data strategy in a Big Data worldData strategy in a Big Data world
Data strategy in a Big Data worldCraig Milroy
 

La actualidad más candente (20)

Why an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessWhy an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business Success
 
Data Quality Strategies
Data Quality StrategiesData Quality Strategies
Data Quality Strategies
 
Data Catalog as the Platform for Data Intelligence
Data Catalog as the Platform for Data IntelligenceData Catalog as the Platform for Data Intelligence
Data Catalog as the Platform for Data Intelligence
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 
Data as a Product by Wayne Eckerson
Data as a Product by Wayne EckersonData as a Product by Wayne Eckerson
Data as a Product by Wayne Eckerson
 
Data Management, Metadata Management, and Data Governance – Working Together
Data Management, Metadata Management, and Data Governance – Working TogetherData Management, Metadata Management, and Data Governance – Working Together
Data Management, Metadata Management, and Data Governance – Working Together
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data StrategyBecoming a Data-Driven Organization - Aligning Business & Data Strategy
Becoming a Data-Driven Organization - Aligning Business & Data Strategy
 
DAS Slides: Data Governance - Combining Data Management with Organizational ...
DAS Slides: Data Governance -  Combining Data Management with Organizational ...DAS Slides: Data Governance -  Combining Data Management with Organizational ...
DAS Slides: Data Governance - Combining Data Management with Organizational ...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
The Business Glossary, Data Dictionary, Data Catalog Trifecta
The Business Glossary, Data Dictionary, Data Catalog TrifectaThe Business Glossary, Data Dictionary, Data Catalog Trifecta
The Business Glossary, Data Dictionary, Data Catalog Trifecta
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management
 
Data Quality
Data QualityData Quality
Data Quality
 
Data Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityData Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data Quality
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata Management
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Data Governance
Data GovernanceData Governance
Data Governance
 
DMBOK and Data Governance
DMBOK and Data GovernanceDMBOK and Data Governance
DMBOK and Data Governance
 
Data strategy in a Big Data world
Data strategy in a Big Data worldData strategy in a Big Data world
Data strategy in a Big Data world
 

Similar a Data Catalog as a Business Enabler

Chief Data & Analytics Officer Fall Boston - Presentation
Chief Data & Analytics Officer Fall Boston - PresentationChief Data & Analytics Officer Fall Boston - Presentation
Chief Data & Analytics Officer Fall Boston - PresentationSrinivasan Sankar
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationDatabricks
 
Maximizing The Value of Your Structured and Unstructured Data with Data Catal...
Maximizing The Value of Your Structured and Unstructured Data with Data Catal...Maximizing The Value of Your Structured and Unstructured Data with Data Catal...
Maximizing The Value of Your Structured and Unstructured Data with Data Catal...Molly Alexander
 
Best Practices To Build a Data Lake
Best Practices To Build a Data LakeBest Practices To Build a Data Lake
Best Practices To Build a Data LakeFibonalabs
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...DataScienceConferenc1
 
ERP technology Areas.pptx
ERP technology Areas.pptxERP technology Areas.pptx
ERP technology Areas.pptxssuserdd904d
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeThomas Kelly, PMP
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 
Data warehouse
Data warehouseData warehouse
Data warehouseRajThakuri
 
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdfACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdfJerichoGerance
 
ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonDATAVERSITY
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It? Caserta
 

Similar a Data Catalog as a Business Enabler (20)

Chief Data & Analytics Officer Fall Boston - Presentation
Chief Data & Analytics Officer Fall Boston - PresentationChief Data & Analytics Officer Fall Boston - Presentation
Chief Data & Analytics Officer Fall Boston - Presentation
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with Alation
 
Maximizing The Value of Your Structured and Unstructured Data with Data Catal...
Maximizing The Value of Your Structured and Unstructured Data with Data Catal...Maximizing The Value of Your Structured and Unstructured Data with Data Catal...
Maximizing The Value of Your Structured and Unstructured Data with Data Catal...
 
Best Practices To Build a Data Lake
Best Practices To Build a Data LakeBest Practices To Build a Data Lake
Best Practices To Build a Data Lake
 
9. Data Warehousing & Mining.pptx
9. Data Warehousing & Mining.pptx9. Data Warehousing & Mining.pptx
9. Data Warehousing & Mining.pptx
 
Big data
Big dataBig data
Big data
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
 
Abstract
AbstractAbstract
Abstract
 
ERP technology Areas.pptx
ERP technology Areas.pptxERP technology Areas.pptx
ERP technology Areas.pptx
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data LakeSemantic 'Radar' Steers Users to Insights in the Data Lake
Semantic 'Radar' Steers Users to Insights in the Data Lake
 
Data Mining
Data MiningData Mining
Data Mining
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdfACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
ACCOUNTING-IT-APP-MIdterm Topic-Bigdata.pdf
 
ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and Comparison
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
 

Último

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制vexqp
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 

Último (20)

In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 

Data Catalog as a Business Enabler

  • 1. IDMA 2021 Fall/Winter Conference October 13th-14th, 2021 Data Catalog as a Business Enabler Presented by Srinivasan Sankar
  • 2. Disclaimer Please note that the views expressed by our speakers are their own and may not necessarily reflect those of their respective employers. This material is for general informational purposes only and is not legal advice. It is not designed to be comprehensive, and it may not apply to your particular facts and circumstances.
  • 3. TOPICS • Improve insights by extracting value from unstructured data utilizing a machine learning augmented data catalog • Practical steps to deal with the onslaught of data and learn how to implement an effective data catalog • Overcoming data silos using intelligent tools • Let the insights come to you with AI-augmentation • Multi-source data to increase the potential of data value • Data Catalog – key enabler of a Data Mesh
  • 4.
  • 5. NEW DATA, NEW INSIGHTS: MAXIMIZING THE VALUE OF YOUR STRUCTURED AND UNSTRUCTURED DATA
  • 6.
  • 7. Definition A data catalog creates and maintains an inventory of data assets through the discovery, description and organization of distributed datasets. The data catalog provides context to enable data stewards, data/business analysts, data engineers, data scientists and other line of business (LOB) data consumers to find and understand relevant datasets for the purpose of extracting business value. In a nutshell,a data catalog is a place that shows what data assets you have and where they are located.You might be asking,what is a data asset? That is any entity (i.e.,reports,databases, websites) that contains data. Data Catalogs Are the New Black in Data Management and Analytics
  • 8.
  • 9. • Leverage an ML-augmented data catalog as a first step in metadata management • Deploy data catalogs with the capability to scale beyond narrow (or tactical) use-case requirements (such as cataloging data only within a Hadoop distribution),
  • 10. AI POWERED PROCESS FOR CURATING, VERIFYING, AND CLASSIFYING DATA THAT ENHANCES SPEED AND USABILITY How does it work? What is it? Use Algorithms (Advanced Statistics and Deep Learning) to learn from the large scale data to: Applicable to large, complex and often streaming data sets 3rd party data, sensor data, customer data, transactions • Algorithmic sampling of data to identify key patterns and business rules • Continuous monitoring to alert Data Stewards of exceptions for timely resolution • Correlation of data concepts across domains and data sources to track usage and establish lineage • Ability to ingest and apply quality rules to third party and unstructured data sources • Establishes feedback loop that refines the machine learning models to improve data quality over time Identify patterns Quality issues and anomalies across massive, complex and often streaming data sets Business rules
  • 11. THE CASE FOR DATA CATALOGS Analyze Data not chase Data – Many data scientists spend over 2/3rd of their time understanding and finding the data.The main reason for this problem in an organization is the poor mechanism of handling and tracking all the data. A good Catalog helps the Data Scientist or Business Analyst understand the data and answer the question they have. Efficient Access Control – When an organization grows, role-based policies are needed, don’t want everybody to modify the data. Access Control should be implemented while building the Data Lake. Roles are assigned to the users, and according to those roles, Data Access should be controlled. Eliminate Data Redundancies – A good Catalogue Tool helped us find the data redundancies and eliminate them.This can help us to save storage costs and data management costs. To follow Laws – There are different protection laws to follow as per the data, such as GDPR, BASEL, GDSN, HIPAA, and many more.These laws must be followed while dealing with any data. But these laws stand for different usecases and don’t imply every data set, to understand that we need to know about the data set. A good Catalog helps us make sure that Data Compliance’s followed by giving a view on Data Lineage and using Access Control.
  • 12. Phase 1 Catalog and Lineage • Infrastructure and Installation of Catalog tool • Data Architects to initiate the collection of data assets, catalog and identify lineage Phase 2 Data Stewardship, Business Glossary •Appoint Part- time Governance Lead role (cross- functional business facing) •Supporting Analyst •Manage Governance activities Phase 3 Operationalize Governance activities •Accountability, Ownership of Data •Operationalize Data Governance activities •Report Metrics •Iterate activities for all information / data projects Improve / Enhance Data Governance HOW TO ADOPT DATA CATALOGS Manage Data Lifecycle Establish Data Governance Sustain Data Governance Communicate Manage Return On Investment Maintain Organization & Sponsorship Review/Update Processes Review//Update Scope (Quarterly Workshop) Business Change Management Review & Approve New Projects Maintain Data Definitions Maintain Metrics Identify Data Stewards Conflict Resolution, Escalation Plan Organize Organize Define Deploy Core Foundation Augmented Data Catalog* * Machine learning powered process for curating, verifying, and classifying data that enhances speed and usability Phased approach Data Cataloging is a journey……
  • 13. DATA CATALOG BEST PRACTICES Assigning Ownership for the data set – Ownership of each data set must be defined.There must be a person to whom the user contacts in case of an issue. A good Catalog also must talk about the owner of any data set. Human Touch – After building a Catalog, the users must verify the data sets to make them more accurate. Searchability –The Catalog should support searchability. Searchability enables Data Asset Discovery; data consumers easily find assets that meet their needs. Data Protection – Define Access policies to prevent unauthorized data access.
  • 14. HIGH ROI FOR MULTI-SOURCE DATA WITH DATA CATALOG Graphic Source: CEB analysis Weather, Highway safety Industry Enterprise Data Integration and Data Lake Single source data has value in relation to other data in the organization, and the ability to search and analyze across multiple information sources provides tremendous insight Traditional DW •Driving Tracker •Nest Protect •GPS Fleet Tracking D A T A C A T A L O G
  • 15. DATA CATALOG THE NUCLEI OF A DATA MESH* • A data product must be easily discoverable especially with a data catalogue, with their meta information such as their owners, source of origin, lineage, sample datasets, etc.This centralized discoverability service allows data consumers, engineers and scientists in an organization, to find a dataset of their interest easily. Each domain data product must register itself with this centralized data catalogue for easy discoverability. • Note the perspective shift here is from a single platform extracting and owning the data for its use, to each domain providing its data as a product in a discoverable fashion. • Data catalog platforms provide central discoverability, access control and governance of distributed domain datasets. *Data Mesh (concept founded by Zhamak Dehghani) is a sociotechnical approach to share, access and manage analytical data in complex and large-scale environments - within or across organizations