Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
1
Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com
Non-InvasiveData Governance™ is a tr...
Unified Data Orchestration
Madan Kumar | Solutions Engineer| Alluxio
madan@alluxio.com
4 big trends driving the need for a new architecture
Separation of
Compute &
Storage
Hybrid – Multi
cloud
environments
Sel...
Data Ecosystem - Beta Data Ecosystem 1.0
COMPUTE
STORAGE STORAGE
COMPUTE
Data Orchestration Framework
Java File API HDFS Interface S3 Interface REST APIFUSE Interface
HDFS Driver Swift Driver S3 ...
Alluxio’s Approach to Big Data Federation
 Unified Access - Acts as a “virtual data lake.” Files are accessed in Alluxio’...
Data Elasticity
with a unified
namespace
Abstract data silos & storage
systems to independently scale
data with compute
Ru...
Use Cases Data Orchestration Enables
Hive
Alluxio
Run big data workloads in hybrid
cloud environments
On premise
Same inst...
Incredible Open Source Momentum with growing community
900+ contributors &
growing
3760+ Git Stars
Apache 2.0 Licensed
Hun...
2
2
Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com
Non-InvasiveData Governance™ is a ...
3
3
Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com
Non-InvasiveData Governance™ is a ...
4
4
Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com
Non-InvasiveData Governance™ is a ...
5
5
Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com
Non-InvasiveData Governance™ is a ...
6
6
Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com
Non-InvasiveData Governance™ is a ...
7
7
Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com
Non-InvasiveData Governance™ is a ...
8
8
Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com
Non-InvasiveData Governance™ is a ...
9
9
Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com
Non-InvasiveData Governance™ is a ...
10
10
Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com
Non-InvasiveData Governance™ is ...
11
11
Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com
Non-InvasiveData Governance™ is ...
12
12
Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com
Non-InvasiveData Governance™ is ...
13
13
Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com
Non-InvasiveData Governance™ is ...
14
14
Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com
Non-InvasiveData Governance™ is ...
15
15
Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com
Non-InvasiveData Governance™ is ...
16
16
Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com
Non-InvasiveData Governance™ is ...
17
17
Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com
Non-InvasiveData Governance™ is ...
18
18
Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com
Non-InvasiveData Governance™ is ...
19
19
Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com
Non-InvasiveData Governance™ is ...
20
20
Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com
Non-InvasiveData Governance™ is ...
Próxima SlideShare
Cargando en…5
×

RWDG Slides: How to Govern Data Lakes

466 visualizaciones

Publicado el

Are you spending your summer down by the Data Lake? If so, then you want to make certain that the lake is clean and that you pick the best place to swim. The Data Lake is the new analytical paradise that many organizations are banking on to become that answer to improved insights. And you need to prevent the lake from turning swampy.

In this month’s RWDG webinar, Bob Seiner and a special guest will focus on how to govern the data in your Data Lake. Bob’s interaction with his guests is always lively, fact filled and this month they will help you to successfully swim through major barriers to provide an effective and valuable data resource.

In this webinar, Bob and his guest will discuss:
- The relationship between Data Lakes and Data Governance
- Preventing your Data Lake from becoming a Data Swamp
- Governing the Metadata associated with your Data Lake
- Leveraging governed data to provide trustworthy Analytics
- Measuring the value of a governed Data Lake

Publicado en: Datos y análisis
  • Sé el primero en comentar

RWDG Slides: How to Govern Data Lakes

  1. 1. 1 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner How to Govern Data Lakes with Special Guest Evan Terry Monthly Webinar Series Hosted by DATAVERSITY Robert S. Seiner – KIK Consulting / TDAN.com July 18, 2019 – 11:00 a.m. PT / 2:00 p.m. ET Real-World Data Governance
  2. 2. Unified Data Orchestration Madan Kumar | Solutions Engineer| Alluxio madan@alluxio.com
  3. 3. 4 big trends driving the need for a new architecture Separation of Compute & Storage Hybrid – Multi cloud environments Self-service data across the enterprise Rise of the object store
  4. 4. Data Ecosystem - Beta Data Ecosystem 1.0 COMPUTE STORAGE STORAGE COMPUTE
  5. 5. Data Orchestration Framework Java File API HDFS Interface S3 Interface REST APIFUSE Interface HDFS Driver Swift Driver S3 Driver NFS Driver
  6. 6. Alluxio’s Approach to Big Data Federation  Unified Access - Acts as a “virtual data lake.” Files are accessed in Alluxio’s global namespace as if they resided in a single system  Performant - Provides fast local access to important and frequently used data, without maintaining a permanent copy of all data.  Modern, flexible architecture - Promotes separation of compute from storage  Storage Cost Optimization -Transparently reads and writes data directly from the source system, and so does not need to create a permanent copy of the data
  7. 7. Data Elasticity with a unified namespace Abstract data silos & storage systems to independently scale data with compute Run Spark, Hive, Presto, ML workloads on your data located anywhere Accelerate big data workloads with transparent tiered local data Data Accessibility for popular APIs & API translation Data Locality with Intelligent Multi-tiering Key Innovations of the Data Orchestration Layer
  8. 8. Use Cases Data Orchestration Enables Hive Alluxio Run big data workloads in hybrid cloud environments On premise Same instance / container Spark Alluxio Any Cloud / Multi Cloud Same data center / region PrestoSpark Alluxio Accelerate big data frameworks on the public cloud Same instance / container Enable big data on object stores across single or multiple clouds Standalone
  9. 9. Incredible Open Source Momentum with growing community 900+ contributors & growing 3760+ Git Stars Apache 2.0 Licensed Hundreds of thousands of downloads Join the conversation on Slack alluxio.org/slack
  10. 10. 2 2 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • Real-World Data Governance – Monthly Webinar Series – August 15, 2019 – Data Governance versus Information Governance – Third Thursday each Month @ 2pm EST – Register at TDAN.com, KIKconsulting.com, DATAVERSITY.net • Non-Invasive Data Governance Book – ISBN 9781935504856 / Technics Publishing / Amazon.com • Speaking @ Dataversity Events – Data Architecture Summit, Chicago – October 14-17 – Data Governance Vision, Washington, DC – December 9-12 • Non-Invasive Data Governance Online Learning Plan Non-Invasive Metadata Governance Online Learning Plan – DATAVERSITY Training Center – https://training.dataversity.net • The Data Administration Newsletter (TDAN.com) – Twice Monthly – Data Articles, Columns, Blogs and Features – Produced by DATAVERSITY – Subscribe for emails – New Non-Invasive Data Governance Framework now being published • KIK Consulting & Educational Services KIKConsulting.com Home of Non-Invasive Data Governance™ – Home of Non-Invasive Metadata Governance How to Govern Data Lakes Introduction
  11. 11. 3 3 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner Chief Analytics Officer, Velocity Mortgage Capital Evan brings over 20 years of consulting experience in IT environments, including leading software development projects, designing and implementing IT and data strategies, and working on long term, cross departmental projects in such diverse industries as automotive, retail, state government, and e-commerce payments. Evan’s areas of expertise include designing practical analytics solutions, aligning business and IT strategies, and implementing data management and governance programs. He co-authored the data modeling book Beginning Relational Data Modeling and has spoken about data and process quality and systems design. Evan has a BA in Economics from McGill University and an MBA from Columbia Business School. How to Govern Data Lakes Special Guest Evan Terry
  12. 12. 4 4 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • In this webinar, Bob and Evan will discuss: – The relationship between Data Lakes and Data Governance – Preventing your Data Lake from becoming a Data Swamp – Governing the Metadata associated with your Data Lake – Leveraging governed data to provide trustworthy Analytics – Measuring the value of a governed Data Lake How to Govern Data Lakes Abstract
  13. 13. 5 5 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • What is Data Governance? – The execution and enforcement of authority over the definition, production and usage of data and data-related assets. Robert S. Seiner – The management and organization of data. Evan Terry – The orchestration of people and process and data. – The harmonization of people and process and data. – The formalization of accountability for data. – The implementation of decision-rights for data. How to Govern Data Lakes The Relationship between Data Lakes and Data Governance
  14. 14. 6 6 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • What is a Data Lake? – A data lake is a system or repository of data stored in its natural/ raw format, usually object blobs or files. – A data lake is usually a single store of all enterprise data including raw copies of source system data and transformed data used for tasks such as reporting, visualization, advanced analytics and machine learning. SAS Article, 2016 • When does a Data Lake become a Data Swamp? – A data swamp is a deteriorated and unmanaged data lake that is either inaccessible to its intended users or is providing little value. Olavsrud, Thor. CIO 2017 – When the data in the lake is ungoverned. How to Govern Data Lakes The Relationship between Data Lakes and Data Governance
  15. 15. 7 7 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • A connection between governance (how to manage and organize) and data lakes for accurate and useful data management • Catalogs are critical to help you govern data, especially in data lakes – Find things – Defining things – Curate content • Need to include policy-driven processes that classify and identify the information in the lake, why it’s in there, what it means, who owns it, and who is using it • A data lake without data governance will ultimately end up being a collection of disconnected data pools or information silos—just all in one place. How to Govern Data Lakes The Relationship between Data Lakes and Data Governance
  16. 16. 8 8 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • What can be done to prevent the swamping of your data lake? – Implement data governance for the lake. – Implement metadata management for the lake. – Implement sound principles of: • Data Definition • Data Production • Data Usage • What is the appropriate level of data governance for your data lake? How to Govern Data Lakes Preventing your Data Lake from becoming a Data Swamp
  17. 17. 9 9 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • A “data lake” becomes a data swamp without organization – No organization, no curation of content, little metadata • Data warehouse principles are relevant: – Stewardship/Curation – Design, documentation, maintenance of the lake – Metadata capture – Governance • Technique - Create zones in your data lake: – Transition data sets from “raw data” to “clean data” – Apply different curation/governance principles to each zone How to Govern Data Lakes Preventing your Data Lake from becoming a Data Swamp
  18. 18. 10 10 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • Governing metadata associated with: – Data Definition – Data Production – Data Usage • (Where) Is there metadata associated with your data lake? • Who is responsible for the metadata associated with your data lake? • “The metadata will not govern itself!” How to Govern Data Lakes Governing the Metadata Associated with your Data Lake
  19. 19. 11 11 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • Cataloging is key, but is tricky: – don’t under/over catalog – don't be too loose/rigid in your governance rules • “Goldilocks” mentality – everything in moderation • Tune governance to priorities and context – One person's data lake is another’s data swamp – Don't turn data lake into a data warehouse – the clearest data lake – Cannot be all things to all people – playground, incubator, or operational data store? How to Govern Data Lakes Governing the Metadata Associated with your Data Lake
  20. 20. 12 12 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • Sample DG purpose statement – Use strategic data with confidence. • Make certain the water is clean or it may be unhealthy. • “Boil water alert” – Is data governance the boiling of the water? • “Freshwater” versus “Saltwater” determines species that will live in your lake. How to Govern Data Lakes Leveraging Governed Data to Provide Trustworthy Analytics
  21. 21. 13 13 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • Data catalogs solve the problems of finding, interpreting and using data • Data lake is a tool and the context is key – differences in required data quality • “Trustworthy” depends on context and accuracy needs – data lakes are defined as “less” controlled and structured How to Govern Data Lakes Leveraging Governed Data to Provide Trustworthy Analytics
  22. 22. 14 14 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • Provides much the same value as for a data warehouse – analytics requires: – Who owns the data and can answer questions about it – Finding the right data elements that meet your needs – Cleaning the data to an appropriate level of quality – Having the right security on the data being used – Monitoring the data for adherence to standards • Lightweight governance on adding, naming, organizing protects the shared resource from the “tragedy of the commons” How to Govern Data Lakes Leveraging Governed Data to Provide Trustworthy Analytics
  23. 23. 15 15 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • Metrics are one of the 6 core components of Data Governance. Data, people, process, communications, metrics and tools. • Measuring people’s ____________ the data in the lake. – confidence in – understanding of – usage of – decisions made using – knowledge of what data resides in – … all will depend on the effective management of metadata associated with your data lake. How to Govern Data Lakes Measuring the Value of a Governed Data Lake
  24. 24. 16 16 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • Considerations for providing metrics – Benchmark current status – Select metrics that mean something to someone – Select metrics associated with the data lake rather than data governance – Consider that it is not easy to measure Return on Investment on DG – Go jump in the lake! How to Govern Data Lakes Measuring the Value of a Governed Data Lake
  25. 25. 17 17 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • Unlocking the value depends on the data lake being broadly usable • What is the value of R&D? What is the value of avoiding a disaster? • The context of the data lake is key – What is the purpose of the data lake? – What is the tool the data lake will help you solve? – How much value does governance (lightweight or not) provide? • Value is measured in combination with the final use – AI/Machine Learning – Agility/Time to Market – Variety of end users served/capabilities enabled How to Govern Data Lakes Measuring the Value of a Governed Data Lake
  26. 26. 18 18 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • In this webinar, Bob and Evan discussed: – The relationship between Data Lakes and Data Governance – Preventing your Data Lake from becoming a Data Swamp – Governing the Metadata associated with your Data Lake – Leveraging governed data to provide trustworthy Analytics – Measuring the value of a governed Data Lake How to Govern Data Lakes Abstract
  27. 27. 19 19 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • Questions and Answers Real-World Data Governance Contact Information Join us in the Dataversity Community to continue the conversation. https://community.dataversity.net/
  28. 28. 20 20 Copyright © 2019 Robert S. Seiner – KIK Consulting & EducationalServices / TDAN.com Non-InvasiveData Governance™ is a trademark of Robert S. Seiner & KIK Consulting #RWDG @RSeiner • Robert S. Seiner KIK Consulting & Educational Services – KIKconsulting.com The Data Administration Newsletter – TDAN.com Post Office Box 112571, Upper St. Clair, Pennsylvania 15241 412.220.9643, 412.220.9644 (Fax) rseiner@kikconsulting.com rseiner@tdan.com @RSeiner @TDAN_com #RWDG Real-World Data Governance Contact Information

×