Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Who changed my data? Need for data governance and provenance in a streaming world

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio

Eche un vistazo a continuación

1 de 26 Anuncio

Who changed my data? Need for data governance and provenance in a streaming world

Descargar para leer sin conexión

Enterprises have dealt with data governance over the years, but it has been mostly around master data. With the advent of IoT/web/app streams everywhere in the ecosystem surrounding an enterprise, data-in-motion has become a strong force to reckon. Data-in-motion passes through several levels of transformations and augmentation before it becomes data-at-rest. Through this, it is pertinent to preserve the sanctity of such data or at least track the provenance through the various changes. This is very important for a lot of verticals where there are strong regulatory and compliance laws that exist around "who changed what."

This session will go into detail around some specific use cases of how data gets changed, how it can be tracked seamlessly and why this is important for certain verticals. This will be presented in two parts. The first part will cover the industry angle to this and its importance weighed in by several regulatory bodies. The second part will address the technology aspect of it and discuss how companies can leverage Apache Atlas and Ranger in conjunction with NiFi and Kafka to embrace data governance and provenance of their data streams.

Speakers
Dinesh Chandrasekhar, Director, Hortonworks
Paige Bartley, Senior Analyst - Data and Enterprise Intelligence, Ovum

Enterprises have dealt with data governance over the years, but it has been mostly around master data. With the advent of IoT/web/app streams everywhere in the ecosystem surrounding an enterprise, data-in-motion has become a strong force to reckon. Data-in-motion passes through several levels of transformations and augmentation before it becomes data-at-rest. Through this, it is pertinent to preserve the sanctity of such data or at least track the provenance through the various changes. This is very important for a lot of verticals where there are strong regulatory and compliance laws that exist around "who changed what."

This session will go into detail around some specific use cases of how data gets changed, how it can be tracked seamlessly and why this is important for certain verticals. This will be presented in two parts. The first part will cover the industry angle to this and its importance weighed in by several regulatory bodies. The second part will address the technology aspect of it and discuss how companies can leverage Apache Atlas and Ranger in conjunction with NiFi and Kafka to embrace data governance and provenance of their data streams.

Speakers
Dinesh Chandrasekhar, Director, Hortonworks
Paige Bartley, Senior Analyst - Data and Enterprise Intelligence, Ovum

Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a Who changed my data? Need for data governance and provenance in a streaming world (20)

Anuncio

Más de DataWorks Summit (20)

Más reciente (20)

Anuncio

Who changed my data? Need for data governance and provenance in a streaming world

  1. 1. Who changed my data? Need for data governance and provenance in a streaming world Digital capability requires granular control of all data assets. Dinesh Chandrasekhar Director, Product Marketing Paige Bartley Senior Analyst, Data and Enterprise Intelligence
  2. 2. Ovum | TMT intelligence | informa2 Copyright © Informa PLC
  3. 3. Ovum | TMT intelligence | informa3 Copyright © Informa PLC Business challenges in achieving digital capability include:  Reproducibility of analytics results  Debugging of models and algorithms  Ensuring correct access rights to data  Consistent application of data policies  Meeting regulatory compliance requirements  Unifying data across repositories and silos  Finding the right data at the right time Digital Capability Depends on Full Control of Data Addressing these challenges requires understanding how data changes over time.
  4. 4. Ovum | TMT intelligence | informa4 Copyright © Informa PLC Governance and Transparency of Data Assets is More Important than Ever
  5. 5. Ovum | TMT intelligence | informa5 Copyright © Informa PLC More Data:  Economics of storage have made keeping data cheap.  New data types – sensor data, etc. – need to be combined with historical data. More Users:  Self-service era means more data consumers and more frequent data access.  Varying users have varying access rights and privileges.  More users means more proliferation of data versions. More Complexity:  Data repositories have become more distributed, and data sources more varied.  Data resides in more locations than ever before, in the cloud and on-prem. Factors Within the Enterprise
  6. 6. Ovum | TMT intelligence | informa6 Copyright © Informa PLC More Regulatory Pressure Regulations such as GDPR have indirect requirements for tracking lineage.  Article 30 requirements for record keeping necessitate knowledge of provenance. More Competitive Pressure  Leverage of data is increasingly a competitive differentiator.  Pace of change is accelerating, and comprehensive understanding of data is critical.  Disruptors are emerging from unlikely industries, using data to their advantage. Factors External to the Enterprise
  7. 7. Ovum | TMT intelligence | informa7 Copyright © Informa PLC  Article 4: Definition of Personal Data A person can be identified indirectly or directly Data sources can be combined to make personal data  Article 9: Processing of Special Categories of Personal Data Processing of biometric data is highly restricted Many types of sensors produce biometric data  Article 30: Records of Processing Activities “Who, what, when, where, and why” of processing Need deep understanding of metadata and data lineage. GDPR doesn’t differentiate between data-in-motion and data-at-rest! “Who changed what” is critical. Lineage and provenance, while not directly required by GDPR, are critical to meeting requirements. GDPR’s Specific Requirements for Data
  8. 8. Ovum | TMT intelligence | informa8 Copyright © Informa PLC << << From an analytics standpoint, reaping the benefits of big data means investing in data management and governance. Without the correct people, processes, and infrastructure, more casual business users will likely struggle to see the benefits of big data technologies. Laurent-Olivier Lioté Analyst, Data and Enterprise Intelligence, Ovum
  9. 9. Ovum | TMT intelligence | informa9 Copyright © Informa PLC A Holistic View of Data Requires Both Data-in-Motion and Data-at-Rest Data at Rest Data in Motion Contextual Understanding of Data
  10. 10. Ovum | TMT intelligence | informa10 Copyright © Informa PLC Having a common enterprise metadata framework allows data of different types and from different sources to be managed consistently. A common metadata framework allows for:  Common search and lineage for datasets  Lifecycle management from ingestion to disposition  Metadata exchange with other metadata tools  Analysis of data usage and access trends  Consistent application of access rights  Analysis of behavior and anomalies How Do We Do This? Metadata Management is Necessary for Governance Metadata Creation Metadata Enrichment Metadata Analysis
  11. 11. Ovum | TMT intelligence | informa11 Copyright © Informa PLC The data lake, if properly managed, can support a common metadata framework which underpins enterprise data.  Data-in-motion  Data-at-rest  Structured data  Unstructured data Common management of metadata allows for streamlined control and visibility into data. Better control of data results in better business outcomes. The Managed Data Lake Can Support a Common Metadata Framework All metadata, managed together.
  12. 12. Ovum | TMT intelligence | informa12 Copyright © Informa PLC The enterprise increasingly wants to analyze all data, both in-motion and at-rest, in context with each other. Governance and lineage for data-in-motion allows for:  Audit and regulatory compliance  Insight into data history and provenance  Comprehensive lifecycle management  Security and access controls  Better quality data = better analytics Governance standards for data-in-motion need to match those for data-at-rest. Governance Standards Need to be Equal Common Metadata Framework Data-in-Motion Data-at-Rest Data Management Platform
  13. 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved Changing face of data Challenges and Solutions
  14. 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved The New Way of Business Is Fueled By Connected Data • Connected Customers, Vehicles, Devices • Socially crowd-sourced requirements • Digital design and analysis • Digital prototypes and tests (simulations) • Connected Factories, Sensors, Devices • Human-robotic interaction • 3D-printing on demand • Connected Trucks, Inventory • Location, traffic, weather-aware distribution • Real-time inventory visibility • Dynamic rerouting • Connected Customers, Devices • Omni- channel demand sensing • Real-Time Recommendations • Connected Assets • Remote service monitoring & delivery • Predictive maintenance • OTA Updates DEVELOPMENT MANUFACTURING DISTRIBUTION MARKETING/SALES SERVICE
  15. 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved Today’s Digital Enterprises RFID TRACKERS AND NANO-DEVICES to give you visibility into movement of your goods MOBILE NOTIFICATIONS to inform you of shipment delay from a supplier BLOCKCHAINS to give complete trust and provenance in your supply chain VIRTUAL ASSISTANTS to enhance your customer experience AI-POWERED CHATBOTS to improve your customer support functions ELECTRONIC B2B EXCHANGES to streamline order processing with partners
  16. 16. 16 © Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information. Modern Data Architecture DATA CENTER Machine Learning/ Artificial Intelligence Telemetry – Connected Devices Time Series Databases Stream Analytics Deep Historical Analysis Exception Monitoring Legacy/ Operational Data Sensors, Control Systems Cyber Security Edge Analytics Social Mobile IoT IoT CLOUD Geo Location
  17. 17. 17 © Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information. Data Challenges Cannot get a 360 VIEW of your customer? DROWNING in data lakes? TOO MUCH DATA coming in from TOO MANY SOURCES and devices? New business initiatives leading to EXCESSIVE IT COSTS?` MOST IMPORTANTLY… Don’t have the right data at the right time to make the right decision?
  18. 18. 18 © Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information. G L O B A L D ATA M A N A G E M E N T DATA SOURCES DATA CENTER CLOUD EDGE Exception Monitoring 360 View of Operations Cyber Security Telemetry – Connected Devices Time Series Sensors, Control Systems Telemetry – Connected Devices Sensors, Control Systems Time Series Exception Monitoring Cyber Security Legacy/ Operational Data Global Data Management Enables Modern Data Architecture
  19. 19. 19 © Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information. Data Management Challenges • Dealing with multi-clouds • Avoiding cloud/ vendor lock-in • Future proofing your architecture • Common view of security, governance • Manage all data, regardless of type or location • Maximize data re-use for multiple workloads DATA SOURCES DATA CENTER CLOUD EDGE Exception Monitoring 360 View of Operations Cyber Security Telemetry – Connected Devices Time Series Sensors, Control Systems
  20. 20. 20 © Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information. Global Data Management Platform DATA SOURCES DATA CENTER CLOUD EDGE Exception Monitoring 360 View of Operations Cyber Security Telemetry – Connected Devices Time Series Sensors, Control Systems DATA-IN-MOTION DATA-AT-REST MANAGE, SECURE, GOVERN, CONSUME
  21. 21. 21 © Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information. Global Data Management - Powering Innovation MODERN DATA USE CASES EDW OPTIMIZATION CYBERSECURITY DATA SCIENCE ADVANCED ANALYTICS IOT/ STREAMING ANALYTICS DATA SOURCES DATA CENTER CLOUD EDGE Exception Monitoring 360 View of Operations Cyber Security Telemetry – Connected Devices Time Series Sensors, Control Systems DATA-IN-MOTION DATA-AT-REST MANAGE, SECURE, GOVERN, CONSUME
  22. 22. 22 © Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information. Apache NiFi Overview • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Recovery/recording a rolling log of fine- grained history • Visual command and control • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering
  23. 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved Watch real time flow of data: Data Provenance in Apache NiFi Select Data Provenance
  24. 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved Easily access and trace changes to dataflow in Apache NiFi
  25. 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved Apache Atlas • Enterprise data governance • Integration with Apache NiFi • Integration with Apache Ranger Apache Atlas Knowledge Store Audit Store ModelsType-System Policy RulesTaxonomies Tag Based Policies Data Lifecycle Management Real Time Tag BasedAccess Control REST API Services Search Lineage Exchange Healthcare HIPAA HL7 Financial SOX Dodd-Frank Energy PPDM Retail PCI PII Other CWM SERVICE: DATA STEWARD STUDIODSS Discover& Fingerprint Data Smart Enterprise Search Data & Metadata Security Data Lineage & Impact Analysis Enterprise Data Catalog Organize& CurateData
  26. 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved Thank you

Notas del editor

  • Let’s step away from compliance, regulation, and requirements, and look at the major trends and drivers within the enterprise. Governance and provenance are often discussed as “checkbox” requirements, rather than as enablers.

    ICT Enterprise Insights survey identified “create digital capability” and “manage security, identity, and privacy” as the top two IT trends in the enterprise. What do these trends have in common?
  • There are three pillars to creating digital capability. The first pillar is the creation of the digital platform and infrastructure itself. The second pillar is the creation of the ability to effectively exploit and utilize data. The third pillar is the development of the enterprise's innovation process and methodology for the digital age. All three are underpinned by a clearly articulated digital strategy.
  • Article 4: Any information relating to an identified or identifiable natural person; a natural person can be identified indirectly or directly , and the enterprise needs to be cautious with combining data sources to ensure that innocuous information doesn’t become personal information
    Article 9: Processing of biometric data for the purpose of uniquely identifying a person is inherently prohibited, unless certain conditions are met, and this applies to several types of data in motion: sensor data from wearables, medical devices, and fitness devices.
    Article 30: Must document purposes of processing, transfers of data to non-EU countries, and the envisaged time limits for erasure of the data

  • Data policies are applied and encoded at the metadata level. Metadata, or data about data, is critical to providing a common foundation for understanding the qualities of data residing in different systems and to provide lineage and cataloging capabilities. A shared or common metadata framework, where all metadata is managed together, allows data to be centrally searched, tracked, and monitored regardless of its "home" repository.
  • To make this a reality, the same governance standards need to be applied to all enterprise data equally. There needs to be a single platform environment where data-in-motion and data-at-rest can be managed together, with a common metadata framework. All data-in-motion sources need a way to be ingested into this platform, with provenance and lineage tracked as they flow in.
  • TALK TRACK
    Hortonworks Powers the Future of Data: data-in-motion, data-at-rest, and Modern Data Applications.

    [NEXT SLIDE]
  • Data is often referred to as the fuel of today’s businesses. In reality, every business has data and perhaps can have access to the same types of data than most of their competitors. The real paradigm is not data but who uses it smarter with greater effect. And that usage often rely on connecting the data dots across your organization. By connecting customers to products to channels through which they interact of prefer to interact we can drive better customer experiences – resulting in better loyalty and hopefully better revenues. Every industry is being transformed through these connected use cases.
  • 1) Data is in multiple places (data centers that the company owns, cloud, owned by a third party,). 2) Different data in different places (data in your databases – numbers – data from sensors in a connected product not arranged in a database; 3) data flowing back and forth between data center and cloud.

    Talking points:
    There is a an entire new world being created by combining lots of data with break through tools.
    Data could be on-premises and in the cloud
    Data is moving from sensors in real time across our data fabric and giving us precise instrumentation of what happened just before an event as well as after the event. This is true for customers buying on the web as well as products that might fail.
    We can run our machine learning and deep learning on these vast repositories of data
    And we can push these models down to the edges to automate decision

    Note:
    For us as a community and as a company, we need to continue to innovate around the core technology, while thinking about how we enable 3 personas to be successful. This is the logical evolution and transformation that’s happening now.
  • You need to holistically manage all the data in all places, then begin to move our platform into place
  • You need to holistically manage all the data in all places, then begin to move our platform into place
  • You need to holistically manage all the data in all places, then begin to move our platform into place
  • HDF provides very fine-grained, high fidelity reporting about the origins of data, how it was used, who used it etc.

×