SlideShare una empresa de Scribd logo
1 de 12
0
GLOBAL DATA MANAGEMENT:
GOVERNANCE, SECURITY AND
USEFULNESS IN A HYBRID WORLD
Sponsored by
By Neil Raden
Hired Brains Research
May, 2018
1
TABLE OF CONTENTS
GOAL OF GLOBAL DATA MANAGEMENT 1
A SHORT HISTORY OF SECURITY 1
THE SITUATION TODAY 2
“ALIEN” DISTANT DATA 3
MULTI-JURISDICTIONAL ISSUES 4
RISKS AND REWARDS OF A TRADE-OFF GOVERNANCE POLICY 4
THE FABRIC 4
THE GLOBAL DATA MANAGEMENT (GDM) PROGRAM 5
METADATA 5
LINEAGE 6
GOVERNANCE 6
SECURITY 7
LIFECYCLE 7
WHAT A GDM PERSON DOES (PERSONALLY AND THROUGH THE TEAM) 8
OTHER KEY ROLES 9
CONCLUSION 9
ABOUT THE AUTHOR 10
1
GOAL OF GLOBAL DATA MANAGEMENT
There is no question that there is a greater, aching desire by organizations to capture
data and draw insight from it for a multitude of improvements and innovations in
operations, customer service, and evenin completely new businesses1. That effort has
become more complicated with the emergence of hybrid, distributed computing and
data architectures (big data, cloud variants, multi-clouds and IoT). To succeed there
is a need to address a broader data management philosophy incorporating
collaboration, standardization, reuse, retention (of data and models) and especially,
security and governance. To illustrate this need, a short history of enterprise security
and governance will help.
A SHORT HISTORY OF SECURITY
Before the cloud, before big data, and even into the present, security was
implemented one application system at a time. If you were in the finance
department, you may be granted access to post manual ledger entries through the
accounting system. If you were in Human Resources, you may be granted access to
view and/or modify an employee’s records through an HR system. These grants were
either embedded in the application logic based on your role, or applied externally.
But the grants and restrictions were all administered through separate application
systems and their security scheme was not transferable from one application to
another. As a result, the overall picture was fractured, inconsistent and difficult to
administer. It was developed from a time when people in organizations had tightly
constrained roles. Today, employees are expected to be agile, adaptable and able to
handle multiple roles in the organization simultaneously.
Again, before the cloud, before big data, before data science, analysts did devise
quantitative methods. In the early days of e-commerce for example, websites already
employed recommendation engines, dynamic decision making based on scoring and
decision trees for next-best-offer or propensity models. They did this by getting
access, usually one data source at a time, from IT. Data warehouses both aided and
hindered their work: aided by integrating data from multiple sources and collapsing
the security model to just one source, hindered by only providing aggregated data and
a rigid design that couldn’t adapt quickly (in fairness, any good data warehouse
designer could enhance a schema, but provisioning new data was a slow process). The
only thing that prevented the data warehouse from ingesting all of the data, internal
and external, that analysts craved was scarcity. The data warehouse could only scale
in terms of volume, throughput and demanding use at extreme cost.
1 We use the term“businesses” loosely as these innovationsalso apply to government, non-profits, charities and NGO’s, and any
type of organization
2
What organizations crave seems to shift over decades. Fifty years ago, computers
were employed for record-keeping. Reporting from these systems was limited to
copious printing of records. The demand for actual reporting generated long backlogs
of systems analysts and programmers creating massive hairball of “interfaces” with no
management. Early Business Intelligence (BI) emerged that shifted the burden to
analysts, freeing IT to focus on new generations of application systems. Data access
and security shifted to the data warehouse.
About ten years ago, Tom Davenport published his landmark book, “Competing on
Analytics2” which put the term “analytics” in play. Suddenly, analytics rose to the top
of enterprise computing. Predictive analytics, data science, machine learning and
Artificial Intelligence became top of mind, but they needed a place to live.
The process of analyzing data in organizations has for decades applied tools designed
for the individual. Spreadsheets, for example, proved to be the de-facto modeling and
reporting tool for thirty years or more, but they never adequately provided services of
security, governance, efficient creation and maintenance of metadata. Other tools for
analysis and reporting, such as BI, provided their own solutions for metadata and
collaboration, version control, etc., but they were point solutions, only useful for the
product itself (Unfortunately, the same can be said for the some of the newer data
science workbench products.)
When Hadoop burst on the scene ten years ago, it too shared the many of the gaps.
That’s not an indictment of DIY (do-it-yourself) analytics or wider analytic practices
based on self-service. Rather, it’s a cautionary tale that in an enterprise, the most
well-meaning and well-crafted analysis by individual contributors will always bog
down with redundancy without adequate
Data Management
THE SITUATION TODAY
With Global Data Management methodology and tools, all of your data can be
accessed and used no matter where it is or where it is from: on-premises, private
cloud, public cloud(s), hybrid cloud, open source, third-party data and any
combination of the these, with security, privacy and governance applied as if they
were a single entity. Ingenious software products and the economics of computing
make it economical to do this. Not free, but feasible.
Large data platforms, such as Hadoop, by their nature contain many different types of
data from many different sources. In past decades, IT organizations built business-
oriented data models and massaged an often unruly collection of data in data
warehouses (frankly, an approach that still has merit), but for today’s technology,
2 Davenport, T. H., & Harris, J. G. (2007). Competing on analytics: The new science of winning. Boston, Mass: Harvard Business
School Press
3
that approach is too slow and too limiting for the hastening digital transformation
facing every industry.
While corporate IT designs for Security and Governance were conceived in an
environment of highly controlled data management and computing, for both
operational and analytical processes, those designs are counterproductive in a hybrid,
distributed, complex and increasingly streaming near-real-time world. Definitions of
security and governance in this environment are quite different. For example:
 Old (and still prevalent) meaning of Security: To protect against loss,
malicious, innocent and/or inadvertent access to or distribution of data that
can cause damage. To isolate various organizational entities from each other.
To throttle activity by managing from scarcity.
 New meaning of security: Securing that useful and important analysis will not
be missed as a result of too restrictive and or misappropriated restrictions,
usually as a result of a lack of shared understanding between data stewards
and, for example, data scientists
 Old meaning of Governance: Is a framework that provides a formal structure
for organizations to produce measurable results toward achieving their
strategies and ensures that IT investments support business objectives. The
most commonly used frameworks are COBIT, ITIL, COSO, CMMI and FAIR.
 New meaning for Governance: Governance should be driven by a simple
concept (though hard to practice): trade-offs. Giventhe complexity of the
computing/data environment today, governance should aim toward a shared
understanding of risk-reward for what’s needed and evaluated and managed
across the enterprise by intelligent agents that augment the work of data
professionals and analytics practitioners. For example, it may be in the
organization’s interest to relax some access and use rules derived from simple
assumptions to achieve more productive analytics from data scientists. Trade-
offs are the opposite of rigidity.
“ALIEN” DISTANT DATA
The major issue is that enterprise data no longer exists solely in a data center or even
a single cloud (or more than one, or combinations of both). Edge analytics for IoT, for
example capture, digest, curate and evenpull data from other, different application
platforms and live connections to partners, previously a snail-like process using
obsolete processes like EDI or evenbatch ETL. Edge computing can be thought of as
decentralized from on-premises networks, cellular networks, data center networks, or
the cloud. All of these factors pose a risk of data originating in far-flung
environments, where the data structures and semantics are not well understood or
4
documented3. The risk of easily moving data from place to place or the complexity of
moving the logic to the data while everything is in motion is too extreme for manual
methods.
MULTI-JURISDICTIONAL ISSUES
Currently, organizations, at best, have governance programs for data and use in their
own jurisdictions. But even those organizations that primarily operate in a single
jurisdiction may have exposure to regulatory requirements in many others. The 2018
phase-in of the European Union GDPR (General Data Protection Regulation) is one
such instance. The solution is a Global Data Management scheme that operates as a
single program in in all jurisdictions.
RISKS AND REWARDS OF A TRADE-OFF GOVERNANCE POLICY
The cadence of technology innovation clearly surpasses most organization’s ability to
implement each new or improved technique before the next one arrives. Governance
and data management can never be a pure, complete process. It requires trade-offs;
picking the issues that make the most sense, have the greatest centrality to the
organization’s strategy (ies) and provide both the most protection against danger, as
well as insuring the organization can be as effective as possible.
Governance and data management tools today are not designed for a trade-off
approach. They are layered with rules and restrictions with a “better safe than sorry”
mentality. Governance has to be a continuing process between IT and the rest of the
organization. Modern governance approaches cannot work with the “IT has the last
word” in any discussion. It only leads to dysfunction and missed opportunities. It can’t
be done with tools and methodologies of the past decades.
THE FABRIC
The best way to describe the solution is as a data management “fabric” that
metaphorically drapes over all of these environments and provides the management
and governance services needed. A short description of its functions is:
The Fabric drapes over all the data resources. Is a completely different approach to
enterprise data management. It allows an organization to finally derive more value
from their data management initiatives than the cost of implementing them. Areas of
the organization that previously were denied the insight that could have been
provided by data the organization captured (somewhere) can leverage the latent
value in distributed data stores, enabled by the capabilities the GDM provides. You
can also think of the fabric as an underlying mechanism that orchestrates all of the
functions of the GDM and allows for plugging in new capabilities in an open and
seamless fashion.
3 A trucking company may have more than twentyseparate telematics providersin the cab, each with its own protocols for
applications that require the truckingcompany to absorb and reactto in near-real-time
5
THE GLOBAL DATA MANAGEMENT (GDM) PROGRAM
Metadata, lineage, governance, security, lifecycle - are the components of the GDM.
But just as importantly, are the program, the people and skills.
The first step is to have an actual implementation of the “fabric.” Hortonworks
provides this through its DataPlane service. The common foundation includes the
ability to manage and govern data across distributed data lakes.
METADATA
Has a wide variety of definitions and sub-classes, but in the need for GDM, it powers
both operation and understanding. Accelerating the time to value of your data
investments, metadata democratizes accessibility and improves the understanding of
data and processes across the organization. It rapidly improves the productivity of
analysts and data scientists. While operational metadata is the bedrock for technical
and operational aspects of uptime, performance, cost, etc., it is fundamental in
lifting the productivity of analysts by addressing these six questions:
What does the data mean (semantic)?
Where does it come from (lineage)?
Can I trust it (trust metrics)?
Does its meaning vary by context (interpretation)?
How do I find it?
Who do I ask (Data stewards, SME’s)?
Metadata is the key to governance and use. Metadata has to be developed for both
consistency of use and understanding as well as flexibility as the organizations
6
changes. The scope of the metadata catalogs is beyond the capabilities of data
stewards to develop manually. The GDM must have intelligent software to:
- Capture and catalog metadata for new or modified data assets
- Allow for data stewards to examine the machine-generated metadata and make
adjustments as necessary
- Manage metadata repositories across instances to ensure it is consistent
LINEAGE
Where the data originated and how it has been manipulated; trust metrics (crowd
sourced). A lot of the analytical data wrangling is still a manual process. One
drawback is the issue of keeping track of provenance, i.e., what is the source of the
data and whether it is still current. Data is rarely gathered just once. It can be
reused for multiple versions of the analysis, or evencontinuously updated/refreshed
as models are refreshed for continuous improvement. In addition, outcomes often
need to be tracked to the original data sources for validation.
GOVERNANCE
Taking security and access to a new level. Security, grants and restrictions, are driven
by context, not location. For example, as an analyst, you manage a corpus of work -
data, models, presentations, notebooks. Access to data you need is granted based on
the components you use, no matter where in the world they are. Time-consuming
requests to IT or data stewards are unnecessary as access is driven by intelligent
agents that understand your role.
The Hortonworks Data Steward Studio, which operates which the DataPlane Service,
provides businesses the capability to develop trust in their data and comply with
7
regulations by understanding data provenance, origin, lineage and impact. The GDM
by its nature is too complicated one or more data stewards to manage with current
manual methods. The DSS provides then with the tools to secure, govern and provide
the data for todays distributed, hybrid world.
A popular misconception about data scientists is that all of their work is one-off and
ad hoc, grabbing data and massaging it until it yields answers. In fact, their work is
much more formal than that. They have to assign business friendly and intuitive
names to data files that they create or download and then organize those files into
directories, according to a rational naming convention. When they refresh those files,
they must version them and keep track of their differences. This is a complicated
process. Data doesn’t always reside in logical files. For, example, clinical and
scientific lab equipment can generate hundreds or thousands of data files that
scientists must name and organize before running computational analyses on them.
SECURITY
Previously, data management was highly driven by “silos,” collections of domains in
locations. Schemes for governance were highly localized. Access to a data warehouse
could be broad for an analyst, but deeper analysis requiring access to other data
sources were dependent on data management in place at those sources.
Where most data warehouses disappointed practitioners of advanced modeling and
analysis (data scientists, for example) such as machine learning models was having
access to raw data not otherwise needed in the data warehouse, including detail from
source systems, sensor data streaming from the edge, and all manner of external data
sources. Existing data management and security programs typically allow access to
data sources used by an analyst and cohort of others on a “normal” basis, but
requests beyond that range fire an alert. The paradox is, a productive analyst should
spend more time working “out-of-the-box” than in it. Fractured data management
and security programs thwart their efforts.
Your organization is likely composed of a mosaic of data stores (or will be soon):
Multi-cloud, IoT, data lakes, data warehouses, on-prem, hybrid cloud, at-rest and
streaming. At-rest data can be catalogued and even updated/refreshed according to a
governance scheme, but streaming data presents a more challenging problem, not one
that can be solved manually as the flow can change without notice. GDM should
provide tools to deal with it, but governance policy is the map, software that
implements the policy is the journey.
LIFECYCLE
Everything discussed so far only addresses a scheme of security and governance in
place. A GDM must be able to perform as a lifecycle process. That means putting in
place a program and architecture that is capable of dynamically adjusting to changing
to business realities as well as the rapid cadence of new technology: Integration of
8
new data and features, adjusting governance policy and administration to changing
conditions and doing all of that on a consistent set of tools and metadata.
A robust GDM program cannot be implemented as a “project,” it continues through a
lifecycle. Hortonworks provides the tools to maintain your GDM through its Data
Lifecycle Manager.
WHAT A GDM PERSON DOES (PERSONALLY AND THROUGH THE TEAM)
One thing to keep in mind is that the fortunes of an organizations do not change by
implementing technology. That’s the first step.
The leader of the GDM initiative in the organization (often given the title Chief Data
Officer, or CDO) needs, above all, to inspire confidence among the various
stakeholders in the organization. Above and beyond any particular previous skill and
experience in data management, it is paramount the person in this role has the vision
to motivate and encourage the organization. This requires someone with the gravitas
and communication and political skill to navigate the currents of diverse backgrounds
and requirements.
The GDM role is the keeper of the strategy to ensure it doesn’t flag as the process is
not without challenges. This encompasses all aspects of GDM -- architecture, data
catalogs, quality, lineage and metadata. To establish policies, measures, standards
and requirements that fit the spirit of the initiative, must dismantle obsolete security
and governance methodologies that degrade the vision. Driving the selection process
of the components ensures the program can scale economically from both
implementation and TCO perspectives.
9
The GDM leader owns the initiative, no matter how influential various others are in
the organization. The breaking down of siloes, fiefdoms and data czars is key to
delivering data democratization in support of all services, analytics and data
products. Inevitable change management requires careful and thorough
communication to business owners and their designated data managers and stewards.
The GDM Is the point person with the C-Suite on all matters relating to data for
compliance, privacy and governance, and has responsibility for the initial creation of
control apparatus to ensure integrity in the program. At some point, it is wise for the
GDM to delegate these roles and move on as the project becomes a program.
OTHER KEY ROLES
There are four key roles that you will need to establish and nurture. Many people in
your organization can step up to these roles with training, but will need to re-orient
their practices for a global, elastic governed process:
- Data scientists and data analysts to understand cross source lineage, apply
models across types of data and gain access to data to gain deeper insight into
both pre and post transaction analysis
- Data stewards to investigate lineage, improve quality and eliminate
redundancies across data assets.
- Data engineers to move, backup and restore data assets across environments
and sources, while implementing an efficient data storage tiering policy.
- Data architects to define security and governance policies that are
automatically enforced to meet compliance requirements
CONCLUSION
No organization today is immune from the push for some form of digital
transformation. The late Peter Drucker famously said, “The computer actually may
have aggravated management's degenerative tendency to focus inward…4” That was
almost twenty years ago and is almost certainly not true today. However, it illustrates
how information systems have changed, and how quickly. It is no longer solely
sufficient to thresh through your internal record-keeping systems for insight, and it is
very likely that you already do your analytics in multiple locations, multiple
platforms, multiple clusters and with very different kinds of data. In addition, more
of your staff are engaged in analytics as a result of better software tools and more
will continue to be. It is time to jettison your old piecemeal approach to data
4 Peter F. Drucker (2009). “The Effective Executive: The Definitive Guide to Getting the Right Things Done”, p.16, Harper
Collins.
10
management from the mindset of twenty years ago. Global data management is not
optional.
ABOUT THE AUTHOR
Neil Raden, based in Santa Fe, NM, is an active industry analyst, consultant and
widely published author and speaker and also the founder of Hired Brains Research.
Hired Brains provides thought leadership, context and advisory consulting and
implementation services in Information Management, Analytics/ Data Science,
Machine Learning/AI and IoT for clients worldwide. Hired Brains also provides
consulting, market research, product marketing and advisory services to the software
industry. Neil is the co-author of Smart (Enough) Systems: How to Deliver Competitive
Advantage by Automating Hidden Decisions, Prentice-Hall. He welcomes your
comments at nraden@hiredbrains.com.

Más contenido relacionado

La actualidad más candente

CS309A Final Paper_KM_DD
CS309A Final Paper_KM_DDCS309A Final Paper_KM_DD
CS309A Final Paper_KM_DD
David Darrough
 
Big Data & Analytics Trends 2016 Vin Malhotra
Big Data & Analytics Trends 2016 Vin MalhotraBig Data & Analytics Trends 2016 Vin Malhotra
Big Data & Analytics Trends 2016 Vin Malhotra
Vin Malhotra
 
2012 iia-predictions-brief-final
2012 iia-predictions-brief-final2012 iia-predictions-brief-final
2012 iia-predictions-brief-final
camdi
 
KM - Cognitive Computing overview by Ken Martin 13Apr2016
KM - Cognitive Computing overview by Ken Martin 13Apr2016KM - Cognitive Computing overview by Ken Martin 13Apr2016
KM - Cognitive Computing overview by Ken Martin 13Apr2016
HCL Technologies
 
The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Management
mark madsen
 

La actualidad más candente (20)

CS309A Final Paper_KM_DD
CS309A Final Paper_KM_DDCS309A Final Paper_KM_DD
CS309A Final Paper_KM_DD
 
Investing in AI: Moving Along the Digital Maturity Curve
Investing in AI: Moving Along the Digital Maturity CurveInvesting in AI: Moving Along the Digital Maturity Curve
Investing in AI: Moving Along the Digital Maturity Curve
 
Big Data & Analytics Trends 2016 Vin Malhotra
Big Data & Analytics Trends 2016 Vin MalhotraBig Data & Analytics Trends 2016 Vin Malhotra
Big Data & Analytics Trends 2016 Vin Malhotra
 
EDW 2015 cognitive computing panel session
EDW 2015 cognitive computing panel session EDW 2015 cognitive computing panel session
EDW 2015 cognitive computing panel session
 
Smart Data Module 6 d drive the future
Smart Data Module 6 d drive the futureSmart Data Module 6 d drive the future
Smart Data Module 6 d drive the future
 
Analytics Trends 20145 - Deloitte - us-da-analytics-analytics-trends-2015
Analytics Trends 20145 -  Deloitte - us-da-analytics-analytics-trends-2015Analytics Trends 20145 -  Deloitte - us-da-analytics-analytics-trends-2015
Analytics Trends 20145 - Deloitte - us-da-analytics-analytics-trends-2015
 
2012 iia-predictions-brief-final
2012 iia-predictions-brief-final2012 iia-predictions-brief-final
2012 iia-predictions-brief-final
 
Latest trends in Business Analytics
Latest trends in Business AnalyticsLatest trends in Business Analytics
Latest trends in Business Analytics
 
Data science market insights usa
Data science market insights usaData science market insights usa
Data science market insights usa
 
Cognitive technologies with David Schatsky at Blocks + Bots
Cognitive technologies with David Schatsky at Blocks + BotsCognitive technologies with David Schatsky at Blocks + Bots
Cognitive technologies with David Schatsky at Blocks + Bots
 
Big Data; Big Potential: How to find the talent who can harness its power
Big Data; Big Potential: How to find the talent who can harness its powerBig Data; Big Potential: How to find the talent who can harness its power
Big Data; Big Potential: How to find the talent who can harness its power
 
KM - Cognitive Computing overview by Ken Martin 13Apr2016
KM - Cognitive Computing overview by Ken Martin 13Apr2016KM - Cognitive Computing overview by Ken Martin 13Apr2016
KM - Cognitive Computing overview by Ken Martin 13Apr2016
 
How Do You Improve Data Skills and Data Literacy in your Business?
How Do You Improve Data Skills and Data Literacy in your Business?How Do You Improve Data Skills and Data Literacy in your Business?
How Do You Improve Data Skills and Data Literacy in your Business?
 
Big Data Trends and Challenges Report - Whitepaper
Big Data Trends and Challenges Report - WhitepaperBig Data Trends and Challenges Report - Whitepaper
Big Data Trends and Challenges Report - Whitepaper
 
Overview of mit sloan case study on ge data and analytics initiative titled g...
Overview of mit sloan case study on ge data and analytics initiative titled g...Overview of mit sloan case study on ge data and analytics initiative titled g...
Overview of mit sloan case study on ge data and analytics initiative titled g...
 
Emerging opportunities in the age of data
Emerging opportunities in the age of dataEmerging opportunities in the age of data
Emerging opportunities in the age of data
 
Module 6 The Future of Big and Smart Data- Online
Module 6 The Future of Big and Smart Data- Online Module 6 The Future of Big and Smart Data- Online
Module 6 The Future of Big and Smart Data- Online
 
The Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data ManagementThe Black Box: Interpretability, Reproducibility, and Data Management
The Black Box: Interpretability, Reproducibility, and Data Management
 
To Become a Data-Driven Enterprise, Data Democratization is Essential
To Become a Data-Driven Enterprise, Data Democratization is EssentialTo Become a Data-Driven Enterprise, Data Democratization is Essential
To Become a Data-Driven Enterprise, Data Democratization is Essential
 
Cognitive analytics: What's coming in 2016?
Cognitive analytics: What's coming in 2016?Cognitive analytics: What's coming in 2016?
Cognitive analytics: What's coming in 2016?
 

Similar a Global Data Management: Governance, Security and Usefulness in a Hybrid World

Encrypted Data Management With Deduplication In Cloud...
Encrypted Data Management With Deduplication In Cloud...Encrypted Data Management With Deduplication In Cloud...
Encrypted Data Management With Deduplication In Cloud...
Angie Jorgensen
 
Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paper
John Enoch
 
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Onyebuchi nosiri
 
Big data is a broad term for data sets so large or complex that tr.docx
Big data is a broad term for data sets so large or complex that tr.docxBig data is a broad term for data sets so large or complex that tr.docx
Big data is a broad term for data sets so large or complex that tr.docx
hartrobert670
 
Information economics and big data
Information economics and big dataInformation economics and big data
Information economics and big data
Mark Albala
 
LS_WhitePaper_NextGenAnalyticsMay2016
LS_WhitePaper_NextGenAnalyticsMay2016LS_WhitePaper_NextGenAnalyticsMay2016
LS_WhitePaper_NextGenAnalyticsMay2016
Anjan Roy, PMP
 

Similar a Global Data Management: Governance, Security and Usefulness in a Hybrid World (20)

Big data security
Big data securityBig data security
Big data security
 
Big data security
Big data securityBig data security
Big data security
 
Encrypted Data Management With Deduplication In Cloud...
Encrypted Data Management With Deduplication In Cloud...Encrypted Data Management With Deduplication In Cloud...
Encrypted Data Management With Deduplication In Cloud...
 
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...
 
Top 10 guidelines for deploying modern data architecture for the data driven ...
Top 10 guidelines for deploying modern data architecture for the data driven ...Top 10 guidelines for deploying modern data architecture for the data driven ...
Top 10 guidelines for deploying modern data architecture for the data driven ...
 
Practical analytics john enoch white paper
Practical analytics john enoch white paperPractical analytics john enoch white paper
Practical analytics john enoch white paper
 
Big Data
Big DataBig Data
Big Data
 
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
 
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
Efficient Data Filtering Algorithm for Big Data Technology in Telecommunicati...
 
Big data is a broad term for data sets so large or complex that tr.docx
Big data is a broad term for data sets so large or complex that tr.docxBig data is a broad term for data sets so large or complex that tr.docx
Big data is a broad term for data sets so large or complex that tr.docx
 
Data foundation for analytics excellence
Data foundation for analytics excellenceData foundation for analytics excellence
Data foundation for analytics excellence
 
IABE Big Data information paper - An actuarial perspective
IABE Big Data information paper - An actuarial perspectiveIABE Big Data information paper - An actuarial perspective
IABE Big Data information paper - An actuarial perspective
 
Real callenges in big data security
Real callenges in big data securityReal callenges in big data security
Real callenges in big data security
 
The value of our data
The value of our dataThe value of our data
The value of our data
 
3 guiding priciples to improve data security
3 guiding priciples to improve data security3 guiding priciples to improve data security
3 guiding priciples to improve data security
 
Putting data science into perspective
Putting data science into perspectivePutting data science into perspective
Putting data science into perspective
 
Big Data.pdf
Big Data.pdfBig Data.pdf
Big Data.pdf
 
Information economics and big data
Information economics and big dataInformation economics and big data
Information economics and big data
 
LS_WhitePaper_NextGenAnalyticsMay2016
LS_WhitePaper_NextGenAnalyticsMay2016LS_WhitePaper_NextGenAnalyticsMay2016
LS_WhitePaper_NextGenAnalyticsMay2016
 
Big data Readiness white paper
Big data  Readiness white paperBig data  Readiness white paper
Big data Readiness white paper
 

Más de Neil Raden

Understanding the effects of steroid hormone exposure on direct gene regulati...
Understanding	the effects of steroid hormone exposure on direct gene regulati...Understanding	the effects of steroid hormone exposure on direct gene regulati...
Understanding the effects of steroid hormone exposure on direct gene regulati...
Neil Raden
 

Más de Neil Raden (9)

Kagan our constitutional crisis is already here
Kagan our constitutional crisis is already here Kagan our constitutional crisis is already here
Kagan our constitutional crisis is already here
 
Data lakehouse fallacies
 Data lakehouse fallacies Data lakehouse fallacies
Data lakehouse fallacies
 
Ethical use of ai for actuaries
Ethical use of ai for actuariesEthical use of ai for actuaries
Ethical use of ai for actuaries
 
Precision medicine and AI: problems ahead
Precision medicine and AI: problems aheadPrecision medicine and AI: problems ahead
Precision medicine and AI: problems ahead
 
Persistence of memory: In-memory Is Not Often the Answer
Persistence of memory: In-memory Is Not Often the AnswerPersistence of memory: In-memory Is Not Often the Answer
Persistence of memory: In-memory Is Not Often the Answer
 
Relational Technologies Under Siege: Will Handsome Newcomers Displace the St...
Relational Technologies Under Siege:  Will Handsome Newcomers Displace the St...Relational Technologies Under Siege:  Will Handsome Newcomers Displace the St...
Relational Technologies Under Siege: Will Handsome Newcomers Displace the St...
 
Understanding the effects of steroid hormone exposure on direct gene regulati...
Understanding	the effects of steroid hormone exposure on direct gene regulati...Understanding	the effects of steroid hormone exposure on direct gene regulati...
Understanding the effects of steroid hormone exposure on direct gene regulati...
 
Storytelling Drives Usefulness in Business Intelligence
Storytelling Drives Usefulness in Business IntelligenceStorytelling Drives Usefulness in Business Intelligence
Storytelling Drives Usefulness in Business Intelligence
 
The Case for Business Modeling
The Case for Business ModelingThe Case for Business Modeling
The Case for Business Modeling
 

Último

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 

Último (20)

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 

Global Data Management: Governance, Security and Usefulness in a Hybrid World

  • 1. 0 GLOBAL DATA MANAGEMENT: GOVERNANCE, SECURITY AND USEFULNESS IN A HYBRID WORLD Sponsored by By Neil Raden Hired Brains Research May, 2018
  • 2. 1 TABLE OF CONTENTS GOAL OF GLOBAL DATA MANAGEMENT 1 A SHORT HISTORY OF SECURITY 1 THE SITUATION TODAY 2 “ALIEN” DISTANT DATA 3 MULTI-JURISDICTIONAL ISSUES 4 RISKS AND REWARDS OF A TRADE-OFF GOVERNANCE POLICY 4 THE FABRIC 4 THE GLOBAL DATA MANAGEMENT (GDM) PROGRAM 5 METADATA 5 LINEAGE 6 GOVERNANCE 6 SECURITY 7 LIFECYCLE 7 WHAT A GDM PERSON DOES (PERSONALLY AND THROUGH THE TEAM) 8 OTHER KEY ROLES 9 CONCLUSION 9 ABOUT THE AUTHOR 10
  • 3. 1 GOAL OF GLOBAL DATA MANAGEMENT There is no question that there is a greater, aching desire by organizations to capture data and draw insight from it for a multitude of improvements and innovations in operations, customer service, and evenin completely new businesses1. That effort has become more complicated with the emergence of hybrid, distributed computing and data architectures (big data, cloud variants, multi-clouds and IoT). To succeed there is a need to address a broader data management philosophy incorporating collaboration, standardization, reuse, retention (of data and models) and especially, security and governance. To illustrate this need, a short history of enterprise security and governance will help. A SHORT HISTORY OF SECURITY Before the cloud, before big data, and even into the present, security was implemented one application system at a time. If you were in the finance department, you may be granted access to post manual ledger entries through the accounting system. If you were in Human Resources, you may be granted access to view and/or modify an employee’s records through an HR system. These grants were either embedded in the application logic based on your role, or applied externally. But the grants and restrictions were all administered through separate application systems and their security scheme was not transferable from one application to another. As a result, the overall picture was fractured, inconsistent and difficult to administer. It was developed from a time when people in organizations had tightly constrained roles. Today, employees are expected to be agile, adaptable and able to handle multiple roles in the organization simultaneously. Again, before the cloud, before big data, before data science, analysts did devise quantitative methods. In the early days of e-commerce for example, websites already employed recommendation engines, dynamic decision making based on scoring and decision trees for next-best-offer or propensity models. They did this by getting access, usually one data source at a time, from IT. Data warehouses both aided and hindered their work: aided by integrating data from multiple sources and collapsing the security model to just one source, hindered by only providing aggregated data and a rigid design that couldn’t adapt quickly (in fairness, any good data warehouse designer could enhance a schema, but provisioning new data was a slow process). The only thing that prevented the data warehouse from ingesting all of the data, internal and external, that analysts craved was scarcity. The data warehouse could only scale in terms of volume, throughput and demanding use at extreme cost. 1 We use the term“businesses” loosely as these innovationsalso apply to government, non-profits, charities and NGO’s, and any type of organization
  • 4. 2 What organizations crave seems to shift over decades. Fifty years ago, computers were employed for record-keeping. Reporting from these systems was limited to copious printing of records. The demand for actual reporting generated long backlogs of systems analysts and programmers creating massive hairball of “interfaces” with no management. Early Business Intelligence (BI) emerged that shifted the burden to analysts, freeing IT to focus on new generations of application systems. Data access and security shifted to the data warehouse. About ten years ago, Tom Davenport published his landmark book, “Competing on Analytics2” which put the term “analytics” in play. Suddenly, analytics rose to the top of enterprise computing. Predictive analytics, data science, machine learning and Artificial Intelligence became top of mind, but they needed a place to live. The process of analyzing data in organizations has for decades applied tools designed for the individual. Spreadsheets, for example, proved to be the de-facto modeling and reporting tool for thirty years or more, but they never adequately provided services of security, governance, efficient creation and maintenance of metadata. Other tools for analysis and reporting, such as BI, provided their own solutions for metadata and collaboration, version control, etc., but they were point solutions, only useful for the product itself (Unfortunately, the same can be said for the some of the newer data science workbench products.) When Hadoop burst on the scene ten years ago, it too shared the many of the gaps. That’s not an indictment of DIY (do-it-yourself) analytics or wider analytic practices based on self-service. Rather, it’s a cautionary tale that in an enterprise, the most well-meaning and well-crafted analysis by individual contributors will always bog down with redundancy without adequate Data Management THE SITUATION TODAY With Global Data Management methodology and tools, all of your data can be accessed and used no matter where it is or where it is from: on-premises, private cloud, public cloud(s), hybrid cloud, open source, third-party data and any combination of the these, with security, privacy and governance applied as if they were a single entity. Ingenious software products and the economics of computing make it economical to do this. Not free, but feasible. Large data platforms, such as Hadoop, by their nature contain many different types of data from many different sources. In past decades, IT organizations built business- oriented data models and massaged an often unruly collection of data in data warehouses (frankly, an approach that still has merit), but for today’s technology, 2 Davenport, T. H., & Harris, J. G. (2007). Competing on analytics: The new science of winning. Boston, Mass: Harvard Business School Press
  • 5. 3 that approach is too slow and too limiting for the hastening digital transformation facing every industry. While corporate IT designs for Security and Governance were conceived in an environment of highly controlled data management and computing, for both operational and analytical processes, those designs are counterproductive in a hybrid, distributed, complex and increasingly streaming near-real-time world. Definitions of security and governance in this environment are quite different. For example:  Old (and still prevalent) meaning of Security: To protect against loss, malicious, innocent and/or inadvertent access to or distribution of data that can cause damage. To isolate various organizational entities from each other. To throttle activity by managing from scarcity.  New meaning of security: Securing that useful and important analysis will not be missed as a result of too restrictive and or misappropriated restrictions, usually as a result of a lack of shared understanding between data stewards and, for example, data scientists  Old meaning of Governance: Is a framework that provides a formal structure for organizations to produce measurable results toward achieving their strategies and ensures that IT investments support business objectives. The most commonly used frameworks are COBIT, ITIL, COSO, CMMI and FAIR.  New meaning for Governance: Governance should be driven by a simple concept (though hard to practice): trade-offs. Giventhe complexity of the computing/data environment today, governance should aim toward a shared understanding of risk-reward for what’s needed and evaluated and managed across the enterprise by intelligent agents that augment the work of data professionals and analytics practitioners. For example, it may be in the organization’s interest to relax some access and use rules derived from simple assumptions to achieve more productive analytics from data scientists. Trade- offs are the opposite of rigidity. “ALIEN” DISTANT DATA The major issue is that enterprise data no longer exists solely in a data center or even a single cloud (or more than one, or combinations of both). Edge analytics for IoT, for example capture, digest, curate and evenpull data from other, different application platforms and live connections to partners, previously a snail-like process using obsolete processes like EDI or evenbatch ETL. Edge computing can be thought of as decentralized from on-premises networks, cellular networks, data center networks, or the cloud. All of these factors pose a risk of data originating in far-flung environments, where the data structures and semantics are not well understood or
  • 6. 4 documented3. The risk of easily moving data from place to place or the complexity of moving the logic to the data while everything is in motion is too extreme for manual methods. MULTI-JURISDICTIONAL ISSUES Currently, organizations, at best, have governance programs for data and use in their own jurisdictions. But even those organizations that primarily operate in a single jurisdiction may have exposure to regulatory requirements in many others. The 2018 phase-in of the European Union GDPR (General Data Protection Regulation) is one such instance. The solution is a Global Data Management scheme that operates as a single program in in all jurisdictions. RISKS AND REWARDS OF A TRADE-OFF GOVERNANCE POLICY The cadence of technology innovation clearly surpasses most organization’s ability to implement each new or improved technique before the next one arrives. Governance and data management can never be a pure, complete process. It requires trade-offs; picking the issues that make the most sense, have the greatest centrality to the organization’s strategy (ies) and provide both the most protection against danger, as well as insuring the organization can be as effective as possible. Governance and data management tools today are not designed for a trade-off approach. They are layered with rules and restrictions with a “better safe than sorry” mentality. Governance has to be a continuing process between IT and the rest of the organization. Modern governance approaches cannot work with the “IT has the last word” in any discussion. It only leads to dysfunction and missed opportunities. It can’t be done with tools and methodologies of the past decades. THE FABRIC The best way to describe the solution is as a data management “fabric” that metaphorically drapes over all of these environments and provides the management and governance services needed. A short description of its functions is: The Fabric drapes over all the data resources. Is a completely different approach to enterprise data management. It allows an organization to finally derive more value from their data management initiatives than the cost of implementing them. Areas of the organization that previously were denied the insight that could have been provided by data the organization captured (somewhere) can leverage the latent value in distributed data stores, enabled by the capabilities the GDM provides. You can also think of the fabric as an underlying mechanism that orchestrates all of the functions of the GDM and allows for plugging in new capabilities in an open and seamless fashion. 3 A trucking company may have more than twentyseparate telematics providersin the cab, each with its own protocols for applications that require the truckingcompany to absorb and reactto in near-real-time
  • 7. 5 THE GLOBAL DATA MANAGEMENT (GDM) PROGRAM Metadata, lineage, governance, security, lifecycle - are the components of the GDM. But just as importantly, are the program, the people and skills. The first step is to have an actual implementation of the “fabric.” Hortonworks provides this through its DataPlane service. The common foundation includes the ability to manage and govern data across distributed data lakes. METADATA Has a wide variety of definitions and sub-classes, but in the need for GDM, it powers both operation and understanding. Accelerating the time to value of your data investments, metadata democratizes accessibility and improves the understanding of data and processes across the organization. It rapidly improves the productivity of analysts and data scientists. While operational metadata is the bedrock for technical and operational aspects of uptime, performance, cost, etc., it is fundamental in lifting the productivity of analysts by addressing these six questions: What does the data mean (semantic)? Where does it come from (lineage)? Can I trust it (trust metrics)? Does its meaning vary by context (interpretation)? How do I find it? Who do I ask (Data stewards, SME’s)? Metadata is the key to governance and use. Metadata has to be developed for both consistency of use and understanding as well as flexibility as the organizations
  • 8. 6 changes. The scope of the metadata catalogs is beyond the capabilities of data stewards to develop manually. The GDM must have intelligent software to: - Capture and catalog metadata for new or modified data assets - Allow for data stewards to examine the machine-generated metadata and make adjustments as necessary - Manage metadata repositories across instances to ensure it is consistent LINEAGE Where the data originated and how it has been manipulated; trust metrics (crowd sourced). A lot of the analytical data wrangling is still a manual process. One drawback is the issue of keeping track of provenance, i.e., what is the source of the data and whether it is still current. Data is rarely gathered just once. It can be reused for multiple versions of the analysis, or evencontinuously updated/refreshed as models are refreshed for continuous improvement. In addition, outcomes often need to be tracked to the original data sources for validation. GOVERNANCE Taking security and access to a new level. Security, grants and restrictions, are driven by context, not location. For example, as an analyst, you manage a corpus of work - data, models, presentations, notebooks. Access to data you need is granted based on the components you use, no matter where in the world they are. Time-consuming requests to IT or data stewards are unnecessary as access is driven by intelligent agents that understand your role. The Hortonworks Data Steward Studio, which operates which the DataPlane Service, provides businesses the capability to develop trust in their data and comply with
  • 9. 7 regulations by understanding data provenance, origin, lineage and impact. The GDM by its nature is too complicated one or more data stewards to manage with current manual methods. The DSS provides then with the tools to secure, govern and provide the data for todays distributed, hybrid world. A popular misconception about data scientists is that all of their work is one-off and ad hoc, grabbing data and massaging it until it yields answers. In fact, their work is much more formal than that. They have to assign business friendly and intuitive names to data files that they create or download and then organize those files into directories, according to a rational naming convention. When they refresh those files, they must version them and keep track of their differences. This is a complicated process. Data doesn’t always reside in logical files. For, example, clinical and scientific lab equipment can generate hundreds or thousands of data files that scientists must name and organize before running computational analyses on them. SECURITY Previously, data management was highly driven by “silos,” collections of domains in locations. Schemes for governance were highly localized. Access to a data warehouse could be broad for an analyst, but deeper analysis requiring access to other data sources were dependent on data management in place at those sources. Where most data warehouses disappointed practitioners of advanced modeling and analysis (data scientists, for example) such as machine learning models was having access to raw data not otherwise needed in the data warehouse, including detail from source systems, sensor data streaming from the edge, and all manner of external data sources. Existing data management and security programs typically allow access to data sources used by an analyst and cohort of others on a “normal” basis, but requests beyond that range fire an alert. The paradox is, a productive analyst should spend more time working “out-of-the-box” than in it. Fractured data management and security programs thwart their efforts. Your organization is likely composed of a mosaic of data stores (or will be soon): Multi-cloud, IoT, data lakes, data warehouses, on-prem, hybrid cloud, at-rest and streaming. At-rest data can be catalogued and even updated/refreshed according to a governance scheme, but streaming data presents a more challenging problem, not one that can be solved manually as the flow can change without notice. GDM should provide tools to deal with it, but governance policy is the map, software that implements the policy is the journey. LIFECYCLE Everything discussed so far only addresses a scheme of security and governance in place. A GDM must be able to perform as a lifecycle process. That means putting in place a program and architecture that is capable of dynamically adjusting to changing to business realities as well as the rapid cadence of new technology: Integration of
  • 10. 8 new data and features, adjusting governance policy and administration to changing conditions and doing all of that on a consistent set of tools and metadata. A robust GDM program cannot be implemented as a “project,” it continues through a lifecycle. Hortonworks provides the tools to maintain your GDM through its Data Lifecycle Manager. WHAT A GDM PERSON DOES (PERSONALLY AND THROUGH THE TEAM) One thing to keep in mind is that the fortunes of an organizations do not change by implementing technology. That’s the first step. The leader of the GDM initiative in the organization (often given the title Chief Data Officer, or CDO) needs, above all, to inspire confidence among the various stakeholders in the organization. Above and beyond any particular previous skill and experience in data management, it is paramount the person in this role has the vision to motivate and encourage the organization. This requires someone with the gravitas and communication and political skill to navigate the currents of diverse backgrounds and requirements. The GDM role is the keeper of the strategy to ensure it doesn’t flag as the process is not without challenges. This encompasses all aspects of GDM -- architecture, data catalogs, quality, lineage and metadata. To establish policies, measures, standards and requirements that fit the spirit of the initiative, must dismantle obsolete security and governance methodologies that degrade the vision. Driving the selection process of the components ensures the program can scale economically from both implementation and TCO perspectives.
  • 11. 9 The GDM leader owns the initiative, no matter how influential various others are in the organization. The breaking down of siloes, fiefdoms and data czars is key to delivering data democratization in support of all services, analytics and data products. Inevitable change management requires careful and thorough communication to business owners and their designated data managers and stewards. The GDM Is the point person with the C-Suite on all matters relating to data for compliance, privacy and governance, and has responsibility for the initial creation of control apparatus to ensure integrity in the program. At some point, it is wise for the GDM to delegate these roles and move on as the project becomes a program. OTHER KEY ROLES There are four key roles that you will need to establish and nurture. Many people in your organization can step up to these roles with training, but will need to re-orient their practices for a global, elastic governed process: - Data scientists and data analysts to understand cross source lineage, apply models across types of data and gain access to data to gain deeper insight into both pre and post transaction analysis - Data stewards to investigate lineage, improve quality and eliminate redundancies across data assets. - Data engineers to move, backup and restore data assets across environments and sources, while implementing an efficient data storage tiering policy. - Data architects to define security and governance policies that are automatically enforced to meet compliance requirements CONCLUSION No organization today is immune from the push for some form of digital transformation. The late Peter Drucker famously said, “The computer actually may have aggravated management's degenerative tendency to focus inward…4” That was almost twenty years ago and is almost certainly not true today. However, it illustrates how information systems have changed, and how quickly. It is no longer solely sufficient to thresh through your internal record-keeping systems for insight, and it is very likely that you already do your analytics in multiple locations, multiple platforms, multiple clusters and with very different kinds of data. In addition, more of your staff are engaged in analytics as a result of better software tools and more will continue to be. It is time to jettison your old piecemeal approach to data 4 Peter F. Drucker (2009). “The Effective Executive: The Definitive Guide to Getting the Right Things Done”, p.16, Harper Collins.
  • 12. 10 management from the mindset of twenty years ago. Global data management is not optional. ABOUT THE AUTHOR Neil Raden, based in Santa Fe, NM, is an active industry analyst, consultant and widely published author and speaker and also the founder of Hired Brains Research. Hired Brains provides thought leadership, context and advisory consulting and implementation services in Information Management, Analytics/ Data Science, Machine Learning/AI and IoT for clients worldwide. Hired Brains also provides consulting, market research, product marketing and advisory services to the software industry. Neil is the co-author of Smart (Enough) Systems: How to Deliver Competitive Advantage by Automating Hidden Decisions, Prentice-Hall. He welcomes your comments at nraden@hiredbrains.com.