The Briefing Room with Eric Kavanagh and the PSI-KORS Institute
Live Webcast Nov. 12, 2013
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?AT=pb&SP=EC&rID=7727087&rKey=66b1fa7d82868199
Let's face it -- most enterprise information systems are a mess. That's often due to grunt work which was overlooked months or years ago and had nothing to do with you, except that you inherited it. Some mistakes can be swept under the rug for a while, but sooner or later, garbage in results in very expensive garbage out.
Register for this episode of the Briefing Room to hear Senior Analyst Eric Kavanagh outline a roadmap from the past into the possible futures of the information economy. He'll be briefed by Dr. Geoffrey Malafsky, Founder and Data Scientist for the PSI-KORS Institute, a new organization focused on data reconciliation. Malafsky will share his institute's methodology and explain how the process of doing the dirty work can yield tremendous benefits.
Visit InsideAnalysis.com for more information
4. Mission
! Reveal the essential characteristics of enterprise software,
good and bad
! Provide a forum for detailed analysis of today s innovative
technologies
! Give vendors a chance to explain their product to savvy
analysts
! Allow audience members to pose serious questions... and get
answers!
Twitter Tag: #briefr
The Briefing Room
7. § Current
data
is
disjointed
and
of
low
quality
§ Variable
use
and
meaning
among
systems
even
for
“same”
data
elements
§ Undocumented
defini=ons
and
data
mgmt
processes
§ Errors
in
data
systems
§ Disagreement
among
data
systems
§ Lack
of
exis=ng
descrip=ons
for
key
readiness
use
cases
§ Legacy
data
systems
have
failed
to
overcome
these
problems
despite
several
years
of
new
marts/houses/
brokers/IPTs/applica=ons
8. “Many CIOs believe data is inexpensive because storage has become
inexpensive. But data is inherently messy – it can be wrong, it can be
duplicative, and it can be irrelevant – which means it requires handling,
which is where the real expenses come in. ‘The cost of more data is the
application and the computing power and the processes to reconcile all
these things’,”
"While there are a myriad of analytical tools that can be leveraged, a
recent study indicated that more than 70% of CMOs feel they are
underprepared to manage the explosion of data and ‘lack true insight.’ “
1. Wall Street Journal, CIO‘s Big Problem with Big Data, 2012-08-02
2. Forbes, The CEO/CMO Dilemma: So Much Data, So Little Impact, 2012-07-18
8
9. § Suffix
in
source
A,
prefix
in
B,
neither
in
C
for
same
(part
number,
=tle,
…)?
§ Conflict
syntac=cally
(simplest
case)
and
seman=cally
(most
difficult)
§ Other
tools
&
methods
never
solve
this
because
they
deal
with
the
obstacles
independently
or
not
at
all:
Data
values
out-‐of-‐sync
with
metadata,
data
models
Different
Meanings
(Legal
and
Business
Ac=vi=es)
NKY
HomeSeekers
1.
2.
3.
4.
5.
Texas
Create
table
–
=tle
aligned
to
business
=
Garage
Create
vocabulary:
spaces.descrip=on,
spaces.na=onal,
spaces.state,
.
Define
ETL
logic
Merge
in
warehouse
and
process
in
virtualiza=on
layer
Change
as
needed
9
Copyright
Phasic
Systems
Inc
2013
10. § Data
Ra=onaliza=on
is
the
process
of
building
and
managing
a
con=nuously
adap=ve
data
environment
that
fuels
current
and
future
business
needs
for
decision
making
and
system
opera=ons
§ It
ensures
data
(i.e.
not
just
metadata)
is
as
accurate,
meaningful,
and
useful
as
possible
while
con=nuously
adjus=ng
to
improve
and
add
capability
§ It
provides
collabora=ve
management
of
data
assets,
the
designs
governing
who,
why,
and
how
of
data
,
and
the
where,
when,
how
of
data
use
in
opera=onal
systems
§ It
solves
the
great
challenge
of
mapping
all
source
values
to
each
target
along
the
en=re
complex
paths
of
enterprise
data
use
§ Consolidated
values
when
possible
with
con=nuous
improvement
§ Simplified
and
adap=ve
mapping
with
Corporate
NoSQL
10
11. Design
Ra-onaliza-on
Issues
• Mul=ple
data
models
• Conflic=ng
defini=ons
• Similar,
supposedly
similar,
opera=onally
dis=nct
values
• Unknown
business
logic
• Mul=ple
ETL
mappings
Design
Ra-onaliza-on
•
•
•
•
•
Consolidated,
adap=ve
data
models
Standardized
defini=ons
Synchronized
dis=nct
opera=onal
values
Managed
business
logic
Coordinated
ETL
mappings
System
Ra-onaliza-on
Issues
•
•
•
•
•
Mul=ple
database
systems
Conflic=ng
formats
Redundant
storage
Unsynchronized
values
Mul=ple
integra=on
points
System
Ra-onaliza-on
•
•
•
•
•
Consolidated,
adap=ve
systems
Common,
interoperable
formats
Common
storage
Synchronized
interfaces
Coordinated
integra=on
11
Copyright
Phasic
Systems
Inc
2013
13. § Example
from
DARPA
Evidence
Extrac=on
&
Link
Discovery
§ Today’s
Situa=on:
~10k
messages/day
from
mul=ple
sources
read
by
mul=ple
analysts
and
analyzed
in
mul=ple
manual
non-‐integrated
tools
§ Similar
to
Social
Network
Analysis
13
14. Complicated
Mixture
of
Commercial,
Custom,
Legacy,
Services
Applica=ons,
Data
Stores
14
Copyright
Phasic
Systems
Inc
2013
20. § DOD
CIO
§ Adap=vely
blend
financial
and
program
data
from
mul=ple
sources
with
unclear,
undocumented
alignment
and
integra=on
logic
(i.e.
this
is
an
intelligence
challenge)
into
BI
tools
(QlikView,
Tableau,
PentaHo,
Excel
Web
Apps-‐Sharepoint)
§ Export
Development
Canada
§ Ra=onalize
core
data
distributed
and
undocumented
to
feed
cross-‐enterprise
governance
and
develop
Enterprise
Data
Model
with
seman=cally
adjudicated
canonical
en==es
Copyright
Phasic
Systems
Inc
2013
20
21. § Challenge:
Complicated
environment
with
conflic=ng
data
values,
standards,
business
uses
cases,
and
lack
of
documenta=on.
Data
owned
by
4
major
organiza=on,
in
mul=ple
Warehouses
and
data
stores,
redundant
non-‐reconciled
sets
of
data
§ Requirement:
Integrated,
common,
accurate
data
to
enable
new
Integrated
workforce
planning,
training,
management
applica=on
(“Sailor
of
the
Future”)
for
1
million
people
§ Prior
Ac-vi-es:
10+
years
of
system
integra=on,
data
warehouse,
data
governance
efforts
à
no
improvement,
poor
coordina=on
across
organiza=ons
and
systems
21
22. § Yet,
there
were
problems
with
the
most
basic
data
fields,
which
for
the
Navy,
include
things
like
§ billet
(effec=vely
a
job
but
also
includes
other
characteris=cs),
§ rank
(similar
to
seniority
but
with
formal
rules
that
change
over
=me),
§ ra=ng
(similar
to
voca=onal
ability
but
also
with
changing
rules),
§ and
even
the
primary
iden=fier
of
a
person
the
Social
Security
Number
(SSN).
22
25. • Promulgate
key
technologies
to
help
field
overcome
major
obstacles
• Iden=fy
cause
and
existence
of
seman=c
conflicts
• Determine
op=ons
• Promote
enterprise
decision
making
on
solu=on
• Implement
solu=on
into
opera=onal
data
• Visible
direct
line
from
governance
to
data
modeling
to
integra=on
to
database
engineering
to
analysis
and
back
again
• Rapid
cycle
=me:
iden=fy,
assess,
decide,
execute
con=nuously
in
natural
organiza=onal
=meline
(days/weeks)
• Community
version
DataStar
for
non-‐commercial
use
• Collabora=ve
community
communica=on
and
design
of
common,
seman=cally
clear
Corporate
NoSQL
models