1. Trends in Data Modeling
Presented by Steven MacLauchlan and Peter Aiken, Ph.D.
Click to Add Presented By Text
2. Welcome: Trends in Data Modeling
2
Copyright 2014 by Data Blueprint
Businesses cannot compete without data. Every organization produces and
consumes it. Data trends are hitting the mainstream and businesses are adopting
buzzwords such as Big data, Data Vault, Data Scientist, etc., to seek solutions to
their fundamental data issues. Few realize that the importance of any solution,
regardless of platform or technology relies on the data model supporting it. Data
modeling is not an optional task for an organization’s data remediation effort.
Instead, it is a vital activity that supports the solution driving your business. !
This webinar will address emerging trends around data model application
technology, as well as trends around the practice of data modeling itself. We will
discuss abstract models and entity frameworks, as well as the general shift from
data modeling being segmented to becoming more integrated with business
practices.!
Takeaways:!
• NoSQL, data vault, etc., different and when should I apply them?!
• How Data Modeling relates to business process!
• Application development (data first, code first, object first?)
Date: October 14, 2014
Time: 2:00 PM ET
Presented by: Peter Aiken, PhD/
Steven MacLauchlan
3. Get Social With Us!
Like Us on Facebook
www.facebook.com/
datablueprint
Post questions and comments
Find industry news, insightful
content
and event updates.
Join the Group
Data Management &
Business Intelligence
Ask questions, gain insights
and collaborate with fellow
data management
professionals
3
Copyright 2014 by Data Blueprint
Live Twitter Feed
Join the conversation!
Follow us:
@datablueprint
@paiken
@SJMacLauchlan
Ask questions and submit your comments: #dataed
4. MONETIZING
DATA MANAGEMENT
Unlocking the Value in Your Organization’s
Most Important Asset.
PETER AIKEN WITH JUANITA BILLINGS
FOREWORD BY JOHN BOTTEGA
Peter Aiken, Ph.D.
4
Copyright 2014 by Data Blueprint
• 30+ years data management
• Multiple international awards &
recognition
• Founder, Data Blueprint (datablueprint.com)
• Associate Professor of IS, VCU (vcu.edu)
• (Past) President, DAMA Int. (dama.org)
• 9 books and dozens of articles
• Experienced w/ 500+ data
management practices in 20
countries
• Multi-year immersions with
organizations as diverse as the
US DoD, Nokia, Deutsche Bank,
Wells Fargo, Walmart, and the
Commonwealth of Virginia
The Case for the
Chief Data Officer
Recasting the C-Suite to Leverage
Your Most Valuable Asset
Peter Aiken and
Michael Gorman
5. Steven MacLauchlan
• 10 years of experience in Application
Development and Data Modeling with a focus
on Healthcare solutions.
• Head of Marketing, and PR. Helped revamp
the game playtesting process from the
ground up with a data-centric approach
which improved confidence in the final rules
• Delivers tailored data management solutions
that provide focus on data’s business value
while enhancing clients’ overall capability to
manage data
• Certified Data Management Professional
(CDMP)
• Computer Science degree from Virginia
Commonwealth University
• Most recent focus: Understanding emerging
data modeling trends and how these can
best be leveraged for the Enterprise.
5
Copyright 2014 by Data Blueprint
6. At Data Blueprint we believe...
• Today, data is the most powerful, yet underutilized and poorly
managed organizational asset
• Data is your
Data
Financial
Real
– Sole
Assets
Assets
Estate Assets
– Non-depletable
– Non-degrading
– Durable
– Strategic
• Asset
– Data is the new oil!
– Data is the new (s)oil!
– Data is the new bacon!
• Our mission is to unlock business value by
– Strengthening your data management capabilities
– Providing tailored solutions, and
– Building lasting partnerships
Inventory
Assets
Non-depletable
Available for
subsequent use
Can be
used up
Can be
used up
Non-degrading
√ √ Can degrade
over time
Can degrade
over time
Durable Non-taxed √ √
Strategic
Asset √ √ √ √
6
Copyright 2014 by Data Blueprint
7. Trends in Data Modeling
7
Copyright 2014 by Data Blueprint
• Business to Data: the Relationship
• What is a Data Model?
• Conceptual, Logical, Physical
• What issues can poor data modeling
introduce?
• Different Models, Different Uses
• 3NF, Star Schema, Data Vault
• Key-Value/Document
• Other NoSQL Technologies
• How is it changing
• Patterns and Reuse
• Abstraction for application
• Data Sharing World (The API’s)
• Scaling Out not up
8. What is a Data Model*?
*According to ANSI.
8
Copyright 2014 by Data Blueprint
• A data model organizes data
elements and standardizes how the
data elements relate to one another.
• In “Data Modeling Made Simple” by
Steve Hoberman, he says: "A data
model is a wayfinding tool for both
business and IT professionals, which
uses a set of symbols and text to
precisely explain a subset of real
information to improve
communication within the
organization and thereby lead to a
more flexible and stable application
environment."
9. How are Data Models Expressed as Architectures?
9
Copyright 2014 by Data Blueprint
• Attributes are organized into entities/objects
– Attributes are characteristics of "things"
– Entitles/objects are "things" whose information is
managed in support of strategy
– Examples
• Entities/objects are organized into models
– Combinations of attributes and entities are structured
to represent information requirements
– Poorly structured data, constrains organizational
information delivery capabilities
– Examples
• Models are organized into architectures
– When building new systems, architectures are used
to plan development
– More often, data managers do not know what existing
architectures are and - therefore - cannot make use of
them in support of strategy implementation
– Why no examples?
More Granular
More Abstract
10. The Conceptual Data Model
10
• Represents entities and relationships
• Should Identify the domain and scope of data
• Should be easily understood by business users in order to
communicate core data concepts, and drive application
requirements
Copyright 2014 by Data Blueprint
Example:
We need to model customer
address data. A customer may have
many addresses, and many
customers may share one address.
“many to many”
12. • At least one but possibly more system USERS enter the DISPOSITION facts into the system.
• An ADMISSION is associated with one and only one DISCHARGE.
• An ADMISSION is associated with zero or more FACILITIES.
• An ADMISSION is associated with zero or more PROVIDERS.
• An ADMISSION is associated with one or more ENCOUNTERS.
• An ENCOUNTER may be recorded by a system USER.
• An ENCOUNTER may be associated with a PROVIDER.
• An ENCOUNTER may be associated with one or more DIAGNOSES.
Data map of
DISPOSITION
history related to one or more inpatient episodes
DIAGNOSIS! Contains the International Disease Classification
(IDC) of code representation and/or description of a
patient's health related to an inpatient code
12
ADMISSION!Contains information about patient admission
DISCHARGE!A table of codes describing disposition types
available for an inpatient at a FACILITY
ENCOUNTER! Tracking information related to inpatient
Copyright 2014 by Data Blueprint
episodes
FACILITY! File containing a list of all facilities in regional health
care system
PROVIDER! Full name of a member of the FACILITY team
providing services to the patient
USER! Any user with access to create, read, update, and
delete DISPOSITION data
13. A sample data entity and associated metadata
• A purpose statement describing why the organization is maintaining information about this
business concept;
• Sources of information about it;
• A partial list of the attributes or characteristics of the entity; and
• Associations with other data items; this one is read as "One room contains zero or many beds."
13
Copyright 2014 by Data Blueprint
Entity: BED
Data Asset Type: Principal Data Entity
Purpose: This is a substructure within the Room
substructure of the Facility Location. It contains
information about beds within rooms.
Source: Maintenance Manual for File and Table
Data (Software Version 3.0, Release 3.1)
Attributes: Bed.Description
Bed.Status
Bed.Sex.To.Be.Assigned
Bed.Reserve.Reason
Associations: >0-+ Room
Status: Validated
14. The Logical Data Model
14
• Should represent the Conceptual Data model more
thoroughly, but be otherwise very similar
• Will include attributes, names, relationships, and other
metadata
• Will be developed using Data Modeling notation (ex: UML)
Copyright 2014 by Data Blueprint
15. The Physical Data Model
15
• Describes the specific database implementation of the
data
• Attributes will be named according to naming conventions
• Displays data types, accurate table names, Key
information, etc
Copyright 2014 by Data Blueprint
16. Consequences of Poor Data Modeling
• Poor data modeling up front can cause Data Quality issues
“downstream”
• If the model isn’t a true representation of the business concepts, this will
impact confidence in the data
• Potential for poor DB/Application performance for reads/writes.
Example: Over-normalization
• Lack of flexibility can cause difficulty aligning with evolving business
requirements
• Difficulty integrating data in the future
• Constrains business agility
• Creates operational inefficiencies
• Limits workflow transparency
• Inhibit business insights and
innovation
• Proliferates system work-arounds,
including shadow systems developed by end users
16
Copyright 2014 by Data Blueprint
17. Trends in Data Modeling
17
Copyright 2014 by Data Blueprint
*
• Business to Data: the Relationship
• What is a Data Model?
• Conceptual, Logical, Physical
• What issues can poor data modeling
introduce?
• Different Models, Different Uses
• 3NF, Star Schema, Data Vault
• Key-Value/Document
• Other NoSQL Technologies
• How is it changing
• Patterns and Reuse
• Abstraction for application
• Data Sharing World (The API’s)
• Scaling Out not up
18. Normalization Rules Overview
18
Copyright 2014 by Data Blueprint
• 1st Normal Form - no repeating non-key
attributes for a given primary key
• 2nd Normal Form - no non-key
attributes that depend on only a
portion of the primary key
• 3rd Normal Form - no attributes
depend on something other than the
primary key
• 4th Normal Form - attributes depend
on not only key but the value of the
key
• 5th Normal Form - an entity is in 5NBF
if its dependencies on occurrences of
the same entity of entity type have
been moved into a structured entity
19. CM2 Component Evolution is technology derived but technology independent
19
As-is To-be
Copyright 2014 by Data Blueprint
Technology
Independent/
Logical
Technology
Dependent/
Physical
abstraction
20. Data Reengineering for More Shareable Data
20
As-is To-be
Copyright 2014 by Data Blueprint
Technology
Independent/
Logical
Technology
Dependent/
Physical
abstraction
Other logical as-is
data architecture
components
21. Information Architecture Component Evolution Framework
Conceptual Logical Physical
Every change can
be mapped to a
transformation in
this framework!
Goal
Validated
Not Validated
21
Copyright 2014 by Data Blueprint
22. Third Normal Form
22
• Each attribute in the relationship is a fact about a key
• Highly normalized structure
Copyright 2014 by Data Blueprint
• Use Cases:
– Transactional Systems.
– Operational Data Stores.
!
!
23. Third Normal Form: Pros and Cons
23
• Pros
– Easily understood by business and end users
– Reduced data redundancy
– Enforced referential integrity
– Indexed attributes/flexible querying
• Cons
– Joins can be expensive
– Does not scale
Copyright 2014 by Data Blueprint
Neo4j.com
24. Star Schema
24
• Comprised of “fact tables” that contain quantitative data, and any number of
adjoining “dimension” tables
• Optimized for business reporting
Copyright 2014 by Data Blueprint
!
!
• Use Cases:
– OLAP (Online Analytic Processing)
– BI
!
!
Wikipedia
25. Star Schema
Pros and Cons
25
Copyright 2014 by Data Blueprint
• Pros
– Simple Design
– Fast Queries
– Most major DBMS are
optimized for Star
Schema Designs
• Cons
– Questions must be built
into the design
– Data marts are often
centralized on one fact
table
26. Data Vault
• Designed to facilitate long-term historical storage, focusing on ease
of implementation
• Retains data lineage information (source/date)
• “All the data, all the time”. Hybrid approach of Inmon and Kimball.
• Comprised of Hubs (which contain a list of business keys that do
not change often), Links (Associations/transactions between hubs),
and Satellites (descriptive attributes associated with hubs and links)
26
Copyright 2014 by Data Blueprint
• Use Cases:
– Data Warehousing
– Complete Auditability
!
!
!
!
Bukhantsov.org
27. Data Vault Pros
and Cons
27
Copyright 2014 by Data Blueprint
• Pros
– Simple integration
– Houses immense
amounts of data with
excellent performance
– Full data lineage
captured
• Cons
– Complication is pushed
to the “back end”
– Can be difficult to setup
for many data workers
– No widespread support
for ETL tools yet
28. Gartner Five-phase Hype Cycle
Peak of Inflated Expectations: Early publicity produces a number of success
stories—often accompanied by scores of failures. Some companies take action;
many do not.
Plateau of Productivity: Mainstream adoption starts to take off.
Criteria for assessing provider viability are more clearly defined.
The technology’s broad market applicability and relevance are
clearly paying off.
Slope of Enlightenment: More instances of how the technology can benefit the enterprise start to
crystallize and become more widely understood. Second- and third-generation products appear from
technology providers. More enterprises fund pilots; conservative companies remain cautious.
Trough of Disillusionment: Interest wanes as experiments and implementations fail to deliver. Producers of the technology shake out
or fail. Investments continue only if the surviving providers improve their products to the satisfaction of early adopters.
28
Copyright 2014 by Data Blueprint
http://www.gartner.com/technology/research/methodologies/hype-cycle.jsp
Technology Trigger: A potential technology breakthrough kicks things off. Early proof-of-concept stories and media interest trigger significant publicity.
Often no usable products exist and commercial viability is unproven.
29. Gartner Hype Cycle
29
"A focus on big data is not a substitute for the
fundamentals of information management."
Copyright 2014 by Data Blueprint
30. 2012 Big Data in Gartner’s Hype Cycle
30
Copyright 2014 by Data Blueprint
31. 2013 Big Data in Gartner’s Hype Cycle
31
Copyright 2014 by Data Blueprint
32. Document/Key Value*
32
• Scalable thanks to a Distributed Hash Table
• Flexible, schema-less design
• Supports large scale web-applications
Copyright 2014 by Data Blueprint
!
• Use Cases:
– Applications with many users/writes
– Agile development- games/apps
– Flexible Schema
!
!
!
!
!
Kirupa.com
Dougfinke.com
33. Document/Key
Value Pros and
Cons
33
Copyright 2014 by Data Blueprint
• Pros
– “Schema-less” design
empowers developers*
– Scalable
– High availability
– Economically viable (scale
out not up!)
• Cons
– Poor ad-hoc query and
analysis capabilities
– Lack of maturity
– “Eventually consistent”
34. Other NoSQL Solutions*
*not exhaustive!
34
• RDF/Triple Store
– Purpose-built to store triples (“bob likes football”)
– SPARQL is a query language specific to RDF.
– One of the pillars of “Semantic Web”
• Graph
– Structure comprised of “nodes”, “edges”, and “properties”
– Focused on the interconnection between entities
– Fast queries to find associative data
• Column Family
– Columns are stored individually (but clustered by “family” unlike
traditional columnar databases)
– By only querying specific column families, we can have nearly
unlimited numbers of columns without causing expensive queries
Copyright 2014 by Data Blueprint
35. More NoSQL Examples
35
Copyright 2014 by Data Blueprint
RDF/Triple Store
Graph (Source: Neo4J)
37. Trends in Data Modeling
37
Copyright 2014 by Data Blueprint
• Business to Data: the Relationship
• What is a Data Model?
• Conceptual, Logical, Physical
• What issues can poor data modeling
introduce?
• Different Models, Different Uses
• 3NF, Star Schema, Data Vault
• Key-Value/Document
• Other NoSQL Technologies
• How is it changing
• Patterns and Reuse
• Abstraction for application
• Data Sharing World (The API’s)
• Scaling Out not up
38. Design Patterns
38
• Why are the restrooms generally in the same place in each building?
• What about the electrical wiring?
• HVAC? Floorplans? ...
• Architecture design patterns (spoke and hub,
hub of hubs, warehouse, cloud, MDM,
changing tires, portal)
Copyright 2014 by Data Blueprint
39. Meta Data Models
Source:http://dmreview.com/article_sub.cfm?articleID=1000941 used with permission
39
Copyright 2014 by Data Blueprint
40. Marco & Jennings's Metadata Model
Source:http://dmreview.com/article_sub.cfm?articleID=1000941 used with permission
40
Copyright 2014 by Data Blueprint
41. Patterns and Reuse
41
Copyright 2014 by Data Blueprint
• Common rule of thumb:
– One third of a data model contains
fields common to all business.
– One third contains fields common
to the industry, and the
– Other third is specific to the
organization.
• Patterns should theoretically provide
an organization with a base-line to
quickly develop data infrastructure.
• Off-the-shelf solutions may require in-depth
customization or specialization.
42. Data as a Service
42
Copyright 2014 by Data Blueprint
• Based on the concept
that data can be
provided on demand to
any user regardless of
geographical or
organizational
separations.
• Can enforce a “post-schema”
on data, by
shaping how it’s offered.
• By offering centralized
data, we can eliminate
silos and increase data
quality.
43. Data Sharing World
43
• Adding structure to information allows us to obtain exactly
what we want, when we want it.
• Allows applications to serve up data to external sources in
a structured way- “Post-schema”.
Copyright 2014 by Data Blueprint
44. Scaling Out Not Up
44
Anup Shah
Copyright 2014 by Data Blueprint
• Economical. Multiple
commodity servers
rather than one beefy
machine.
• Load balancing/
auto-sharding.
• Data redundancy for
disaster recovery.
• Applications/
technologies must be
built to capitalize on
scale-out.
45. Trends in Data Modeling
45
Copyright 2014 by Data Blueprint
• Business to Data: the Relationship
• What is a Data Model?
• Conceptual, Logical, Physical
• What issues can poor data modeling
introduce?
• Different Models, Different Uses
• 3NF, Star Schema, Data Vault
• Key-Value/Document
• Other NoSQL Technologies
• How is it changing
• Patterns and Reuse
• Abstraction for application
• Data Sharing World (The API’s)
• Scaling Out not up
46. Conclusions
• Data Modeling is
important to get right.
• Getting it “right” is
hugely dependent on
the business case,
maturity of the
organization, flexibility
for future growth, and
so much more.
• There are many
technologies and
ideas available to help
solve a number of
problems.
• Don't try any of this
without considering
the various
architectures involved
46
Copyright 2014 by Data Blueprint
47. Questions?
47
Copyright 2014 by Data Blueprint
It’s your turn!
Use the chat feature or Twitter (#dataed) to submit
your questions to Peter and Steven now.
48. Upcoming Events
48
Copyright 2014 by Data Blueprint
Metadata Strategies
November 11, 2014
@ 2:00 PM ET/11:00 AM PT
!
Data Warehouse Strategies
December 9, 2014 @ 2:00 PM ET/11:00 AM PT
!
Sign up here:
• www.datablueprint.com/webinar-schedule
• or www.dataversity.net
49. Sources
49
• Data model. (2014, October 7). In Wikipedia, The Free
Encyclopedia. Retrieved October 7, 2014, from http://
en.wikipedia.org/w/index.php?
title=Data_model&oldid=628639882
• Data Modeling 101. (2006). In Agile Data. Retrieved
October 7, 2014, from http://www.agiledata.org/essays/
dataModeling101.html
Copyright 2014 by Data Blueprint