Slides from the Live Webcast on Mar.14, 2012
Watch this Database Revolution roundtable to learn from four of the best minds in the business: Mark Madsen of Third Nature, Robin Bloor of The Bloor Group, Colin White of BI Research and Steve Dine of Datasource Consulting. Each will present their thoughts on what’s happening in the NoSQL space, followed by an extended Q&A in which you can pose your detailed questions.
For more information visit: http://www.databaserevolution.com
Watch this and the entire series at : http://www.youtube.com/playlist?list=PLE1A2D56295866394
2. Eric Kavanagh
Eric.kavanagh@bloorgroup.com
Twitter Tag: #briefr
Wednesday, March 14, 12
3. To conduct an Open Research program that
invites the participation of both IT users and
technology vendors
To assist IT buyers in understanding database
technology and the architecture that surrounds
it.
Allow audience members to pose serious
questions... and get answers!
Publish all findings
Twitter Tag: #briefr
Wednesday, March 14, 12
4. Your Host: Eric Kavanagh
Research Leader: Mark Madsen - Third Nature
Primary Collaborator: Robin Bloor - The Bloor
Group
Guest Analyst 1: Colin White - BI Research
Guest Analyst 2: Steve Dine - DataSource
Consulting
Wednesday, March 14, 12
5. Colin White is the president of
DataBase Associates Inc. and founder
of BI Research. He is well known for
his in-depth knowledge of data
management, information integration,
and business intelligence technologies.
He has consulted for dozens of
companies throughout the world and is
a frequent speaker at leading IT
events. For ten years he was the
conference chair of the DCI and Shared
Insights Portals, Content Management,
and Collaboration conference.
Twitter Tag: #briefr
Wednesday, March 14, 12
6. Big Data is Bigger than NoSQL
Colin White
President BI Research
March 2012
Wednesday, March 14, 12
13. Robin Bloor is Chief
Analyst at The
Bloor Group.
Robin.Bloor@Bloorgroup.com
Twitter Tag: #briefr
Wednesday, March 14, 12
14. The Hardware Landscape
CPUs go multicore
Memory/Disk cost ratio falls
Speed of random reads lag speed
of serial reads
Faster networking and fast
switches
Parallelism becomes more
important
Commodity servers
Cloud computing cuts H/W costs
Wednesday, March 14, 12
15. That MapReduce Thing
There are two fundamental
approaches to parallelism
Data Partitioning
Process partitioning
MapReduce implements an
approach which is oriented to
data partitioning
This relates to data processing
rather than to database
Hadoop is often used for ETL
Wednesday, March 14, 12
16. The Devil Is In The Workload
NoSQL is a distraction
Big Data can be Big US Big XML
Data or Big SDATA D
Table Store
A
Unstructured T
A Column
Store
Document
Store
workloads are rarely V
O
suited to traditional L
U
RDBMS ODBMS
RDMBS-type engines
M
E
Database Database
Analytical workloads
span both More
Structured
Less
Structured
Wednesday, March 14, 12
17. If you don’t know the expected
workloads, you shouldn’t be
selecting a database
Wednesday, March 14, 12
18. Steve Dine is the founder of Datasource
Consulting, LLC. He has extensive experience
delivering and managing successful, highly
scalable and maintainable data integration and
business intelligence solutions. Steve combines
hands-on technical experience across the
entire BI project lifecycle with strong business
acumen. He currently works as a consultant for
Fortune 500 companies. Steve is a faculty
member at TDWI and a judge for the Annual
TDWI Best Practices Awards. He teaches
courses and presents on many BI topics.
Contact info: Twitter: @steve_dineEmail:
sdine@datasourceconsulting.com Web: http://
www.datasourceconsulting.com
Twitter Tag: #briefr
Wednesday, March 14, 12
19. The State of NoSQL & BI
From the trenches…
“Hey
Bob,
seems
like
a
no
brainer.
So,
what’s
the
catch?”
*
Graphic
from
h=p://schri@man.wordpress.com/category/booksbook-‐reviews/c-‐s-‐lewis/page/2/
Confiden)al,
Datasource
Consul)ng,
LLC
19
Wednesday, March 14, 12
20. Why NoSQL?
More
data
More
different
types
of
data
(semi-‐structured,
unstructured)
More
frequent
changes
to
the
structure
of
the
data
we
need
to
store
and
analyze
More
demand
for
the
long
tail
analysis
More
“affordable”,
commodity
hardware
available
(blade
servers,
“cheap”
storage,
cloud)
More
buzz!
*
Graphic
from
h=p://www.fredberinger.com/musings-‐on-‐nosql/
Confiden)al,
Datasource
Consul)ng,
LLC
20
Wednesday, March 14, 12
21. Why Not Not NoSQL?
RelaCvely
immature
(0.x
–
2.x)
Difficult
to
describe
to
decision
makers
Not
fit
for
purpose
(low
latency,
update
heavy,
complex
joins)
In
many
organizaCons
it’s
a
soluCon
looking
for
a
problem
Lack
of
“BI”
support
Skills
gap!
*
Graphic
based
on
h=p://www.fredberinger.com/musings-‐on-‐nosql/
Confiden)al,
Datasource
Consul)ng,
LLC
21
Wednesday, March 14, 12
22. BI-NoSQL Skills Gap
“SQL”
Skills NoSQL
Skills
•
GUI’s
(mostly) •
Command
Line
•
Rela)onal
Data
•
Key-‐Value
/
Column
Modeling
Family
Modeling
•
RDBMS
•
Distributed
Data
•
SQL Store
•
Stored
procedures •
Programming
(Java,
•
LDAP Jscript,
Python,
etc)
•
Javascript •
MapReduce
(Hive)
•
Batch/Shell
Scripts •
JSON
•
Shell
Scripts
*
Graphic
based
on
h=p://www.beckshome.com/index.php/2007/09/the-‐soa-‐chasm/
Confiden)al,
Datasource
Consul)ng,
LLC
22
Wednesday, March 14, 12
23. Conclusions?
• Best
to
evaluate
your
true
data
size,
data
growth,
data
formats,
data
structure
and
analyCc
requirements
before
deciding
on
soluCon
• Make
sure
to
evaluate
your
available
skills
• Experienced
NoSQL
resources
with
BI
experience
not
always
easy
to
find
• Need
to
plan
for
addiConal
technology
risk
in
project
plan
• Consider
starCng
out
with
one
part
of
your
DW
architecture
(i.e.
staging)
• POC
POC
POC
• NoSQL
maturing
quickly
and
will
likely
conCnue
to
evolve
into
a
hybrid
soluCon
Confiden)al,
Datasource
Consul)ng,
LLC
23
Wednesday, March 14, 12
24. Mark Madsen is founder of Third Nature, a
research and consulting firm focused on
analytics, BI and decision-making. Mark
spent the past two decades working on
analysis and decision support in many
industries and countries. He is an award-
winning architect and former CTO whose
work has been featured in numerous
industry publications. Over the past ten
years Mark received awards for his work
from the American Productivity & Quality
Center, TDWI, and the Smithsonian Institute.
He is an international speaker, a contributing
editor at Intelligent Enterprise, and manages
the open source channel at the Business
Intelligence Network. For more information
or to contact Mark, visit http://
ThirdNature.net.
Twitter Tag: #briefr
Wednesday, March 14, 12
25. One Size Doesn’t Fit All
Choosing which big data,
NoSQL or database
technology to use
March 14, 2012
Mark R. Madsen
http://ThirdNature.net
Wednesday, March 14, 12
27. Big
data?
Unstructured
data
isn’t
really
unstructured.
The
problem
is
that
this
data
is
unmodeled.
The
real
challenge
is
complexity.
Wednesday, March 14, 12
28. The
holy
grail
of
databases
under
current
market
hype
A
key
problem
is
that
we’re
talking
mostly
about
computa?on
over
data
when
we
talk
about
“big
data”
and
analy?cs,
a
poten?al
mismatch
for
both
rela?onal
and
nosql.
Wednesday, March 14, 12
30. You
must
understand
your
workload
-‐
throughput
and
response
=me
requirements
aren’t
enough.
▪ 100
simple
queries
accessing
month-‐to-‐date
data
▪ 90
simple
queries
accessing
month-‐to-‐date
data
plus
10
complex
queries
using
two
years
of
history
▪ Hazard
calculaCon
for
the
enCre
customer
master
▪ Performance
problems
are
rarely
due
to
a
single
factor.
Wednesday, March 14, 12
31. Workload:
One
big
query
or
many
small
queries?
Retrieval: small return set or large?
Selectivity: large volume of data scanned or small?
Wednesday, March 14, 12
33. Important
workload
parameters
to
know
• Read-‐intensive
vs.
write-‐intensive
• Mutable
vs.
immutable
data
Wednesday, March 14, 12
34. Important
workload
parameters
to
know
• Read-‐intensive
vs.
write-‐intensive
• Mutable
vs.
immutable
data
• Immediate
vs.
eventual
consistency
Wednesday, March 14, 12
35. Important
workload
parameters
to
know
• Read-‐intensive
vs.
write-‐intensive
• Mutable
vs.
immutable
data
• Immediate
vs.
eventual
consistency
• Short
vs.
long
access
latency
Wednesday, March 14, 12
36. Important
workload
parameters
to
know
• Read-‐intensive
vs.
write-‐intensive
• Mutable
vs.
immutable
data
• Immediate
vs.
eventual
consistency
• Short
vs.
long
access
latency
• Predictable
vs.
unpredictable
data
access
paEerns
Wednesday, March 14, 12
37. Types
of
workloads
Write-‐biased:
Read-‐biased:
▪ OLTP Query
▪ OLTP,
batch Query,
simple
retrieval
▪ OLTP,
lite Query,
complex
▪ Object
persistence Query-‐hierarchical
/
▪ Data
ingest,
batch object
/
network
▪ Data
ingest,
real-‐Cme AnalyCc
Mixed?
Inline analytic execution, operational BI
Wednesday, March 14, 12
38. Matching
to
parameters,
at
assumpCon
of
data
scale
Workload
Write-‐ Read-‐ Updateable
Eventual
Un-‐ Compute
parameters biased biased data consistency
predictable
intensive
ok query
path
Standard
RDBMS
Parallel
RDBMS
NoSQL
(kv,
dht,
obj)
Hadoop*
Streaming
database
You see the problem: it’s an intersection of multiple parameters, and this
chart only includes the first tier of parameters. Plus, workload factors can
completely invert these general rules of thumb.
Wednesday, March 14, 12
39. Matching
to
parameters,
at
assumpCon
of
data
scale
Workload
Complex
SelecCve
Low
latency
High
High
ingest
parameters queries queries queries concurrency rate
Standard
RDBMS
Parallel
RDBMS
NoSQL
(kv,
dht,
obj)
Hadoop
Streaming
database
You have to look at the combination of workload factors: data scale,
concurrency, latency & response time, then chart the parameters.
Wednesday, March 14, 12
40. Always
build
a
proof
of
concept!
Wednesday, March 14, 12
43. March:
Vendor Research
March 14th: Second Round Table focusing on No SQL databases and
their application
DB Revolution Survey conducted
April:
Vendor Research
Publishing of Round Table Transcripts, with comments
May:
Authoring of White Paper
Publishing of White Paper
Publishing of survey activity
Twitter Tag: #briefr
Wednesday, March 14, 12
44. March Briefing Room:
Integration
April Briefing Room:
Discovery
May Briefing Room: Analytics
Twitter Tag: #briefr
Wednesday, March 14, 12
45. Thank You
For Your
Attention
Wednesday, March 14, 12