48. the real dilemma:
business wants near
realtime, but without
penalties or data loss,
with endless scalability,
zero-latency and 100%
consistency
Wednesday, April 24, 13
50. Data that’s not immediately
turned into useful
information and thus value is
only of archaeologic,
accounting- or compliance-
relevant or even algorith-
training interest
Wednesday, April 24, 13
51. The true market advantage
of the future depends on how
close to near realtime you are
gaining useful information
out of your live data
Wednesday, April 24, 13
62. But Big Data people need to
learn from them
Wednesday, April 24, 13
63. Go with message/event
orientation, VMs with native
support for them, or similar
on platforms you probably
didn’t think it’s possible on
Wednesday, April 24, 13
66. result = <some optimized
binary that ideally fits into
one single MTU of the
underlying protocol(s)>
Wednesday, April 24, 13
67. OK, you can process and
queue results for whoever
listens to them (semi-time-
critical, lower-level queue).
Now how to store real fast a
lot of data like this?
Wednesday, April 24, 13
68. There is no such thing as
high-performance, high-
load-capable, high-scale,
multi-purpose, rich
model, absolutely
reliable and 100%
consistent database
Wednesday, April 24, 13
70. Classic databases and even
NoSQL data stores, for
different reasons, sometimes
tend to lose their original
intention / focus
Wednesday, April 24, 13
71. NewSQL world aims to
solve the scale-up
limitations of RDBMS
through distribution
while still guaranteeing
ACIDish transactions
Wednesday, April 24, 13
74. You can be real fast just
spilling data block-wise
to the disk through
DMA, but beware of
caches
Wednesday, April 24, 13
75. You’ll be a bit slower
with an in-memory,
journaling K/V store -
but beware of weak
storage reliability
Wednesday, April 24, 13
76. You will be slower, but win
reliability (and redundancy if
you wish) when you go with a
column-oriented or K/V,
natively distributed and
masterless store - as model-
agnostic as possible
Wednesday, April 24, 13
77. But you need to be aware
that to make such a
store real fast, you’ll
have to turn a lot of
infrastructural nobs
before your data even
hits the store
Wednesday, April 24, 13
78. OK, now it’s in the store,
though you probably
didn’t need to store it.
But what if you run into
the (slow) batch? How
make it faster?
Wednesday, April 24, 13
79. Go with native, machine-
and system-close
extensions instead of
general portability
Wednesday, April 24, 13
80. Keep it all in memory.
Memory of a distributed
system is also
distributed
Wednesday, April 24, 13
81. Splice your pipes or go
with almost-zero-
infrastructure queues if
you mix technologies
Wednesday, April 24, 13
82. Have the data where you
process it, don’t move it
there first
Wednesday, April 24, 13
87. But you’re slow if you
don’t give them data as it
comes
Wednesday, April 24, 13
88. And what about Big Data
Clouds or Cloud in
general?
Wednesday, April 24, 13
89. Clouds can be fast, real
fast. If you can afford it
Wednesday, April 24, 13
90. And you’re slow if you
don’t give them data as it
comes
Wednesday, April 24, 13
91. There is no single tool around
that will do your Big Data
Wednesday, April 24, 13
92. Everything that makes you
faster - from hardware over
kernel tweaks and network
optimization to direct
memory access and minimal-
abstraction code are your
friends
Wednesday, April 24, 13
93. When you don’t need to
retrieve or search, you win
Wednesday, April 24, 13
94. It’s all about speed. Size
doesn’t matter a lot
Wednesday, April 24, 13