The document discusses major trends in technology that are driving changes to data architecture. It covers:
1. Exponential growth in data and computing power due to Moore's Law, which is causing databases and hardware to rapidly scale up in size and capabilities.
2. Emergence of parallel processing and new hardware like GPUs, FPGAs and fast networks to handle large volumes of data in real-time. Memory and SSDs are replacing disks.
3. These changes require new architectural approaches compared to traditional centralized and client-server models. Event-driven architectures using stream processing and taking compute to the data are discussed.
4. The need to support different data flows like real-time and analytical,
4. Moore’s Law and Its Consequences
Speed x10 every 6 years
Moore’s Law has about 10
years left (probably)
If Moore’s Law stops
there will be problems.
Because of Moore’s Law,
expensive technology is
fairly affordable within 6
years and inexpensive
within 12 years.
5. The Visible “Big Data” Trend
Corporate data volumes
grow at about 55% per
annum - exponentially
Data has been growing
at this rate for, maybe,
40 years
There is nothing new
about big data. It clings
to an established
exponential trend
6. The Invisible Trend: Moore’s Law Cubed…
The biggest databases are new
databases
They grow at the cube of Moore’s
Law
Moore’s Law = 10x every 6 years
VLDB: 1000x every 6 years
– 1991/2 megabytes
– 1997/8 gigabytes
– 2003/4 terabytes
– 2009/10 petabytes
– 2015/16 exabytes
7. Moore’s Law’s Cubic Consequences
Database
technology is
the most stressed
technology in the stack
Scale-out architecture
has become a necessity
In-database analytics
will become a necessity
In-memory database is
the next iteration
10. The Take Aways
Software architectures
change: centralized, C/S,
3 tier/web , SOA, etc.
Applications migrate
according to latencies
Dominant applications
and software brands can
die via “The innovator’s
dilemma”
Wholly new applications
appear because of lower
latencies e.g. VMs, CEP.
11. Disruption on Disruption
We are no longer
certain that the pattern
still holds
We used to encounter
new technologies that
were 10x because of
Moore’s Law
Now we encounter new
technologies that are
100x or even 1000x
This is not because of
Moore’s Law but
because of parallelism
12. Moore’s Law Does Somersault
In 2004 chips got too hot
That’s when the world
of parallel processing
suddenly emerged
Now CPUs miniaturize
and add more cores
This changes software
forever
13. Parallelism Will Become The Norm
True parallelism involves
both data segmentation
and pipeline parallelism
MapReduce is a halfway
house.
This is about all
software. Eventually
everything will execute
in parallel
Everything goes much
faster
15. CPUs, GPUs and FPGA’s
CPUs, GPUs and FPGAs
are commodities
They can be harnessed
to deliver extreme
parallelism on a single
server
The use of such chips
can deliver
acceleration above
100x for some
applications
16. The Network Latency
In tests of DBMS
queries, Cisco found
about 90% of
latency was the
network
Big network
switches virtualize
networks.
The network can no
longer be ignored
17. The Memory Cascade
On chip speed v RAM
L1(32K) = 100x
L2(246K) = 30x
L3(8-20Mb) = 8.6x
RAM v SSD
RAM = 300x
SSD v Disk
SSD = 10x
18. In-Memory Disruption
In-memory processing
will become the norm
The latency matters
most for real-time
applications.
However some
businesses are using it
for analytics
As such memory is an
accelerator
19. A Question
When will memory become the
primary source store for data?
Soon, probably.
21. It’s Over for Spinning Disk
SSD is now on the
Moore’s Law curve.
Disk is not and never was
(in respect of seek
time).
All traditional databases
were engineered for
spinning disk and not for
scale-out
This explains the new
DBMS products…
23. Tech Revolutions
Tech Revolution
Architecture
Computer
Batch
On-line
Centralized
PC
Client/server
Internet
Multi-tier
Mobile
Service
Internet
of things
Orientation
Event Driven/Big
Data
26. Some Architectural Principles
The new atom of data
is the event
SUSO, scale up before
scale out
Take the processing to
the data, if you can
Hadoop is a
component not a
solution
28. The Biological System
Our human control system
works at different speeds:
Almost instant reflex
Swift response
Considered response
Organizations will
gradually implement
similar control systems
This suggests a data-flowbased architecture
29. The Corporate Biological System
Right now this division
into different data flows
is already occurring
Currently we can
distinguish between:
Real-time/Business time
applications
Analytical applications
We should build specific
architectures for this
30. In Summary…
1
2
3
4
5
The Big Data Curve
The March of
Technology
Upheaval in the
Hardware Layer
Architecture?
The Flow of Data