B17 Eliminating the database bottleneck

Eliminating the Database
Bottleneck
What makes Vectorwise so fast

Mark Van de Wiel
Director Product Management, Vectorwise

Thursday, November 01, 2012

1 of 9 1 of 9
Confidential © 2012 Actian Corporation

Agenda

Why traditional RDBMSs are slow for analytics
Why Vectorwise is fast
The I/O challenge
Efficient updates

Confidential © 2012 Actian Corporation 2

100x (+) Performance Difference – 2003
Custom C versus Relational Database
TPC-H 1 GB query 1
(runtime in s)
30 28.1
26.2
25
20 MySQL
15 DBMS 'X'
C program
10
Vectorwise
5
0.2 0.6
0
MySQL DBMS 'X' C program Vectorwise


Traditional Relational Database for Analytics
Inefficiencies
Inefficient storage
Inefficient processing


Inefficient Storage for Analytics

Row-based storage model
Predominant in 2003, still very common today
Works well for OLTP

101 Joe 27 Black

103 Edward 21 Scissorhand


Inefficient Storage – Row-based

Pages on disk – example

101 27 Joe Black

103 21 Edward Scissorhand
Var-width attribute pointers

pointers to tuples


Issues with Row-based Storage

Always read all attributes
Poor bandwidth
Poor use of memory buffer

Complex row structure and navigation
E.g. compressing out null fields
E.g. row chaining


Efficient Storage for Analytics

Columnar storage: store attributes separtely
Retrieve only attributes required by the query
Used by “traditional” column stores, e.g. Sybase IQ, Vertica


Inefficient Processing

How a traditional database runs a query

Query:

SELECT
name,
salary*.19 AS tax
FROM
employee
WHERE
age > 25




Tuple-at-a-time iterator interface:
- open()
- next(): tuple
- close()

next() is called:
- for each operator
- for each tuple

Complex code repeated over and over




Data-specific computational
functionality

Called once for every operation
on every tuple

Worse for complex tuple
representations


Inefficient Processing (Part 1 of 2)

Lots of repeated, unnecessary code
Operator logic
Function calls
Attribute access

Most instructions interpreting a query
Very few instructions processing actual data!
Many instructions per tuple


CPU Features – Inefficient Processing Part 2

In the last 20 years…
Chip cache because RAM access is too slow and congested
Branch-sensitive CPU pipelines
Superscalar features
SIMD instructions (SSE and AVX)

Great for multimedia processing, scientific computing…
… but NOT for traditional relational databases
Complex code: function calls, branches
Poor use of CPU cache (both data and instructions)
Processing one value at a time



Traditional RDBMS
Many instructions per tuple
Many cycles per instruction
Very many cycles per tuple


Vectorwise – Vector-based Processing

Query:

SELECT
name,
salary*.19 AS tax
FROM
employee
WHERE
age > 25


Vectorwise – Vector-based Processing

Vector contains data of
multiple tuples (1024)

All operations consume
and produce entire vectors

Effect: much less
operator.next() and
primitive calls.

AND: pipelined query
evaluation


Why is Vectorwise so Fast?

Reduced interpretation overhead
100+ times fewer function calls
Good CPU cache use
High locality in primitives
Cache-conscious algorithms
No tuple navigation
Primitives only see arrays
Vectorization allows algorithmic optimization
CPU and compiler-friendly function bodies
Multiple work units, loop-pipelining, SIMD…
BONUS: PARALLEL QUERY


Some Numbers

Traditional RDBMS: <200 MB/s per core
Vectorwise (lab environment): >1.5 GB/s per core


Addressing the I/O Challenge

Columnar storage
Smart column buffer (memory)
Data compression
On disk: less I/O
In memory: best use of column buffer
Ultra-efficient decompression algorithms to
get sufficient throughput

Large contiguous data blocks
for optimum disk I/O
In-memory min-max indexes per block (i.e. per column)
Eliminate data blocks based on implicit/explicit filter criteria


Efficient Updates in a Column Store

Positional Delta Trees (PDTs)
In-memory representation of small data changes
Efficiently merged with on-disk data
Periodically propagated to disk

Provide snapshot read consistency
ACID compliant


Agenda

Why traditional RDBMSs are slow for analytics
Why Vectorwise is fast
The I/O challenge
Efficient updates


B17 Eliminating the database bottleneck

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (6)

Similar a B17 Eliminating the database bottleneck

Similar a B17 Eliminating the database bottleneck (20)

Más de Insight Technology, Inc.

Más de Insight Technology, Inc. (20)

Último

Último (20)

B17 Eliminating the database bottleneck