SlideShare una empresa de Scribd logo
1 de 22
Descargar para leer sin conexión
Eliminating the Database
Bottleneck
What makes Vectorwise so fast


Mark Van de Wiel
Director Product Management, Vectorwise

Thursday, November 01, 2012



1 of 9 1 of 9
Confidential © 2012 Actian Corporation
Agenda

 Why traditional RDBMSs are slow for analytics
 Why Vectorwise is fast
 The I/O challenge
 Efficient updates




          Confidential © 2012 Actian Corporation   2
100x (+) Performance Difference – 2003
Custom C versus Relational Database
                                           TPC-H 1 GB query 1
                                             (runtime in s)
30                                28.1
     26.2
25
20                                                                           MySQL
15                                                                           DBMS 'X'
                                                                             C program
10
                                                                             Vectorwise
 5
                                                     0.2           0.6
 0
     MySQL                    DBMS 'X'            C program     Vectorwise



        Confidential © 2012 Actian Corporation                                    3
Traditional Relational Database for Analytics
Inefficiencies
 Inefficient storage
 Inefficient processing




           Confidential © 2012 Actian Corporation   4
Inefficient Storage for Analytics

 Row-based storage model
  Predominant in 2003, still very common today
  Works well for OLTP


      101      Joe                           27      Black

      103      Edward                        21      Scissorhand




            Confidential © 2012 Actian Corporation                 5
Inefficient Storage – Row-based

 Pages on disk – example


         101                          27                Joe Black

                     103                           21       Edward Scissorhand
                                                         Var-width attribute pointers



                                                                       pointers to tuples




          Confidential © 2012 Actian Corporation                                            6
Issues with Row-based Storage

 Always read all attributes
  Poor bandwidth
  Poor use of memory buffer

 Complex row structure and navigation
  E.g. compressing out null fields
  E.g. row chaining




             Confidential © 2012 Actian Corporation   7
Efficient Storage for Analytics

 Columnar storage: store attributes separtely
 Retrieve only attributes required by the query
 Used by “traditional” column stores, e.g. Sybase IQ, Vertica




          Confidential © 2012 Actian Corporation                8
Inefficient Processing

How a traditional database runs a query


                                                   Query:

                                                   SELECT
                                                       name,
                                                       salary*.19 AS tax
                                                   FROM
                                                       employee
                                                   WHERE
                                                       age > 25




          Confidential © 2012 Actian Corporation                           9
Inefficient Processing

How a traditional database runs a query

                                                   Tuple-at-a-time iterator interface:
                                                   - open()
                                                   - next(): tuple
                                                   - close()


                                                   next() is called:
                                                   - for each operator
                                                   - for each tuple


                                                   Complex code repeated over and over


          Confidential © 2012 Actian Corporation                                     10
Inefficient Processing

How a traditional database runs a query

                                                   Data-specific computational
                                                   functionality

                                                   Called once for every operation
                                                   on every tuple

                                                   Worse for complex tuple
                                                   representations




          Confidential © 2012 Actian Corporation                                     11
Inefficient Processing (Part 1 of 2)

 Lots of repeated, unnecessary code
  Operator logic
  Function calls
  Attribute access

 Most instructions interpreting a query
 Very few instructions processing actual data!
 Many instructions per tuple




             Confidential © 2012 Actian Corporation   12
CPU Features – Inefficient Processing Part 2

 In the last 20 years…
  Chip cache because RAM access is too slow and congested
  Branch-sensitive CPU pipelines
  Superscalar features
  SIMD instructions (SSE and AVX)

 Great for multimedia processing, scientific computing…
 … but NOT for traditional relational databases
  Complex code: function calls, branches
  Poor use of CPU cache (both data and instructions)
  Processing one value at a time




            Confidential © 2012 Actian Corporation          13
Inefficient Processing

Traditional RDBMS
 Many instructions per tuple
 Many cycles per instruction
 Very many cycles per tuple




          Confidential © 2012 Actian Corporation   14
Vectorwise – Vector-based Processing



                                                Query:

                                                SELECT
                                                    name,
                                                    salary*.19 AS tax
                                                FROM
                                                    employee
                                                WHERE
                                                    age > 25




       Confidential © 2012 Actian Corporation                           15
Vectorwise – Vector-based Processing

                                                Vector contains data of
                                                multiple tuples (1024)

                                                All operations consume
                                                and produce entire vectors

                                                Effect: much less
                                                operator.next() and
                                                primitive calls.

                                                AND: pipelined query
                                                evaluation

       Confidential © 2012 Actian Corporation                          16
Why is Vectorwise so Fast?

 Reduced interpretation overhead
  100+ times fewer function calls
 Good CPU cache use
  High locality in primitives
  Cache-conscious algorithms
 No tuple navigation
  Primitives only see arrays
 Vectorization allows algorithmic optimization
 CPU and compiler-friendly function bodies
  Multiple work units, loop-pipelining, SIMD…
 BONUS: PARALLEL QUERY



              Confidential © 2012 Actian Corporation   17
Some Numbers

 Traditional RDBMS: <200 MB/s per core
 Vectorwise (lab environment): >1.5 GB/s per core




          Confidential © 2012 Actian Corporation    18
Addressing the I/O Challenge

 Columnar storage
 Smart column buffer (memory)
 Data compression
  On disk: less I/O
  In memory: best use of column buffer
  Ultra-efficient decompression algorithms to
  get sufficient throughput

 Large contiguous data blocks
 for optimum disk I/O
 In-memory min-max indexes per block (i.e. per column)
  Eliminate data blocks based on implicit/explicit filter criteria



              Confidential © 2012 Actian Corporation                 19
Efficient Updates in a Column Store

Positional Delta Trees (PDTs)
 In-memory representation of small data changes
  Efficiently merged with on-disk data
  Periodically propagated to disk

 Provide snapshot read consistency
 ACID compliant




             Confidential © 2012 Actian Corporation   20
Agenda

 Why traditional RDBMSs are slow for analytics
 Why Vectorwise is fast
 The I/O challenge
 Efficient updates




          Confidential © 2012 Actian Corporation   21
Confidential © 2012 Actian Corporation

Más contenido relacionado

La actualidad más candente

Oracle Systems _ Jeff Schwartz _ Engineering Solutions Exadata - Exalogic.pdf
Oracle Systems _ Jeff Schwartz _ Engineering Solutions Exadata - Exalogic.pdfOracle Systems _ Jeff Schwartz _ Engineering Solutions Exadata - Exalogic.pdf
Oracle Systems _ Jeff Schwartz _ Engineering Solutions Exadata - Exalogic.pdf
InSync2011
 
JDE & Peoplesoft 1 _ Roland Slee & Doug Hughes _ Oracle's Cloud Computing Str...
JDE & Peoplesoft 1 _ Roland Slee & Doug Hughes _ Oracle's Cloud Computing Str...JDE & Peoplesoft 1 _ Roland Slee & Doug Hughes _ Oracle's Cloud Computing Str...
JDE & Peoplesoft 1 _ Roland Slee & Doug Hughes _ Oracle's Cloud Computing Str...
InSync2011
 
The non stop mission critical experience
The non stop mission critical experienceThe non stop mission critical experience
The non stop mission critical experience
HP ESSN Philippines
 
Case study 1
Case study 1Case study 1
Case study 1
systemz
 

La actualidad más candente (20)

Diagnosability versus The Cloud, Redwood Shores 2011-08-30
Diagnosability versus The Cloud, Redwood Shores 2011-08-30Diagnosability versus The Cloud, Redwood Shores 2011-08-30
Diagnosability versus The Cloud, Redwood Shores 2011-08-30
 
[Uruguay] IBM Systems Director Navigator for i
[Uruguay] IBM Systems Director Navigator for i[Uruguay] IBM Systems Director Navigator for i
[Uruguay] IBM Systems Director Navigator for i
 
Oracle Systems _ Jeff Schwartz _ Engineering Solutions Exadata - Exalogic.pdf
Oracle Systems _ Jeff Schwartz _ Engineering Solutions Exadata - Exalogic.pdfOracle Systems _ Jeff Schwartz _ Engineering Solutions Exadata - Exalogic.pdf
Oracle Systems _ Jeff Schwartz _ Engineering Solutions Exadata - Exalogic.pdf
 
Oow Ppt 1
Oow Ppt 1Oow Ppt 1
Oow Ppt 1
 
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?  Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
Greenplum Analytics Workbench - What Can a Private Hadoop Cloud Do For You?
 
VMworld 2012 - Spotlight Session - EMC Transforms IT - Jeremy Burton
VMworld 2012 - Spotlight Session - EMC Transforms IT - Jeremy BurtonVMworld 2012 - Spotlight Session - EMC Transforms IT - Jeremy Burton
VMworld 2012 - Spotlight Session - EMC Transforms IT - Jeremy Burton
 
JDE & Peoplesoft 1 _ Roland Slee & Doug Hughes _ Oracle's Cloud Computing Str...
JDE & Peoplesoft 1 _ Roland Slee & Doug Hughes _ Oracle's Cloud Computing Str...JDE & Peoplesoft 1 _ Roland Slee & Doug Hughes _ Oracle's Cloud Computing Str...
JDE & Peoplesoft 1 _ Roland Slee & Doug Hughes _ Oracle's Cloud Computing Str...
 
Introducing VNX Series
Introducing VNX SeriesIntroducing VNX Series
Introducing VNX Series
 
The non stop mission critical experience
The non stop mission critical experienceThe non stop mission critical experience
The non stop mission critical experience
 
Application Grid: Platform for Virtualization and Consolidation of your Java ...
Application Grid: Platform for Virtualization and Consolidation of your Java ...Application Grid: Platform for Virtualization and Consolidation of your Java ...
Application Grid: Platform for Virtualization and Consolidation of your Java ...
 
Transform Microsoft Application Environment With EMC Information Infrastructure
Transform Microsoft Application Environment With EMC Information InfrastructureTransform Microsoft Application Environment With EMC Information Infrastructure
Transform Microsoft Application Environment With EMC Information Infrastructure
 
Collaborate07kmohiuddin
Collaborate07kmohiuddinCollaborate07kmohiuddin
Collaborate07kmohiuddin
 
Limewood Event - EMC
Limewood Event - EMC Limewood Event - EMC
Limewood Event - EMC
 
Do More with Oracle Environment with Open and Best of breed Technologies
Do More with Oracle Environment with Open and Best of breed TechnologiesDo More with Oracle Environment with Open and Best of breed Technologies
Do More with Oracle Environment with Open and Best of breed Technologies
 
Oracle ksplice
Oracle kspliceOracle ksplice
Oracle ksplice
 
Architecting for a cost effective Windows Azure solution
Architecting for a cost effective Windows Azure solutionArchitecting for a cost effective Windows Azure solution
Architecting for a cost effective Windows Azure solution
 
Open systems Specialists: XiV Storage Reinvented
Open systems Specialists: XiV Storage ReinventedOpen systems Specialists: XiV Storage Reinvented
Open systems Specialists: XiV Storage Reinvented
 
Case study 1
Case study 1Case study 1
Case study 1
 
Converged infrastructure ucc
Converged infrastructure  uccConverged infrastructure  ucc
Converged infrastructure ucc
 
Improve DB2 z/OS Test Data Management
Improve DB2 z/OS Test Data ManagementImprove DB2 z/OS Test Data Management
Improve DB2 z/OS Test Data Management
 

Destacado

Bottlenecks exposed web app db servers
Bottlenecks exposed web app db serversBottlenecks exposed web app db servers
Bottlenecks exposed web app db servers
Upender Dravidum
 

Destacado (6)

Performance Bottleneck Identification through Software Diagnostics- Impetus W...
Performance Bottleneck Identification through Software Diagnostics- Impetus W...Performance Bottleneck Identification through Software Diagnostics- Impetus W...
Performance Bottleneck Identification through Software Diagnostics- Impetus W...
 
Find and Fix Performance Bottlenecks with New Relic and BlazeMeter
Find and Fix Performance Bottlenecks with New Relic and BlazeMeter Find and Fix Performance Bottlenecks with New Relic and BlazeMeter
Find and Fix Performance Bottlenecks with New Relic and BlazeMeter
 
Bottlenecks exposed web app db servers
Bottlenecks exposed web app db serversBottlenecks exposed web app db servers
Bottlenecks exposed web app db servers
 
How to Run a 1,000,000 VU Load Test using Apache JMeter and BlazeMeter
How to Run a 1,000,000 VU Load Test using Apache JMeter and BlazeMeterHow to Run a 1,000,000 VU Load Test using Apache JMeter and BlazeMeter
How to Run a 1,000,000 VU Load Test using Apache JMeter and BlazeMeter
 
Database - Design & Implementation - 1
Database - Design & Implementation - 1Database - Design & Implementation - 1
Database - Design & Implementation - 1
 
20161213_FinTech時代に求められるDB開発とセキュリティ by 株式会社インサイトテクノロジー 阿部健一
20161213_FinTech時代に求められるDB開発とセキュリティ by 株式会社インサイトテクノロジー 阿部健一20161213_FinTech時代に求められるDB開発とセキュリティ by 株式会社インサイトテクノロジー 阿部健一
20161213_FinTech時代に求められるDB開発とセキュリティ by 株式会社インサイトテクノロジー 阿部健一
 

Similar a B17 Eliminating the database bottleneck

A27 Vectorwise Performance Considerations_implementation_best_practices
A27 Vectorwise Performance Considerations_implementation_best_practicesA27 Vectorwise Performance Considerations_implementation_best_practices
A27 Vectorwise Performance Considerations_implementation_best_practices
Insight Technology, Inc.
 
A14 Getting Started with Vectorwise by Mark Van de Wiel
A14 Getting Started with Vectorwise by Mark Van de WielA14 Getting Started with Vectorwise by Mark Van de Wiel
A14 Getting Started with Vectorwise by Mark Van de Wiel
Insight Technology, Inc.
 
Atea roadshow norr
Atea roadshow norrAtea roadshow norr
Atea roadshow norr
Johan Odell
 
Open world exadata_top_10_lessons_learned
Open world exadata_top_10_lessons_learnedOpen world exadata_top_10_lessons_learned
Open world exadata_top_10_lessons_learned
chet justice
 
Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10
keirdo1
 
Presentación Data Center Cablevisión Day 2010
Presentación Data Center Cablevisión Day 2010Presentación Data Center Cablevisión Day 2010
Presentación Data Center Cablevisión Day 2010
Logicalis Latam
 
Ugif 04 2011 france ug04042011-jroy_part1
Ugif 04 2011   france ug04042011-jroy_part1Ugif 04 2011   france ug04042011-jroy_part1
Ugif 04 2011 france ug04042011-jroy_part1
UGIF
 

Similar a B17 Eliminating the database bottleneck (20)

A27 Vectorwise Performance Considerations_implementation_best_practices
A27 Vectorwise Performance Considerations_implementation_best_practicesA27 Vectorwise Performance Considerations_implementation_best_practices
A27 Vectorwise Performance Considerations_implementation_best_practices
 
A14 Getting Started with Vectorwise by Mark Van de Wiel
A14 Getting Started with Vectorwise by Mark Van de WielA14 Getting Started with Vectorwise by Mark Van de Wiel
A14 Getting Started with Vectorwise by Mark Van de Wiel
 
Atea roadshow norr
Atea roadshow norrAtea roadshow norr
Atea roadshow norr
 
Back to The Future V
Back to The Future VBack to The Future V
Back to The Future V
 
OpenStack Summit Portland April 2013 talk - Quantum and EC2
OpenStack Summit Portland April 2013 talk - Quantum and EC2OpenStack Summit Portland April 2013 talk - Quantum and EC2
OpenStack Summit Portland April 2013 talk - Quantum and EC2
 
Breakthrough performance with MySQL Cluster (2012)
Breakthrough performance with MySQL Cluster (2012)Breakthrough performance with MySQL Cluster (2012)
Breakthrough performance with MySQL Cluster (2012)
 
Antonio piraino v1
Antonio piraino v1Antonio piraino v1
Antonio piraino v1
 
Informix Update New Features 11.70.xC1+
Informix Update New Features 11.70.xC1+Informix Update New Features 11.70.xC1+
Informix Update New Features 11.70.xC1+
 
Tame that Beast
Tame that BeastTame that Beast
Tame that Beast
 
Open world exadata_top_10_lessons_learned
Open world exadata_top_10_lessons_learnedOpen world exadata_top_10_lessons_learned
Open world exadata_top_10_lessons_learned
 
Oracle Database Appliance - Introduction in Cyprus
Oracle Database Appliance - Introduction in CyprusOracle Database Appliance - Introduction in Cyprus
Oracle Database Appliance - Introduction in Cyprus
 
Accelerating big data with ioMemory and Cisco UCS and NOSQL
Accelerating big data with ioMemory and Cisco UCS and NOSQLAccelerating big data with ioMemory and Cisco UCS and NOSQL
Accelerating big data with ioMemory and Cisco UCS and NOSQL
 
Pro sphere customer technical
Pro sphere customer technicalPro sphere customer technical
Pro sphere customer technical
 
Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10Accel Partners New Data Workshop 7-14-10
Accel Partners New Data Workshop 7-14-10
 
Brocade: Storage Networking For the Virtual Enterprise
Brocade: Storage Networking For the Virtual Enterprise Brocade: Storage Networking For the Virtual Enterprise
Brocade: Storage Networking For the Virtual Enterprise
 
Implementing and Troubleshooting EdgeSight
Implementing and Troubleshooting EdgeSightImplementing and Troubleshooting EdgeSight
Implementing and Troubleshooting EdgeSight
 
VMware Solutions
VMware SolutionsVMware Solutions
VMware Solutions
 
Presentación Data Center Cablevisión Day 2010
Presentación Data Center Cablevisión Day 2010Presentación Data Center Cablevisión Day 2010
Presentación Data Center Cablevisión Day 2010
 
Ugif 04 2011 france ug04042011-jroy_part1
Ugif 04 2011   france ug04042011-jroy_part1Ugif 04 2011   france ug04042011-jroy_part1
Ugif 04 2011 france ug04042011-jroy_part1
 
MT49 Dell EMC XtremIO: Product Overview and New Use Cases
MT49 Dell EMC XtremIO: Product Overview and New Use CasesMT49 Dell EMC XtremIO: Product Overview and New Use Cases
MT49 Dell EMC XtremIO: Product Overview and New Use Cases
 

Más de Insight Technology, Inc.

コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
Insight Technology, Inc.
 

Más de Insight Technology, Inc. (20)

グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?
 
Docker and the Oracle Database
Docker and the Oracle DatabaseDocker and the Oracle Database
Docker and the Oracle Database
 
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
Great performance at scale~次期PostgreSQL12のパーティショニング性能の実力に迫る~
 
事例を通じて機械学習とは何かを説明する
事例を通じて機械学習とは何かを説明する事例を通じて機械学習とは何かを説明する
事例を通じて機械学習とは何かを説明する
 
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン
仮想通貨ウォレットアプリで理解するデータストアとしてのブロックチェーン
 
MBAAで覚えるDBREの大事なおしごと
MBAAで覚えるDBREの大事なおしごとMBAAで覚えるDBREの大事なおしごと
MBAAで覚えるDBREの大事なおしごと
 
グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?グラフデータベースは如何に自然言語を理解するか?
グラフデータベースは如何に自然言語を理解するか?
 
DBREから始めるデータベースプラットフォーム
DBREから始めるデータベースプラットフォームDBREから始めるデータベースプラットフォーム
DBREから始めるデータベースプラットフォーム
 
SQL Server エンジニアのためのコンテナ入門
SQL Server エンジニアのためのコンテナ入門SQL Server エンジニアのためのコンテナ入門
SQL Server エンジニアのためのコンテナ入門
 
Lunch & Learn, AWS NoSQL Services
Lunch & Learn, AWS NoSQL ServicesLunch & Learn, AWS NoSQL Services
Lunch & Learn, AWS NoSQL Services
 
db tech showcase2019オープニングセッション @ 森田 俊哉
db tech showcase2019オープニングセッション @ 森田 俊哉 db tech showcase2019オープニングセッション @ 森田 俊哉
db tech showcase2019オープニングセッション @ 森田 俊哉
 
db tech showcase2019 オープニングセッション @ 石川 雅也
db tech showcase2019 オープニングセッション @ 石川 雅也db tech showcase2019 オープニングセッション @ 石川 雅也
db tech showcase2019 オープニングセッション @ 石川 雅也
 
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー
db tech showcase2019 オープニングセッション @ マイナー・アレン・パーカー
 
難しいアプリケーション移行、手軽に試してみませんか?
難しいアプリケーション移行、手軽に試してみませんか?難しいアプリケーション移行、手軽に試してみませんか?
難しいアプリケーション移行、手軽に試してみませんか?
 
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介Attunityのソリューションと異種データベース・クラウド移行事例のご紹介
Attunityのソリューションと異種データベース・クラウド移行事例のご紹介
 
そのデータベース、クラウドで使ってみませんか?
そのデータベース、クラウドで使ってみませんか?そのデータベース、クラウドで使ってみませんか?
そのデータベース、クラウドで使ってみませんか?
 
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
コモディティサーバー3台で作る高速処理 “ハイパー・コンバージド・データベース・インフラストラクチャー(HCDI)” システム『Insight Qube』...
 
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。 複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。
複数DBのバックアップ・切り戻し運用手順が異なって大変?!運用性の大幅改善、その先に。。
 
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...
Attunity社のソリューションの日本国内外適用事例及びロードマップ紹介[ATTUNITY & インサイトテクノロジー IoT / Big Data フ...
 
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]
レガシーに埋もれたデータをリアルタイムでクラウドへ [ATTUNITY & インサイトテクノロジー IoT / Big Data フォーラム 2018]
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 

B17 Eliminating the database bottleneck

  • 1. Eliminating the Database Bottleneck What makes Vectorwise so fast Mark Van de Wiel Director Product Management, Vectorwise Thursday, November 01, 2012 1 of 9 1 of 9 Confidential © 2012 Actian Corporation
  • 2. Agenda Why traditional RDBMSs are slow for analytics Why Vectorwise is fast The I/O challenge Efficient updates Confidential © 2012 Actian Corporation 2
  • 3. 100x (+) Performance Difference – 2003 Custom C versus Relational Database TPC-H 1 GB query 1 (runtime in s) 30 28.1 26.2 25 20 MySQL 15 DBMS 'X' C program 10 Vectorwise 5 0.2 0.6 0 MySQL DBMS 'X' C program Vectorwise Confidential © 2012 Actian Corporation 3
  • 4. Traditional Relational Database for Analytics Inefficiencies Inefficient storage Inefficient processing Confidential © 2012 Actian Corporation 4
  • 5. Inefficient Storage for Analytics Row-based storage model Predominant in 2003, still very common today Works well for OLTP 101 Joe 27 Black 103 Edward 21 Scissorhand Confidential © 2012 Actian Corporation 5
  • 6. Inefficient Storage – Row-based Pages on disk – example 101 27 Joe Black 103 21 Edward Scissorhand Var-width attribute pointers pointers to tuples Confidential © 2012 Actian Corporation 6
  • 7. Issues with Row-based Storage Always read all attributes Poor bandwidth Poor use of memory buffer Complex row structure and navigation E.g. compressing out null fields E.g. row chaining Confidential © 2012 Actian Corporation 7
  • 8. Efficient Storage for Analytics Columnar storage: store attributes separtely Retrieve only attributes required by the query Used by “traditional” column stores, e.g. Sybase IQ, Vertica Confidential © 2012 Actian Corporation 8
  • 9. Inefficient Processing How a traditional database runs a query Query: SELECT name, salary*.19 AS tax FROM employee WHERE age > 25 Confidential © 2012 Actian Corporation 9
  • 10. Inefficient Processing How a traditional database runs a query Tuple-at-a-time iterator interface: - open() - next(): tuple - close() next() is called: - for each operator - for each tuple Complex code repeated over and over Confidential © 2012 Actian Corporation 10
  • 11. Inefficient Processing How a traditional database runs a query Data-specific computational functionality Called once for every operation on every tuple Worse for complex tuple representations Confidential © 2012 Actian Corporation 11
  • 12. Inefficient Processing (Part 1 of 2) Lots of repeated, unnecessary code Operator logic Function calls Attribute access Most instructions interpreting a query Very few instructions processing actual data! Many instructions per tuple Confidential © 2012 Actian Corporation 12
  • 13. CPU Features – Inefficient Processing Part 2 In the last 20 years… Chip cache because RAM access is too slow and congested Branch-sensitive CPU pipelines Superscalar features SIMD instructions (SSE and AVX) Great for multimedia processing, scientific computing… … but NOT for traditional relational databases Complex code: function calls, branches Poor use of CPU cache (both data and instructions) Processing one value at a time Confidential © 2012 Actian Corporation 13
  • 14. Inefficient Processing Traditional RDBMS Many instructions per tuple Many cycles per instruction Very many cycles per tuple Confidential © 2012 Actian Corporation 14
  • 15. Vectorwise – Vector-based Processing Query: SELECT name, salary*.19 AS tax FROM employee WHERE age > 25 Confidential © 2012 Actian Corporation 15
  • 16. Vectorwise – Vector-based Processing Vector contains data of multiple tuples (1024) All operations consume and produce entire vectors Effect: much less operator.next() and primitive calls. AND: pipelined query evaluation Confidential © 2012 Actian Corporation 16
  • 17. Why is Vectorwise so Fast? Reduced interpretation overhead 100+ times fewer function calls Good CPU cache use High locality in primitives Cache-conscious algorithms No tuple navigation Primitives only see arrays Vectorization allows algorithmic optimization CPU and compiler-friendly function bodies Multiple work units, loop-pipelining, SIMD… BONUS: PARALLEL QUERY Confidential © 2012 Actian Corporation 17
  • 18. Some Numbers Traditional RDBMS: <200 MB/s per core Vectorwise (lab environment): >1.5 GB/s per core Confidential © 2012 Actian Corporation 18
  • 19. Addressing the I/O Challenge Columnar storage Smart column buffer (memory) Data compression On disk: less I/O In memory: best use of column buffer Ultra-efficient decompression algorithms to get sufficient throughput Large contiguous data blocks for optimum disk I/O In-memory min-max indexes per block (i.e. per column) Eliminate data blocks based on implicit/explicit filter criteria Confidential © 2012 Actian Corporation 19
  • 20. Efficient Updates in a Column Store Positional Delta Trees (PDTs) In-memory representation of small data changes Efficiently merged with on-disk data Periodically propagated to disk Provide snapshot read consistency ACID compliant Confidential © 2012 Actian Corporation 20
  • 21. Agenda Why traditional RDBMSs are slow for analytics Why Vectorwise is fast The I/O challenge Efficient updates Confidential © 2012 Actian Corporation 21
  • 22. Confidential © 2012 Actian Corporation