SlideShare a Scribd company logo
1 of 43
Download to read offline
1
GPGPU Accelerates PostgreSQL
~Unlock the power of multi-thousand cores~
PGconf.EU 2015 Wien
NEC Business Creation Division
The PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
2
Today’s topic
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
Very powerful
computing
capability
Very functional
& well-used
database
PG-Strom:
Glue of the both promising
technologies
GPGPU
3
Self Introduction
▌Name: KaiGai Kohei
▌Works for:NEC
▌Role:
 Lead of the PG-Strom project
 Contribution to PostgreSQL and other OSS
projects
 Making business opportunity using this
technology
▌PG-Strom Project
 Mission: To deliver the power of semiconductor evolution
for every people who want
 Launched at Jan-2012, as a personal development project
 Project is now funded by NEC
 Fully open source project
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
4
Agenda
1. GPU characteristics and why?
2. What intermediates PostgreSQL and GPU
3. How GPU processes SQL workloads
4. Performance results and analysis
5. Further development
5
Features of GPU (Graphic Processor Unit)
▌Characteristics
 Massive parallel cores
 Much higher DRAM bandwidth
 Better price / performance ratio
 advantages for massive but
simple arithmetic operations
 Different manner to write
software algorithm
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
SOURCE: CUDA C Programming Guide
GPU CPU
Model
Nvidia GTX
TITAN X
Intel Xeon
E5-2690 v3
Architecture Maxwell Haswell
Launch Mar-2015 Sep-2014
# of transistors 8.0billion 3.84billion
# of cores
3072
(simple)
12
(functional)
Core clock 1.0GHz
2.6GHz,
up to 3.5GHz
Peak Flops
(single
precision)
6.6TFLOPS
998.4GFLOPS
(with AVX2)
DRAM size
12GB, GDDR5
(384bits bus)
768GB/socket,
DDR4
Memory band 336.5GB/s 68GB/s
Power
consumption
250W 135W
Price $999 $2,094
6
How GPU cores works
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
●item[0]
step.1 step.2 step.4step.3
Calculation of
𝑖𝑡𝑒𝑚[𝑖]
𝑖=0…𝑁−1
with GPU cores
◆
●
▲ ■ ★
● ◆
●
● ◆ ▲
●
● ◆
●
● ◆ ▲ ■
●
● ◆
●
● ◆ ▲
●
● ◆
●
item[1]
item[2]
item[3]
item[4]
item[5]
item[6]
item[7]
item[8]
item[9]
item[10]
item[11]
item[12]
item[13]
item[14]
item[15]
Sum of items[]
by log2N steps
Inter-core synchronization by HW functionality
7
Dark Silicon and Semiconductor Trend
▌Processor shrink also increases power leakage.
▌Dark Silicon: area cannot power-on due to thermal problem.
▌People thought: How to utilize these extra transistors?
Domain specific processors, like GPU
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
Core Core Core Core
Core Core Core Core
Dark SiliconCore Core
Core Core
Dark Silicon
Core
Core
Core
Core
Core
Core
Next GenNext Gen
Intel Haswell
on-chip layout
(4core system)
8
Towards heterogeneous era...
Software must be designed to intermediate CPU and GPU
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
9
Agenda
1. GPU characteristics and why?
2. What intermediates PostgreSQL and GPU
3. How GPU processes SQL workloads
4. Performance results and analysis
5. Further development
10
What is PG-Strom
▌Core ideas
① GPU native code generation on the fly
② Asynchronous massive parallel execution
▌Advantages
 Transparent acceleration with 100% query compatibility
 Commodity H/W and less system integration cost
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
Parser
Planner
Executor
Custom-
Scan/Join
Interface
Query: SELECT * FROM l_tbl JOIN r_tbl on l_tbl.lid = r_tbl.rid;
PG-Strom
CUDA
driver
nvrtc
DMA Data Transfer
CUDA
Source
code
Massive
Parallel
Execution
11
How query is processed
▌Custom-Scan/Join interface (v9.5 new!)
▌PG-Strom provides own scan/join logics using GPUs
▌It generates same results, but different way
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
Query Parser
Query Optimizer
Query Executor
SQL
(text)
Parse
Tree
Query Plan
Custom
Scan/Join
Interface
PG-Strom
Interactions
between
PostgreSQL
and
Extensions
CUDA
code
Data
Chunk
NVIDIA
Run-Time
Compiler
12
definition
at planning
time
(OT) Dive into Custom-Join
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
Built-in JOIN Custom JOIN
CustomScan
(scanrelid==0)
Logic implemented
by extension
Inner Scan Outer Scan
table-X table-Y
Join
• NestedLoop
• MergeJoin
• HashJoin
TupleDesc
of table-X
TupleDesc
of table-Y
X-tuple Y-tuple
joined-tuple
TupleDesc
of Join X-Y
table-Y Something other
data source
table-X
Data Load with extension’s
comfortable way
joined-tuple
compatible tuple format
13
SQL-to-GPU native code generation (1/3)
postgres=# EXPLAIN VERBOSE
SELECT cat, count(*), avg(x) FROM t0
WHERE x between y and y + 20.0 GROUP BY cat;
QUERY PLAN
-------------------------------------------------------------------------------------
HashAggregate (cost=167646.33..167646.36 rows=3 width=12)
Output: cat, pgstrom.count((pgstrom.nrows())),
pgstrom.avg((pgstrom.nrows((x IS NOT NULL))), (pgstrom.psum(x)))
Group Key: t0.cat
-> Custom Scan (GpuPreAgg) (cost=24150.22..163315.53 rows=258 width=48)
Output: cat, pgstrom.nrows(), pgstrom.nrows((x IS NOT NULL)), pgstrom.psum(x)
Bulkload: On (density: 100.00%)
Reduction: Local + Global
Device Filter: ((t0.x >= t0.y) AND (t0.x <= (t0.y + '20'::double precision)))
Features: format: tuple-slot, bulkload: unsupported
Kernel Source: /opt/pgsql/base/pgsql_tmp/pgsql_tmp_strom_24905.4.gpu
-> Custom Scan (BulkScan) on public.t0 (cost=21150.22..159312.94
rows=10000060 width=12)
Output: id, cat, aid, bid, cid, did, eid, x, y, z
Features: format: heap-tuple, bulkload: supported
(13 rows)
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
14
SQL-to-GPU native code generation (2/3)
$ less /opt/pgsql/base/pgsql_tmp/pgsql_tmp_strom_24905.4.gpu
:
STATIC_FUNCTION(bool)
gpupreagg_qual_eval(kern_context *kcxt,
kern_data_store *kds,
kern_data_store *ktoast,
size_t kds_index)
{
pg_float8_t KPARAM_1 = pg_float8_param(kcxt,1); /* constant 20.0 */
pg_float8_t KVAR_8 = pg_float8_vref(kds,kcxt,7,kds_index); /* var of X */
pg_float8_t KVAR_9 = pg_float8_vref(kds,kcxt,8,kds_index); /* var of Y */
return EVAL(pgfn_boolop_and_2(kcxt,
pgfn_float8ge(kcxt, KVAR_8, KVAR_9),
pgfn_float8le(kcxt, KVAR_8,
pgfn_float8pl(kcxt, KVAR_9, KPARAM_1))));
}
:
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
 formula expression based on WHERE clause
15
SQL-to-GPU native code generation (3/3)
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
SQL
Query
Dynamically
constructed
GPU code
------
Per Query Variant
Statically
constructed
GPU code
------
Algorithm Core
NVIDIA
Run-Time
Compiler
(nvrtc)
Cached
GPU Binary
+
PG-Strom
GPU code
generator
For
Execution
For Next
Time
GpuScan,
GpuHashJoin,
GpuPreAgg,
GpuSort, ...
16
PG-Strom functionalities at Sep-2015
▌Logics
 GpuScan ... Simple loop extraction by GPU multithread
 GpuHashJoin ... GPU multithread based N-way hash-join
 GpuNestLoop ... GPU multithread based N-way nested-loop
 GpuPreAgg ... Row reduction prior to CPU aggregation
 GpuSort ... GPU bitonic + CPU merge, hybrid sorting
▌Data Types
 Numeric ... int2/4/8, float4/8, numeric
 Date and Time ... date, time(tz), timestamp(tz), interval
 Text ... simple matching only, at this moment
 others ... currency
▌Functions
 Comparison operator ... <, <=, !=, =, >=, >
 Arithmetic operators ... +, -, *, /, %, ...
 Mathematical functions ... sqrt, log, exp, ...
 Aggregate functions ... min, max, sum, avg, stddev, ...
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
17
Agenda
1. GPU characteristics and why?
2. What intermediates PostgreSQL and GPU
3. How GPU processes SQL workloads
4. Performance results and analysis
5. Further development
18
Asynchronous & Pipelining Execution (1/3)
▌Karma of discrete GPU device
 We have to move the data to be processed.
 Thus, people say data intensive tasks are not good for GPU.
 Correct, but we can reduce the impact.
 Our solution: Pipelining
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
GPU
CPU
GPU kernel
execution
time
DMA
Send
DMA
Recv
19
Asynchronous & Pipelining Execution (2/3)
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
chunk-(i+1)
chunk-(i+2)
chunk-i
chunk-(i+3)
tablescan
Move
to Next
DMA
Recv
GPU
Kernel
Exec
DMA
Send
Buffer
Read
DMA
Recv
GPU
Kernel
Exec
DMA
Send
Buffer
Read
GPU
Kernel
Exec
DMA
Send
Buffer
Read
DMA
Send
Buffer
Read
Step-(i+4) Step-(i+3) Step-(i+2) Step-(i+1) Step-i
20
Asynchronous & Pipelining Execution (3/3)
▌Karma of discrete GPU device
 We have to move the data to be processed.
 Thus, people say data intensive tasks are not good for GPU.
 Correct, but we can reduce the impact
 Our solution: Pipelining
▌Downside of the pipelining
 Too small workloads
 Pipeline hazard
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
GPU
CPU
GPU kernel
execution
time
DMA
Send
DMA
Recv
21
SQL Logic on GPU (1/3) – GpuHashJoin
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
Inner
relation
Outer
relation
Inner
relation
Outer
relation
Hash table Hash table
Next step Next step
All CPU does
is just
references
the result of
relations join
Sequential
Hash table
search by CPU
Sequential
Projection
by CPU
Parallel
Projection
by GPU
Parallel Hash-
table search
by GPU
Built-in Hash-Join GpuHashJoin of PG-Strom
22
Inner Hash table
OT: Ununiformly Distributed Hash-Table
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
GPU Core-0
GPU Core-1
GPU Core-2
GPU Core-3
GPU Core-4
GPU Core-5
GPU Core-M
Outer
relation
Item[0]
Item[1]
Item[2]
Item[3]
Item[4]
Item[5]
Item[N]
Item[M]
Item Item Item Item Item
Item Item Item Item Item
Inner
relation
Hash
Value
Hash
Function
Item[3] Item
Item[3] Item
Item[3] Item
Item[3] Item
Item[3] Item
Item[3] Item
Item[3] Item
Item[3] Item
Item[3] Item
Item[3] Item
Overflow of
Result Buffer
23
Inner Hash table
OT: Ununiformly Distributed Hash-Table
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
GPU Core-0
GPU Core-1
GPU Core-2
GPU Core-3
GPU Core-4
GPU Core-5
GPU Core-M
Outer
relation
Item[0]
Item[1]
Item[2]
Item[3]
Item[4]
Item[5]
Item[N]
Item[M]
Item Item Item Item Item
Item Item Item Item Item
Inner
relation
Hash
Value
Hash
Function
Item[3] Item Item[3] Item
Virtual Partition
with Extra Randomness
Just fit for
Result Buffer
x Retry for each
virtual partition
24
SQL Logic on GPU (2/3) – GpuNestLoop (unparametalized)
Outer-Relation
(Nx:usuallylarger)
※splittochunk-by-chunkon
demand
●
●
●
●
●
●
●
●
●
●
●●●●●●●Two
dimensional
GPU kernel
launch
blockDim.x
blockDim.y
Ny
Nx
Thread
(X=2, Y=3)
Inner-Relation
(Ny: relatively small)
Only edge thread references
DRAM to fetch values.
Nx:32 x Ny:32 = 1024
A matrix can be evaluated with
only 64 times DRAM accesses
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
25
SQL Logic on GPU (3/3) – GpuPreAgg
▌GpuPreAgg, prior to Aggregate, reduces number of rows to be processed.
▌Query rewrite to make equivalent results from intermediate aggregation.
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
GroupAggregate
Sort
SeqScan
Tbl_1
Result Set
several rows
several millions rows
several
millions
rows
Y, count(*),
avg(X)
X, Y
X, Y
X, Y, Z (table definition)
GroupAggregate
Sort
GpuScan
Tbl_1
Result Set
several rows
several hundreds rows
several millions rows
Y, sum(nrows),
avg_ex(nrows, psum)
Y, nrows, psum_X
X, Y
X, Y, Z (table definition)
GpuPreAgg Y, nrows, psum_X
several hundreds rows
SELECT count(*), AVG(X), Y FROM Tbl_1 GROUP BY Y;
reduction
of rows to be
processed
26
Agenda
1. GPU characteristics and why?
2. What intermediates PostgreSQL and GPU
3. How GPU processes SQL workloads
4. Performance results and analysis
5. Further development
27
Microbenchmark (1/2) – total response time
▌SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 [, ...] GROUP BY cat;
measurement of query response time with increasing of inner relations
 t0: 100M rows, t1~t10: 100K rows for each, all the data was preloaded.
▌Environment:
 PostgreSQL v9.5beta1 + PG-Strom (22-Oct), CUDA 7.0 + RHEL6.6 (x86_64)
 CPU: Xeon E5-2670v3, RAM: 384GB, GPU: NVIDIA TESLA K20c (2496cores)
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
58.89
82.51
102.15
125.46
159.83
194.50
241.88
12.44 17.88 17.56 21.98
33.33 38.14
50.29
0
50
100
150
200
250
300
2 3 4 5 6 7 8
QueryResponseTime[sec]
# of tables involved
Query Response Time towards Numer of Tables Joined
PostgreSQL 9.5β PG-Strom
28
Microbenchmark (2/2) – Time consumption per component
▌Good acceleration results towards JOIN and AGGREGATION
▌Less acceleration ratio when N is larger than 6
By inaccurate problem size estimation, why?
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
0
50
100
150
200
250
300
PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom
2 3 4 5 6 7 8
QueryResponseTime[sec]
# of tables involved
Time consumption per component (PostgreSQL v9.5β vs PG-Strom)
Scan Join Aggregate Others
29
OT: Why accurate problem size estimation is important
▌GpuJoin results must be smaller than GPU RAM size.
▌Virtual inner-partition saves scale of the Join results.
Better size estimation reduces number of retries.
We can determine the problem size using dynamic parallelism
downside: Supported by only Kepler or later generation.
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
innerRel
(depth-1)
innerRel
(depth-2)
innerRel
(depth-3)
innerRel
(depth-4)
outerRel
Results
× × × × =
× × × × =
GpuJoin with virtual inner-partition and iteration
Larger
than
GPU RAM
Within
GPU RAM
Size
30
TPC-DS Benchmark (1/2) – Total Results
▌What is TPC-DS?
 Successor of TPC-H, defines 103 (99+4) kind of queries
 Simulation of decision support queries in retail industry
▌About this benchmark
 Run the series of TPC-DS queries sequentially.
 We could collect 96 results of 103.
 Environment:
• PostgreSQL v9.5beta1 + PG-Strom (22-Oct), CUDA 7.0 + RHEL6.6 (x86_64)
• CPU: Xeon E5-2670v3, RAM: 384GB, GPU: NVIDIA TESLA K20c (2496cores)
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
0
500
1000
1500
2000
QueryResponseTime[sec]
PostgreSQL v9.5β PG-Strom
anomaly?
31
TPC-DS Benchmark (2/2) – Total Result
▌Summary
 PG-Strom wins to PostgreSQL: 76 of 96
 PostgreSQL wins to PG-Strom: 20 of 96
 shows advantages on most of simple queries
▌Challenges
 Wise plan construction, to avoid GpuXXX node not to be here
 Functionality improvement for Aggregation and Scan
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
0
100
200
300
400
500
QueryResponseTime[sec]
PostgreSQL v9.5β PG-Strom
32
Time consumption per workloads (1/3) – PostgreSQL v9.5
▌Scan, Join and Aggregation provides a clear explanation for
the time consumption factor in TPC-DS workloads
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Time consumption per workloads (PostgreSQL v9.5beta)
Scan Join Aggregate Others
33
Time consumption per workloads (2/3) – PG-Strom
▌PG-Strom dramatically reduced time for Joins.
▌Time for Joins are: 37%  17%
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Time consumption per workloads (PostgreSQL v9.5beta)
Scan Join Aggregate Others
34
Time consumption per workloads (3/3) – Wow! GpuJoin
GpuJoin can explain 84% of performance improvement
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
y = 0.962x - 7.289
R² = 0.8442
-50.00
0.00
50.00
100.00
150.00
200.00
250.00
-50.00 0.00 50.00 100.00 150.00 200.00 250.00
ImprovementofTotalTime[sec]
(TotaltimeofPgSQL)–(TotaltimeofPG-Strom)
Improvement of Join Time [sec]
(Join time of PgSQL) – (Join time of PG-Strom)
35
Agenda
1. GPU characteristics and why?
2. What intermediates PostgreSQL and GPU
3. How GPU processes SQL workloads
4. Performance results and analysis
5. Further development
36
What we learn from TPC-DS
▌GpuJoin may be a killer feature of PG-Strom
 ....unless problem size estimation is accurate.
 Planner needs to be wiser, not to inject GpuJoin towards
the situation not suitable.
▌For Aggregation
 GpuPreAgg didn’t appear as we expected
 Anything GPU can help CPU aggregation?
▌For Scan
 Scan performance == data stream bandwidth of GPU input
 Single thread is a restriction for bandwidth of data supply
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
37
Features enhancement / improvement
▌Target-list calculation on GPU
 Pre-calculation of hash-value for HashAggregate
 Projection off-load if complicated mathematical expression
▌Custom-Scan under Gather node
 Allows multiple workers to provide data stream to GPUs
▌Partial Aggregation before Join
 Wider utilization of GpuPreAgg to reduce number of rows
to be joined by CPU
▌Utilization of dynamic parallelism
 Wise approach towards result buffer overflow problem in Join
▌... and more...
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
38
Short term development
1. Software quality improvement
2. Corner case handling
3. Documentation
4. Target-List calculation in GPU
5. Software quality improvement
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
39
WIP Project – In-place Computing
Why export entire data to run mathematical algorithm?
▌Runs complicated mathematical expression on GPUs, then
CPU just references the result
 Algorithms in SQL (even partially) automatically moves to GPUs
 WIP: Data-Clustering algorithms, like Min-Max, K-means, ...
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
mathematical
algorithm
Things
wants to
know
Things
wants to
know
result
result
mathematical
algorithm
Principle of
Near-Data
Computing
Limited
computing
capability
Data
Export
40
Future Direction of PG-Strom
Real-life workloads are the only compass for development
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
41
WE WANT YOU GET INVOLVED
▌PostgreSQL Users
 who concern about query
execution performance on
your workloads
▌Application Developers
 who are interested in trying
yet another database
acceleration methodology
▌PostgreSQL Developers
 who can bet the future of
heterogeneous era
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
42
Resources
▌Github
 https://github.com/pg-strom/devel
 https://github.com/pg-strom/devel/issues
▌Documentation
 Under construction.... sorry
▌Today’s slides
 Uploaded soon, check #pgconfeu on Tw
▌Author’s contact
 e-mail: kaigai@ak.jp.nec.com
 Tw: @kkaigai
PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
43

More Related Content

What's hot

GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)Kohei KaiGai
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storageKohei KaiGai
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...Kohei KaiGai
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsKohei KaiGai
 
PG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU devicePG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU deviceKohei KaiGai
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - EnglishKohei KaiGai
 
20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdwKohei KaiGai
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_ProcessingKohei KaiGai
 
20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGISKohei KaiGai
 
PG-Strom v2.0 Technical Brief (17-Apr-2018)
PG-Strom v2.0 Technical Brief (17-Apr-2018)PG-Strom v2.0 Technical Brief (17-Apr-2018)
PG-Strom v2.0 Technical Brief (17-Apr-2018)Kohei KaiGai
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDASavith Satheesh
 
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_IndexKohei KaiGai
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Ural-PDC
 
20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multiKohei KaiGai
 
20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdw20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdwKohei KaiGai
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQLKohei KaiGai
 
Easy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersEasy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersKazuaki Ishizaki
 
20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_ENKohei KaiGai
 

What's hot (20)

GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)GPGPU Accelerates PostgreSQL (English)
GPGPU Accelerates PostgreSQL (English)
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
PostgreSQL with OpenCL
PostgreSQL with OpenCLPostgreSQL with OpenCL
PostgreSQL with OpenCL
 
PG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU devicePG-Strom - A FDW module utilizing GPU device
PG-Strom - A FDW module utilizing GPU device
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English
 
20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw
 
20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing20201006_PGconf_Online_Large_Data_Processing
20201006_PGconf_Online_Large_Data_Processing
 
20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS20201128_OSC_Fukuoka_Online_GPUPostGIS
20201128_OSC_Fukuoka_Online_GPUPostGIS
 
PG-Strom v2.0 Technical Brief (17-Apr-2018)
PG-Strom v2.0 Technical Brief (17-Apr-2018)PG-Strom v2.0 Technical Brief (17-Apr-2018)
PG-Strom v2.0 Technical Brief (17-Apr-2018)
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
 
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
 
20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi20181016_pgconfeu_ssd2gpu_multi
20181016_pgconfeu_ssd2gpu_multi
 
20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdw20181025_pgconfeu_lt_gstorefdw
20181025_pgconfeu_lt_gstorefdw
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL
 
Easy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersEasy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java Programmers
 
20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN
 
Debugging CUDA applications
Debugging CUDA applicationsDebugging CUDA applications
Debugging CUDA applications
 

Viewers also liked

PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSTomas Vondra
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSTomas Vondra
 
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016Tomas Vondra
 
What's New in PostgreSQL 9.6
What's New in PostgreSQL 9.6What's New in PostgreSQL 9.6
What's New in PostgreSQL 9.6EDB
 
20170127 JAWS HPC-UG#8
20170127 JAWS HPC-UG#820170127 JAWS HPC-UG#8
20170127 JAWS HPC-UG#8Kohei KaiGai
 
PostgreSQL performance improvements in 9.5 and 9.6
PostgreSQL performance improvements in 9.5 and 9.6PostgreSQL performance improvements in 9.5 and 9.6
PostgreSQL performance improvements in 9.5 and 9.6Tomas Vondra
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015PostgreSQL-Consulting
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsKohei KaiGai
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performancePostgreSQL-Consulting
 
雑なMySQLパフォーマンスチューニング
雑なMySQLパフォーマンスチューニング雑なMySQLパフォーマンスチューニング
雑なMySQLパフォーマンスチューニングyoku0825
 

Viewers also liked (10)

PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
 
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
 
What's New in PostgreSQL 9.6
What's New in PostgreSQL 9.6What's New in PostgreSQL 9.6
What's New in PostgreSQL 9.6
 
20170127 JAWS HPC-UG#8
20170127 JAWS HPC-UG#820170127 JAWS HPC-UG#8
20170127 JAWS HPC-UG#8
 
PostgreSQL performance improvements in 9.5 and 9.6
PostgreSQL performance improvements in 9.5 and 9.6PostgreSQL performance improvements in 9.5 and 9.6
PostgreSQL performance improvements in 9.5 and 9.6
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
雑なMySQLパフォーマンスチューニング
雑なMySQLパフォーマンスチューニング雑なMySQLパフォーマンスチューニング
雑なMySQLパフォーマンスチューニング
 

Similar to GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit pptSandeep Singh
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018NVIDIA
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGaiKohei KaiGai
 
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...Equnix Business Solutions
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUsfcassier
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUsiguazio
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecturemohamedragabslideshare
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievVolodymyr Saviak
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Kohei KaiGai
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to AcceleratorsDilum Bandara
 
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONSA SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONScseij
 

Similar to GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~ (20)

Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
 
RAPIDS Overview
RAPIDS OverviewRAPIDS Overview
RAPIDS Overview
 
SNAP MACHINE LEARNING
SNAP MACHINE LEARNINGSNAP MACHINE LEARNING
SNAP MACHINE LEARNING
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai
 
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
PGConf.ASIA 2019 Bali - Full-throttle Running on Terabytes Log-data - Kohei K...
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 Kiev
 
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
E3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - SundanceE3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - Sundance
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to Accelerators
 
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONSA SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
 
GPU - Basic Working
GPU - Basic WorkingGPU - Basic Working
GPU - Basic Working
 
Pgopencl
PgopenclPgopencl
Pgopencl
 

More from Kohei KaiGai

20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_History20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_HistoryKohei KaiGai
 
20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_APIKohei KaiGai
 
20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrow20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrowKohei KaiGai
 
20210928_pgunconf_hll_count
20210928_pgunconf_hll_count20210928_pgunconf_hll_count
20210928_pgunconf_hll_countKohei KaiGai
 
20210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.020210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.0Kohei KaiGai
 
20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCache20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCacheKohei KaiGai
 
20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGIS20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGISKohei KaiGai
 
20200828_OSCKyoto_Online
20200828_OSCKyoto_Online20200828_OSCKyoto_Online
20200828_OSCKyoto_OnlineKohei KaiGai
 
20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdw20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdwKohei KaiGai
 
20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_Fdw20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_FdwKohei KaiGai
 
20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_Tokyo20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_TokyoKohei KaiGai
 
20191115-PGconf.Japan
20191115-PGconf.Japan20191115-PGconf.Japan
20191115-PGconf.JapanKohei KaiGai
 
20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_Beta20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_BetaKohei KaiGai
 
20190925_DBTS_PGStrom
20190925_DBTS_PGStrom20190925_DBTS_PGStrom
20190925_DBTS_PGStromKohei KaiGai
 
20190516_DLC10_PGStrom
20190516_DLC10_PGStrom20190516_DLC10_PGStrom
20190516_DLC10_PGStromKohei KaiGai
 
20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdw20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdwKohei KaiGai
 
20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_Fdw20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_FdwKohei KaiGai
 
20181212 - PGconf.ASIA - LT
20181212 - PGconf.ASIA - LT20181212 - PGconf.ASIA - LT
20181212 - PGconf.ASIA - LTKohei KaiGai
 
20181211 - PGconf.ASIA - NVMESSD&GPU for BigData
20181211 - PGconf.ASIA - NVMESSD&GPU for BigData20181211 - PGconf.ASIA - NVMESSD&GPU for BigData
20181211 - PGconf.ASIA - NVMESSD&GPU for BigDataKohei KaiGai
 
20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA UnconferenceKohei KaiGai
 

More from Kohei KaiGai (20)

20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_History20221116_DBTS_PGStrom_History
20221116_DBTS_PGStrom_History
 
20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API20221111_JPUG_CustomScan_API
20221111_JPUG_CustomScan_API
 
20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrow20211112_jpugcon_gpu_and_arrow
20211112_jpugcon_gpu_and_arrow
 
20210928_pgunconf_hll_count
20210928_pgunconf_hll_count20210928_pgunconf_hll_count
20210928_pgunconf_hll_count
 
20210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.020210731_OSC_Kyoto_PGStrom3.0
20210731_OSC_Kyoto_PGStrom3.0
 
20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCache20210511_PGStrom_GpuCache
20210511_PGStrom_GpuCache
 
20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGIS20201113_PGconf_Japan_GPU_PostGIS
20201113_PGconf_Japan_GPU_PostGIS
 
20200828_OSCKyoto_Online
20200828_OSCKyoto_Online20200828_OSCKyoto_Online
20200828_OSCKyoto_Online
 
20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdw20200806_PGStrom_PostGIS_GstoreFdw
20200806_PGStrom_PostGIS_GstoreFdw
 
20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_Fdw20200424_Writable_Arrow_Fdw
20200424_Writable_Arrow_Fdw
 
20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_Tokyo20191211_Apache_Arrow_Meetup_Tokyo
20191211_Apache_Arrow_Meetup_Tokyo
 
20191115-PGconf.Japan
20191115-PGconf.Japan20191115-PGconf.Japan
20191115-PGconf.Japan
 
20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_Beta20190926_Try_RHEL8_NVMEoF_Beta
20190926_Try_RHEL8_NVMEoF_Beta
 
20190925_DBTS_PGStrom
20190925_DBTS_PGStrom20190925_DBTS_PGStrom
20190925_DBTS_PGStrom
 
20190516_DLC10_PGStrom
20190516_DLC10_PGStrom20190516_DLC10_PGStrom
20190516_DLC10_PGStrom
 
20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdw20190418_PGStrom_on_ArrowFdw
20190418_PGStrom_on_ArrowFdw
 
20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_Fdw20190314 PGStrom Arrow_Fdw
20190314 PGStrom Arrow_Fdw
 
20181212 - PGconf.ASIA - LT
20181212 - PGconf.ASIA - LT20181212 - PGconf.ASIA - LT
20181212 - PGconf.ASIA - LT
 
20181211 - PGconf.ASIA - NVMESSD&GPU for BigData
20181211 - PGconf.ASIA - NVMESSD&GPU for BigData20181211 - PGconf.ASIA - NVMESSD&GPU for BigData
20181211 - PGconf.ASIA - NVMESSD&GPU for BigData
 
20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference20181210 - PGconf.ASIA Unconference
20181210 - PGconf.ASIA Unconference
 

Recently uploaded

Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptrcbcrtm
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 

Recently uploaded (20)

Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 

GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~

  • 1. 1 GPGPU Accelerates PostgreSQL ~Unlock the power of multi-thousand cores~ PGconf.EU 2015 Wien NEC Business Creation Division The PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
  • 2. 2 Today’s topic PGconf.EU 2015 - GPGPU Accelerates PostgreSQL Very powerful computing capability Very functional & well-used database PG-Strom: Glue of the both promising technologies GPGPU
  • 3. 3 Self Introduction ▌Name: KaiGai Kohei ▌Works for:NEC ▌Role:  Lead of the PG-Strom project  Contribution to PostgreSQL and other OSS projects  Making business opportunity using this technology ▌PG-Strom Project  Mission: To deliver the power of semiconductor evolution for every people who want  Launched at Jan-2012, as a personal development project  Project is now funded by NEC  Fully open source project PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
  • 4. 4 Agenda 1. GPU characteristics and why? 2. What intermediates PostgreSQL and GPU 3. How GPU processes SQL workloads 4. Performance results and analysis 5. Further development
  • 5. 5 Features of GPU (Graphic Processor Unit) ▌Characteristics  Massive parallel cores  Much higher DRAM bandwidth  Better price / performance ratio  advantages for massive but simple arithmetic operations  Different manner to write software algorithm PGconf.EU 2015 - GPGPU Accelerates PostgreSQL SOURCE: CUDA C Programming Guide GPU CPU Model Nvidia GTX TITAN X Intel Xeon E5-2690 v3 Architecture Maxwell Haswell Launch Mar-2015 Sep-2014 # of transistors 8.0billion 3.84billion # of cores 3072 (simple) 12 (functional) Core clock 1.0GHz 2.6GHz, up to 3.5GHz Peak Flops (single precision) 6.6TFLOPS 998.4GFLOPS (with AVX2) DRAM size 12GB, GDDR5 (384bits bus) 768GB/socket, DDR4 Memory band 336.5GB/s 68GB/s Power consumption 250W 135W Price $999 $2,094
  • 6. 6 How GPU cores works PGconf.EU 2015 - GPGPU Accelerates PostgreSQL ●item[0] step.1 step.2 step.4step.3 Calculation of 𝑖𝑡𝑒𝑚[𝑖] 𝑖=0…𝑁−1 with GPU cores ◆ ● ▲ ■ ★ ● ◆ ● ● ◆ ▲ ● ● ◆ ● ● ◆ ▲ ■ ● ● ◆ ● ● ◆ ▲ ● ● ◆ ● item[1] item[2] item[3] item[4] item[5] item[6] item[7] item[8] item[9] item[10] item[11] item[12] item[13] item[14] item[15] Sum of items[] by log2N steps Inter-core synchronization by HW functionality
  • 7. 7 Dark Silicon and Semiconductor Trend ▌Processor shrink also increases power leakage. ▌Dark Silicon: area cannot power-on due to thermal problem. ▌People thought: How to utilize these extra transistors? Domain specific processors, like GPU PGconf.EU 2015 - GPGPU Accelerates PostgreSQL Core Core Core Core Core Core Core Core Dark SiliconCore Core Core Core Dark Silicon Core Core Core Core Core Core Next GenNext Gen Intel Haswell on-chip layout (4core system)
  • 8. 8 Towards heterogeneous era... Software must be designed to intermediate CPU and GPU PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
  • 9. 9 Agenda 1. GPU characteristics and why? 2. What intermediates PostgreSQL and GPU 3. How GPU processes SQL workloads 4. Performance results and analysis 5. Further development
  • 10. 10 What is PG-Strom ▌Core ideas ① GPU native code generation on the fly ② Asynchronous massive parallel execution ▌Advantages  Transparent acceleration with 100% query compatibility  Commodity H/W and less system integration cost PGconf.EU 2015 - GPGPU Accelerates PostgreSQL Parser Planner Executor Custom- Scan/Join Interface Query: SELECT * FROM l_tbl JOIN r_tbl on l_tbl.lid = r_tbl.rid; PG-Strom CUDA driver nvrtc DMA Data Transfer CUDA Source code Massive Parallel Execution
  • 11. 11 How query is processed ▌Custom-Scan/Join interface (v9.5 new!) ▌PG-Strom provides own scan/join logics using GPUs ▌It generates same results, but different way PGconf.EU 2015 - GPGPU Accelerates PostgreSQL Query Parser Query Optimizer Query Executor SQL (text) Parse Tree Query Plan Custom Scan/Join Interface PG-Strom Interactions between PostgreSQL and Extensions CUDA code Data Chunk NVIDIA Run-Time Compiler
  • 12. 12 definition at planning time (OT) Dive into Custom-Join PGconf.EU 2015 - GPGPU Accelerates PostgreSQL Built-in JOIN Custom JOIN CustomScan (scanrelid==0) Logic implemented by extension Inner Scan Outer Scan table-X table-Y Join • NestedLoop • MergeJoin • HashJoin TupleDesc of table-X TupleDesc of table-Y X-tuple Y-tuple joined-tuple TupleDesc of Join X-Y table-Y Something other data source table-X Data Load with extension’s comfortable way joined-tuple compatible tuple format
  • 13. 13 SQL-to-GPU native code generation (1/3) postgres=# EXPLAIN VERBOSE SELECT cat, count(*), avg(x) FROM t0 WHERE x between y and y + 20.0 GROUP BY cat; QUERY PLAN ------------------------------------------------------------------------------------- HashAggregate (cost=167646.33..167646.36 rows=3 width=12) Output: cat, pgstrom.count((pgstrom.nrows())), pgstrom.avg((pgstrom.nrows((x IS NOT NULL))), (pgstrom.psum(x))) Group Key: t0.cat -> Custom Scan (GpuPreAgg) (cost=24150.22..163315.53 rows=258 width=48) Output: cat, pgstrom.nrows(), pgstrom.nrows((x IS NOT NULL)), pgstrom.psum(x) Bulkload: On (density: 100.00%) Reduction: Local + Global Device Filter: ((t0.x >= t0.y) AND (t0.x <= (t0.y + '20'::double precision))) Features: format: tuple-slot, bulkload: unsupported Kernel Source: /opt/pgsql/base/pgsql_tmp/pgsql_tmp_strom_24905.4.gpu -> Custom Scan (BulkScan) on public.t0 (cost=21150.22..159312.94 rows=10000060 width=12) Output: id, cat, aid, bid, cid, did, eid, x, y, z Features: format: heap-tuple, bulkload: supported (13 rows) PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
  • 14. 14 SQL-to-GPU native code generation (2/3) $ less /opt/pgsql/base/pgsql_tmp/pgsql_tmp_strom_24905.4.gpu : STATIC_FUNCTION(bool) gpupreagg_qual_eval(kern_context *kcxt, kern_data_store *kds, kern_data_store *ktoast, size_t kds_index) { pg_float8_t KPARAM_1 = pg_float8_param(kcxt,1); /* constant 20.0 */ pg_float8_t KVAR_8 = pg_float8_vref(kds,kcxt,7,kds_index); /* var of X */ pg_float8_t KVAR_9 = pg_float8_vref(kds,kcxt,8,kds_index); /* var of Y */ return EVAL(pgfn_boolop_and_2(kcxt, pgfn_float8ge(kcxt, KVAR_8, KVAR_9), pgfn_float8le(kcxt, KVAR_8, pgfn_float8pl(kcxt, KVAR_9, KPARAM_1)))); } : PGconf.EU 2015 - GPGPU Accelerates PostgreSQL  formula expression based on WHERE clause
  • 15. 15 SQL-to-GPU native code generation (3/3) PGconf.EU 2015 - GPGPU Accelerates PostgreSQL SQL Query Dynamically constructed GPU code ------ Per Query Variant Statically constructed GPU code ------ Algorithm Core NVIDIA Run-Time Compiler (nvrtc) Cached GPU Binary + PG-Strom GPU code generator For Execution For Next Time GpuScan, GpuHashJoin, GpuPreAgg, GpuSort, ...
  • 16. 16 PG-Strom functionalities at Sep-2015 ▌Logics  GpuScan ... Simple loop extraction by GPU multithread  GpuHashJoin ... GPU multithread based N-way hash-join  GpuNestLoop ... GPU multithread based N-way nested-loop  GpuPreAgg ... Row reduction prior to CPU aggregation  GpuSort ... GPU bitonic + CPU merge, hybrid sorting ▌Data Types  Numeric ... int2/4/8, float4/8, numeric  Date and Time ... date, time(tz), timestamp(tz), interval  Text ... simple matching only, at this moment  others ... currency ▌Functions  Comparison operator ... <, <=, !=, =, >=, >  Arithmetic operators ... +, -, *, /, %, ...  Mathematical functions ... sqrt, log, exp, ...  Aggregate functions ... min, max, sum, avg, stddev, ... PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
  • 17. 17 Agenda 1. GPU characteristics and why? 2. What intermediates PostgreSQL and GPU 3. How GPU processes SQL workloads 4. Performance results and analysis 5. Further development
  • 18. 18 Asynchronous & Pipelining Execution (1/3) ▌Karma of discrete GPU device  We have to move the data to be processed.  Thus, people say data intensive tasks are not good for GPU.  Correct, but we can reduce the impact.  Our solution: Pipelining PGconf.EU 2015 - GPGPU Accelerates PostgreSQL GPU CPU GPU kernel execution time DMA Send DMA Recv
  • 19. 19 Asynchronous & Pipelining Execution (2/3) PGconf.EU 2015 - GPGPU Accelerates PostgreSQL chunk-(i+1) chunk-(i+2) chunk-i chunk-(i+3) tablescan Move to Next DMA Recv GPU Kernel Exec DMA Send Buffer Read DMA Recv GPU Kernel Exec DMA Send Buffer Read GPU Kernel Exec DMA Send Buffer Read DMA Send Buffer Read Step-(i+4) Step-(i+3) Step-(i+2) Step-(i+1) Step-i
  • 20. 20 Asynchronous & Pipelining Execution (3/3) ▌Karma of discrete GPU device  We have to move the data to be processed.  Thus, people say data intensive tasks are not good for GPU.  Correct, but we can reduce the impact  Our solution: Pipelining ▌Downside of the pipelining  Too small workloads  Pipeline hazard PGconf.EU 2015 - GPGPU Accelerates PostgreSQL GPU CPU GPU kernel execution time DMA Send DMA Recv
  • 21. 21 SQL Logic on GPU (1/3) – GpuHashJoin PGconf.EU 2015 - GPGPU Accelerates PostgreSQL Inner relation Outer relation Inner relation Outer relation Hash table Hash table Next step Next step All CPU does is just references the result of relations join Sequential Hash table search by CPU Sequential Projection by CPU Parallel Projection by GPU Parallel Hash- table search by GPU Built-in Hash-Join GpuHashJoin of PG-Strom
  • 22. 22 Inner Hash table OT: Ununiformly Distributed Hash-Table PGconf.EU 2015 - GPGPU Accelerates PostgreSQL GPU Core-0 GPU Core-1 GPU Core-2 GPU Core-3 GPU Core-4 GPU Core-5 GPU Core-M Outer relation Item[0] Item[1] Item[2] Item[3] Item[4] Item[5] Item[N] Item[M] Item Item Item Item Item Item Item Item Item Item Inner relation Hash Value Hash Function Item[3] Item Item[3] Item Item[3] Item Item[3] Item Item[3] Item Item[3] Item Item[3] Item Item[3] Item Item[3] Item Item[3] Item Overflow of Result Buffer
  • 23. 23 Inner Hash table OT: Ununiformly Distributed Hash-Table PGconf.EU 2015 - GPGPU Accelerates PostgreSQL GPU Core-0 GPU Core-1 GPU Core-2 GPU Core-3 GPU Core-4 GPU Core-5 GPU Core-M Outer relation Item[0] Item[1] Item[2] Item[3] Item[4] Item[5] Item[N] Item[M] Item Item Item Item Item Item Item Item Item Item Inner relation Hash Value Hash Function Item[3] Item Item[3] Item Virtual Partition with Extra Randomness Just fit for Result Buffer x Retry for each virtual partition
  • 24. 24 SQL Logic on GPU (2/3) – GpuNestLoop (unparametalized) Outer-Relation (Nx:usuallylarger) ※splittochunk-by-chunkon demand ● ● ● ● ● ● ● ● ● ● ●●●●●●●Two dimensional GPU kernel launch blockDim.x blockDim.y Ny Nx Thread (X=2, Y=3) Inner-Relation (Ny: relatively small) Only edge thread references DRAM to fetch values. Nx:32 x Ny:32 = 1024 A matrix can be evaluated with only 64 times DRAM accesses PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
  • 25. 25 SQL Logic on GPU (3/3) – GpuPreAgg ▌GpuPreAgg, prior to Aggregate, reduces number of rows to be processed. ▌Query rewrite to make equivalent results from intermediate aggregation. PGconf.EU 2015 - GPGPU Accelerates PostgreSQL GroupAggregate Sort SeqScan Tbl_1 Result Set several rows several millions rows several millions rows Y, count(*), avg(X) X, Y X, Y X, Y, Z (table definition) GroupAggregate Sort GpuScan Tbl_1 Result Set several rows several hundreds rows several millions rows Y, sum(nrows), avg_ex(nrows, psum) Y, nrows, psum_X X, Y X, Y, Z (table definition) GpuPreAgg Y, nrows, psum_X several hundreds rows SELECT count(*), AVG(X), Y FROM Tbl_1 GROUP BY Y; reduction of rows to be processed
  • 26. 26 Agenda 1. GPU characteristics and why? 2. What intermediates PostgreSQL and GPU 3. How GPU processes SQL workloads 4. Performance results and analysis 5. Further development
  • 27. 27 Microbenchmark (1/2) – total response time ▌SELECT cat, AVG(x) FROM t0 NATURAL JOIN t1 [, ...] GROUP BY cat; measurement of query response time with increasing of inner relations  t0: 100M rows, t1~t10: 100K rows for each, all the data was preloaded. ▌Environment:  PostgreSQL v9.5beta1 + PG-Strom (22-Oct), CUDA 7.0 + RHEL6.6 (x86_64)  CPU: Xeon E5-2670v3, RAM: 384GB, GPU: NVIDIA TESLA K20c (2496cores) PGconf.EU 2015 - GPGPU Accelerates PostgreSQL 58.89 82.51 102.15 125.46 159.83 194.50 241.88 12.44 17.88 17.56 21.98 33.33 38.14 50.29 0 50 100 150 200 250 300 2 3 4 5 6 7 8 QueryResponseTime[sec] # of tables involved Query Response Time towards Numer of Tables Joined PostgreSQL 9.5β PG-Strom
  • 28. 28 Microbenchmark (2/2) – Time consumption per component ▌Good acceleration results towards JOIN and AGGREGATION ▌Less acceleration ratio when N is larger than 6 By inaccurate problem size estimation, why? PGconf.EU 2015 - GPGPU Accelerates PostgreSQL 0 50 100 150 200 250 300 PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom PgSQL Strom 2 3 4 5 6 7 8 QueryResponseTime[sec] # of tables involved Time consumption per component (PostgreSQL v9.5β vs PG-Strom) Scan Join Aggregate Others
  • 29. 29 OT: Why accurate problem size estimation is important ▌GpuJoin results must be smaller than GPU RAM size. ▌Virtual inner-partition saves scale of the Join results. Better size estimation reduces number of retries. We can determine the problem size using dynamic parallelism downside: Supported by only Kepler or later generation. PGconf.EU 2015 - GPGPU Accelerates PostgreSQL innerRel (depth-1) innerRel (depth-2) innerRel (depth-3) innerRel (depth-4) outerRel Results × × × × = × × × × = GpuJoin with virtual inner-partition and iteration Larger than GPU RAM Within GPU RAM Size
  • 30. 30 TPC-DS Benchmark (1/2) – Total Results ▌What is TPC-DS?  Successor of TPC-H, defines 103 (99+4) kind of queries  Simulation of decision support queries in retail industry ▌About this benchmark  Run the series of TPC-DS queries sequentially.  We could collect 96 results of 103.  Environment: • PostgreSQL v9.5beta1 + PG-Strom (22-Oct), CUDA 7.0 + RHEL6.6 (x86_64) • CPU: Xeon E5-2670v3, RAM: 384GB, GPU: NVIDIA TESLA K20c (2496cores) PGconf.EU 2015 - GPGPU Accelerates PostgreSQL 0 500 1000 1500 2000 QueryResponseTime[sec] PostgreSQL v9.5β PG-Strom anomaly?
  • 31. 31 TPC-DS Benchmark (2/2) – Total Result ▌Summary  PG-Strom wins to PostgreSQL: 76 of 96  PostgreSQL wins to PG-Strom: 20 of 96  shows advantages on most of simple queries ▌Challenges  Wise plan construction, to avoid GpuXXX node not to be here  Functionality improvement for Aggregation and Scan PGconf.EU 2015 - GPGPU Accelerates PostgreSQL 0 100 200 300 400 500 QueryResponseTime[sec] PostgreSQL v9.5β PG-Strom
  • 32. 32 Time consumption per workloads (1/3) – PostgreSQL v9.5 ▌Scan, Join and Aggregation provides a clear explanation for the time consumption factor in TPC-DS workloads PGconf.EU 2015 - GPGPU Accelerates PostgreSQL 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Time consumption per workloads (PostgreSQL v9.5beta) Scan Join Aggregate Others
  • 33. 33 Time consumption per workloads (2/3) – PG-Strom ▌PG-Strom dramatically reduced time for Joins. ▌Time for Joins are: 37%  17% PGconf.EU 2015 - GPGPU Accelerates PostgreSQL 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Time consumption per workloads (PostgreSQL v9.5beta) Scan Join Aggregate Others
  • 34. 34 Time consumption per workloads (3/3) – Wow! GpuJoin GpuJoin can explain 84% of performance improvement PGconf.EU 2015 - GPGPU Accelerates PostgreSQL y = 0.962x - 7.289 R² = 0.8442 -50.00 0.00 50.00 100.00 150.00 200.00 250.00 -50.00 0.00 50.00 100.00 150.00 200.00 250.00 ImprovementofTotalTime[sec] (TotaltimeofPgSQL)–(TotaltimeofPG-Strom) Improvement of Join Time [sec] (Join time of PgSQL) – (Join time of PG-Strom)
  • 35. 35 Agenda 1. GPU characteristics and why? 2. What intermediates PostgreSQL and GPU 3. How GPU processes SQL workloads 4. Performance results and analysis 5. Further development
  • 36. 36 What we learn from TPC-DS ▌GpuJoin may be a killer feature of PG-Strom  ....unless problem size estimation is accurate.  Planner needs to be wiser, not to inject GpuJoin towards the situation not suitable. ▌For Aggregation  GpuPreAgg didn’t appear as we expected  Anything GPU can help CPU aggregation? ▌For Scan  Scan performance == data stream bandwidth of GPU input  Single thread is a restriction for bandwidth of data supply PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
  • 37. 37 Features enhancement / improvement ▌Target-list calculation on GPU  Pre-calculation of hash-value for HashAggregate  Projection off-load if complicated mathematical expression ▌Custom-Scan under Gather node  Allows multiple workers to provide data stream to GPUs ▌Partial Aggregation before Join  Wider utilization of GpuPreAgg to reduce number of rows to be joined by CPU ▌Utilization of dynamic parallelism  Wise approach towards result buffer overflow problem in Join ▌... and more... PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
  • 38. 38 Short term development 1. Software quality improvement 2. Corner case handling 3. Documentation 4. Target-List calculation in GPU 5. Software quality improvement PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
  • 39. 39 WIP Project – In-place Computing Why export entire data to run mathematical algorithm? ▌Runs complicated mathematical expression on GPUs, then CPU just references the result  Algorithms in SQL (even partially) automatically moves to GPUs  WIP: Data-Clustering algorithms, like Min-Max, K-means, ... PGconf.EU 2015 - GPGPU Accelerates PostgreSQL mathematical algorithm Things wants to know Things wants to know result result mathematical algorithm Principle of Near-Data Computing Limited computing capability Data Export
  • 40. 40 Future Direction of PG-Strom Real-life workloads are the only compass for development PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
  • 41. 41 WE WANT YOU GET INVOLVED ▌PostgreSQL Users  who concern about query execution performance on your workloads ▌Application Developers  who are interested in trying yet another database acceleration methodology ▌PostgreSQL Developers  who can bet the future of heterogeneous era PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
  • 42. 42 Resources ▌Github  https://github.com/pg-strom/devel  https://github.com/pg-strom/devel/issues ▌Documentation  Under construction.... sorry ▌Today’s slides  Uploaded soon, check #pgconfeu on Tw ▌Author’s contact  e-mail: kaigai@ak.jp.nec.com  Tw: @kkaigai PGconf.EU 2015 - GPGPU Accelerates PostgreSQL
  • 43. 43