SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
Introducing PgOpenCL
        A New PostgreSQL
       Procedural Language
Unlocking the Power of the GPU!
                By
             Tim Child
Bio

Tim Child
• 35 years experience of software development
• Formerly
  •   VP Oracle Corporation
  •   VP BEA Systems Inc.
  •   VP Informix
  •   Leader at Illustra, Autodesk, Navteq, Intuit, …
• 30+ years experience in 3D, CAD, GIS and DBMS
Terminology
Term                  Description
Procedure Language    Language for SQL Procedures (e.g. PgPLSQL, Perl, TCL, Java, … )
GPU                   Graphics Processing Unit (highly specialized CPU for graphics)
GPGPU                 General Purpose GPU (non-graphics programming on a GPU)
CUDA                  Nvidia’s GPU programming environment
APU                   Accelerated Processing Unit      (AMD’s Hybrid CPU & GPU chip)
ISO C99               Modern standard version of the C language
OpenCL                Open Compute Language
OpenMP                Open Multi-Processing (parallelizing compilers)
SIMD                  Single Instruction Multiple Data (Vector instructions )
SSE                   x86, x64 (Intel, AMD) Streaming SIMD Extensions
xPU                   Any Processing Unit device (CPU, GPU, APU)
Kernel                Functions that execute on a OpenCL Device
Work Item             Instance of a Kernel
Workgroup             A group of Work Items
FLOP                  Floating Point Operation (single = SQL real type )
MIC                   Many Integrated Cores (Intel’s 50+ x86 Core chip architecture)
Some Technology Trends
            Impacting DBMS
• Solid State Storage
    – Reduced Access Time, Lower Power, Increasing in capacity
• Virtualization
    – Server consolidation, Specialized VM’s, lowers direct costs
• Cloud Computing
    – EC2, Azure, … lowers capital requirements
• Multi-Core
    – 2,4,6,8, 12, …. Lots of benefits to multi-threaded applications

• xPU (GPU/APU)
    –   GPU >1000 Cores
    –    > 1T FLOP /s @ €2500
    –   APU = CPU + GPU Chip Hybrids due in Mid 2011
    –   2 T FLOP /s for $2.10 per hour (AWS EC2)
    –   Intel MIC “Knights Corner “ > 50 x86 Cores
Compute Intensive
    xPU Database Applications
•   Bioinformatics

•   Signal/Audio/Image Processing/Video

•   Data Mining & Analytics

•   Searching

•   Sorting

•   Spatial Selections and Joins

•   Map/Reduce

•   Scientific Computing

•   Many Others …
GPU vs CPU
Vendor           NVidia       ATI Radeon      Intel
Architecture     Fermi         Evergreen    Nehalem
Cores              448           1600          4
                  Simple        Simple      Complex
Transistors       3.1 B         2.15 B       731 M
Clock            1.5 G Hz      851 M Hz      3 G Hz
Peak Float       1500 G        2720 G         96 G
Performance      FLOP / s      FLOP / s     FLOP / s
Peak Double       750 G         544 G         48 G
Performance      FLOP / s      FLOP / s     FLOP / s
Memory          ~ 190 G / s   ~ 153 G / s   ~ 30 G / s
Bandwidth
Power             250 W        > 250 W        80 W
Consumption
SIMD / Vector     Many          Many         SSE4+
Instructions
Multi-Core Performance




Source NVidia
Future (Mid 2011)
                 APU Based PC
APU (Accelerated Processing Unit)

              APU Chip
      CPU             CPU                 ~20 GB/s     System RAM


         North Bridge
        ~20 GB/s                                           APU’s
                          PCIE ~12 GB/s
                          PCIE ~12 GB/s




                                                     Adds an Embedded
      Embedded                                             GPU
        GPU


                   Discrete
                                          150 GB/s     Graphic RAM
                     GPU

             Source AMD
Scalar vs. SIMD
Scalar Instruction
          C=A+B                           1       +       2        =        3




SIMD Instruction                              1       3       5         7

                                                          +
      Vector C = Vector A + Vector B          2       4       6        8

                                                          =
                                              3       7       11       15


        OpenCL
                  Vector lengths 2,4,8,16 for char, short, int, float, double
Summarizing xPU
            Trends
• Many more xPU Cores in our Future
• Compute Environment becoming Hybrid
  – CPU and GPU’s
  – Need CPU to give access to GPU power
• GPU Capabilities
  – Lots of cores
  – Vector/SIMD Instructions
  – Fast Memory
• GPU Futures
  – Virtual Memory
  – Multi-tasking / Pre-emption
Scaling PostgreSQL Queries
                       on xPU’s
            Multi-Core CPU                                           Many Core GPU


 PgOpenCL    PgOpenCL   PgOpenCL   PgOpenCL       PgOpenCL    PgOpenCL   PgOpenCL   PgOpenCL   PgOpenCL
  Threads     Threads    Threads    Threads        Threads     Threads    Threads    Thread     Thread



                                                   PgOpenCL   PgOpenCL   PgOpenCL   PgOpenCL   PgOpenCL
Postgres                                            Threads    Threads    Threads    Thread     Thread
Process


                                                   PgOpenCL              PgOpenCL   PgOpenCL   PgOpenCL
                                                              PgOpenCL
                                                    Threads               Threads    Thread     Thread
                                                               Threads




                                              Using More
                                              Transistors
Parallel
      Programming Systems
Category             CUDA     OpenMP       OpenCL
Language               C      C, Fortran     C
Cross Platform         X          √           √
Standard             Vendor   OpenMP       Khronos
CPU                    X          √           √
GPU                    √          X           √
Clusters               X          √           X

Compilation / Link   Static     Static     Dynamic
What is OpenCL?
• OpenCL - Open Compute Language
  –   Subset of C 99
  –   Open Specification
  –   Proposed by Apple
  –   Many Companies Collaborated on the Specification
  –   Portable, Device Agnostic
  –   Specification maintained by Khronos Group
• PgOpenCL
  – OpenCL as a PostgreSQL Procedural Language
System Overview
                                    DBMS Server

                                                   PgOpenCL
                                                    PgOpenCL
  Web     HTTP     Web               SQL              SQL
                                                       SQL
Browser           Server             Statement     Procedure
                                                    Procedure

                                                       PCIe X2 Bus
                           TCP/IP

                   App
                                      PostgreSQL              GPGPU
                  Server




                                        Disk I/O     Tables
                           TCP/IP
          PostgreSQL
            Client
OpenCL
                       Language
• A subset of ISO C99
   – - But without some C99 features such as standard C99 headers,
   – function pointers, recursion, variable length arrays, and bit fields
• A superset of ISO C99 with additions for:
   –   - Work-items and Workgroups
   –   - Vector types
   –   - Synchronization
   –   - Address space qualifiers
• Also includes a large set of built-in functions
   – - Image manipulation
   – - Work-item manipulation,
   – - Specialized math routines, etc.
PgOpenCL
             Components
• New PostgreSQL Procedural Language
  – Language handler
     • Maps arguments
     • Calls function
     • Returns results
  – Language validator
     • Creates Function with parameter & syntax checking
     • Compiles Function to a Binary format
• New data types
  – cl_double4, cl_double8, ….
• System Admin Pseudo-Tables
  – Platform, Device, Run-Time, …
PgOpenCL
 Admin
PGOpenCL
                        Function Declaration
CREATE or REPLACE FUNCTION VectorAdd(IN a float[], IN B float[], OUT c float[])
AS $BODY$

#pragma PGOPENCL Platform : ATI Stream
#pragma PGOPENCL Device : CPU

__kernel __attribute__((reqd_work_group_size(64, 1, 1)))
void VectorAdd( __global const float *a, __global const float *b, __global float *c)
  {
    int i = get_global_id(0);

      c[i] = a[i] + b[i];
  }

$BODY$
Language PgOpenCL;
PgOpenCL
                                   Execution Model
            A
Table
            B

            Select Table                    100’s - 1000’s of
              to Array                      Threads (Kernels)

                                        xPU
                                           VectorAdd(A, B)
        A           +        B                Returns C               =       C


                            Copy                                                  Unnest Array
                                                                 Copy               To Table
            Table

                C       C    C      C   C   C    C    C      C    C       C   C      C
Using
               Re-Shaped Tables
                       100’s - 1000’s of
    Table of           Threads (Kernels)                  Table of
     Arrays                                                Arrays
                  A    +   B     =         C

A
                                                      C     C        C   C
B
                   xPU
                      VectorAdd(A, B)
                         Returns C
A
                                                      C     C        C   C
B

                Copy
                                               Copy
Today’s GPGPU
              Challenges
• No Pre-emptive Multi-Tasking
• No Virtual Memory
• Limited Bandwidth to discrete GPGPU
   – 1 – 8 G/s over PCIe Bus
• Hard to Program
   – New Parallel Algorithms and constructs
   – “New” C language dialect
• Immature Tools
   – Compilers, IDE, Debuggers, Profilers - early years
• Data organization really matters
   – Types, Structure, and Alignment
   – SQL needs to Shape the Data
• Profiling and Debugging is not easy

Solves Well for Problem Sets with the Right Shape!
Making a Problem
                           Work for You
        • Determine % Parallelism Possible
for ( i = 0, i <  ∞, i++)
            for ( j = 0; j < ∞; j++ )
                      for ( k = 0; k <   ∞; k++ )


        • Arrange data to fit available GPU RAM
        •    Ensure calculation time >> I/O transfer overhead
        •    Learn about Parallel Algorithms and the OpenCL language
        •    Learn new tools
        •    Carefully choose Data Types, Organization and Alignments
        •    Profile and Measure at Every Stage
PgOpenCL
     System Requirements
• PostgreSQL 9.x
• For GPU’s
   – AMD ATI OpenCL Stream SDK 2.x
   – NVidia CUDA 3.x SDK
   – Recent Macs with O/S 11.6
• For CPU’s (Pentium M or more recent)
   – AMD ATI OpenCL Stream SDK 2.x
   – Intel OpenCL SDK Alpha Release (x86)
   – Recent Macs with O/S 11.6
PGOpenCL
                                   Status
    Today        1Q 2011
  Prototype       Beta


     2010             2011


• Wish List
       • Beta Testers
              – Existing OpenCL App?
              – Have a GPU App?
       • Contributors
              – Code server side functions?
       • Sponsors & Supporters
           – AMD Fusion Fund?
           – Khronos?
PgOpenCL
               Future Plans
• Increase Platform Support
• Scatter/Gather Functions
• Additional Type Support
   – Image Types
   – Sparse Matrices
• Run-Time
   –   Asynchronous
   –   Events
   –   Profiling
   –   Debugging
Using the
                                Whole Brain
                        APU Chip
PgOpenCl                           PgOpenCl
  PgOpenCL                           PgOpenCL
                 CPU
         CPU                    CPU
      Postgres                                  You can’t be in a
                                                parallel universe
                                                  with a single
                                                     brain!
                 North Bridge
             ~20 GB/s
                                                 • Heterogeneous Compute Environments
                          PgOpenCl
                            PgOpenCl                  • CPU’s, GPU’s, APU’s
             Embedded         PgOpenCl                • Expect 100’s – 1000’s of cores
                                PgOpenCl
               GPU                PgOpenCL




             The Future Is Parallel: What's a Programmer to Do?
Summarizing
              PgOpenCL
• Supports Heterogeneous Parallel Compute Environments
    • CPU’s, GPU’s, APU’s

• OpenCL
    • Portable and high-performance framework
        –Ideal for computationally intensive algorithms
        –Access to all compute resources (CPU, APU, GPU)
        –Well-defined computation/memory model
    •Efficient parallel programming language
        –C99 with extensions for task and data parallelism
        –Rich set of built-in functions
    •Open standard for heterogeneous parallel computing
• PgOpenCL
   • Integrates PostgreSQL with OpenCL
   • Provides Easy SQL Access to xPU’s
       • APU, CPU, GPGPU
   • Integrates OpenCL
       • SQL + Web Apps(PHP, Ruby, … )
More
                    Information
•   PGOpenCL
        • Twitter @3DMashUp

•   OpenCL

• www.khronos.org/opencl/


• www.amd.com/us/products/technologies/stream-technology/opencl/


• http://software.intel.com/en-us/articles/intel-opencl-sdk


• http://www.nvidia.com/object/cuda_opencl_new.html


• http://developer.apple.com/technologies/mac/snowleopard/opencl.html
Q&A

• Using Parallel Applications?
• Benefits of OpenCL / PgOpenCL?
• Want to Collaborate on PgOpenCL?

Más contenido relacionado

La actualidad más candente

20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStromKohei KaiGai
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda enKohei KaiGai
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsKohei KaiGai
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storageKohei KaiGai
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...Kohei KaiGai
 
Easy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersEasy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersKazuaki Ishizaki
 
Let's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdwLet's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdwJan Holčapek
 
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...Equnix Business Solutions
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseKazuaki Ishizaki
 
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsGPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsAkihiro Hayashi
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.J On The Beach
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Danielle Womboldt
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - EnglishKohei KaiGai
 
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_IndexKohei KaiGai
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaKazuaki Ishizaki
 
20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdwKohei KaiGai
 
PG-Strom v2.0 Technical Brief (17-Apr-2018)
PG-Strom v2.0 Technical Brief (17-Apr-2018)PG-Strom v2.0 Technical Brief (17-Apr-2018)
PG-Strom v2.0 Technical Brief (17-Apr-2018)Kohei KaiGai
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDASavith Satheesh
 
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...Equnix Business Solutions
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGaiKohei KaiGai
 

La actualidad más candente (20)

20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom20150318-SFPUG-Meetup-PGStrom
20150318-SFPUG-Meetup-PGStrom
 
pgconfasia2016 plcuda en
pgconfasia2016 plcuda enpgconfasia2016 plcuda en
pgconfasia2016 plcuda en
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage20170602_OSSummit_an_intelligent_storage
20170602_OSSummit_an_intelligent_storage
 
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
GPU/SSD Accelerates PostgreSQL - challenge towards query processing throughpu...
 
Easy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersEasy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java Programmers
 
Let's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdwLet's turn your PostgreSQL into columnar store with cstore_fdw
Let's turn your PostgreSQL into columnar store with cstore_fdw
 
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
PGConf.ASIA 2019 Bali - AppOS: PostgreSQL Extension for Scalable File I/O - K...
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to Use
 
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU PlatformsGPUIterator: Bridging the Gap between Chapel and GPU Platforms
GPUIterator: Bridging the Gap between Chapel and GPU Platforms
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
Ceph Day Beijing - Optimizing Ceph Performance by Leveraging Intel Optane and...
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English
 
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index20210301_PGconf_Online_GPU_PostGIS_GiST_Index
20210301_PGconf_Online_GPU_PostGIS_GiST_Index
 
Transparent GPU Exploitation for Java
Transparent GPU Exploitation for JavaTransparent GPU Exploitation for Java
Transparent GPU Exploitation for Java
 
20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw20171206 PGconf.ASIA LT gstore_fdw
20171206 PGconf.ASIA LT gstore_fdw
 
PG-Strom v2.0 Technical Brief (17-Apr-2018)
PG-Strom v2.0 Technical Brief (17-Apr-2018)PG-Strom v2.0 Technical Brief (17-Apr-2018)
PG-Strom v2.0 Technical Brief (17-Apr-2018)
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
 
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
PGConf.ASIA 2019 Bali - Building PostgreSQL as a Service with Kubernetes - Ta...
 
20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai20190909_PGconf.ASIA_KaiGai
20190909_PGconf.ASIA_KaiGai
 

Destacado

TPL Dataflow – зачем и для кого?
TPL Dataflow – зачем и для кого?TPL Dataflow – зачем и для кого?
TPL Dataflow – зачем и для кого?GoSharp
 
Task Parallel Library 2014
Task Parallel Library 2014Task Parallel Library 2014
Task Parallel Library 2014Lluis Franco
 
An Intelligent Storage?
An Intelligent Storage?An Intelligent Storage?
An Intelligent Storage?Kohei KaiGai
 
20170127 JAWS HPC-UG#8
20170127 JAWS HPC-UG#820170127 JAWS HPC-UG#8
20170127 JAWS HPC-UG#8Kohei KaiGai
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsKohei KaiGai
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoSeongwon Hwang
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 

Destacado (8)

Task Parallel Library (TPL)
Task Parallel Library (TPL)Task Parallel Library (TPL)
Task Parallel Library (TPL)
 
TPL Dataflow – зачем и для кого?
TPL Dataflow – зачем и для кого?TPL Dataflow – зачем и для кого?
TPL Dataflow – зачем и для кого?
 
Task Parallel Library 2014
Task Parallel Library 2014Task Parallel Library 2014
Task Parallel Library 2014
 
An Intelligent Storage?
An Intelligent Storage?An Intelligent Storage?
An Intelligent Storage?
 
20170127 JAWS HPC-UG#8
20170127 JAWS HPC-UG#820170127 JAWS HPC-UG#8
20170127 JAWS HPC-UG#8
 
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database AnalyticsPL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
PL/CUDA - Fusion of HPC Grade Power with In-Database Analytics
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 

Similar a PostgreSQL with OpenCL

Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
GPU Computing for Data Science
GPU Computing for Data Science GPU Computing for Data Science
GPU Computing for Data Science Domino Data Lab
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAFacultad de Informática UCM
 
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
fpga1 - What is.pptx
fpga1 - What is.pptxfpga1 - What is.pptx
fpga1 - What is.pptxssuser0de10a
 
Machine Learning Developers - Know your GPUs
Machine Learning Developers - Know your GPUsMachine Learning Developers - Know your GPUs
Machine Learning Developers - Know your GPUsAmazon Web Services
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapGeorge Markomanolis
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLinaro
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrjRoberto Brandao
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018NVIDIA
 
Case Study: Porting Qt for Embedded Linux on Embedded Processors
Case Study: Porting Qt for Embedded Linux on Embedded ProcessorsCase Study: Porting Qt for Embedded Linux on Embedded Processors
Case Study: Porting Qt for Embedded Linux on Embedded Processorsaccount inactive
 
Computação acelerada – a era das ap us roberto brandão, ciência
Computação acelerada – a era das ap us   roberto brandão,  ciênciaComputação acelerada – a era das ap us   roberto brandão,  ciência
Computação acelerada – a era das ap us roberto brandão, ciênciaCampus Party Brasil
 
GPU enablement for data science on OpenShift | DevNation Tech Talk
GPU enablement for data science on OpenShift | DevNation Tech TalkGPU enablement for data science on OpenShift | DevNation Tech Talk
GPU enablement for data science on OpenShift | DevNation Tech TalkRed Hat Developers
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecturemohamedragabslideshare
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementGanesan Narayanasamy
 
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」PC Cluster Consortium
 

Similar a PostgreSQL with OpenCL (20)

Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
GPU Computing for Data Science
GPU Computing for Data Science GPU Computing for Data Science
GPU Computing for Data Science
 
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGAMaking the most out of Heterogeneous Chips with CPU, GPU and FPGA
Making the most out of Heterogeneous Chips with CPU, GPU and FPGA
 
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
fpga1 - What is.pptx
fpga1 - What is.pptxfpga1 - What is.pptx
fpga1 - What is.pptx
 
Machine Learning Developers - Know your GPUs
Machine Learning Developers - Know your GPUsMachine Learning Developers - Know your GPUs
Machine Learning Developers - Know your GPUs
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience Report
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrj
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
Case Study: Porting Qt for Embedded Linux on Embedded Processors
Case Study: Porting Qt for Embedded Linux on Embedded ProcessorsCase Study: Porting Qt for Embedded Linux on Embedded Processors
Case Study: Porting Qt for Embedded Linux on Embedded Processors
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
 
Computação acelerada – a era das ap us roberto brandão, ciência
Computação acelerada – a era das ap us   roberto brandão,  ciênciaComputação acelerada – a era das ap us   roberto brandão,  ciência
Computação acelerada – a era das ap us roberto brandão, ciência
 
GPU enablement for data science on OpenShift | DevNation Tech Talk
GPU enablement for data science on OpenShift | DevNation Tech TalkGPU enablement for data science on OpenShift | DevNation Tech Talk
GPU enablement for data science on OpenShift | DevNation Tech Talk
 
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablement
 
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
PCCC23:筑波大学計算科学研究センター テーマ1「スーパーコンピュータCygnus / Pegasus」
 

Último

Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 

Último (20)

Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 

PostgreSQL with OpenCL

  • 1. Introducing PgOpenCL A New PostgreSQL Procedural Language Unlocking the Power of the GPU! By Tim Child
  • 2. Bio Tim Child • 35 years experience of software development • Formerly • VP Oracle Corporation • VP BEA Systems Inc. • VP Informix • Leader at Illustra, Autodesk, Navteq, Intuit, … • 30+ years experience in 3D, CAD, GIS and DBMS
  • 3. Terminology Term Description Procedure Language Language for SQL Procedures (e.g. PgPLSQL, Perl, TCL, Java, … ) GPU Graphics Processing Unit (highly specialized CPU for graphics) GPGPU General Purpose GPU (non-graphics programming on a GPU) CUDA Nvidia’s GPU programming environment APU Accelerated Processing Unit (AMD’s Hybrid CPU & GPU chip) ISO C99 Modern standard version of the C language OpenCL Open Compute Language OpenMP Open Multi-Processing (parallelizing compilers) SIMD Single Instruction Multiple Data (Vector instructions ) SSE x86, x64 (Intel, AMD) Streaming SIMD Extensions xPU Any Processing Unit device (CPU, GPU, APU) Kernel Functions that execute on a OpenCL Device Work Item Instance of a Kernel Workgroup A group of Work Items FLOP Floating Point Operation (single = SQL real type ) MIC Many Integrated Cores (Intel’s 50+ x86 Core chip architecture)
  • 4. Some Technology Trends Impacting DBMS • Solid State Storage – Reduced Access Time, Lower Power, Increasing in capacity • Virtualization – Server consolidation, Specialized VM’s, lowers direct costs • Cloud Computing – EC2, Azure, … lowers capital requirements • Multi-Core – 2,4,6,8, 12, …. Lots of benefits to multi-threaded applications • xPU (GPU/APU) – GPU >1000 Cores – > 1T FLOP /s @ €2500 – APU = CPU + GPU Chip Hybrids due in Mid 2011 – 2 T FLOP /s for $2.10 per hour (AWS EC2) – Intel MIC “Knights Corner “ > 50 x86 Cores
  • 5. Compute Intensive xPU Database Applications • Bioinformatics • Signal/Audio/Image Processing/Video • Data Mining & Analytics • Searching • Sorting • Spatial Selections and Joins • Map/Reduce • Scientific Computing • Many Others …
  • 6. GPU vs CPU Vendor NVidia ATI Radeon Intel Architecture Fermi Evergreen Nehalem Cores 448 1600 4 Simple Simple Complex Transistors 3.1 B 2.15 B 731 M Clock 1.5 G Hz 851 M Hz 3 G Hz Peak Float 1500 G 2720 G 96 G Performance FLOP / s FLOP / s FLOP / s Peak Double 750 G 544 G 48 G Performance FLOP / s FLOP / s FLOP / s Memory ~ 190 G / s ~ 153 G / s ~ 30 G / s Bandwidth Power 250 W > 250 W 80 W Consumption SIMD / Vector Many Many SSE4+ Instructions
  • 8. Future (Mid 2011) APU Based PC APU (Accelerated Processing Unit) APU Chip CPU CPU ~20 GB/s System RAM North Bridge ~20 GB/s APU’s PCIE ~12 GB/s PCIE ~12 GB/s Adds an Embedded Embedded GPU GPU Discrete 150 GB/s Graphic RAM GPU Source AMD
  • 9. Scalar vs. SIMD Scalar Instruction C=A+B 1 + 2 = 3 SIMD Instruction 1 3 5 7 + Vector C = Vector A + Vector B 2 4 6 8 = 3 7 11 15 OpenCL Vector lengths 2,4,8,16 for char, short, int, float, double
  • 10. Summarizing xPU Trends • Many more xPU Cores in our Future • Compute Environment becoming Hybrid – CPU and GPU’s – Need CPU to give access to GPU power • GPU Capabilities – Lots of cores – Vector/SIMD Instructions – Fast Memory • GPU Futures – Virtual Memory – Multi-tasking / Pre-emption
  • 11. Scaling PostgreSQL Queries on xPU’s Multi-Core CPU Many Core GPU PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL Threads Threads Threads Threads Threads Threads Threads Thread Thread PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL Postgres Threads Threads Threads Thread Thread Process PgOpenCL PgOpenCL PgOpenCL PgOpenCL PgOpenCL Threads Threads Thread Thread Threads Using More Transistors
  • 12. Parallel Programming Systems Category CUDA OpenMP OpenCL Language C C, Fortran C Cross Platform X √ √ Standard Vendor OpenMP Khronos CPU X √ √ GPU √ X √ Clusters X √ X Compilation / Link Static Static Dynamic
  • 13. What is OpenCL? • OpenCL - Open Compute Language – Subset of C 99 – Open Specification – Proposed by Apple – Many Companies Collaborated on the Specification – Portable, Device Agnostic – Specification maintained by Khronos Group • PgOpenCL – OpenCL as a PostgreSQL Procedural Language
  • 14. System Overview DBMS Server PgOpenCL PgOpenCL Web HTTP Web SQL SQL SQL Browser Server Statement Procedure Procedure PCIe X2 Bus TCP/IP App PostgreSQL GPGPU Server Disk I/O Tables TCP/IP PostgreSQL Client
  • 15. OpenCL Language • A subset of ISO C99 – - But without some C99 features such as standard C99 headers, – function pointers, recursion, variable length arrays, and bit fields • A superset of ISO C99 with additions for: – - Work-items and Workgroups – - Vector types – - Synchronization – - Address space qualifiers • Also includes a large set of built-in functions – - Image manipulation – - Work-item manipulation, – - Specialized math routines, etc.
  • 16. PgOpenCL Components • New PostgreSQL Procedural Language – Language handler • Maps arguments • Calls function • Returns results – Language validator • Creates Function with parameter & syntax checking • Compiles Function to a Binary format • New data types – cl_double4, cl_double8, …. • System Admin Pseudo-Tables – Platform, Device, Run-Time, …
  • 18. PGOpenCL Function Declaration CREATE or REPLACE FUNCTION VectorAdd(IN a float[], IN B float[], OUT c float[]) AS $BODY$ #pragma PGOPENCL Platform : ATI Stream #pragma PGOPENCL Device : CPU __kernel __attribute__((reqd_work_group_size(64, 1, 1))) void VectorAdd( __global const float *a, __global const float *b, __global float *c) { int i = get_global_id(0); c[i] = a[i] + b[i]; } $BODY$ Language PgOpenCL;
  • 19. PgOpenCL Execution Model A Table B Select Table 100’s - 1000’s of to Array Threads (Kernels) xPU VectorAdd(A, B) A + B Returns C = C Copy Unnest Array Copy To Table Table C C C C C C C C C C C C C
  • 20. Using Re-Shaped Tables 100’s - 1000’s of Table of Threads (Kernels) Table of Arrays Arrays A + B = C A C C C C B xPU VectorAdd(A, B) Returns C A C C C C B Copy Copy
  • 21. Today’s GPGPU Challenges • No Pre-emptive Multi-Tasking • No Virtual Memory • Limited Bandwidth to discrete GPGPU – 1 – 8 G/s over PCIe Bus • Hard to Program – New Parallel Algorithms and constructs – “New” C language dialect • Immature Tools – Compilers, IDE, Debuggers, Profilers - early years • Data organization really matters – Types, Structure, and Alignment – SQL needs to Shape the Data • Profiling and Debugging is not easy Solves Well for Problem Sets with the Right Shape!
  • 22. Making a Problem Work for You • Determine % Parallelism Possible for ( i = 0, i < ∞, i++) for ( j = 0; j < ∞; j++ ) for ( k = 0; k < ∞; k++ ) • Arrange data to fit available GPU RAM • Ensure calculation time >> I/O transfer overhead • Learn about Parallel Algorithms and the OpenCL language • Learn new tools • Carefully choose Data Types, Organization and Alignments • Profile and Measure at Every Stage
  • 23. PgOpenCL System Requirements • PostgreSQL 9.x • For GPU’s – AMD ATI OpenCL Stream SDK 2.x – NVidia CUDA 3.x SDK – Recent Macs with O/S 11.6 • For CPU’s (Pentium M or more recent) – AMD ATI OpenCL Stream SDK 2.x – Intel OpenCL SDK Alpha Release (x86) – Recent Macs with O/S 11.6
  • 24. PGOpenCL Status Today 1Q 2011 Prototype Beta 2010 2011 • Wish List • Beta Testers – Existing OpenCL App? – Have a GPU App? • Contributors – Code server side functions? • Sponsors & Supporters – AMD Fusion Fund? – Khronos?
  • 25. PgOpenCL Future Plans • Increase Platform Support • Scatter/Gather Functions • Additional Type Support – Image Types – Sparse Matrices • Run-Time – Asynchronous – Events – Profiling – Debugging
  • 26. Using the Whole Brain APU Chip PgOpenCl PgOpenCl PgOpenCL PgOpenCL CPU CPU CPU Postgres You can’t be in a parallel universe with a single brain! North Bridge ~20 GB/s • Heterogeneous Compute Environments PgOpenCl PgOpenCl • CPU’s, GPU’s, APU’s Embedded PgOpenCl • Expect 100’s – 1000’s of cores PgOpenCl GPU PgOpenCL The Future Is Parallel: What's a Programmer to Do?
  • 27. Summarizing PgOpenCL • Supports Heterogeneous Parallel Compute Environments • CPU’s, GPU’s, APU’s • OpenCL • Portable and high-performance framework –Ideal for computationally intensive algorithms –Access to all compute resources (CPU, APU, GPU) –Well-defined computation/memory model •Efficient parallel programming language –C99 with extensions for task and data parallelism –Rich set of built-in functions •Open standard for heterogeneous parallel computing • PgOpenCL • Integrates PostgreSQL with OpenCL • Provides Easy SQL Access to xPU’s • APU, CPU, GPGPU • Integrates OpenCL • SQL + Web Apps(PHP, Ruby, … )
  • 28. More Information • PGOpenCL • Twitter @3DMashUp • OpenCL • www.khronos.org/opencl/ • www.amd.com/us/products/technologies/stream-technology/opencl/ • http://software.intel.com/en-us/articles/intel-opencl-sdk • http://www.nvidia.com/object/cuda_opencl_new.html • http://developer.apple.com/technologies/mac/snowleopard/opencl.html
  • 29. Q&A • Using Parallel Applications? • Benefits of OpenCL / PgOpenCL? • Want to Collaborate on PgOpenCL?