SlideShare una empresa de Scribd logo
1 de 15
CONFLUX: GPGPU FOR .NET Eugene Burmako, 2010
Videocards: state of the art Equipment – tenth/hundreds of ALU clocked at ~1 GHz  Peak performance – 1 SP TFLOPS, > 100 DP GFLOPS API – random memory access, data structures, pointers, subroutines API maturity – nearly four years, several generations of graphics processors
Videocards: programmer’s PoV Modern GPU programming models (CUDA, AMD Stream, OpenCL, DirectCompute): Parallel algorithm is defined by the pair: 1) kernel (loop iteration), 2) iteration bounds. Kernel is compiled by the driver. Iteration bounds are used to create grid of threads. Input data is copied to video memory. Execution gets kicked off. Result is copied to main memory.
Example: SAXPY via CUDA __global__ void Saxpy(float a, float* X, float* Y)  { inti = blockDim.x * blockIdx.x + threadIdx.x;       Y[i] = a * X[i] + Y[i];  } cudaMemcpy(X, hX, cudaMemcpyHostToDevice); cudaMemcpy(Y, hY, cudaMemcpyHostToDevice); Saxpy<<<256, (N + 255) / 256>>>(a, hX, hY); cudaMemcpy(hY, Y, cudaMemcpyDeviceToHost);
Hot question
Official answer
In fact Brahma: Data structures: data parallel array. Computations: C# expressions, LINQ combinators. Accelerator v2: Data structures: data parallel array. Computations: arithmetic operators, number of predefined functions. This does the trick for a lot of algorithms. But what if we’ve got branching or non-regular memory access?
Example: CUDA  interop saxpy = @”__global__ void Saxpy(float a, float* X, float* Y)  { inti = blockDim.x * blockIdx.x + threadIdx.x;       Y[i] = a * X[i] + Y[i];  }”; nvcuda.cuModuleLoadDataEx(saxpy); nvcuda.cuMemcpyHtoD(X, Y); nvcuda.cuParamSeti(a, X, Y); nvcuda.cuLaunchGrid(256, (N + 255) / 256); nvcuda.cuMemcpyDtoH(Y);
Conflux Kernels are written in C#: data structures, local variables, branching, loops float a; float[] x; [Result] float[] y; vari = GlobalIdx.X; y[i] = a * x[i] + y[i];
Conflux Avoids explicit interop with unmanaged code, lets programmer use native .NET data types. float[] x, y; varcfg = new CudaConfig(); var kernel = cfg.Configure<Saxpy>(); y = kernel.Execute(a, x, y);
How does it work? Front end: decompiles C#. AST transformer: inlines calls, destructures classes and arrays, maps intrinsincs. Back end:generates PTX (NVIDIA GPU assembler) and/or multicoreIL. Interop: binds to nvcuda driver that is capable of executing GPU assembler.
Current progress http://bitbucket.org/conflux/conflux Proof of concept. Capable of computing hello-world of parallel computations: matrix multiplication. If we don’t take into account [currently]high overhead incurred by JIT-compilation, the idea works finely even for naïve code generator: 1x CPU < 2x CPU << GPU. Triple license: AGPL, exception for OSS projects, commercial.
Demo
Future work GPU-specific optimizations (e.g. diagonal stripes for optimizing bandwidth utilization of matrix transposition) Polyhedral model for loop nest optimization (can be configured to fit specific levels and sizes of memory hierarchy, there exist GPU-specific linear heuristics that optimize spatial and temporal locality). Distributed execution (a new level of memory hierarchy if we use polyhedral model).
Conclusion Conflux: GPGPU for .NET http://bitbucket.org/conflux/conflux eugene.burmako@confluxhpc.net

Más contenido relacionado

La actualidad más candente

General Programming on the GPU - Confoo
General Programming on the GPU - ConfooGeneral Programming on the GPU - Confoo
General Programming on the GPU - ConfooSirKetchup
 
Efficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCLEfficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCLJonas Traub
 
Nicety of java 8 multithreading for advanced, Max Voronoy
Nicety of java 8 multithreading for advanced, Max VoronoyNicety of java 8 multithreading for advanced, Max Voronoy
Nicety of java 8 multithreading for advanced, Max VoronoySigma Software
 
GPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMPGPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMPMiller Lee
 
Multilayer Neuronal network hardware implementation
Multilayer Neuronal network hardware implementation Multilayer Neuronal network hardware implementation
Multilayer Neuronal network hardware implementation Nabil Chouba
 
C++ amp on linux
C++ amp on linuxC++ amp on linux
C++ amp on linuxMiller Lee
 
Engineering fast indexes
Engineering fast indexesEngineering fast indexes
Engineering fast indexesDaniel Lemire
 
On Mining Bitcoins - Fundamentals & Outlooks
On Mining Bitcoins - Fundamentals & OutlooksOn Mining Bitcoins - Fundamentals & Outlooks
On Mining Bitcoins - Fundamentals & OutlooksFilip Maertens
 
Rubinius @ RubyAndRails2010
Rubinius @ RubyAndRails2010Rubinius @ RubyAndRails2010
Rubinius @ RubyAndRails2010Dirkjan Bussink
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Daniel Lemire
 
Fast Wavelet Tree Construction in Practice
Fast Wavelet Tree Construction in PracticeFast Wavelet Tree Construction in Practice
Fast Wavelet Tree Construction in PracticeRakuten Group, Inc.
 
2013 0928 programming by cuda
2013 0928 programming by cuda2013 0928 programming by cuda
2013 0928 programming by cuda小明 王
 
Cocos2d Performance Tips
Cocos2d Performance TipsCocos2d Performance Tips
Cocos2d Performance TipsKeisuke Hata
 
WebAssembly向け多倍長演算の実装
WebAssembly向け多倍長演算の実装WebAssembly向け多倍長演算の実装
WebAssembly向け多倍長演算の実装MITSUNARI Shigeo
 
Fast indexes with roaring #gomtl-10
Fast indexes with roaring #gomtl-10 Fast indexes with roaring #gomtl-10
Fast indexes with roaring #gomtl-10 Daniel Lemire
 
TensorFlow Studying Part II for GPU
TensorFlow Studying Part II for GPUTensorFlow Studying Part II for GPU
TensorFlow Studying Part II for GPUTe-Yen Liu
 

La actualidad más candente (20)

General Programming on the GPU - Confoo
General Programming on the GPU - ConfooGeneral Programming on the GPU - Confoo
General Programming on the GPU - Confoo
 
C# Assignmet Help
C# Assignmet HelpC# Assignmet Help
C# Assignmet Help
 
Efficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCLEfficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCL
 
Nicety of java 8 multithreading for advanced, Max Voronoy
Nicety of java 8 multithreading for advanced, Max VoronoyNicety of java 8 multithreading for advanced, Max Voronoy
Nicety of java 8 multithreading for advanced, Max Voronoy
 
GPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMPGPU Programming on CPU - Using C++AMP
GPU Programming on CPU - Using C++AMP
 
Multilayer Neuronal network hardware implementation
Multilayer Neuronal network hardware implementation Multilayer Neuronal network hardware implementation
Multilayer Neuronal network hardware implementation
 
C++ amp on linux
C++ amp on linuxC++ amp on linux
C++ amp on linux
 
Engineering fast indexes
Engineering fast indexesEngineering fast indexes
Engineering fast indexes
 
On Mining Bitcoins - Fundamentals & Outlooks
On Mining Bitcoins - Fundamentals & OutlooksOn Mining Bitcoins - Fundamentals & Outlooks
On Mining Bitcoins - Fundamentals & Outlooks
 
Rubinius @ RubyAndRails2010
Rubinius @ RubyAndRails2010Rubinius @ RubyAndRails2010
Rubinius @ RubyAndRails2010
 
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
 
Multi qubit entanglement
Multi qubit entanglementMulti qubit entanglement
Multi qubit entanglement
 
Fast Wavelet Tree Construction in Practice
Fast Wavelet Tree Construction in PracticeFast Wavelet Tree Construction in Practice
Fast Wavelet Tree Construction in Practice
 
AA-sort with SSE4.1
AA-sort with SSE4.1AA-sort with SSE4.1
AA-sort with SSE4.1
 
2013 0928 programming by cuda
2013 0928 programming by cuda2013 0928 programming by cuda
2013 0928 programming by cuda
 
Cocos2d Performance Tips
Cocos2d Performance TipsCocos2d Performance Tips
Cocos2d Performance Tips
 
My bitmap
My bitmapMy bitmap
My bitmap
 
WebAssembly向け多倍長演算の実装
WebAssembly向け多倍長演算の実装WebAssembly向け多倍長演算の実装
WebAssembly向け多倍長演算の実装
 
Fast indexes with roaring #gomtl-10
Fast indexes with roaring #gomtl-10 Fast indexes with roaring #gomtl-10
Fast indexes with roaring #gomtl-10
 
TensorFlow Studying Part II for GPU
TensorFlow Studying Part II for GPUTensorFlow Studying Part II for GPU
TensorFlow Studying Part II for GPU
 

Destacado

Sellers Presentation
Sellers Presentation Sellers Presentation
Sellers Presentation DavidOConnor
 
techKNOW leadership for JefCoEd Tech Camp 7.11.13
techKNOW leadership for JefCoEd Tech Camp 7.11.13techKNOW leadership for JefCoEd Tech Camp 7.11.13
techKNOW leadership for JefCoEd Tech Camp 7.11.13mwilson518
 
Explara publisher partner program
Explara publisher partner programExplara publisher partner program
Explara publisher partner programSantosh Panda
 

Destacado (6)

Fossils
FossilsFossils
Fossils
 
Sellers Presentation
Sellers Presentation Sellers Presentation
Sellers Presentation
 
Sattose2013 mega
Sattose2013 megaSattose2013 mega
Sattose2013 mega
 
techKNOW leadership for JefCoEd Tech Camp 7.11.13
techKNOW leadership for JefCoEd Tech Camp 7.11.13techKNOW leadership for JefCoEd Tech Camp 7.11.13
techKNOW leadership for JefCoEd Tech Camp 7.11.13
 
Cheap Tricks
Cheap TricksCheap Tricks
Cheap Tricks
 
Explara publisher partner program
Explara publisher partner programExplara publisher partner program
Explara publisher partner program
 

Similar a Conflux:gpgpu for .net (en)

Intro2 Cuda Moayad
Intro2 Cuda MoayadIntro2 Cuda Moayad
Intro2 Cuda MoayadMoayadhn
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Angela Mendoza M.
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
CUDA Deep Dive
CUDA Deep DiveCUDA Deep Dive
CUDA Deep Divekrasul
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeOfer Rosenberg
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDARaymond Tay
 
Cuda introduction
Cuda introductionCuda introduction
Cuda introductionHanibei
 
An Introduction to CUDA-OpenCL - University.pptx
An Introduction to CUDA-OpenCL - University.pptxAn Introduction to CUDA-OpenCL - University.pptx
An Introduction to CUDA-OpenCL - University.pptxAnirudhGarg35
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaFerdinand Jamitzky
 
Lecture 6 Kernel Debugging + Ports Development
Lecture 6 Kernel Debugging + Ports DevelopmentLecture 6 Kernel Debugging + Ports Development
Lecture 6 Kernel Debugging + Ports DevelopmentMohammed Farrag
 
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...mouhouioui
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded ProgrammingSri Prasanna
 
gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsnARUNACHALAM468781
 

Similar a Conflux:gpgpu for .net (en) (20)

Intro2 Cuda Moayad
Intro2 Cuda MoayadIntro2 Cuda Moayad
Intro2 Cuda Moayad
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
CUDA Deep Dive
CUDA Deep DiveCUDA Deep Dive
CUDA Deep Dive
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
 
Slide tesi
Slide tesiSlide tesi
Slide tesi
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
 
Cuda introduction
Cuda introductionCuda introduction
Cuda introduction
 
An Introduction to CUDA-OpenCL - University.pptx
An Introduction to CUDA-OpenCL - University.pptxAn Introduction to CUDA-OpenCL - University.pptx
An Introduction to CUDA-OpenCL - University.pptx
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cuda
 
Programming languages
Programming languagesProgramming languages
Programming languages
 
There is more to C
There is more to CThere is more to C
There is more to C
 
Lecture 6 Kernel Debugging + Ports Development
Lecture 6 Kernel Debugging + Ports DevelopmentLecture 6 Kernel Debugging + Ports Development
Lecture 6 Kernel Debugging + Ports Development
 
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
 
Threaded Programming
Threaded ProgrammingThreaded Programming
Threaded Programming
 
gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsn
 
GPU: Understanding CUDA
GPU: Understanding CUDAGPU: Understanding CUDA
GPU: Understanding CUDA
 

Último

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Último (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Conflux:gpgpu for .net (en)

  • 1. CONFLUX: GPGPU FOR .NET Eugene Burmako, 2010
  • 2. Videocards: state of the art Equipment – tenth/hundreds of ALU clocked at ~1 GHz Peak performance – 1 SP TFLOPS, > 100 DP GFLOPS API – random memory access, data structures, pointers, subroutines API maturity – nearly four years, several generations of graphics processors
  • 3. Videocards: programmer’s PoV Modern GPU programming models (CUDA, AMD Stream, OpenCL, DirectCompute): Parallel algorithm is defined by the pair: 1) kernel (loop iteration), 2) iteration bounds. Kernel is compiled by the driver. Iteration bounds are used to create grid of threads. Input data is copied to video memory. Execution gets kicked off. Result is copied to main memory.
  • 4. Example: SAXPY via CUDA __global__ void Saxpy(float a, float* X, float* Y) { inti = blockDim.x * blockIdx.x + threadIdx.x; Y[i] = a * X[i] + Y[i]; } cudaMemcpy(X, hX, cudaMemcpyHostToDevice); cudaMemcpy(Y, hY, cudaMemcpyHostToDevice); Saxpy<<<256, (N + 255) / 256>>>(a, hX, hY); cudaMemcpy(hY, Y, cudaMemcpyDeviceToHost);
  • 7. In fact Brahma: Data structures: data parallel array. Computations: C# expressions, LINQ combinators. Accelerator v2: Data structures: data parallel array. Computations: arithmetic operators, number of predefined functions. This does the trick for a lot of algorithms. But what if we’ve got branching or non-regular memory access?
  • 8. Example: CUDA interop saxpy = @”__global__ void Saxpy(float a, float* X, float* Y) { inti = blockDim.x * blockIdx.x + threadIdx.x; Y[i] = a * X[i] + Y[i]; }”; nvcuda.cuModuleLoadDataEx(saxpy); nvcuda.cuMemcpyHtoD(X, Y); nvcuda.cuParamSeti(a, X, Y); nvcuda.cuLaunchGrid(256, (N + 255) / 256); nvcuda.cuMemcpyDtoH(Y);
  • 9. Conflux Kernels are written in C#: data structures, local variables, branching, loops float a; float[] x; [Result] float[] y; vari = GlobalIdx.X; y[i] = a * x[i] + y[i];
  • 10. Conflux Avoids explicit interop with unmanaged code, lets programmer use native .NET data types. float[] x, y; varcfg = new CudaConfig(); var kernel = cfg.Configure<Saxpy>(); y = kernel.Execute(a, x, y);
  • 11. How does it work? Front end: decompiles C#. AST transformer: inlines calls, destructures classes and arrays, maps intrinsincs. Back end:generates PTX (NVIDIA GPU assembler) and/or multicoreIL. Interop: binds to nvcuda driver that is capable of executing GPU assembler.
  • 12. Current progress http://bitbucket.org/conflux/conflux Proof of concept. Capable of computing hello-world of parallel computations: matrix multiplication. If we don’t take into account [currently]high overhead incurred by JIT-compilation, the idea works finely even for naïve code generator: 1x CPU < 2x CPU << GPU. Triple license: AGPL, exception for OSS projects, commercial.
  • 13. Demo
  • 14. Future work GPU-specific optimizations (e.g. diagonal stripes for optimizing bandwidth utilization of matrix transposition) Polyhedral model for loop nest optimization (can be configured to fit specific levels and sizes of memory hierarchy, there exist GPU-specific linear heuristics that optimize spatial and temporal locality). Distributed execution (a new level of memory hierarchy if we use polyhedral model).
  • 15. Conclusion Conflux: GPGPU for .NET http://bitbucket.org/conflux/conflux eugene.burmako@confluxhpc.net