SlideShare a Scribd company logo
1 of 28
Download to read offline
GPU Compute in Medical
and Print Imaging
Amey Deosthali
Director, Embedded Imaging
Medical Imaging Trends
SYSTEM
OPTIMIZATION AND
MINIATURIZATION
 Advances in visualization
and increased use of
3D/4D imaging for
improved diagnosis
 High-end systems of
yesterday becoming
portables of today
INCREASED USE
OF 3D/4D
IMAGING
INTEGRATION OF
MODALITIES &
ADVANCED FEATURES
 Endoscopic ultrasound,
Augmented reality,
Robotic endoscopy
INCREASED
SYSTEM COST
PRESSURES
 Expanding emerging
markets, regulatory
pressures, increased
competition
Print Imaging Trends
Traditional Multi-Function
Printer Architecture
GPU Compute based Multi-
Function Printer Architecture
SoC with GPU
SCALABLE SOFTWARE SCALABLE ARCHITECTURE SYSTEM COST SAVINGS
GPU Compute and AMD APU
GPU Compute in Imaging
 Medical and Print Imaging workloads are well suited for GPU compute
HSA architecture can deliver significant benefits in the field of Imaging
 AMD APUs integrate GPU with support for Heterogeneous System Architecture (HSA)
GPU COMPUTE IN
MEDICAL IMAGING
Typical Ultrasound Imaging Pipeline
Transmitter
Receiver
Beamforming IQ Demodulation
Filters
- Edge enhancement
- Speckle Reduction
Log Compression
Envelope
Detection
Frame Averaging
2D Image
formation
Frequency/Time
Compounding
Color flow analysis
Velocity
Estimation
Wall Filter
Spatial Doppler
Scan Conversion
Echo Processing
Color Flow Processing
Transducer
GPU Friendly
FASTER
SCANS
 Evolution in algorithm
complexity with GPU
 Reconstruct whole
image plane
IMPROVED IMAGE
QUALITY
ACCESS TO RAW
DATA
 Fast data transfer and efficient
use of system memory
SIMPLIFIED
ARCHITECTURE
 Scalable SW defined
architecture
GPU Compute for SW Beamforming
Bridge
Convert JESD-204b
to PCIe
JESD-204b
64-256 I/O Channels
Image Formation
Plane Wave Imaging
• FK Stolts with optimized
FFT/iFFT
• IQ Demodulation and
Log Compression
Image Post Processing
Separable Filters
• Sobel and Box filters
Non-separable Filter
• Laplacian of Gaussian
De-speckle Filter
• Median filter
Frequency Domain Filter
• Gaussian blur and Edge
Enhancement filters
Gen 3 PCIe® x16
dGMA support for 10+ GBps
GPU
coherent compounding
GPU + CPU
post processing
SW Beamforming on AMD APU
Transpose
1D
FFT
Z Shift &
Transpose
1D
IFFT
FK interpolation
1D
IFFT
Acquisition
Device
iGPU or dGPU
Software Beamformer
Direct
GMA
(> 10
GB/s)RF Data
1D
FFT
X Shift &
Transpose
Transpose
OpenCL™ implementation of FK Stolts algorithm
SW Beamformer
Performance1
APU dGPU
256 Channel, 2048
Samples
1.95 ms 0.47 ms
128 Channel, 2048
Samples
1.15ms 0.29 ms
Processed Output
5x5
Median
Filter
Speckle Noise Reduction
Down
Sample by 2
Subtract
Multiply
With
Coefficients
Up-sample
by 2
Gama
Correction
Down-
Sample by 2
Up-Sample
by 2
Sub
Gama
Correction
Down
Sample by 2
Sobel Diffusion
Gama
Correction
Pixel
Correction
IQ Demodulation
Output
Speckle Reduction
Output
Speckle Noise Reduction Optimization
• Combine multiple functions
into single kernel
• Get more compute per byte of global
memory access
• Reduce kernel launch delay overheads
• Reduce use of temporary
buffers and buffer copies
• Reduce CPU bottlenecks that
require blocking calls by
moving operations to GPU
• Optimize pipeline with “in
order” enqueue of OpenCL
commands
Block
A
Block
B
Block
C
Block
E
Block
D
Block A & B
(Multiple
OpenCL
kernels)
Block C & D
(Multiple
OpenCL
kernels)
Block E
(Multiple
OpenCL
kernels)
CPU Path
(4.10 ms)
GPU Path2
(1.01 ms)
Downsample
+ memcpy
Downsample
+ Optimized
memcpy
Color conversion, edge detection, diffusion,
normalization, gamma correction, image enhancement
Code Migration and Optimization Process
1. Profile
Identify target
workloads to convert
2. Convert
Target workloads from
CPU to GPU
3. Block
Optimization
Combine multiple CPU
calls to a single OpenCL
kernel
4. Buffer
Optimization
Reduce use of
temporary buffers and
buffer copies
5. Pipeline
Optimization
Move low workload CPU
operations to GPU to
reduce blocking calls
6. Reduce kernel
launch delay
“in order” enqueue of
OpenCL commands
Sobel Filter Optimization
8-bit Grayscale
Image
(1920x1080)
Median
Filter IPP
8 to 32-bit
Float
Sobel &
Sobel
Magnitude
Max & Min
6.51ms
19.47ms
Migrate Sobel filter to GPU
with OpenCL
A:
B:
8-bit Grayscale
Image
(1920x1080)
Median
Filter IPP
8 to 32-bit
Float
Sobel &
Sobel
Magnitude
Max & Min
CPU
Optimized
Modules
GPU
Optimized
Modules
OpenCL Optimized
2X faster computation time with
migration of single module to GPU3
GPU COMPUTE IN
PRINT IMAGING
Print and Scan Image Pipeline
Accelerated RIP Pipeline
Open source Ghostscript postscript
renderer accelerated using GPU4
AMD G-Series Reference Board
Ubuntu 14.04 Linux OS
KMD GFX Driver
OCL CodeGLSL Libraries
C
Libraries
OCL 2.0
Runtime
OGL 4.3
Runtime
Software Stack
PDF Files
on Disk
Bitmap
File
on
RAMdisk
PDL Interpreter
Element
Decompose
Generate Glyph
Bitmaps
Bitmap
Ghostscript App
Planarize
GPU
Raster
GPU
Color Conversion
GPU
DMA
DMA
OpenCL
GL Shader
Language
(GLSL)
CPU Operating in Host Memory GPU Operating in Device Memory
GPU compute can deliver large increase
in PPM performance4
RIP Pipeline acceleration: PPM performance
101.8
164
244.3
370
0
50
100
150
200
250
300
350
400
GX-412 GX-424
PPM
PPM - Test case 2 @600 dpi
Legacy code (no GPU accl)
GPU accelerated code
27.6
44
76.6
111
0
20
40
60
80
100
120
GX-412 GX-424
PPM
PPM - Test case 2 @1200 dpi
Legacy code (no GPU accl)
GPU accelerated code
2.4x
2.3x
2.8x
2.5x
PPM: Pages per Minute performance of
Ghostscript RIP pipeline
GPU compute can free up CPU
for other value added tasks4
CPU Load: Average load across all 4 CPU
cores of G-series devices under test
RIP Pipeline acceleration: CPU Load Reduction
0
10
20
30
40
50
60
30 40 50 60 70 75 80 90 100 125 150
%CPULoad(Avg)
PPM
Average CPU Load - Test case 2 @ 600 DPI*
Legacy code (no GPU accl): GX-424 Legacy code (no GPU accl): GX-412
GPU accelerated code: GX-424 GPU accelerated code: GX-412
0
10
20
30
40
50
60
70
80
5 10 15 20 25 30 35 40%CPULoad(Avg)
PPM
Average CPU Load - Test case 2 @ 1200 DPI*
Legacy code (no GPU accl): GX-424 Legacy code (no GPU accl): GX-412
GPU accelerated code: GX-424 GPU accelerated code: GX-412
Optical Character Recognition: Tesseract Project
Accelerated using GPU
Tesseract Flow Optical Character Recognition (OCR) Project
 Tesseract : Open source Optical Character
Recognition(OCR) Engine
GPU Compute for OCR
 Most of the image preprocessing and character
recognition is GPU friendly
 The data structures in word recognition phase are
not very GPU friendly
Expected Future Improvements
 Deep Neural Network (DNN) for character
recognition
Optical Character Recognition: Demo Performance
Processing time measured for above modules with CPU
processing and GPU accelerated processing5
AMD APU 95W
(Time in seconds)
AMD APU 35W
(Time in seconds)
Non OpenCL
(CPU only)
23.65 46.2
OpenCL
(GPU Compute)
16.79 36.3
Gain 41% 27%
Core Scan Processing Algorithms
• AMD worked with customer to accelerate partial scan pipeline using OpenCL on AMD APU
and GPU
• Scan pipeline includes several image processing algorithms such as grayscale conversion,
edge detection, rotation, color conversion etc.
• GPU compute can deliver significant improvement in processing time compared to CPU based
processing6
– Translates to faster scan time and higher scan ppm
Iterative algorithm
optimization on AMD APU
CPU Optimized
(Execution Time)*
OpenCL Optimized
(Execution Time)
OpenCL Optimized Fused Code
(Execution Time)
Grayscale 13.5 ms 4.6 ms (2.9x)
Median 25.6 ms 3.1 ms (8.3x)
Grayscale + Median 39.1 ms 7.9 ms (5.0x) 5.9 ms (6.6x)
Color
Conversion
Partial scan pipeline acceleration
Document Detect and
Alignment correction
Quality
Improvement
7 8
CONCLUSION
The Future is bright with GPU Compute
Improve quality of human
care with improved accuracy
Empower new experiences with
next generation technology
Enhance performance while
reducing system cost
Endnotes
1Testing by AMD performance labs. Measured performance of OpenCL™ implementation of FK Stolts algorithm on AMD APU and AMD FirePro GPU.
System Configuration: AMD Lamar development board with Windows® 10, AMD RX427BB 35W APU, 2.7/3.6 GHz, 2133 MHz DDR3, 8GB RAM. Discrete
GPU: AMD FirePro ™ W9100 GPU, 275W, 5.2 TFLOPS SP, 16GB GDDR5, 512-bit memory interface, Windows 10. Driver version 15.200.1045-150622a
2Testing by AMD performance labs. Measured performance of Speckle Noise Reduction pipeline with and without GPU acceleration, multi-threaded CPU
compiler option. Image size: 768 x 252, active ROI was 712 x 252.
System Configuration: AMD Lamar development board with Windows® 10, AMD RX427BB 35W APU, 2.7/3.6 GHz, 2133 MHz DDR3, 8GB RAM. Discrete
GPU: AMD FirePro ™ W9100 GPU, 275W, 5.2 TFLOPS SP, 16GB GDDR5, 512-bit memory interface, Windows 10. Driver version 16.20-160405a-301215E
3Testing by AMD performance labs. Measured performance of Sobel Filter with and without GPU acceleration. 8.2 Multi Threaded Library. Image resolution:
1920x1080. Sobel filter size: 5x5
System Configuration: Advantech ComE board with Windows 7 64-bit, AMD RX425BB, 35W, 2.5/3.4 GHz, 1866 MHz DDR3, 4GB RAM, AMD driver version:
14.502.1001.1001, OpenCL 1.2
4Testing by AMD performance labs. Measured performance of Raster Image Processing with and without GPU acceleration.
System Configuration: AMD GX-424CC: 25W, 2.4 GHz, 1866 MHz DDR3, 8GB RAM, AMD GX-412HC: 7W, 1.2 GHz, 1333 MHz DDR3, 8 GB RAM. Ubuntu
14.04 with AMD Catalyst Driver 14.301.1001
Endnotes
5Testing by AMD performance labs. Measured performance of Optical Character Recognition using Tesseract open source code with and
without GPU acceleration.
System Configuration: AMD APU 95W: AMD A10-7850K APU with Radeon™ HD Graphics, 3.7/4.0 GHz, AMD APU 35W: AMD A10-7400P APU
with Radeon™ HD Graphics, 2.7/3.6 GHz. Windows® 8.1, OpenCL™ 1.2, version 1084.4
5Testing by AMD performance labs. Measured performance of Optical Character Recognition using Tesseract open source code with and
without GPU acceleration.
System Configuration: AMD APU 95W: AMD A10-7850K APU with Radeon™ HD Graphics, 3.7/4.0 GHz, AMD APU 35W: AMD A10-7400P APU
with Radeon™ HD Graphics, 2.7/3.6 GHz. Windows® 8.1, OpenCL™ 1.2, version 1084.4
6Testing by AMD performance labs. Measured performance of scan pipeline performance using proprietary customer code with and without
GPU acceleration.
System Configuration: AMD Olive Hill+ development board, AMD RX427BB: 25W, 2.7 GHz, 1600 MHz DDR3, 8GB RAM, Windows 8.1, AMD
Catalyst 14.29 drivers and OpenCL™ 1.2
Endnotes
7Testing by AMD performance labs. Measured performance of partial scan pipeline using proprietary customer code.
System Configuration: AMD Olive Hill+ development board with AMD RX427BB: 35W, 2.7 GHz, 1600 MHz DDR3, 8GB RAM Ubuntu 14.04 and
AMD Catalyst driver 14.29
8Testing by AMD performance labs. Measured performance of partial scan pipeline using proprietary customer code.
System Configuration: : 2015 MacBook Pro with Intel Core i7-4980HQ 2.8 GHz, 16 GB DDR3L RAM. AMD Radeon™ R9 M370X Graphics, 2GB
GDDR5, Mac OS X 10.10.3. AMD Catalyst 14.29
Disclaimer
The information contained herein is for informational purposes only, and is subject to change without notice. While every precaution has
been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under
no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with
respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied
warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware,
software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted
by this document. Terms and limitations applicable to the purchase or use of AMD’s products are as set forth in a signed agreement between
the parties or in AMD's Standard Terms and Conditions of Sale.
AMD's products are not designed, intended, authorized or warranted for use as components in systems intended for surgical implant into
the body, or in other applications intended to support or sustain life, or in any other application in which the failure of AMD's product
could create a situation where personal injury, death, or severe property or environmental damage may occur. AMD reserves the right to
discontinue or make changes to its products at any time without notice.
AMD does not provide a license/sublicense to any intellectual property rights relating to any to any standards, including but not limited to any
audio and/or video codec technologies such as AVC/H.264/MPEG-4, AVC, VC-1, MPEG-2, and DivX/xVid.
AMD, the AMD Arrow logo, AMD Catalyst, AMD CrossFire, AMD CrossFireX, AMD Radeon, ATI Radeon, and combinations thereof are
trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be
trademarks of their respective companies.
Windows and DirectX are registered trademarks of Microsoft Corporation. ARM is a registered trademark of ARM Limited. 3DMark is a
trademark of Futuremark Corporation. DivX is a registered trademark of DivX, Inc. HDMI is a trademark of HDMI Licensing, LLC. Linux is a
registered trademark of Linus Torvalds. OpenCL is a trademark of Apple Inc. used by permission of Khronos. PCIe and PCI Express are
registered trademarks of PCI-SIG Corporation.
© 2016 Advanced Micro Devices, Inc. All rights reserved.
THANK YOU

More Related Content

What's hot

A Reference Architecture for IoT
A Reference Architecture for IoT A Reference Architecture for IoT
A Reference Architecture for IoT
WSO2
 
Resnet.pptx
Resnet.pptxResnet.pptx
Resnet.pptx
YanhuaSi
 

What's hot (20)

Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
HDR and WCG Principles-Part 3
HDR and WCG Principles-Part 3HDR and WCG Principles-Part 3
HDR and WCG Principles-Part 3
 
1.Introduction to deep learning
1.Introduction to deep learning1.Introduction to deep learning
1.Introduction to deep learning
 
15CS81 Module1 IoT
15CS81 Module1 IoT15CS81 Module1 IoT
15CS81 Module1 IoT
 
Brain Tumor Segmentation using Enhanced U-Net Model with Empirical Analysis
Brain Tumor Segmentation using Enhanced U-Net Model with Empirical AnalysisBrain Tumor Segmentation using Enhanced U-Net Model with Empirical Analysis
Brain Tumor Segmentation using Enhanced U-Net Model with Empirical Analysis
 
Prototyping Embedded Devices_Internet of Things
Prototyping Embedded Devices_Internet of ThingsPrototyping Embedded Devices_Internet of Things
Prototyping Embedded Devices_Internet of Things
 
Time series predictions using LSTMs
Time series predictions using LSTMsTime series predictions using LSTMs
Time series predictions using LSTMs
 
Skin Cancer Detection Using Deep Learning Techniques
Skin Cancer Detection Using Deep Learning TechniquesSkin Cancer Detection Using Deep Learning Techniques
Skin Cancer Detection Using Deep Learning Techniques
 
A Reference Architecture for IoT
A Reference Architecture for IoT A Reference Architecture for IoT
A Reference Architecture for IoT
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
Top 5 IoT Use Cases
Top 5 IoT Use CasesTop 5 IoT Use Cases
Top 5 IoT Use Cases
 
Handwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural networkHandwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural network
 
Video Classification Basic
Video Classification Basic Video Classification Basic
Video Classification Basic
 
Silicon Photonics 2018 - Report by Yole Developpement
Silicon Photonics 2018 - Report by Yole Developpement Silicon Photonics 2018 - Report by Yole Developpement
Silicon Photonics 2018 - Report by Yole Developpement
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)
 
Explainability for Learning to Rank
Explainability for Learning to RankExplainability for Learning to Rank
Explainability for Learning to Rank
 
Introduction to OpenCV
Introduction to OpenCVIntroduction to OpenCV
Introduction to OpenCV
 
Resnet.pptx
Resnet.pptxResnet.pptx
Resnet.pptx
 
Brain tumor detection using image segmentation ppt
Brain tumor detection using image segmentation pptBrain tumor detection using image segmentation ppt
Brain tumor detection using image segmentation ppt
 
Image classification using convolutional neural network
Image classification using convolutional neural networkImage classification using convolutional neural network
Image classification using convolutional neural network
 

Viewers also liked

Viewers also liked (20)

AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
 
AMD Enduro Technology
AMD Enduro TechnologyAMD Enduro Technology
AMD Enduro Technology
 
Diagnostic Imaging Global Market - Forecast to 2020
Diagnostic Imaging Global Market - Forecast to 2020Diagnostic Imaging Global Market - Forecast to 2020
Diagnostic Imaging Global Market - Forecast to 2020
 
Introduction to Neuroimaging Informatics
Introduction to Neuroimaging InformaticsIntroduction to Neuroimaging Informatics
Introduction to Neuroimaging Informatics
 
Data Explosion in Medical Imaging
Data Explosion in Medical ImagingData Explosion in Medical Imaging
Data Explosion in Medical Imaging
 
AMD Vega Presentation - GPU Memory Architecture
AMD Vega Presentation - GPU Memory ArchitectureAMD Vega Presentation - GPU Memory Architecture
AMD Vega Presentation - GPU Memory Architecture
 
Advanced Micro Devices - AMD
Advanced Micro Devices - AMDAdvanced Micro Devices - AMD
Advanced Micro Devices - AMD
 
Langs - Machine Learning in Medical Imaging: Learning from Large-scale popula...
Langs - Machine Learning in Medical Imaging: Learning from Large-scale popula...Langs - Machine Learning in Medical Imaging: Learning from Large-scale popula...
Langs - Machine Learning in Medical Imaging: Learning from Large-scale popula...
 
Introduction to Medical Imaging (informatics approach)
Introduction to Medical Imaging (informatics approach)Introduction to Medical Imaging (informatics approach)
Introduction to Medical Imaging (informatics approach)
 
Technological needs calling for the application of coaching in university adv...
Technological needs calling for the application of coaching in university adv...Technological needs calling for the application of coaching in university adv...
Technological needs calling for the application of coaching in university adv...
 
Computer Aided Detection and Diagnosis in medical imaging: a review of clinic...
Computer Aided Detection and Diagnosis in medical imaging: a review of clinic...Computer Aided Detection and Diagnosis in medical imaging: a review of clinic...
Computer Aided Detection and Diagnosis in medical imaging: a review of clinic...
 
AMD Radeon Instinct
AMD Radeon InstinctAMD Radeon Instinct
AMD Radeon Instinct
 
AMD PowerTune & ZeroCore Power Technologies
AMD PowerTune & ZeroCore Power TechnologiesAMD PowerTune & ZeroCore Power Technologies
AMD PowerTune & ZeroCore Power Technologies
 
Fields of digital image processing slides
Fields of digital image processing slidesFields of digital image processing slides
Fields of digital image processing slides
 
Deep Learning for Computer Vision: Medical Imaging (UPC 2016)
Deep Learning for Computer Vision: Medical Imaging (UPC 2016)Deep Learning for Computer Vision: Medical Imaging (UPC 2016)
Deep Learning for Computer Vision: Medical Imaging (UPC 2016)
 
Medical Imaging: 8 Opportunities for technology entrepreneurs and investors
Medical Imaging: 8 Opportunities for technology entrepreneurs and investorsMedical Imaging: 8 Opportunities for technology entrepreneurs and investors
Medical Imaging: 8 Opportunities for technology entrepreneurs and investors
 
Medical image analysis
Medical image analysisMedical image analysis
Medical image analysis
 
Race to Reality: The Next Billion-People Market Opportunity
Race to Reality: The Next Billion-People Market OpportunityRace to Reality: The Next Billion-People Market Opportunity
Race to Reality: The Next Billion-People Market Opportunity
 
Applications of Digital image processing in Medical Field
Applications of Digital image processing in Medical FieldApplications of Digital image processing in Medical Field
Applications of Digital image processing in Medical Field
 
Introduction to Medical Imaging
Introduction to Medical ImagingIntroduction to Medical Imaging
Introduction to Medical Imaging
 

Similar to GPU Compute in Medical and Print Imaging

Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrj
Roberto Brandao
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 Kiev
Volodymyr Saviak
 
Computação acelerada – a era das ap us roberto brandão, ciência
Computação acelerada – a era das ap us   roberto brandão,  ciênciaComputação acelerada – a era das ap us   roberto brandão,  ciência
Computação acelerada – a era das ap us roberto brandão, ciência
Campus Party Brasil
 
Cuda 6 performance_report
Cuda 6 performance_reportCuda 6 performance_report
Cuda 6 performance_report
Michael Zhang
 
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent AcceleratorExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
Jinho Lee
 

Similar to GPU Compute in Medical and Print Imaging (20)

SDC Server Sao Jose
SDC Server Sao JoseSDC Server Sao Jose
SDC Server Sao Jose
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrj
 
Industry’s performance leading ultra low-power dsp solution
Industry’s performance leading ultra low-power dsp solutionIndustry’s performance leading ultra low-power dsp solution
Industry’s performance leading ultra low-power dsp solution
 
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfNVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?計算力学シミュレーションに GPU は役立つのか?
計算力学シミュレーションに GPU は役立つのか?
 
Nvidia tesla-k80-overview
Nvidia tesla-k80-overviewNvidia tesla-k80-overview
Nvidia tesla-k80-overview
 
Octnews featured article
Octnews featured articleOctnews featured article
Octnews featured article
 
Kindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 KievKindratenko hpc day 2011 Kiev
Kindratenko hpc day 2011 Kiev
 
How to Choose Mobile Workstation? VR Ready
How to Choose Mobile Workstation? VR ReadyHow to Choose Mobile Workstation? VR Ready
How to Choose Mobile Workstation? VR Ready
 
Computação acelerada – a era das ap us roberto brandão, ciência
Computação acelerada – a era das ap us   roberto brandão,  ciênciaComputação acelerada – a era das ap us   roberto brandão,  ciência
Computação acelerada – a era das ap us roberto brandão, ciência
 
Solving channel coding simulation and optimization problems using GPU
Solving channel coding simulation and optimization problems using GPUSolving channel coding simulation and optimization problems using GPU
Solving channel coding simulation and optimization problems using GPU
 
Cuda 6 performance_report
Cuda 6 performance_reportCuda 6 performance_report
Cuda 6 performance_report
 
Monte Carlo on GPUs
Monte Carlo on GPUsMonte Carlo on GPUs
Monte Carlo on GPUs
 
BURA Supercomputer
BURA SupercomputerBURA Supercomputer
BURA Supercomputer
 
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent AcceleratorExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
ExtraV - Boosting Graph Processing Near Storage with a Coherent Accelerator
 
Cache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing UnitsCache Optimization Techniques for General Purpose Graphic Processing Units
Cache Optimization Techniques for General Purpose Graphic Processing Units
 
customization of a deep learning accelerator, based on NVDLA
customization of a deep learning accelerator, based on NVDLAcustomization of a deep learning accelerator, based on NVDLA
customization of a deep learning accelerator, based on NVDLA
 
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
 
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APUDelivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
 

More from AMD

More from AMD (20)

“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
 
Heterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D PackagingHeterogeneous Integration with 3D Packaging
Heterogeneous Integration with 3D Packaging
 
3D V-Cache
3D V-Cache 3D V-Cache
3D V-Cache
 
AMD EPYC Family World Record Performance Summary Mar 2022
AMD EPYC Family World Record Performance Summary Mar 2022AMD EPYC Family World Record Performance Summary Mar 2022
AMD EPYC Family World Record Performance Summary Mar 2022
 
AMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World Record
 
AMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World RecordAMD EPYC Family of Processors World Record
AMD EPYC Family of Processors World Record
 
AMD EPYC World Records
AMD EPYC World RecordsAMD EPYC World Records
AMD EPYC World Records
 
AMD: Where Gaming Begins
AMD: Where Gaming BeginsAMD: Where Gaming Begins
AMD: Where Gaming Begins
 
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World Records
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World Records
 
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor CoreZen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
 
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUsAMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs
 
AMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop ProductsAMD Chiplet Architecture for High-Performance Server and Desktop Products
AMD Chiplet Architecture for High-Performance Server and Desktop Products
 
AMD EPYC 100 World Records and Counting
AMD EPYC 100 World Records and CountingAMD EPYC 100 World Records and Counting
AMD EPYC 100 World Records and Counting
 
AMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World RecordsAMD EPYC 7002 Launch World Records
AMD EPYC 7002 Launch World Records
 
Delivering the Future of High-Performance Computing
Delivering the Future of High-Performance ComputingDelivering the Future of High-Performance Computing
Delivering the Future of High-Performance Computing
 
7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance 7nm "Navi" GPU - A GPU Built For Performance
7nm "Navi" GPU - A GPU Built For Performance
 
The Path to "Zen 2"
The Path to "Zen 2"The Path to "Zen 2"
The Path to "Zen 2"
 
AMD Next Horizon
AMD Next HorizonAMD Next Horizon
AMD Next Horizon
 

Recently uploaded

Recently uploaded (20)

Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

GPU Compute in Medical and Print Imaging

  • 1. GPU Compute in Medical and Print Imaging Amey Deosthali Director, Embedded Imaging
  • 2. Medical Imaging Trends SYSTEM OPTIMIZATION AND MINIATURIZATION  Advances in visualization and increased use of 3D/4D imaging for improved diagnosis  High-end systems of yesterday becoming portables of today INCREASED USE OF 3D/4D IMAGING INTEGRATION OF MODALITIES & ADVANCED FEATURES  Endoscopic ultrasound, Augmented reality, Robotic endoscopy INCREASED SYSTEM COST PRESSURES  Expanding emerging markets, regulatory pressures, increased competition
  • 3. Print Imaging Trends Traditional Multi-Function Printer Architecture GPU Compute based Multi- Function Printer Architecture SoC with GPU SCALABLE SOFTWARE SCALABLE ARCHITECTURE SYSTEM COST SAVINGS
  • 4. GPU Compute and AMD APU GPU Compute in Imaging  Medical and Print Imaging workloads are well suited for GPU compute HSA architecture can deliver significant benefits in the field of Imaging  AMD APUs integrate GPU with support for Heterogeneous System Architecture (HSA)
  • 6. Typical Ultrasound Imaging Pipeline Transmitter Receiver Beamforming IQ Demodulation Filters - Edge enhancement - Speckle Reduction Log Compression Envelope Detection Frame Averaging 2D Image formation Frequency/Time Compounding Color flow analysis Velocity Estimation Wall Filter Spatial Doppler Scan Conversion Echo Processing Color Flow Processing Transducer GPU Friendly
  • 7. FASTER SCANS  Evolution in algorithm complexity with GPU  Reconstruct whole image plane IMPROVED IMAGE QUALITY ACCESS TO RAW DATA  Fast data transfer and efficient use of system memory SIMPLIFIED ARCHITECTURE  Scalable SW defined architecture GPU Compute for SW Beamforming Bridge Convert JESD-204b to PCIe JESD-204b 64-256 I/O Channels Image Formation Plane Wave Imaging • FK Stolts with optimized FFT/iFFT • IQ Demodulation and Log Compression Image Post Processing Separable Filters • Sobel and Box filters Non-separable Filter • Laplacian of Gaussian De-speckle Filter • Median filter Frequency Domain Filter • Gaussian blur and Edge Enhancement filters Gen 3 PCIe® x16 dGMA support for 10+ GBps GPU coherent compounding GPU + CPU post processing
  • 8. SW Beamforming on AMD APU Transpose 1D FFT Z Shift & Transpose 1D IFFT FK interpolation 1D IFFT Acquisition Device iGPU or dGPU Software Beamformer Direct GMA (> 10 GB/s)RF Data 1D FFT X Shift & Transpose Transpose OpenCL™ implementation of FK Stolts algorithm SW Beamformer Performance1 APU dGPU 256 Channel, 2048 Samples 1.95 ms 0.47 ms 128 Channel, 2048 Samples 1.15ms 0.29 ms Processed Output 5x5 Median Filter
  • 9. Speckle Noise Reduction Down Sample by 2 Subtract Multiply With Coefficients Up-sample by 2 Gama Correction Down- Sample by 2 Up-Sample by 2 Sub Gama Correction Down Sample by 2 Sobel Diffusion Gama Correction Pixel Correction IQ Demodulation Output Speckle Reduction Output
  • 10. Speckle Noise Reduction Optimization • Combine multiple functions into single kernel • Get more compute per byte of global memory access • Reduce kernel launch delay overheads • Reduce use of temporary buffers and buffer copies • Reduce CPU bottlenecks that require blocking calls by moving operations to GPU • Optimize pipeline with “in order” enqueue of OpenCL commands Block A Block B Block C Block E Block D Block A & B (Multiple OpenCL kernels) Block C & D (Multiple OpenCL kernels) Block E (Multiple OpenCL kernels) CPU Path (4.10 ms) GPU Path2 (1.01 ms) Downsample + memcpy Downsample + Optimized memcpy Color conversion, edge detection, diffusion, normalization, gamma correction, image enhancement
  • 11. Code Migration and Optimization Process 1. Profile Identify target workloads to convert 2. Convert Target workloads from CPU to GPU 3. Block Optimization Combine multiple CPU calls to a single OpenCL kernel 4. Buffer Optimization Reduce use of temporary buffers and buffer copies 5. Pipeline Optimization Move low workload CPU operations to GPU to reduce blocking calls 6. Reduce kernel launch delay “in order” enqueue of OpenCL commands
  • 12. Sobel Filter Optimization 8-bit Grayscale Image (1920x1080) Median Filter IPP 8 to 32-bit Float Sobel & Sobel Magnitude Max & Min 6.51ms 19.47ms Migrate Sobel filter to GPU with OpenCL A: B: 8-bit Grayscale Image (1920x1080) Median Filter IPP 8 to 32-bit Float Sobel & Sobel Magnitude Max & Min CPU Optimized Modules GPU Optimized Modules OpenCL Optimized 2X faster computation time with migration of single module to GPU3
  • 14. Print and Scan Image Pipeline
  • 15. Accelerated RIP Pipeline Open source Ghostscript postscript renderer accelerated using GPU4 AMD G-Series Reference Board Ubuntu 14.04 Linux OS KMD GFX Driver OCL CodeGLSL Libraries C Libraries OCL 2.0 Runtime OGL 4.3 Runtime Software Stack PDF Files on Disk Bitmap File on RAMdisk PDL Interpreter Element Decompose Generate Glyph Bitmaps Bitmap Ghostscript App Planarize GPU Raster GPU Color Conversion GPU DMA DMA OpenCL GL Shader Language (GLSL) CPU Operating in Host Memory GPU Operating in Device Memory
  • 16. GPU compute can deliver large increase in PPM performance4 RIP Pipeline acceleration: PPM performance 101.8 164 244.3 370 0 50 100 150 200 250 300 350 400 GX-412 GX-424 PPM PPM - Test case 2 @600 dpi Legacy code (no GPU accl) GPU accelerated code 27.6 44 76.6 111 0 20 40 60 80 100 120 GX-412 GX-424 PPM PPM - Test case 2 @1200 dpi Legacy code (no GPU accl) GPU accelerated code 2.4x 2.3x 2.8x 2.5x PPM: Pages per Minute performance of Ghostscript RIP pipeline
  • 17. GPU compute can free up CPU for other value added tasks4 CPU Load: Average load across all 4 CPU cores of G-series devices under test RIP Pipeline acceleration: CPU Load Reduction 0 10 20 30 40 50 60 30 40 50 60 70 75 80 90 100 125 150 %CPULoad(Avg) PPM Average CPU Load - Test case 2 @ 600 DPI* Legacy code (no GPU accl): GX-424 Legacy code (no GPU accl): GX-412 GPU accelerated code: GX-424 GPU accelerated code: GX-412 0 10 20 30 40 50 60 70 80 5 10 15 20 25 30 35 40%CPULoad(Avg) PPM Average CPU Load - Test case 2 @ 1200 DPI* Legacy code (no GPU accl): GX-424 Legacy code (no GPU accl): GX-412 GPU accelerated code: GX-424 GPU accelerated code: GX-412
  • 18. Optical Character Recognition: Tesseract Project Accelerated using GPU Tesseract Flow Optical Character Recognition (OCR) Project  Tesseract : Open source Optical Character Recognition(OCR) Engine GPU Compute for OCR  Most of the image preprocessing and character recognition is GPU friendly  The data structures in word recognition phase are not very GPU friendly Expected Future Improvements  Deep Neural Network (DNN) for character recognition
  • 19. Optical Character Recognition: Demo Performance Processing time measured for above modules with CPU processing and GPU accelerated processing5 AMD APU 95W (Time in seconds) AMD APU 35W (Time in seconds) Non OpenCL (CPU only) 23.65 46.2 OpenCL (GPU Compute) 16.79 36.3 Gain 41% 27%
  • 20. Core Scan Processing Algorithms • AMD worked with customer to accelerate partial scan pipeline using OpenCL on AMD APU and GPU • Scan pipeline includes several image processing algorithms such as grayscale conversion, edge detection, rotation, color conversion etc. • GPU compute can deliver significant improvement in processing time compared to CPU based processing6 – Translates to faster scan time and higher scan ppm Iterative algorithm optimization on AMD APU CPU Optimized (Execution Time)* OpenCL Optimized (Execution Time) OpenCL Optimized Fused Code (Execution Time) Grayscale 13.5 ms 4.6 ms (2.9x) Median 25.6 ms 3.1 ms (8.3x) Grayscale + Median 39.1 ms 7.9 ms (5.0x) 5.9 ms (6.6x)
  • 21. Color Conversion Partial scan pipeline acceleration Document Detect and Alignment correction Quality Improvement 7 8
  • 23. The Future is bright with GPU Compute Improve quality of human care with improved accuracy Empower new experiences with next generation technology Enhance performance while reducing system cost
  • 24. Endnotes 1Testing by AMD performance labs. Measured performance of OpenCL™ implementation of FK Stolts algorithm on AMD APU and AMD FirePro GPU. System Configuration: AMD Lamar development board with Windows® 10, AMD RX427BB 35W APU, 2.7/3.6 GHz, 2133 MHz DDR3, 8GB RAM. Discrete GPU: AMD FirePro ™ W9100 GPU, 275W, 5.2 TFLOPS SP, 16GB GDDR5, 512-bit memory interface, Windows 10. Driver version 15.200.1045-150622a 2Testing by AMD performance labs. Measured performance of Speckle Noise Reduction pipeline with and without GPU acceleration, multi-threaded CPU compiler option. Image size: 768 x 252, active ROI was 712 x 252. System Configuration: AMD Lamar development board with Windows® 10, AMD RX427BB 35W APU, 2.7/3.6 GHz, 2133 MHz DDR3, 8GB RAM. Discrete GPU: AMD FirePro ™ W9100 GPU, 275W, 5.2 TFLOPS SP, 16GB GDDR5, 512-bit memory interface, Windows 10. Driver version 16.20-160405a-301215E 3Testing by AMD performance labs. Measured performance of Sobel Filter with and without GPU acceleration. 8.2 Multi Threaded Library. Image resolution: 1920x1080. Sobel filter size: 5x5 System Configuration: Advantech ComE board with Windows 7 64-bit, AMD RX425BB, 35W, 2.5/3.4 GHz, 1866 MHz DDR3, 4GB RAM, AMD driver version: 14.502.1001.1001, OpenCL 1.2 4Testing by AMD performance labs. Measured performance of Raster Image Processing with and without GPU acceleration. System Configuration: AMD GX-424CC: 25W, 2.4 GHz, 1866 MHz DDR3, 8GB RAM, AMD GX-412HC: 7W, 1.2 GHz, 1333 MHz DDR3, 8 GB RAM. Ubuntu 14.04 with AMD Catalyst Driver 14.301.1001
  • 25. Endnotes 5Testing by AMD performance labs. Measured performance of Optical Character Recognition using Tesseract open source code with and without GPU acceleration. System Configuration: AMD APU 95W: AMD A10-7850K APU with Radeon™ HD Graphics, 3.7/4.0 GHz, AMD APU 35W: AMD A10-7400P APU with Radeon™ HD Graphics, 2.7/3.6 GHz. Windows® 8.1, OpenCL™ 1.2, version 1084.4 5Testing by AMD performance labs. Measured performance of Optical Character Recognition using Tesseract open source code with and without GPU acceleration. System Configuration: AMD APU 95W: AMD A10-7850K APU with Radeon™ HD Graphics, 3.7/4.0 GHz, AMD APU 35W: AMD A10-7400P APU with Radeon™ HD Graphics, 2.7/3.6 GHz. Windows® 8.1, OpenCL™ 1.2, version 1084.4 6Testing by AMD performance labs. Measured performance of scan pipeline performance using proprietary customer code with and without GPU acceleration. System Configuration: AMD Olive Hill+ development board, AMD RX427BB: 25W, 2.7 GHz, 1600 MHz DDR3, 8GB RAM, Windows 8.1, AMD Catalyst 14.29 drivers and OpenCL™ 1.2
  • 26. Endnotes 7Testing by AMD performance labs. Measured performance of partial scan pipeline using proprietary customer code. System Configuration: AMD Olive Hill+ development board with AMD RX427BB: 35W, 2.7 GHz, 1600 MHz DDR3, 8GB RAM Ubuntu 14.04 and AMD Catalyst driver 14.29 8Testing by AMD performance labs. Measured performance of partial scan pipeline using proprietary customer code. System Configuration: : 2015 MacBook Pro with Intel Core i7-4980HQ 2.8 GHz, 16 GB DDR3L RAM. AMD Radeon™ R9 M370X Graphics, 2GB GDDR5, Mac OS X 10.10.3. AMD Catalyst 14.29
  • 27. Disclaimer The information contained herein is for informational purposes only, and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD’s products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale. AMD's products are not designed, intended, authorized or warranted for use as components in systems intended for surgical implant into the body, or in other applications intended to support or sustain life, or in any other application in which the failure of AMD's product could create a situation where personal injury, death, or severe property or environmental damage may occur. AMD reserves the right to discontinue or make changes to its products at any time without notice. AMD does not provide a license/sublicense to any intellectual property rights relating to any to any standards, including but not limited to any audio and/or video codec technologies such as AVC/H.264/MPEG-4, AVC, VC-1, MPEG-2, and DivX/xVid. AMD, the AMD Arrow logo, AMD Catalyst, AMD CrossFire, AMD CrossFireX, AMD Radeon, ATI Radeon, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. Windows and DirectX are registered trademarks of Microsoft Corporation. ARM is a registered trademark of ARM Limited. 3DMark is a trademark of Futuremark Corporation. DivX is a registered trademark of DivX, Inc. HDMI is a trademark of HDMI Licensing, LLC. Linux is a registered trademark of Linus Torvalds. OpenCL is a trademark of Apple Inc. used by permission of Khronos. PCIe and PCI Express are registered trademarks of PCI-SIG Corporation. © 2016 Advanced Micro Devices, Inc. All rights reserved.