SlideShare una empresa de Scribd logo
1 de 24
Florian Wende
Zuse-Institute Berlin
Connected Component
Labeling on Xeon Phi
Parallelization & Vectorization
wende@zib.de Connected Component Labeling on Xeon Phi 1ISC13, Leipzig
Connected Component Labeling
Suppose we are given the following image . . .
. . . and we are to assign unique labels to different connected regions!
Connected Component Labeling
wende@zib.de Connected Component Labeling on Xeon Phi 1ISC13, Leipzig
. . . and we are to assign unique labels to different connected regions!
. . . In parallel?
 Computer Vision
Detect connected regions in images
 Computational Physics
Cluster algorithms for the Ising model
 Percolation Theory
How to achieve the labeling? . . .
Connected Component Labeling
wende@zib.de Connected Component Labeling on Xeon Phi 2ISC13, Leipzig
1. Labeling algorithm
2. Parallelization
a. Parallel implementation on CPU
b. Run the CPU code on the Xeon Phi
c. Adapt the code for the Xeon Phi
3. Vectorization (SIMD)
d. Leave it to the compiler (auto-vectorization)
e. SIMD intrinsic functions
Xeon Phi: 512-Bit SIMD unit for 16 x 32-bit words
Connected Component Labeling - Strategy
wende@zib.de Connected Component Labeling on Xeon Phi 3ISC13, Leipzig
 Breadth/Depth first search algorithm, multi-pass algorithms
 Hoshen-Kopelman algorithm
 Cluster self-labeling algorithm by Coddington and Baillie
1. Assign a unique label to each pixel of the image
2. For each pixel consider its adjacent connected pixels in positive 1-, 2-, . . .
direction and set the respective labels to the minimum value each
3. If for all pixels the minimum operation is the identity function: Finished!
Otherwise: Continue with step 2
CPU: Hoshen-Kopelman
Xeon Phi: Hoshen-Kopelman vs. Cluster self-labeling
Connected Component Labeling - Algorithm
wende@zib.de Connected Component Labeling on Xeon Phi 4ISC13, Leipzig
Partition the image into equal-sized sub-images, and label them
independently using multiple threads
Connected Comp. Labeling - Parallelization
wende@zib.de Connected Component Labeling on Xeon Phi 5ISC13, Leipzig
Partition the image into equal-sized sub-images, and label them
independently using multiple threads
 Unique labels across
different sub-images
 Connected regions that
extend over multiple sub-
images are merged after the
labeling using atomic
primitives
Thread 0
Thread 2
Thread 4
Thread 6
Thread 1
Thread 3
Thread 5
Thread 7
Connected Comp. Labeling - Parallelization
wende@zib.de Connected Component Labeling on Xeon Phi 5ISC13, Leipzig
Example: Self-labeling within sub-image of thread 2
 Process multiple data simultaneously using SIMD instructions
Connected Comp. Labeling - Vectorization
wende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
 Process multiple data simultaneously using SIMD instructions
1. Initialize labeling (array index)
Example: Self-labeling within sub-image of thread 2
Connected Comp. Labeling - Vectorization
wende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
1. Initialize labeling (array index)
2. Load row[0] into reg0, and
create mask for adjacent
entries in positive 1-direction:
1 if equal-colored
0 otherwise
Example: Self-labeling within sub-image of thread 2
 Process multiple data simultaneously using SIMD instructions
1-direction
Connected Comp. Labeling - Vectorization
wende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
1. Initialize labeling (array index)
2. Load row[0] into reg0, and
create mask for adjacent
entries in positive 1-direction:
1 if equal-colored
0 otherwise
3. Overlap each element in reg0 with its
adjacent element in positive 1-direction,
and write the result to reg1
Example: Self-labeling within sub-image of thread 2
 Process multiple data simultaneously using SIMD instructions
Connected Comp. Labeling - Vectorization
wende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
4. Determine the pairwise
minimum of the entries in reg0
and reg1 using the mask, and
write the result to reg1
Connected Comp. Labeling - Vectorization
wende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
4. Determine the pairwise
minimum of the entries in reg0
and reg1 using the mask, and
write the result to reg1
5. Write back entries in reg1 to
row[0] using the mask
Connected Comp. Labeling - Vectorization
wende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
4. Determine the pairwise
minimum of the entries in reg0
and reg1 using the mask, and
write the result to reg1
5. Write back entries in reg1 to
row[0] using the mask
6. Shift all elements in reg1 one
position in positive 1-direction, shifting
in the 0-th element, and write the result to reg1
Connected Comp. Labeling - Vectorization
wende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
4. Determine the pairwise
minimum of the entries in reg0
and reg1 using the mask, and
write the result to reg1
5. Write back entries in reg1 to
row[0] using the mask
6. Shift all elements in reg1 one
position in positive 1-direction, shifting
in the 0-th element, and write the result to reg1
7. Shift all bits in mask one position up, and write the pairwise minimum
entries in row[0] and reg1 to row[0] using the shifted mask
Connected Comp. Labeling - Vectorization
wende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
4. Determine the pairwise
minimum of the entries in reg0
and reg1 using the mask, and
write the result to reg1
5. Write back entries in reg1 to
row[0] using the mask
6. Shift all elements in reg1 one
position in positive 1-direction, shifting
in the 0-th element, and write the result to reg1
7. Shift all bits in mask one position up, and write the pairwise minimum
entries in row[0] and reg1 to row[0] using the shifted mask
8. Did labels change?
Connected Comp. Labeling - Vectorization
wende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
Result of the operations up to now . . .
Set adjacent connected
elements in row[0] to the
pairwise minimum value each
Before
After
Repeat the procedure for the 2-direction.
1-direction
2-direction
Connected Comp. Labeling - Vectorization
wende@zib.de Connected Component Labeling on Xeon Phi 7ISC13, Leipzig
Repeat the procedure for all other rows as long as labels change . . .
Before
After
Now: Merge labels across different sub-images using atomics!
Finished!
Connected Comp. Labeling - Vectorization
wende@zib.de Connected Component Labeling on Xeon Phi 8ISC13, Leipzig
CPU: Xeon E5-2670, 8 Cores + 2-way Hyper-Threading @ 2.6GHz
 Hoshen-Kopelman algorithm + Atomics for label merging
 Vectorization was left to the compiler: there are no masked SIMD intrinsics!
Xeon Phi: 60 Cores + 4-way Hyper-Threading @ 1.1GHz
 Hoshen-Kopelman vs. Cluster self-labeling + Atomics for label merging
 Vectorization by means of _mm512_[mask]_XXX() instrinsics
Parallelization by means of OpenMP: #pragma omp parallel {...}
Programming effort: approx. 2-3 days for the CPU code (incl. optimization)
less than 1 day for the Xeon Phi code (based on CPU code)
Connected Comp. Labeling - Benchmark
wende@zib.de Connected Component Labeling on Xeon Phi 9ISC13, Leipzig
CPU: Intel Xeon E5-2670, 8 Cores + 2-way Hyper-Threading @ 2.6GHz
Xeon Phi: 60 Cores + 4-way Hyper-Threading @ 1.1GHz
Application: Swendsen-Wang cluster algorithm for the 2D Ising model
Connected Comp. Labeling - Benchmark
wende@zib.de Connected Component Labeling on Xeon Phi 10ISC13, Leipzig
CPU: Intel Xeon E5-2670, 8 Cores + 2-way Hyper-Threading @ 2.6GHz
Xeon Phi: 60 Cores + 4-way Hyper-Threading @ 1.1GHz
Application: Swendsen-Wang cluster algorithm for the 2D Ising model
Connected Comp. Labeling - Benchmark
wende@zib.de Connected Component Labeling on Xeon Phi 10ISC13, Leipzig
Work partially funded by
BMBF Grant No. 01IH11004G
Dr. Thomas Steinke
Zuse-Institute Berlin (ZIB)
Dr. Michael Klemm
Intel GmbH, Germany
Acknowledgement
wende@zib.de Connected Component Labeling on Xeon Phi 11ISC13, Leipzig
[1] C. F. Baillie and P. D. Coddington. Cluster Identification Algorithms
for Spin Models – Sequential and Parallel, 1991.
[2] Hoshen, J. and Kopelman, R. Percolation and Cluster Distribution.
I. Cluster Multiple Labeling Technique and Critical Concentration Algorithm.
Phys. Rev. B 14, 3438–3445, 1976
[3] R. H. Swendsen and J.-S. Wang. Nonuniversal Critical Dynamics in
Monte Carlo Simulations. Phys. Rev. Lett., 58:86–88, Jan 1987.
[4] Intel Corp. Intel Xeon Phi Coprocessor 5110P, Product Brief, 2012.
References
wende@zib.de Connected Component Labeling on Xeon Phi 12ISC13, Leipzig

Más contenido relacionado

Similar a Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization

Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level FeatureDongmin Choi
 
Anomaly Detection with Azure and .net
Anomaly Detection with Azure and .netAnomaly Detection with Azure and .net
Anomaly Detection with Azure and .netMarco Parenzan
 
GSP 215 RANK Education Counseling -- gsp215rank.com
GSP 215 RANK Education Counseling -- gsp215rank.comGSP 215 RANK Education Counseling -- gsp215rank.com
GSP 215 RANK Education Counseling -- gsp215rank.comkopiko85
 
GSP 215 RANK Education Your Life--gsp215rank.com
GSP 215 RANK Education Your Life--gsp215rank.comGSP 215 RANK Education Your Life--gsp215rank.com
GSP 215 RANK Education Your Life--gsp215rank.comthomashard64
 
GSP 215 RANK Lessons in Excellence-- gsp215rank.com
GSP 215 RANK Lessons in Excellence-- gsp215rank.comGSP 215 RANK Lessons in Excellence-- gsp215rank.com
GSP 215 RANK Lessons in Excellence-- gsp215rank.comRoelofMerwe102
 
GSP 215 RANK Inspiring Innovation--gsp215rank.com
GSP 215 RANK Inspiring Innovation--gsp215rank.com GSP 215 RANK Inspiring Innovation--gsp215rank.com
GSP 215 RANK Inspiring Innovation--gsp215rank.com KeatonJennings102
 
GSP 215 RANK Introduction Education--gsp215rank.com
GSP 215 RANK Introduction Education--gsp215rank.comGSP 215 RANK Introduction Education--gsp215rank.com
GSP 215 RANK Introduction Education--gsp215rank.comagathachristie281
 
GSP 215 RANK Education Counseling--gsp215rank.com
 GSP 215 RANK Education Counseling--gsp215rank.com GSP 215 RANK Education Counseling--gsp215rank.com
GSP 215 RANK Education Counseling--gsp215rank.comwilliamwordsworth40
 
GSP 215 RANK Education Planning--gsp215rank.com
GSP 215 RANK Education Planning--gsp215rank.comGSP 215 RANK Education Planning--gsp215rank.com
GSP 215 RANK Education Planning--gsp215rank.comWindyMiller12
 
GSP 215 Enhance teaching/tutorialrank.com
 GSP 215 Enhance teaching/tutorialrank.com GSP 215 Enhance teaching/tutorialrank.com
GSP 215 Enhance teaching/tutorialrank.comjonhson300
 
GSP 215 Inspiring Innovation/tutorialrank.com
GSP 215 Inspiring Innovation/tutorialrank.comGSP 215 Inspiring Innovation/tutorialrank.com
GSP 215 Inspiring Innovation/tutorialrank.comjonhson129
 
Gsp 215 Effective Communication / snaptutorial.com
Gsp 215  Effective Communication / snaptutorial.comGsp 215  Effective Communication / snaptutorial.com
Gsp 215 Effective Communication / snaptutorial.comHarrisGeorg21
 
Gsp 215 Believe Possibilities / snaptutorial.com
Gsp 215  Believe Possibilities / snaptutorial.comGsp 215  Believe Possibilities / snaptutorial.com
Gsp 215 Believe Possibilities / snaptutorial.comStokesCope20
 
Com 135 final project user manual
Com 135 final project user manualCom 135 final project user manual
Com 135 final project user manualbiasimistfur1984
 
Anomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NETAnomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NETMarco Parenzan
 
Gsp 215 Enhance teaching-snaptutorial.com
Gsp 215 Enhance teaching-snaptutorial.comGsp 215 Enhance teaching-snaptutorial.com
Gsp 215 Enhance teaching-snaptutorial.comrobertleew18
 
Course Project Security Analysis and Redesign of a Network Object.docx
Course Project Security Analysis and Redesign of a Network Object.docxCourse Project Security Analysis and Redesign of a Network Object.docx
Course Project Security Analysis and Redesign of a Network Object.docxmarilucorr
 
GSP 215 Effective Communication - tutorialrank.com
GSP 215  Effective Communication - tutorialrank.comGSP 215  Effective Communication - tutorialrank.com
GSP 215 Effective Communication - tutorialrank.comBartholomew35
 
GSP 215 Technology levels--snaptutorial.com
GSP 215 Technology levels--snaptutorial.comGSP 215 Technology levels--snaptutorial.com
GSP 215 Technology levels--snaptutorial.comsholingarjosh136
 

Similar a Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization (20)

Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level Feature
 
Anomaly Detection with Azure and .net
Anomaly Detection with Azure and .netAnomaly Detection with Azure and .net
Anomaly Detection with Azure and .net
 
GSP 215 RANK Education Counseling -- gsp215rank.com
GSP 215 RANK Education Counseling -- gsp215rank.comGSP 215 RANK Education Counseling -- gsp215rank.com
GSP 215 RANK Education Counseling -- gsp215rank.com
 
GSP 215 RANK Education Your Life--gsp215rank.com
GSP 215 RANK Education Your Life--gsp215rank.comGSP 215 RANK Education Your Life--gsp215rank.com
GSP 215 RANK Education Your Life--gsp215rank.com
 
GSP 215 RANK Lessons in Excellence-- gsp215rank.com
GSP 215 RANK Lessons in Excellence-- gsp215rank.comGSP 215 RANK Lessons in Excellence-- gsp215rank.com
GSP 215 RANK Lessons in Excellence-- gsp215rank.com
 
GSP 215 RANK Inspiring Innovation--gsp215rank.com
GSP 215 RANK Inspiring Innovation--gsp215rank.com GSP 215 RANK Inspiring Innovation--gsp215rank.com
GSP 215 RANK Inspiring Innovation--gsp215rank.com
 
GSP 215 RANK Introduction Education--gsp215rank.com
GSP 215 RANK Introduction Education--gsp215rank.comGSP 215 RANK Introduction Education--gsp215rank.com
GSP 215 RANK Introduction Education--gsp215rank.com
 
GSP 215 RANK Education Counseling--gsp215rank.com
 GSP 215 RANK Education Counseling--gsp215rank.com GSP 215 RANK Education Counseling--gsp215rank.com
GSP 215 RANK Education Counseling--gsp215rank.com
 
GSP 215 RANK Education Planning--gsp215rank.com
GSP 215 RANK Education Planning--gsp215rank.comGSP 215 RANK Education Planning--gsp215rank.com
GSP 215 RANK Education Planning--gsp215rank.com
 
Ci25500508
Ci25500508Ci25500508
Ci25500508
 
GSP 215 Enhance teaching/tutorialrank.com
 GSP 215 Enhance teaching/tutorialrank.com GSP 215 Enhance teaching/tutorialrank.com
GSP 215 Enhance teaching/tutorialrank.com
 
GSP 215 Inspiring Innovation/tutorialrank.com
GSP 215 Inspiring Innovation/tutorialrank.comGSP 215 Inspiring Innovation/tutorialrank.com
GSP 215 Inspiring Innovation/tutorialrank.com
 
Gsp 215 Effective Communication / snaptutorial.com
Gsp 215  Effective Communication / snaptutorial.comGsp 215  Effective Communication / snaptutorial.com
Gsp 215 Effective Communication / snaptutorial.com
 
Gsp 215 Believe Possibilities / snaptutorial.com
Gsp 215  Believe Possibilities / snaptutorial.comGsp 215  Believe Possibilities / snaptutorial.com
Gsp 215 Believe Possibilities / snaptutorial.com
 
Com 135 final project user manual
Com 135 final project user manualCom 135 final project user manual
Com 135 final project user manual
 
Anomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NETAnomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NET
 
Gsp 215 Enhance teaching-snaptutorial.com
Gsp 215 Enhance teaching-snaptutorial.comGsp 215 Enhance teaching-snaptutorial.com
Gsp 215 Enhance teaching-snaptutorial.com
 
Course Project Security Analysis and Redesign of a Network Object.docx
Course Project Security Analysis and Redesign of a Network Object.docxCourse Project Security Analysis and Redesign of a Network Object.docx
Course Project Security Analysis and Redesign of a Network Object.docx
 
GSP 215 Effective Communication - tutorialrank.com
GSP 215  Effective Communication - tutorialrank.comGSP 215  Effective Communication - tutorialrank.com
GSP 215 Effective Communication - tutorialrank.com
 
GSP 215 Technology levels--snaptutorial.com
GSP 215 Technology levels--snaptutorial.comGSP 215 Technology levels--snaptutorial.com
GSP 215 Technology levels--snaptutorial.com
 

Más de Intel IT Center

AI Crash Course- Supercomputing
AI Crash Course- SupercomputingAI Crash Course- Supercomputing
AI Crash Course- SupercomputingIntel IT Center
 
FPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraFPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraIntel IT Center
 
High Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationHigh Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationIntel IT Center
 
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsINFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsIntel IT Center
 
Disrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationDisrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationIntel IT Center
 
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Intel IT Center
 
Harness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayHarness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayIntel IT Center
 
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.Intel IT Center
 
Achieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldAchieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldIntel IT Center
 
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel IT Center
 
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...Intel IT Center
 
Identity Protection for the Digital Age
Identity Protection for the Digital AgeIdentity Protection for the Digital Age
Identity Protection for the Digital AgeIntel IT Center
 
Three Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityThree Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityIntel IT Center
 
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Intel IT Center
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel IT Center
 

Más de Intel IT Center (20)

AI Crash Course- Supercomputing
AI Crash Course- SupercomputingAI Crash Course- Supercomputing
AI Crash Course- Supercomputing
 
FPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraFPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsara
 
High Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationHigh Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel Station
 
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsINFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
 
Disrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationDisrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User Authentication
 
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
 
Harness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayHarness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace Today
 
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
 
Achieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldAchieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital World
 
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
 
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
 
Identity Protection for the Digital Age
Identity Protection for the Digital AgeIdentity Protection for the Digital Age
Identity Protection for the Digital Age
 
Three Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityThree Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a Reality
 
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
 
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
 

Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization

  • 1. Florian Wende Zuse-Institute Berlin Connected Component Labeling on Xeon Phi Parallelization & Vectorization
  • 2. wende@zib.de Connected Component Labeling on Xeon Phi 1ISC13, Leipzig Connected Component Labeling Suppose we are given the following image . . .
  • 3. . . . and we are to assign unique labels to different connected regions! Connected Component Labeling wende@zib.de Connected Component Labeling on Xeon Phi 1ISC13, Leipzig
  • 4. . . . and we are to assign unique labels to different connected regions! . . . In parallel?  Computer Vision Detect connected regions in images  Computational Physics Cluster algorithms for the Ising model  Percolation Theory How to achieve the labeling? . . . Connected Component Labeling wende@zib.de Connected Component Labeling on Xeon Phi 2ISC13, Leipzig
  • 5. 1. Labeling algorithm 2. Parallelization a. Parallel implementation on CPU b. Run the CPU code on the Xeon Phi c. Adapt the code for the Xeon Phi 3. Vectorization (SIMD) d. Leave it to the compiler (auto-vectorization) e. SIMD intrinsic functions Xeon Phi: 512-Bit SIMD unit for 16 x 32-bit words Connected Component Labeling - Strategy wende@zib.de Connected Component Labeling on Xeon Phi 3ISC13, Leipzig
  • 6.  Breadth/Depth first search algorithm, multi-pass algorithms  Hoshen-Kopelman algorithm  Cluster self-labeling algorithm by Coddington and Baillie 1. Assign a unique label to each pixel of the image 2. For each pixel consider its adjacent connected pixels in positive 1-, 2-, . . . direction and set the respective labels to the minimum value each 3. If for all pixels the minimum operation is the identity function: Finished! Otherwise: Continue with step 2 CPU: Hoshen-Kopelman Xeon Phi: Hoshen-Kopelman vs. Cluster self-labeling Connected Component Labeling - Algorithm wende@zib.de Connected Component Labeling on Xeon Phi 4ISC13, Leipzig
  • 7. Partition the image into equal-sized sub-images, and label them independently using multiple threads Connected Comp. Labeling - Parallelization wende@zib.de Connected Component Labeling on Xeon Phi 5ISC13, Leipzig
  • 8. Partition the image into equal-sized sub-images, and label them independently using multiple threads  Unique labels across different sub-images  Connected regions that extend over multiple sub- images are merged after the labeling using atomic primitives Thread 0 Thread 2 Thread 4 Thread 6 Thread 1 Thread 3 Thread 5 Thread 7 Connected Comp. Labeling - Parallelization wende@zib.de Connected Component Labeling on Xeon Phi 5ISC13, Leipzig
  • 9. Example: Self-labeling within sub-image of thread 2  Process multiple data simultaneously using SIMD instructions Connected Comp. Labeling - Vectorization wende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
  • 10.  Process multiple data simultaneously using SIMD instructions 1. Initialize labeling (array index) Example: Self-labeling within sub-image of thread 2 Connected Comp. Labeling - Vectorization wende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
  • 11. 1. Initialize labeling (array index) 2. Load row[0] into reg0, and create mask for adjacent entries in positive 1-direction: 1 if equal-colored 0 otherwise Example: Self-labeling within sub-image of thread 2  Process multiple data simultaneously using SIMD instructions 1-direction Connected Comp. Labeling - Vectorization wende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
  • 12. 1. Initialize labeling (array index) 2. Load row[0] into reg0, and create mask for adjacent entries in positive 1-direction: 1 if equal-colored 0 otherwise 3. Overlap each element in reg0 with its adjacent element in positive 1-direction, and write the result to reg1 Example: Self-labeling within sub-image of thread 2  Process multiple data simultaneously using SIMD instructions Connected Comp. Labeling - Vectorization wende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
  • 13. 4. Determine the pairwise minimum of the entries in reg0 and reg1 using the mask, and write the result to reg1 Connected Comp. Labeling - Vectorization wende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
  • 14. 4. Determine the pairwise minimum of the entries in reg0 and reg1 using the mask, and write the result to reg1 5. Write back entries in reg1 to row[0] using the mask Connected Comp. Labeling - Vectorization wende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
  • 15. 4. Determine the pairwise minimum of the entries in reg0 and reg1 using the mask, and write the result to reg1 5. Write back entries in reg1 to row[0] using the mask 6. Shift all elements in reg1 one position in positive 1-direction, shifting in the 0-th element, and write the result to reg1 Connected Comp. Labeling - Vectorization wende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
  • 16. 4. Determine the pairwise minimum of the entries in reg0 and reg1 using the mask, and write the result to reg1 5. Write back entries in reg1 to row[0] using the mask 6. Shift all elements in reg1 one position in positive 1-direction, shifting in the 0-th element, and write the result to reg1 7. Shift all bits in mask one position up, and write the pairwise minimum entries in row[0] and reg1 to row[0] using the shifted mask Connected Comp. Labeling - Vectorization wende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
  • 17. 4. Determine the pairwise minimum of the entries in reg0 and reg1 using the mask, and write the result to reg1 5. Write back entries in reg1 to row[0] using the mask 6. Shift all elements in reg1 one position in positive 1-direction, shifting in the 0-th element, and write the result to reg1 7. Shift all bits in mask one position up, and write the pairwise minimum entries in row[0] and reg1 to row[0] using the shifted mask 8. Did labels change? Connected Comp. Labeling - Vectorization wende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
  • 18. Result of the operations up to now . . . Set adjacent connected elements in row[0] to the pairwise minimum value each Before After Repeat the procedure for the 2-direction. 1-direction 2-direction Connected Comp. Labeling - Vectorization wende@zib.de Connected Component Labeling on Xeon Phi 7ISC13, Leipzig
  • 19. Repeat the procedure for all other rows as long as labels change . . . Before After Now: Merge labels across different sub-images using atomics! Finished! Connected Comp. Labeling - Vectorization wende@zib.de Connected Component Labeling on Xeon Phi 8ISC13, Leipzig
  • 20. CPU: Xeon E5-2670, 8 Cores + 2-way Hyper-Threading @ 2.6GHz  Hoshen-Kopelman algorithm + Atomics for label merging  Vectorization was left to the compiler: there are no masked SIMD intrinsics! Xeon Phi: 60 Cores + 4-way Hyper-Threading @ 1.1GHz  Hoshen-Kopelman vs. Cluster self-labeling + Atomics for label merging  Vectorization by means of _mm512_[mask]_XXX() instrinsics Parallelization by means of OpenMP: #pragma omp parallel {...} Programming effort: approx. 2-3 days for the CPU code (incl. optimization) less than 1 day for the Xeon Phi code (based on CPU code) Connected Comp. Labeling - Benchmark wende@zib.de Connected Component Labeling on Xeon Phi 9ISC13, Leipzig
  • 21. CPU: Intel Xeon E5-2670, 8 Cores + 2-way Hyper-Threading @ 2.6GHz Xeon Phi: 60 Cores + 4-way Hyper-Threading @ 1.1GHz Application: Swendsen-Wang cluster algorithm for the 2D Ising model Connected Comp. Labeling - Benchmark wende@zib.de Connected Component Labeling on Xeon Phi 10ISC13, Leipzig
  • 22. CPU: Intel Xeon E5-2670, 8 Cores + 2-way Hyper-Threading @ 2.6GHz Xeon Phi: 60 Cores + 4-way Hyper-Threading @ 1.1GHz Application: Swendsen-Wang cluster algorithm for the 2D Ising model Connected Comp. Labeling - Benchmark wende@zib.de Connected Component Labeling on Xeon Phi 10ISC13, Leipzig
  • 23. Work partially funded by BMBF Grant No. 01IH11004G Dr. Thomas Steinke Zuse-Institute Berlin (ZIB) Dr. Michael Klemm Intel GmbH, Germany Acknowledgement wende@zib.de Connected Component Labeling on Xeon Phi 11ISC13, Leipzig
  • 24. [1] C. F. Baillie and P. D. Coddington. Cluster Identification Algorithms for Spin Models – Sequential and Parallel, 1991. [2] Hoshen, J. and Kopelman, R. Percolation and Cluster Distribution. I. Cluster Multiple Labeling Technique and Critical Concentration Algorithm. Phys. Rev. B 14, 3438–3445, 1976 [3] R. H. Swendsen and J.-S. Wang. Nonuniversal Critical Dynamics in Monte Carlo Simulations. Phys. Rev. Lett., 58:86–88, Jan 1987. [4] Intel Corp. Intel Xeon Phi Coprocessor 5110P, Product Brief, 2012. References wende@zib.de Connected Component Labeling on Xeon Phi 12ISC13, Leipzig