SlideShare una empresa de Scribd logo
1 de 20
Descargar para leer sin conexión
Implementing 3D SPHARM Surfaces
   Registration on Cell Processor

 Huian Li (huili@indiana.edu)                Mi Yan (miyan@us.ibm.com)
  Robert Henschel (rhensche@indiana edu)
                   (rhensche@indiana.edu)    Li Shen (shenli@iupui edu)
                                                     (shenli@iupui.edu)



                             July 29, 2009
Contents
•   SPHARM registration
•   Matlab implementation
•   Cell implementation
•   Performance Analysis
•   Conclusion
SPHARM Surfaces
 • R di l and stellar surfaces
   Radial d t ll         f
 • Simply connected, arbitrarily shaped
 • Vision, graphics, imaging, bioinformatics
SPHARM Expansion




             ( )  (x y z)
             (,)  (x,y,z)
             ( )
             (,)   (x,y,z)
                     (     )
              Area-preserving
                 mapping
SHREC




   (a) template, (b) object, (c) after ICP, (d) after
   registration of p
     g             parameterization
Calculation of coefficients
• After rotating the parameter net on the surface in
  Euler angles (α, β, γ), new coefficients will be:
                                               l
             c (  ) 
                  m
                   l                        
                                            nl
                                                    D    l
                                                         mn     (  ) c        l
                                                                                  n



   where
                                                       min( l  n ,l  m )
                 D mn ( )  e (  i m  in ) (
                   l
                                                              (  1) t d mnt (  ))
                                                      t  max( 0 , n  m )
                                                                          l



   and

                     (l  n)!(l  n)!(l  m)!(l  m)!                                
 d mnt (  ) 
   l
                                                           (cos ) ( 2l nm2t ) (sin ) ( 2t mn )
                 (l  n  t )!(l  m  t )!(t  m  n)!t!       2                     2
RMSD
• RMSD (Root Mean Square Distance): distance
  between two SPHARM models

                           L max   l
                       1
       RMSD       
                      4
                            
                           l0 m l
                                       || c 1ml  c 2 , l || 2
                                             ,
                                                    m




            m              m
       c    and c
           1 ,l            2 ,l    are coefficients of two
       SPHARM models
Matlab implementation
• A straightforward implementation in Matlab:

     for l = 0 Lmax
              0,
       for m = -l, l
          for n = -l, l
                   l
             for t = max(0, n-m), min(l+m, l-n)
              ... performing calculations ...

• One rotation for Lmax = 50 took 823 seconds on 2GHz quad
                                                      quad-
  core Intel Xeon E5335
Cell B.E.
Cell implementation
• Domain decomposition:
     for l = 0, Lmax
       for m = -l l
                 l,
          for n = -l, l
             for t = max(0 n-m) min(l+m l-n)
                     max(0, n m), min(l+m, l n)
              ... calculations ...

• Decomposition along l leads to work load
  imbalance among SPUs

 • Decomposition along m creates unnecessary data
        p            g                     y
   communication
Cell implementation
• Loop fusion:
    for l = 0, Lmax
      for m = -l l
                l,
         for n = -l, l
            for t = max(0 n-m) min(l+m l-n)
                    max(0, n m), min(l+m, l n)
             ... calculations ...
• Unique index for combined loop:
    f(l, m) = l2 + m + l
• W kl d f each SPE :
  Workload for     h
    (Lmax + 1)2/(total # of SPEs)
Cell implementation
• Lookup table T for factorial
• Transform exponentials & multiplications into
  multiplications & additions respectively
                    additions, respectively.
                     (l  n)!(l  n)!(l  m)!(l  m)!                                
d   l
          ( )                                            (cos ) ( 2l nm2t ) (sin ) ( 2t mn )
                 (l  n  t )!(l  m  t )!(t  m  n)!t!
    mnt
                                                                2                     2

               exp(
              1
                 (T (l  n )  T (l  n )  T (l  m )  T (l  m ))
              2
               T (l  n  t )  T (l  m  t )  T (t  m  n )  T (t )
                                                                                      
               ( 2l  n  m  2t )  log(cos           )  ( 2t  m  n )  log(sin       ))
                                                    2                                  2
Cell implementation
• Others that specific to Cell:
    • Vectorization & data alignment
    • DMA data transfer between main memory &
      local store
    • SPU d decrementert
Cell implementation
• Single p
     g precision vs. double p
                            precision: all data in single p
                                                      g precision
Cell implementation
• Single p
     g precision vs. double p
                            precision: p
                                       partial data in double p
                                                              precision
Cell implementation
• Single p
     g precision vs. double p
                            precision: all critical data in double p
                                                                   precision
Performance analysis
                      Performance of one rotation on Cell BE

                      1.8
                      18
                      1.6
                      1.4
                 s)
     Time (seconds



                      1.2
                        1
                      0.8
                      0.6
                      0.4
                      04
     T




                      0.2
                        0
                             1       2         4          8   16
                                         Number of SPEs
Performance analysis
                        Performance of finding the shortest
                          distance at Level 3 on Cell BE
                      7000

                      6000

                      5000
                 s)
           seconds




                      4000
     Time (s




                      3000                                    GNU gcc
                                                              IBM xlc
                      2000

                      1000

                         0
                             4       8       12     16
                                   Number of SPEs
Conclusion
• Performance increases dramatically on Cell due to
  its unique architecture and algorithm optimization.
• Carefulness must be taken for data placement due
  to limited local store.
• Carefulness must also be taken for data transfer
  between local store and main memory.
The End




          Questions?

Más contenido relacionado

La actualidad más candente

EAGE Amsterdam 2014
EAGE Amsterdam 2014EAGE Amsterdam 2014
EAGE Amsterdam 2014
wsspsoft
 
Modeling and Verification of Cyber Physical Systems
Modeling and Verification of Cyber Physical SystemsModeling and Verification of Cyber Physical Systems
Modeling and Verification of Cyber Physical Systems
cpsworkshop
 
Signal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier TransformsSignal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier Transforms
Arvind Devaraj
 
19. algorithms and-complexity
19. algorithms and-complexity19. algorithms and-complexity
19. algorithms and-complexity
ashishtinku
 
Math cad fourier analysis (jcb-edited)
Math cad   fourier analysis (jcb-edited)Math cad   fourier analysis (jcb-edited)
Math cad fourier analysis (jcb-edited)
Julio Banks
 
Fourier transformation
Fourier transformationFourier transformation
Fourier transformation
zertux
 

La actualidad más candente (20)

DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and SystemsDSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
 
EAGE Amsterdam 2014
EAGE Amsterdam 2014EAGE Amsterdam 2014
EAGE Amsterdam 2014
 
SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...SchNet: A continuous-filter convolutional neural network for modeling quantum...
SchNet: A continuous-filter convolutional neural network for modeling quantum...
 
LN s05-machine vision-s2
LN s05-machine vision-s2LN s05-machine vision-s2
LN s05-machine vision-s2
 
Presentation of SMC^2 at BISP7
Presentation of SMC^2 at BISP7Presentation of SMC^2 at BISP7
Presentation of SMC^2 at BISP7
 
Presentation MCB seminar 09032011
Presentation MCB seminar 09032011Presentation MCB seminar 09032011
Presentation MCB seminar 09032011
 
Jokyokai2
Jokyokai2Jokyokai2
Jokyokai2
 
Camera parameters
Camera parametersCamera parameters
Camera parameters
 
Modèle de coordination du groupe de robots mobiles
Modèle de coordination du groupe de robots mobilesModèle de coordination du groupe de robots mobiles
Modèle de coordination du groupe de robots mobiles
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Modeling and Verification of Cyber Physical Systems
Modeling and Verification of Cyber Physical SystemsModeling and Verification of Cyber Physical Systems
Modeling and Verification of Cyber Physical Systems
 
Signal Prosessing Lab Mannual
Signal Prosessing Lab Mannual Signal Prosessing Lab Mannual
Signal Prosessing Lab Mannual
 
Signal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier TransformsSignal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier Transforms
 
19. algorithms and-complexity
19. algorithms and-complexity19. algorithms and-complexity
19. algorithms and-complexity
 
Math cad fourier analysis (jcb-edited)
Math cad   fourier analysis (jcb-edited)Math cad   fourier analysis (jcb-edited)
Math cad fourier analysis (jcb-edited)
 
Matt Purkeypile's Doctoral Dissertation Defense Slides
Matt Purkeypile's Doctoral Dissertation Defense SlidesMatt Purkeypile's Doctoral Dissertation Defense Slides
Matt Purkeypile's Doctoral Dissertation Defense Slides
 
Radix-2 DIT FFT
Radix-2 DIT FFT Radix-2 DIT FFT
Radix-2 DIT FFT
 
Self-organized criticality
Self-organized criticalitySelf-organized criticality
Self-organized criticality
 
Chapter6 sampling
Chapter6 samplingChapter6 sampling
Chapter6 sampling
 
Fourier transformation
Fourier transformationFourier transformation
Fourier transformation
 

Destacado

Entrepreneurial goverance in het MKB. Bijdrage aan het Jaarboek Corporate Gov...
Entrepreneurial goverance in het MKB. Bijdrage aan het Jaarboek Corporate Gov...Entrepreneurial goverance in het MKB. Bijdrage aan het Jaarboek Corporate Gov...
Entrepreneurial goverance in het MKB. Bijdrage aan het Jaarboek Corporate Gov...
Karin Kleingeld
 
Risk Management Webinar
Risk Management WebinarRisk Management Webinar
Risk Management Webinar
janemangat
 
BORIS in action
BORIS in actionBORIS in action
BORIS in action
boris_vhc
 
2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement
PTIHPA
 

Destacado (6)

Entrepreneurial goverance in het MKB. Bijdrage aan het Jaarboek Corporate Gov...
Entrepreneurial goverance in het MKB. Bijdrage aan het Jaarboek Corporate Gov...Entrepreneurial goverance in het MKB. Bijdrage aan het Jaarboek Corporate Gov...
Entrepreneurial goverance in het MKB. Bijdrage aan het Jaarboek Corporate Gov...
 
Goed bestuur bij overdracht in familiebedrijven: loslaten en oppakken
Goed bestuur bij overdracht in familiebedrijven: loslaten en oppakkenGoed bestuur bij overdracht in familiebedrijven: loslaten en oppakken
Goed bestuur bij overdracht in familiebedrijven: loslaten en oppakken
 
Risk Management Webinar
Risk Management WebinarRisk Management Webinar
Risk Management Webinar
 
Community
CommunityCommunity
Community
 
BORIS in action
BORIS in actionBORIS in action
BORIS in action
 
2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement2010 02 instrumentation_and_runtime_measurement
2010 02 instrumentation_and_runtime_measurement
 

Similar a Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor

Passive network-redesign-ntua
Passive network-redesign-ntuaPassive network-redesign-ntua
Passive network-redesign-ntua
IEEE NTUA SB
 
IGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdfIGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdf
grssieee
 
Sampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxSampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptx
HamzaJaved306957
 
Paper computer
Paper computerPaper computer
Paper computer
bikram ...
 
Paper computer
Paper computerPaper computer
Paper computer
bikram ...
 

Similar a Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor (20)

Er24902905
Er24902905Er24902905
Er24902905
 
Passive network-redesign-ntua
Passive network-redesign-ntuaPassive network-redesign-ntua
Passive network-redesign-ntua
 
Network Bandwidth Allocation.ppt
Network Bandwidth Allocation.pptNetwork Bandwidth Allocation.ppt
Network Bandwidth Allocation.ppt
 
Dsp manual print
Dsp manual printDsp manual print
Dsp manual print
 
IGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdfIGARSS2011 FR3.T08.3 BenDavid.pdf
IGARSS2011 FR3.T08.3 BenDavid.pdf
 
Multi-Objective Optimization Algorithms for Finite Element Model Updating. Nt...
Multi-Objective Optimization Algorithms for Finite Element Model Updating. Nt...Multi-Objective Optimization Algorithms for Finite Element Model Updating. Nt...
Multi-Objective Optimization Algorithms for Finite Element Model Updating. Nt...
 
Live model transformations driven by incremental pattern matching
Live model transformations driven by incremental pattern matchingLive model transformations driven by incremental pattern matching
Live model transformations driven by incremental pattern matching
 
Predicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman networkPredicting organic reaction outcomes with weisfeiler lehman network
Predicting organic reaction outcomes with weisfeiler lehman network
 
MSc Presentation
MSc PresentationMSc Presentation
MSc Presentation
 
1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx1_Asymptotic_Notation_pptx.pptx
1_Asymptotic_Notation_pptx.pptx
 
Color Img at Prisma Network meeting 2009
Color Img at Prisma Network meeting 2009Color Img at Prisma Network meeting 2009
Color Img at Prisma Network meeting 2009
 
Sampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxSampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptx
 
New Mathematical Tools for the Financial Sector
New Mathematical Tools for the Financial SectorNew Mathematical Tools for the Financial Sector
New Mathematical Tools for the Financial Sector
 
Paper computer
Paper computerPaper computer
Paper computer
 
Paper computer
Paper computerPaper computer
Paper computer
 
El text.tokuron a(2019).jung190711
El text.tokuron a(2019).jung190711El text.tokuron a(2019).jung190711
El text.tokuron a(2019).jung190711
 
From Biological to Smart CMOS Imaging: Architectural approach
From Biological to Smart CMOS Imaging: Architectural approachFrom Biological to Smart CMOS Imaging: Architectural approach
From Biological to Smart CMOS Imaging: Architectural approach
 
Benchmark Calculations of Atomic Data for Modelling Applications
 Benchmark Calculations of Atomic Data for Modelling Applications Benchmark Calculations of Atomic Data for Modelling Applications
Benchmark Calculations of Atomic Data for Modelling Applications
 
Nonlinear Stochastic Programming by the Monte-Carlo method
Nonlinear Stochastic Programming by the Monte-Carlo methodNonlinear Stochastic Programming by the Monte-Carlo method
Nonlinear Stochastic Programming by the Monte-Carlo method
 
Foss4g2009tokyo Realini Go Gps
Foss4g2009tokyo Realini Go GpsFoss4g2009tokyo Realini Go Gps
Foss4g2009tokyo Realini Go Gps
 

Más de PTIHPA

Github:fi Presentation
Github:fi PresentationGithub:fi Presentation
Github:fi Presentation
PTIHPA
 
2010 05 hands_on
2010 05 hands_on2010 05 hands_on
2010 05 hands_on
PTIHPA
 
Trace Visualization
Trace VisualizationTrace Visualization
Trace Visualization
PTIHPA
 
2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configuration2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configuration
PTIHPA
 
2010 03 papi_indiana
2010 03 papi_indiana2010 03 papi_indiana
2010 03 papi_indiana
PTIHPA
 
Overview: Event Based Program Analysis
Overview: Event Based Program AnalysisOverview: Event Based Program Analysis
Overview: Event Based Program Analysis
PTIHPA
 
Switc Hpa
Switc HpaSwitc Hpa
Switc Hpa
PTIHPA
 
Statewide It Robert Henschel
Statewide It Robert HenschelStatewide It Robert Henschel
Statewide It Robert Henschel
PTIHPA
 
3 Vampir Trace In Detail
3 Vampir Trace In Detail3 Vampir Trace In Detail
3 Vampir Trace In Detail
PTIHPA
 
5 Vampir Configuration At IU
5 Vampir Configuration At IU5 Vampir Configuration At IU
5 Vampir Configuration At IU
PTIHPA
 
2 Vampir Trace Visualization
2 Vampir Trace Visualization2 Vampir Trace Visualization
2 Vampir Trace Visualization
PTIHPA
 
1 Vampir Overview
1 Vampir Overview1 Vampir Overview
1 Vampir Overview
PTIHPA
 
4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir Usage4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir Usage
PTIHPA
 
GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...
PTIHPA
 
Big Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing WorkshopBig Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing Workshop
PTIHPA
 

Más de PTIHPA (15)

Github:fi Presentation
Github:fi PresentationGithub:fi Presentation
Github:fi Presentation
 
2010 05 hands_on
2010 05 hands_on2010 05 hands_on
2010 05 hands_on
 
Trace Visualization
Trace VisualizationTrace Visualization
Trace Visualization
 
2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configuration2010 vampir workshop_iu_configuration
2010 vampir workshop_iu_configuration
 
2010 03 papi_indiana
2010 03 papi_indiana2010 03 papi_indiana
2010 03 papi_indiana
 
Overview: Event Based Program Analysis
Overview: Event Based Program AnalysisOverview: Event Based Program Analysis
Overview: Event Based Program Analysis
 
Switc Hpa
Switc HpaSwitc Hpa
Switc Hpa
 
Statewide It Robert Henschel
Statewide It Robert HenschelStatewide It Robert Henschel
Statewide It Robert Henschel
 
3 Vampir Trace In Detail
3 Vampir Trace In Detail3 Vampir Trace In Detail
3 Vampir Trace In Detail
 
5 Vampir Configuration At IU
5 Vampir Configuration At IU5 Vampir Configuration At IU
5 Vampir Configuration At IU
 
2 Vampir Trace Visualization
2 Vampir Trace Visualization2 Vampir Trace Visualization
2 Vampir Trace Visualization
 
1 Vampir Overview
1 Vampir Overview1 Vampir Overview
1 Vampir Overview
 
4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir Usage4 HPA Examples Of Vampir Usage
4 HPA Examples Of Vampir Usage
 
GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...GeneIndex: an open source parallel program for enumerating and locating words...
GeneIndex: an open source parallel program for enumerating and locating words...
 
Big Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing WorkshopBig Iron and Parallel Processing, USArray Data Processing Workshop
Big Iron and Parallel Processing, USArray Data Processing Workshop
 

Último

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 

Implementing 3D SPHARM Surfaces Registration on Cell B.E. Processor

  • 1. Implementing 3D SPHARM Surfaces Registration on Cell Processor Huian Li (huili@indiana.edu) Mi Yan (miyan@us.ibm.com) Robert Henschel (rhensche@indiana edu) (rhensche@indiana.edu) Li Shen (shenli@iupui edu) (shenli@iupui.edu) July 29, 2009
  • 2. Contents • SPHARM registration • Matlab implementation • Cell implementation • Performance Analysis • Conclusion
  • 3. SPHARM Surfaces • R di l and stellar surfaces Radial d t ll f • Simply connected, arbitrarily shaped • Vision, graphics, imaging, bioinformatics
  • 4. SPHARM Expansion ( )  (x y z) (,)  (x,y,z) ( ) (,) (x,y,z) ( ) Area-preserving mapping
  • 5. SHREC (a) template, (b) object, (c) after ICP, (d) after registration of p g parameterization
  • 6. Calculation of coefficients • After rotating the parameter net on the surface in Euler angles (α, β, γ), new coefficients will be: l c (  )  m l  nl D l mn (  ) c l n where min( l  n ,l  m ) D mn ( )  e (  i m  in ) ( l  (  1) t d mnt (  )) t  max( 0 , n  m ) l and (l  n)!(l  n)!(l  m)!(l  m)!   d mnt (  )  l  (cos ) ( 2l nm2t ) (sin ) ( 2t mn ) (l  n  t )!(l  m  t )!(t  m  n)!t! 2 2
  • 7. RMSD • RMSD (Root Mean Square Distance): distance between two SPHARM models L max l 1 RMSD  4   l0 m l || c 1ml  c 2 , l || 2 , m m m c and c 1 ,l 2 ,l are coefficients of two SPHARM models
  • 8. Matlab implementation • A straightforward implementation in Matlab: for l = 0 Lmax 0, for m = -l, l for n = -l, l l for t = max(0, n-m), min(l+m, l-n) ... performing calculations ... • One rotation for Lmax = 50 took 823 seconds on 2GHz quad quad- core Intel Xeon E5335
  • 10. Cell implementation • Domain decomposition: for l = 0, Lmax for m = -l l l, for n = -l, l for t = max(0 n-m) min(l+m l-n) max(0, n m), min(l+m, l n) ... calculations ... • Decomposition along l leads to work load imbalance among SPUs • Decomposition along m creates unnecessary data p g y communication
  • 11. Cell implementation • Loop fusion: for l = 0, Lmax for m = -l l l, for n = -l, l for t = max(0 n-m) min(l+m l-n) max(0, n m), min(l+m, l n) ... calculations ... • Unique index for combined loop: f(l, m) = l2 + m + l • W kl d f each SPE : Workload for h (Lmax + 1)2/(total # of SPEs)
  • 12. Cell implementation • Lookup table T for factorial • Transform exponentials & multiplications into multiplications & additions respectively additions, respectively. (l  n)!(l  n)!(l  m)!(l  m)!   d l ( )   (cos ) ( 2l nm2t ) (sin ) ( 2t mn ) (l  n  t )!(l  m  t )!(t  m  n)!t! mnt 2 2  exp( 1  (T (l  n )  T (l  n )  T (l  m )  T (l  m )) 2  T (l  n  t )  T (l  m  t )  T (t  m  n )  T (t )    ( 2l  n  m  2t )  log(cos )  ( 2t  m  n )  log(sin )) 2 2
  • 13. Cell implementation • Others that specific to Cell: • Vectorization & data alignment • DMA data transfer between main memory & local store • SPU d decrementert
  • 14. Cell implementation • Single p g precision vs. double p precision: all data in single p g precision
  • 15. Cell implementation • Single p g precision vs. double p precision: p partial data in double p precision
  • 16. Cell implementation • Single p g precision vs. double p precision: all critical data in double p precision
  • 17. Performance analysis Performance of one rotation on Cell BE 1.8 18 1.6 1.4 s) Time (seconds 1.2 1 0.8 0.6 0.4 04 T 0.2 0 1 2 4 8 16 Number of SPEs
  • 18. Performance analysis Performance of finding the shortest distance at Level 3 on Cell BE 7000 6000 5000 s) seconds 4000 Time (s 3000 GNU gcc IBM xlc 2000 1000 0 4 8 12 16 Number of SPEs
  • 19. Conclusion • Performance increases dramatically on Cell due to its unique architecture and algorithm optimization. • Carefulness must be taken for data placement due to limited local store. • Carefulness must also be taken for data transfer between local store and main memory.
  • 20. The End Questions?