1) Phase unwrapping is used in optical quadrature microscopy to determine viability of embryos by counting cells after unwrapping. It needs to be done at near real-time speeds to analyze sample changes.
2) The paper implements minimum LP norm phase unwrapping and affine transformations on a GPU to improve performance and latency for optical microscopy research.
3) Performance results show a 5.24x speedup for total phase unwrapping time compared to a serial CPU implementation. Further optimizations like multi-GPU support could improve speeds for higher image acquisition rates.
Accelarating Optical Quadrature Microscopy Using GPUs
1. Phase Unwrapping and Affine
Transformations using CUDA
Perhaad Mistry, Sherman Braganza,
David Kaeli, Miriam Leeser
ECE Department
Northeastern University
Boston, MA
4. Motivation
Phase Unwrapping used for research in In-Vitro
Fertilization (IVF) in Optical Quadrature Microscopy
Cell counts obtained after unwrapping determine viability
of embryos
Results required at close to real time to see changes in
sample
Improve cost-effectiveness of OQM and quality of
research
4
6. OQM Imaging
Pattern determined by phase
Camera 0
PBS
difference between laser beams
Camera 1
Camera 3
Mirror Phase calculated using
magnitudes captured from four
cameras
NPBS Camera 2
Mixed (M), signal only(S),
REF SIG
reference(R) and dark (D) images
Phase difference measured by
Sample
angle(E) ranges from –π to π
1 n=3 n M n − S n − Rn
E = ∑i *
4 n=0 Rn
From Mirror
Laser
angle(E) yields phase that is
required for unwrapping
Optical Quadrature Microscopy Layout
6
7. Affine Transformation
Align images to fix sample’s
position in different images Phase
Affine
Microscope’s
Unwrapping
Transforms
Frame
2x2 and 2x1 matrices Grabber
obtained from microscope
Image pixels divided among Algorithm for Affine Transform [1]
blocks of threads (x,y) = f(BlockId, ThreadID)
Each thread calculates one Load Data = Image[x][y]
pixel Includes - Rotation (θ), Scaling (s), Shearing (sh)
⎛ x' ⎞ ⎛ s(x)*cos(θ) sh(y)*sin(θ) ⎞ ⎛ x ⎞ ⎛ a ⎞
Implemented using CUDA ⎜ ⎟=⎜ ⎟⎜ ⎟+⎜ ⎟
y' ⎠ ⎝ -sh(x)*sin(θ) s(y)*cos(θ) ⎠ ⎝ y ⎠ ⎝ b ⎠
⎝
since result needed for Phase Store Image[x'][y'] = Data
Unwrapping (Uncoalesced W rites)
[1] A. K. Jain, Fundamentals of digital
image processing, Prentice Hall
7
8. Noise Removal & Phase Information
Separate “noise” only image available (Dark Image)
Dark voltage image subtracted from the mixed, signal
and reference images
Subtraction of signal image (Sn) and reference image
(Rn) due to fixed pattern levels of individual cameras
Mn = Mn − Dn and so on for Sn and Rn
1 n=3 n Mn − Sn − Rn
E = ∑i * n: camera number
4 n =0 Rn
phase = angle(E )
Square root of Rn to balance intensities for detection
8
9. Example Images for Affine Transform
Mixed Image Signal only Image
After Affine
Transform and
Noise Removal
9
Dark Image
Reference Image
10. Phase Unwrapping
Frame Phase
Affine
Grabber Unwrapping
Transforms
Optical phase difference from
interferometric image is shown
Parameter of interest is Optical
path difference which can be more
than π
Unwrapping along a line:
Sum till a gradient of π then, add
2π
In 2D if noise is present
Path dependencies while
summing
Wrapped Image
Causes accumulation errors (note range of values) 10
11. MATLAB’s Unwrap v/s Minimum LP Norm
Unwrap Using Minimum Lp
Unwrap Using MATLAB’s
unwrap() function Norm Algorithm [1]
11
[1] Dennis C. Ghiglia, Mark D. Pritt: Two-Dimensional Phase
Unwrapping: Theory, Algorithms, and Software
12. P
Minimum L Norm Algorithm
Minimize difference = (φi +1, j − φi , j )Ui , j + (φi +1, j − φi , j )Vi , j
ci , j
between gradients of −(φi , j − φi −1, j )Ui −1, j − (φi , j − φi , j −1 )Vi , j
wrapped and unwrapped
= ∆ ix, j U (i , j ) − ∆ ix−1, j U (i − 1, j ) +
ci , j
data [1]
∆ iy, jV (i , j ) − ∆ iy−1, jV (i − 1 j )
,
Nonlinear PDE since U and
where
V are data dependant
φx ,y is the unwrapped phase at (x,y)
weights
∆ x ,y and ∆ y ,y denote the wrapped phase in the
PDE solved iteratively x x
x and y direction respectively
using the Preconditioned
Conjugate Gradient Method Can be expressed as Qφ = c
[1] Dennis C. Ghiglia, Mark D. Pritt. Two-
Dimensional Phase Unwrapping: Theory, 12
Algorithms, and Software, Wiley New York
13. Preconditioned Conjugate Gradient
Solve un-weighted least ε 2 = M −2 N−2 (φ − φ − ∆x )2
∑ ∑ i +1, j i , j i , j
square phase unwrapping i =0 j =0
problem M − 2 N −2
+ ∑ ∑ (φi +1, j − φi , j − ∆iy, j )2
Minimize ε2 i =0 j =0
where φi , j is the wrapped phase at (i,j) and
Preconditioning using a
DCT needed ∆ix, j is the unwrapped phase at (i,j)
After discretizing the equation
Conjugate gradient
and taking the 2D Fourier Transform
method called after DCT
Ρ m,n
preconditioning Φ=
2cos(π m ) + 2cos(π n ) − 4
m,n
M N
where Φ and Ρ are the Fourier Transformed
versions of φ and ∆ respectively
13
14. Better DCT Implementation
Implementing N point DCT using DFT uses 2N point DFT
Another method seen in [1]
Rearrange x(n) into v(n) such that
⎡ N − 1⎤
v ( n ) = x (2 n ) 0≤n≤⎢ ⎥
⎣2⎦
⎡ N + 1⎤
= x (2 N − 2 n − 1) ⎢ 2 ⎥ ≤ n ≤ N −1
⎣ ⎦
W4kN * Real(v(n)))
DCT(x(n)) = 2*DFT(
For 2D DCT: 4x less work since N*N instead of 2N*2N
PCG is ~ 90% of LP Norm Execution time
Shuffle kernel before CUFFT call
[1] Makhoul J. A fast cosine transform
in one and two dimensions. 14
15. Minimum L Norm Algorithm
P
for k ← 0 to LP Norm Count
Calculate Data Derived Weights Preconditioning
for j ← 0 to PCG Iteration Count before CG
DCT To DFT Steps (Shuffle Kernel)
Execute CUFFT Call
Point-Wise Complex Multiplication to Get DCT Result
PCG
Scaling Step Algorithm
Execute CUFFT Call to do iDCT
Conjugate Gradient (Point Wise Matrix Operations)
end for
if Convergence exit
end for
15
16. Example Images – Phase Unwrapping
Wrapped Phase Data Unwrapped Phase Data
Images of Glass Bead and water
16
17. Example Images – Mouse Embryo
Wrapped Phase Data Unwrapped Phase Data
17
18. MATLAB External Interface (MEX)
MEX produces linked Frame Grabber
objects from C code Image
Acquisition Tool
Box
MATLAB present in our
system due to the frame
MATLAB Legacy
grabbers MATLAB Legacy
Interfaces
Interface
Gateway function in C
Yes
makes MATLAB data
(mxArray) visible to our Affine Transform Phase
Wrapper(C) Using Unwrapping
Save
linked C code MEX Wrapper (C)
Affine ?
Using MEX
No
Affine Transform Phase
& Noise Removal Unwrapping
GPU GPU code
Code(CUDA) (CUDA)
18
OQM Processing Control Flow
19. Performance (Phase Unwrapping)
Baseline GPU MEX and GPU Mex & GPU
GPU Time
Time Speedup Time(sec) Speedup
Preconditioning 11.17 1.2 9.3X 1.2 9.3X
Conjugate Grad. 2.89 0.55 5.25X 0.55 5.25X
IO Activity 0.7 0.7 1X 0.02 NA
Miscellaneous 0.79 0.51 1.53X 0.41 1.92X
7.20X
Total for Unwrap 15.55 2.97 5.24X 2.16
Baseline : Serial implementation using
FFTW for the Fourier Transform
19
21. Future Work
Study implementation of Conjugate Gradient
Study scatter operations in CUDA
Present literature shows improvements only for larger data
sizes (may be better on G200)
Multi-Gpu version for imaging scenarios where images
may be grabbed at faster rates
Presently only one image at a time acquired.
Phase Unwrapping also required for applications like
Synthetic Aperture Radar (SAR)
21
22. Conclusions
Implemented the computationally intensive Minimum Lp
Norm Phase Unwrapping and Affine Transforms on the
GPU
Not a batch-oriented process
Latency was critical, single image used at a time
Reducing latency throughout the system improves quality
of research and time to discovery in Optical Science
Laboratory
22
23. Thank You
Questions?
Perhaad Mistry
(pmistry@ece.neu.edu)