SlideShare una empresa de Scribd logo
1 de 4
CUDA Speed Up for Side Information Generation In
Distributed Video Coding
Ping-Shang Wang
National Taiwan University
R98944043@ntu.edu.tw
Kan-Han Lu
National Taiwan Normal University
698470271@ntnu.edu.tw
Cong-Min Huang
National Taiwan University
R98548012@ntu.edu.tw
ABSTRACT
Distributed Video Coding (DVC) has become increasingly
popular in recent times among the researchers in video coding due
to its attractive and promising features. DVC primarily has a
modified complexity balance between the encoder and decoder, in
contrast to conventional video codecs. However, Most of the
reported DVC schemes have a high time-delay in decoder which
hinders its practical application in real-time systems. In this work,
we focus on speed up the Side Information(SI) generation module
in DVC, which is a major function in the DVC coding algorithm
and one of the time-consuming factor at the decoder. By applied it
through Compute Unified Device Architecture (CUDA) based on
General-Purpose Graphics Processing Unit (GPGPU), the
experimental results show that a considerable speedup can be
obtained by using the proposed parallelized SI generation
algorithm.
Keywords
Distributed video coding, Wyner-Ziv video coding, Side
information, frame interpolation, CUDA.
1. INTRODUCTION
Distributed video coding (DVC) has recently attracted a vast
amount of attention from the video coding community all around
the world. The new coding paradigm is also known as Wyner-Ziv
(WZ) video coding, which is based on Slepian-Wolf [1] and
Wyner-Ziv [2] theorems. These theorems mainly state that
separate encoding and joint decoding of two correlated sources, X
and Y, can be encoded to the same minimum rate as joint
encoding and decoding in the conventional video coding. DVC
codec subverts the traditional prediction-based standard video
coding scheme by exploiting the source statistics at the decoder
with the development of simpler encoders. That is, the
complexities of encoder and decoder are reversed. Hence the
encoder becomes fairly simple and leaves all the computationally
expensive processing to the decoder. This is done by shifting the
complex procedure of motion estimation/compensation from the
encoder to the decoder. In contrast to conventional coders the
motion estimation is thus only done at the decoder side. It is used
to generate a motion compensated prediction Y , called side
information(SI), of the original frame X, so SI may be seen as a
“corrupted” version of the original information, and error
correcting codes (LDPC or Turbo code) are typically used to
improve the quality of the side information until a target quality
for the final decoded frame is achieved.
These features are effectively utilized in several application
domains, e.g. video conferencing with mobile devices, wireless
video cameras and wireless low-power surveillance. However,
most of the reported DVC architectures face a common problem:
high decoding complexity, which restrains them from being used
in real-time video application. The complexity arises mainly from
two factors: one is iterative LDPC (or Turbo) decoding process
with a feedback channel, and the other is motion estimation
procedure in the Side Information (SI) generation. In order to
obtain a solution which is more suitable for practical applications,
new ideas have been proposed to amend or to optimize the
structure of decoder. However, SI generation plays a key role in
determining the performance of the codec and the reconstructed
video quality is also sensitive to the side information. It is
common that reduced cost on a motion search for a faster
generation may cause a sharp decrease in PSNR. Besides,
abundant channel decoding iterations guarantee decoding bit
accuracy, which is also critical to the video quality. For these
reasons, instead of reducing some computing steps, we are
inclined to adopt parallel approaches to achieve a faster
implementation with complete computation. With the Graphics
Processing Unit (GPU) becoming more powerful and widespread,
GPU are finding broader applications in scientific and general
purpose computation. Our proposed parallel approach utilizes
merely a low-grade NVIDIA GPU to significantly reduce the time
necessary for SI generation while keeping the same SI quality.
The paper is organized as follows. First, we introduce the DVC
codec we used [3] and its SI generation module is discussed in
section 2. Then, the proposed parallel approaches to SI generation
are introduced in section 3. The experimental results are
demonstrated in section. Finally, section 5 concludes the paper.
2. DISCOVER Codec
Figure 1 - DISCOVER codec architecture
2.1 Introduction
The DISCOVER WZ video codec , developed by a European
project funded under the European Commission 1ST FP6
programme, is based on the early Stanford WZ video coding
architecture proposed in [4, 5], further information may be
obtained at [3, 6]. Its architecture is illustrated in Figure 1. The
DISCOVER codec is probably the most efficient WZ video codec
now available. Its performance is reported in detail with the
corresponding test conditions in [6]; moreover, executable code
may be downloaded, allowing all researchers to compare
performances for other sequences and conditions as well.
In this work, we only parallelized the SI generation module
using GPGPU, the remaining modules are the same. Due to scope
of this work, we only describe the SI generation module. For
details of other modules, see [5].
2.2 SI Generation Module in DISCOVER
Codec
The following techniques [7][8] are used to obtain high quality
side information. Fig. 2 shows the architecture proposed for the
frame interpolation scheme. First, forward motion estimation from
Xb to Xf is performed. A block matching based on a modified
MAD (mean absolute difference) criterion is used in order to
regularize the motion vector field, which favors motion vectors
closer to the origin. Then, bidirectional motion estimation is
performed in order to find symmetric motion vectors from the
current WZ frame to Xb and Xf. Spatial motion smoothing based
on a weighted vector median filter is applied afterwards to the
obtained motion field to remove outliers. Finally, motion
compensation is performed between Xb and Xf along the obtained
motion field, so as to generate the side information. A hierarchical
coarse-to-fine approach is used in the bidirectional motion
estimation: the first iteration corresponds to a large block size
(16×16) and tracks fast motion reliably, while the second iteration
achieves higher precision using a smaller block size (8×8). The
motion search is performed using the half-pixel precision method
described in [9].
Among the processes mentioned above, the most time-
consuming steps are forward motion estimation and FIR filter (for
half-pixel motion estimation), which comprise about 70% and
25% of the entire procedure, respectively. Consequently, we
focused on parallelizing these two parts using GPGPU to reduce
the processing time.
3. PARALLEL APPROACH TO SI-
GENERATION ON GPGPU PLATFORM
3.1 Parallelized Forward Motion Estimation
The proposed parallel algorithm for Forward motion estimation is
implemented on GPGPU platform using CUDA. We parallelize
this part at block level to induce the least thread overhead, To
promise the balanced workload on each core, we also use the
indexes of WZ blocks gathered before decoding and allocate
almost the same number of blocks to each core. For simplicity, we
illustrate the proposed approach with an example. The input
sequence is QCIF (176x144), and block size is 16x16, 99 blocks
in target frame.
The parallel processing of the forward motion estimation is
shown in Figure 4. First, we launch a CUDA kernel to compute
the motion filed between past frame and future frame, which each
block in future frame map to a respective CUDA block, and each
have 1024-4096 candidate blocks within search range in reference
frame. We using 512 threads per CUDA block to compute the
cost(modified MAD) of each candidate block in parallel fashion.
Then, each thread keeps the local minimum cost(among all
candidate blocks it processed) and its respective motion vector in
shared memory. Hence, we have 512 local minimum costs when
all threads is done for a CUDA block, the next thing we need to
do is pick the global minimum cost among local minimum costs
by a reduction algorithm we refer form [10]. Finally, we get the
motion filed when CUDA kernel is done and keep the result in
device memory for another CUDA kernel launch after to find
correspond motion vector that closest to the origin of the block in
interpolated frame and copy the result back to the host. The
reference frame is transmit to device and store in GPU`s texture
memory, which can access faster when multiple read of the same
position . And each 16x16block in future frame is store in shared
memory for each thread in CUDA block to access faster.
Moreover, the local minimum cost and its corresponding motion
vector for each thread are store in shared memory for the same
reason. Moreover, we do Loop unrolling, avoid bank conflicts in
shared memory, and minimize the number of accesses of global
memory.
Figure 3 – Reduction Algorithm
3.2 Parallelized FIR Filter
The utilized FIR filter[9] references several neighbor pixel
locations to interpolate the resulting pixel, and therefore has a
higher complexity. We improved the filter performance by
parallelizing the upsampling process at pixel level.
4. EXPERIMENTAL RESULT
All evaluations are run on a PC with an AMD Athlon 64 X2
5600+ CPU at 2.91GHz (1MB cache) and an NVIDIA GeForce
GT220 graphics card. The test sequences are Foreman, Soccer,
Coastguard and Hall Monitor with QCIF resolution, 15 Hz frame
rate and whole sequences.GOP size is 2 and the 8-th quantization
table(Q=8) is used. In addition, the spending time presented here
only include each component in SI generation processing time, so
order to focusing on the performance evaluation of SI generation
module. All the time units are reported in seconds.
The SI generation processing time for all test sequences is
illustrated in Table 1. It is shown that we can achieve 14.15 times
(avg) speed up for forward motion estimation and 6.87 times (avg)
speed up for FIR filter. For entire SI generation procedure, we can
achieve 9.46 times (avg) speed up.
B1
B2
B99
We have also tested the algorithm on a PC with a less powerful
CPU and the same grade NVIDIA graphics card, which resulted
in even higher increase of processing speed (20-24 times), but not
reported here. Therefore, the experimental result is highly depend
on the power of CPU and GPU.
5. CONCLUSIONS
In this paper, a parallel algorithm based on GPGPU using
NVIDIA CUDA for SI generation in distributed video coding was
proposed. To achieve a load balancing and optimal runtime, we
presented a dynamic distribution scheme based on a task tool
model and threshold searching method. Experimental results
demonstrate that our algorithm can achieve up to 10 times (avg)
faster than sequential processing of the side information module.
6. REFERENCES
[1] D. Slepian and J. Wolf, "Noiseless coding of correlated
information sources," IEEE Trans. Inf. Theory, vol. 19, no. 4,
pp. 471-480, 1973.
[2] A. D. Wyner and J. Ziv, "The rate-distortion function for
source coding with side information at the decoder," IEEE
Trans. Inf. Theory, vol. 22, pp. 1-10, 1976.
4096
candidates
4096
candidates
4096
candidates
4096
candidates
4096
candidates
4096
candidates
4096
candidates
4096
candidates
176/16
144/16
Kernel
Shared MemoryFeature Frame
Figure 4 - Parallel approach for forward motion estimation
[3] X. Artigas, J. Ascenso, M. Dalai, S. Klomp, D. Kubasov and
M. Ouaret, "The discover codec: Architecture, techniques
and evaluation," Nov, 2007.
[4] A. A. Aaron, S. Rane, E. Setton, and B. Girod, “Transform-
Domain Wyner–Ziv Codec for Video,” Visual
Communications and Image Processing, San Jose, CA,
January 2004.
[5] B. Girod, A. Aaron, S. Rane, and D. Rebollo-
Monedero,“Distributed Video Coding,” Proceedings of the
IEEE, vol. 93, no. 1, pp. 71–83, January 2005.
[6] DISCOVER Page,
http://www.img.lx.it.pt/~discover/home.html
[7] J. Ascenso, C. Brites and F. Pereira “Content Adaptive
Wyner-Ziv Video Coding Driven by Motion Activity”, IEEE
International Conference on Image Processing, Atlanta, USA,
October 2006.
[8] J. Ascenso, C. Brites and F. Pereira, “Improving Frame
Interpolation with Spatial Motion Smoothing for Pixel
Domain Distributed Video Coding”, 5th EURASIP
Conference on Speech and Image Processing, Multimedia
Communications and Services, Smolenice, Slovak Republic,
July 2005.
[9] S. Klomp, Y. Vatis and J. Ostermann, “Side Information
Interpolation with Sub-pel Motion Compensation for Wyner-
Ziv Decoder”, Int. Conf. on Signal Processing and
Multimedia Applications, Setúbal, Portugal, August 2006.
[10] Mark Harris, “Optimizing parallel reduction in CUDA”,
NVIDIA Developer Technology, 2007.
Table 1. SI Generation time for test sequences
SI Generation Time (ms) CPU GPGPU CPU/GPGPU
Foreman (74 WZ frames, 76 key frames)
FIR filter 1228(11.8%) 178(16.5%) 6.90
Forward ME 8850 (85.2%) 594 (54.8%) 14.90
Others 306 (3%) 311 (28.7%) -
Total 10384 (100%) 1083 (100%) 9.59
Average (per WZ frame) 140.32 14.64 9.59
Soccer (74 WZ frames, 76 key frames)
FIR filter 1173(11.3%) 173(16.8%) 6.78
Forward ME 8911 (86.0%) 593 (57.5%) 15.03
Others 266 (2.7%) 265 (25.7%) -
Total 10350 (100%) 1031 (100%) 10.04
Average (per WZ frame) 139.86 13.93 10.04
Coastguard (74 WZ frames, 76 key frames)
FIR filter 1267 (12.3%) 181(15.3%) 7.00
Forward ME 8769(84.9%) 705(59.7%) 12.44
Others 294 (2.8%) 294 (25.0%) -
Total 10330 (100%) 1180 (100%) 8.75
Average (per WZ frame) 139.59 15.95 8.75
Hall Monitor (81 WZ frames, 83 key frames)
FIR filter 1386(12.2%) 204(16.9%) 6.79
Forward ME 9702(85.0%) 682(56.6%) 14.23
Others 322 (2.8%) 319 (26.5%) -
Total 11410 (100%) 1205 (100%) 9.47
Average (per WZ frame) 140.86 14.88 9.47

Más contenido relacionado

La actualidad más candente

Tutorial on Point Cloud Compression and standardisation
Tutorial on Point Cloud Compression and standardisationTutorial on Point Cloud Compression and standardisation
Tutorial on Point Cloud Compression and standardisationRufael Mekuria
 
Aruna Ravi - M.S Thesis
Aruna Ravi - M.S ThesisAruna Ravi - M.S Thesis
Aruna Ravi - M.S ThesisArunaRavi
 
11 Synchoricity as the basis for going Beyond Moore
11 Synchoricity as the basis for going Beyond Moore11 Synchoricity as the basis for going Beyond Moore
11 Synchoricity as the basis for going Beyond MooreRCCSRENKEI
 
Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...
Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...
Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...IRJET Journal
 
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAEFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAVLSICS Design
 
Design and analysis of optimized CORDIC based GMSK system on FPGA platform
Design and analysis of optimized CORDIC based  GMSK system on FPGA platform Design and analysis of optimized CORDIC based  GMSK system on FPGA platform
Design and analysis of optimized CORDIC based GMSK system on FPGA platform IJECEIAES
 
Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...
Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...
Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...DR.P.S.JAGADEESH KUMAR
 
Motion estimation overview
Motion estimation overviewMotion estimation overview
Motion estimation overviewYoss Cohen
 
An fpga based efficient fruit recognition system using minimum
An fpga based efficient fruit recognition system using minimumAn fpga based efficient fruit recognition system using minimum
An fpga based efficient fruit recognition system using minimumAlexander Decker
 
A fast pu mode decision algorithm for h.264 avc to hevc transcoding
A fast pu mode decision algorithm for h.264 avc to hevc transcodingA fast pu mode decision algorithm for h.264 avc to hevc transcoding
A fast pu mode decision algorithm for h.264 avc to hevc transcodingcsandit
 
Image transmission in ofdm using m ary psk modulation schemes –a comparitive ...
Image transmission in ofdm using m ary psk modulation schemes –a comparitive ...Image transmission in ofdm using m ary psk modulation schemes –a comparitive ...
Image transmission in ofdm using m ary psk modulation schemes –a comparitive ...eSAT Journals
 
Paper id 36201508
Paper id 36201508Paper id 36201508
Paper id 36201508IJRAT
 
martelli.ppt
martelli.pptmartelli.ppt
martelli.pptVideoguy
 
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...Fisnik Kraja
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learningAmgad Muhammad
 
MPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingMPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingChristian Kehl
 
JPEG2000 in a nutshell
JPEG2000 in a nutshellJPEG2000 in a nutshell
JPEG2000 in a nutshellBenoit Michel
 

La actualidad más candente (19)

Tutorial on Point Cloud Compression and standardisation
Tutorial on Point Cloud Compression and standardisationTutorial on Point Cloud Compression and standardisation
Tutorial on Point Cloud Compression and standardisation
 
Aruna Ravi - M.S Thesis
Aruna Ravi - M.S ThesisAruna Ravi - M.S Thesis
Aruna Ravi - M.S Thesis
 
11 Synchoricity as the basis for going Beyond Moore
11 Synchoricity as the basis for going Beyond Moore11 Synchoricity as the basis for going Beyond Moore
11 Synchoricity as the basis for going Beyond Moore
 
Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...
Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...
Implementation of FPGA Based Image Processing Algorithm using Xilinx System G...
 
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAEFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
 
Jpeg and mpeg ppt
Jpeg and mpeg pptJpeg and mpeg ppt
Jpeg and mpeg ppt
 
Design and analysis of optimized CORDIC based GMSK system on FPGA platform
Design and analysis of optimized CORDIC based  GMSK system on FPGA platform Design and analysis of optimized CORDIC based  GMSK system on FPGA platform
Design and analysis of optimized CORDIC based GMSK system on FPGA platform
 
Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...
Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...
Performance Evaluation of H.264 AVC Using CABAC Entropy Coding For Compound I...
 
Motion estimation overview
Motion estimation overviewMotion estimation overview
Motion estimation overview
 
An fpga based efficient fruit recognition system using minimum
An fpga based efficient fruit recognition system using minimumAn fpga based efficient fruit recognition system using minimum
An fpga based efficient fruit recognition system using minimum
 
A fast pu mode decision algorithm for h.264 avc to hevc transcoding
A fast pu mode decision algorithm for h.264 avc to hevc transcodingA fast pu mode decision algorithm for h.264 avc to hevc transcoding
A fast pu mode decision algorithm for h.264 avc to hevc transcoding
 
Image transmission in ofdm using m ary psk modulation schemes –a comparitive ...
Image transmission in ofdm using m ary psk modulation schemes –a comparitive ...Image transmission in ofdm using m ary psk modulation schemes –a comparitive ...
Image transmission in ofdm using m ary psk modulation schemes –a comparitive ...
 
Paper id 36201508
Paper id 36201508Paper id 36201508
Paper id 36201508
 
martelli.ppt
martelli.pptmartelli.ppt
martelli.ppt
 
ICRA Nathan Piasco
ICRA Nathan PiascoICRA Nathan Piasco
ICRA Nathan Piasco
 
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
Parallelization Techniques for the 2D Fourier Matched Filtering and Interpola...
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
 
MPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingMPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video Encoding
 
JPEG2000 in a nutshell
JPEG2000 in a nutshellJPEG2000 in a nutshell
JPEG2000 in a nutshell
 

Destacado

Tv Industry Presentation.Mimd
Tv Industry Presentation.MimdTv Industry Presentation.Mimd
Tv Industry Presentation.Mimdvarun23oct
 
Excom Business Plan
Excom Business  PlanExcom Business  Plan
Excom Business Planvarun23oct
 
[경의선 숲길지기] '하이라인의 친구들' 사례조사 내용
[경의선 숲길지기] '하이라인의 친구들' 사례조사 내용[경의선 숲길지기] '하이라인의 친구들' 사례조사 내용
[경의선 숲길지기] '하이라인의 친구들' 사례조사 내용주상 황
 
KM: Improving and Sustaining Quality - Some Opportunities and Challenges of I...
KM: Improving and Sustaining Quality - Some Opportunities and Challenges of I...KM: Improving and Sustaining Quality - Some Opportunities and Challenges of I...
KM: Improving and Sustaining Quality - Some Opportunities and Challenges of I...Dr. Helen Paige
 
Preguntas frecuentes sobre facturas a clientes
Preguntas frecuentes sobre facturas a clientesPreguntas frecuentes sobre facturas a clientes
Preguntas frecuentes sobre facturas a clientesEsker Ibérica
 
COMPANY PROFILE- NiS Consultants
COMPANY PROFILE- NiS ConsultantsCOMPANY PROFILE- NiS Consultants
COMPANY PROFILE- NiS ConsultantsNiS Consultants
 
Facturas de cargo, las excepciones que no confirman la regla
Facturas de cargo, las excepciones que no confirman la reglaFacturas de cargo, las excepciones que no confirman la regla
Facturas de cargo, las excepciones que no confirman la reglaEsker Ibérica
 
Caso de éxito: Schibsted Spain digitaliza el envío de 300.000 facturas al año
Caso de éxito: Schibsted Spain digitaliza el envío de 300.000 facturas al añoCaso de éxito: Schibsted Spain digitaliza el envío de 300.000 facturas al año
Caso de éxito: Schibsted Spain digitaliza el envío de 300.000 facturas al añoEsker Ibérica
 
วาระการประชุม คณะกรรมการบริหารความเสี่ยงคณะวิทยาศาสตร์ประยุกต์ ครั้งที่ 2/2558
วาระการประชุม คณะกรรมการบริหารความเสี่ยงคณะวิทยาศาสตร์ประยุกต์ ครั้งที่ 2/2558วาระการประชุม คณะกรรมการบริหารความเสี่ยงคณะวิทยาศาสตร์ประยุกต์ ครั้งที่ 2/2558
วาระการประชุม คณะกรรมการบริหารความเสี่ยงคณะวิทยาศาสตร์ประยุกต์ ครั้งที่ 2/2558งานอาคารฯ คณะวิทย์ มจพ.
 

Destacado (14)

Um presentation (1)
Um presentation (1)Um presentation (1)
Um presentation (1)
 
Presentation_NEW.PPTX
Presentation_NEW.PPTXPresentation_NEW.PPTX
Presentation_NEW.PPTX
 
Tv Industry Presentation.Mimd
Tv Industry Presentation.MimdTv Industry Presentation.Mimd
Tv Industry Presentation.Mimd
 
ญี่ปุ่น
ญี่ปุ่นญี่ปุ่น
ญี่ปุ่น
 
Excom Business Plan
Excom Business  PlanExcom Business  Plan
Excom Business Plan
 
[경의선 숲길지기] '하이라인의 친구들' 사례조사 내용
[경의선 숲길지기] '하이라인의 친구들' 사례조사 내용[경의선 숲길지기] '하이라인의 친구들' 사례조사 내용
[경의선 숲길지기] '하이라인의 친구들' 사례조사 내용
 
LPA ด้าน 1 การบริหารจัดการ
LPA ด้าน 1 การบริหารจัดการLPA ด้าน 1 การบริหารจัดการ
LPA ด้าน 1 การบริหารจัดการ
 
KM: Improving and Sustaining Quality - Some Opportunities and Challenges of I...
KM: Improving and Sustaining Quality - Some Opportunities and Challenges of I...KM: Improving and Sustaining Quality - Some Opportunities and Challenges of I...
KM: Improving and Sustaining Quality - Some Opportunities and Challenges of I...
 
Preguntas frecuentes sobre facturas a clientes
Preguntas frecuentes sobre facturas a clientesPreguntas frecuentes sobre facturas a clientes
Preguntas frecuentes sobre facturas a clientes
 
COMPANY PROFILE- NiS Consultants
COMPANY PROFILE- NiS ConsultantsCOMPANY PROFILE- NiS Consultants
COMPANY PROFILE- NiS Consultants
 
Facturas de cargo, las excepciones que no confirman la regla
Facturas de cargo, las excepciones que no confirman la reglaFacturas de cargo, las excepciones que no confirman la regla
Facturas de cargo, las excepciones que no confirman la regla
 
Caso de éxito: Schibsted Spain digitaliza el envío de 300.000 facturas al año
Caso de éxito: Schibsted Spain digitaliza el envío de 300.000 facturas al añoCaso de éxito: Schibsted Spain digitaliza el envío de 300.000 facturas al año
Caso de éxito: Schibsted Spain digitaliza el envío de 300.000 facturas al año
 
วาระการประชุม คณะกรรมการบริหารความเสี่ยงคณะวิทยาศาสตร์ประยุกต์ ครั้งที่ 2/2558
วาระการประชุม คณะกรรมการบริหารความเสี่ยงคณะวิทยาศาสตร์ประยุกต์ ครั้งที่ 2/2558วาระการประชุม คณะกรรมการบริหารความเสี่ยงคณะวิทยาศาสตร์ประยุกต์ ครั้งที่ 2/2558
วาระการประชุม คณะกรรมการบริหารความเสี่ยงคณะวิทยาศาสตร์ประยุกต์ ครั้งที่ 2/2558
 
Văn phòng google tại amsterdam
Văn phòng google tại amsterdamVăn phòng google tại amsterdam
Văn phòng google tại amsterdam
 

Similar a Cuda project paper

HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODER
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODERHARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODER
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODERcscpconf
 
Multi Processor Architecture for image processing
Multi Processor Architecture for image processingMulti Processor Architecture for image processing
Multi Processor Architecture for image processingideas2ignite
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAEFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAVLSICS Design
 
Review On Different Feature Extraction Algorithms
Review On Different Feature Extraction AlgorithmsReview On Different Feature Extraction Algorithms
Review On Different Feature Extraction AlgorithmsIRJET Journal
 
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAEFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAVLSICS Design
 
Optimal coding unit decision for early termination in high efficiency video c...
Optimal coding unit decision for early termination in high efficiency video c...Optimal coding unit decision for early termination in high efficiency video c...
Optimal coding unit decision for early termination in high efficiency video c...IJECEIAES
 
FPGA based JPEG Encoder
FPGA based JPEG EncoderFPGA based JPEG Encoder
FPGA based JPEG EncoderIJERA Editor
 
High Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming ParadigmsHigh Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming ParadigmsQuEST Global (erstwhile NeST Software)
 
absorption, Cu2+ : glass, emission, excitation, XRD
absorption, Cu2+ : glass, emission, excitation, XRDabsorption, Cu2+ : glass, emission, excitation, XRD
absorption, Cu2+ : glass, emission, excitation, XRDIJERA Editor
 
Improved Error Detection and Data Recovery Architecture for Motion Estimation...
Improved Error Detection and Data Recovery Architecture for Motion Estimation...Improved Error Detection and Data Recovery Architecture for Motion Estimation...
Improved Error Detection and Data Recovery Architecture for Motion Estimation...IJERA Editor
 
Comparison of ezw and h.264 2
Comparison of ezw and h.264 2Comparison of ezw and h.264 2
Comparison of ezw and h.264 2IAEME Publication
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Ijripublishers Ijri
 
Efficient Architecture for Variable Block Size Motion Estimation in H.264/AVC
Efficient Architecture for Variable Block Size Motion Estimation in H.264/AVCEfficient Architecture for Variable Block Size Motion Estimation in H.264/AVC
Efficient Architecture for Variable Block Size Motion Estimation in H.264/AVCIDES Editor
 
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...Editor IJMTER
 

Similar a Cuda project paper (20)

HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODER
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODERHARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODER
HARDWARE SOFTWARE CO-SIMULATION OF MOTION ESTIMATION IN H.264 ENCODER
 
Multi Processor Architecture for image processing
Multi Processor Architecture for image processingMulti Processor Architecture for image processing
Multi Processor Architecture for image processing
 
imagefiltervhdl.pptx
imagefiltervhdl.pptximagefiltervhdl.pptx
imagefiltervhdl.pptx
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAEFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
 
Review On Different Feature Extraction Algorithms
Review On Different Feature Extraction AlgorithmsReview On Different Feature Extraction Algorithms
Review On Different Feature Extraction Algorithms
 
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGAEFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
EFFICIENT ABSOLUTE DIFFERENCE CIRCUIT FOR SAD COMPUTATION ON FPGA
 
C0161018
C0161018C0161018
C0161018
 
C0161018
C0161018C0161018
C0161018
 
Optimal coding unit decision for early termination in high efficiency video c...
Optimal coding unit decision for early termination in high efficiency video c...Optimal coding unit decision for early termination in high efficiency video c...
Optimal coding unit decision for early termination in high efficiency video c...
 
FPGA based JPEG Encoder
FPGA based JPEG EncoderFPGA based JPEG Encoder
FPGA based JPEG Encoder
 
High Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming ParadigmsHigh Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming Paradigms
 
A04840107
A04840107A04840107
A04840107
 
absorption, Cu2+ : glass, emission, excitation, XRD
absorption, Cu2+ : glass, emission, excitation, XRDabsorption, Cu2+ : glass, emission, excitation, XRD
absorption, Cu2+ : glass, emission, excitation, XRD
 
Improved Error Detection and Data Recovery Architecture for Motion Estimation...
Improved Error Detection and Data Recovery Architecture for Motion Estimation...Improved Error Detection and Data Recovery Architecture for Motion Estimation...
Improved Error Detection and Data Recovery Architecture for Motion Estimation...
 
Comparison of ezw and h.264 2
Comparison of ezw and h.264 2Comparison of ezw and h.264 2
Comparison of ezw and h.264 2
 
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...
 
Efficient Architecture for Variable Block Size Motion Estimation in H.264/AVC
Efficient Architecture for Variable Block Size Motion Estimation in H.264/AVCEfficient Architecture for Variable Block Size Motion Estimation in H.264/AVC
Efficient Architecture for Variable Block Size Motion Estimation in H.264/AVC
 
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
Reconfigurable CORDIC Low-Power Implementation of Complex Signal Processing f...
 

Más de Kan-Han (John) Lu

Working process and git branch strategy
Working process and git branch strategyWorking process and git branch strategy
Working process and git branch strategyKan-Han (John) Lu
 
Deep neural network for youtube recommendations
Deep neural network for youtube recommendationsDeep neural network for youtube recommendations
Deep neural network for youtube recommendationsKan-Han (John) Lu
 
Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service iiKan-Han (John) Lu
 
Multimedia data minig and analytics sentiment analysis using social multimedia
Multimedia data minig and analytics sentiment analysis using social multimediaMultimedia data minig and analytics sentiment analysis using social multimedia
Multimedia data minig and analytics sentiment analysis using social multimediaKan-Han (John) Lu
 
ARM: Trusted Zone on Android
ARM: Trusted Zone on AndroidARM: Trusted Zone on Android
ARM: Trusted Zone on AndroidKan-Han (John) Lu
 
Android Training - Card Style
Android Training - Card StyleAndroid Training - Card Style
Android Training - Card StyleKan-Han (John) Lu
 
Android Training - View Pager
Android Training - View PagerAndroid Training - View Pager
Android Training - View PagerKan-Han (John) Lu
 
Android Training - Sliding Menu
Android Training - Sliding MenuAndroid Training - Sliding Menu
Android Training - Sliding MenuKan-Han (John) Lu
 
Android Training - Pull to Refresh
Android Training - Pull to RefreshAndroid Training - Pull to Refresh
Android Training - Pull to RefreshKan-Han (John) Lu
 
Code analyzer: FindBugs and PMD
Code analyzer: FindBugs and PMDCode analyzer: FindBugs and PMD
Code analyzer: FindBugs and PMDKan-Han (John) Lu
 
Android Training - Content Sharing
Android Training - Content SharingAndroid Training - Content Sharing
Android Training - Content SharingKan-Han (John) Lu
 
Android Training - Action Bar
Android Training - Action BarAndroid Training - Action Bar
Android Training - Action BarKan-Han (John) Lu
 

Más de Kan-Han (John) Lu (20)

Dagger for android
Dagger for androidDagger for android
Dagger for android
 
Android develop guideline
Android develop guidelineAndroid develop guideline
Android develop guideline
 
Working process and git branch strategy
Working process and git branch strategyWorking process and git branch strategy
Working process and git branch strategy
 
Deep neural network for youtube recommendations
Deep neural network for youtube recommendationsDeep neural network for youtube recommendations
Deep neural network for youtube recommendations
 
Android testing part i
Android testing part iAndroid testing part i
Android testing part i
 
Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service ii
 
Multimedia data minig and analytics sentiment analysis using social multimedia
Multimedia data minig and analytics sentiment analysis using social multimediaMultimedia data minig and analytics sentiment analysis using social multimedia
Multimedia data minig and analytics sentiment analysis using social multimedia
 
Android IPC: Binder
Android IPC: BinderAndroid IPC: Binder
Android IPC: Binder
 
ARM: Trusted Zone on Android
ARM: Trusted Zone on AndroidARM: Trusted Zone on Android
ARM: Trusted Zone on Android
 
Android Training - Card Style
Android Training - Card StyleAndroid Training - Card Style
Android Training - Card Style
 
Android Training - View Pager
Android Training - View PagerAndroid Training - View Pager
Android Training - View Pager
 
Android Training - Sliding Menu
Android Training - Sliding MenuAndroid Training - Sliding Menu
Android Training - Sliding Menu
 
Android Training - Pull to Refresh
Android Training - Pull to RefreshAndroid Training - Pull to Refresh
Android Training - Pull to Refresh
 
Java: Exception Handling
Java: Exception HandlingJava: Exception Handling
Java: Exception Handling
 
Dynamic Proxy by Java
Dynamic Proxy by JavaDynamic Proxy by Java
Dynamic Proxy by Java
 
Code analyzer: FindBugs and PMD
Code analyzer: FindBugs and PMDCode analyzer: FindBugs and PMD
Code analyzer: FindBugs and PMD
 
Android UI System
Android UI SystemAndroid UI System
Android UI System
 
Android Fragment
Android FragmentAndroid Fragment
Android Fragment
 
Android Training - Content Sharing
Android Training - Content SharingAndroid Training - Content Sharing
Android Training - Content Sharing
 
Android Training - Action Bar
Android Training - Action BarAndroid Training - Action Bar
Android Training - Action Bar
 

Último

Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGSIVASHANKAR N
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 

Último (20)

Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 

Cuda project paper

  • 1. CUDA Speed Up for Side Information Generation In Distributed Video Coding Ping-Shang Wang National Taiwan University R98944043@ntu.edu.tw Kan-Han Lu National Taiwan Normal University 698470271@ntnu.edu.tw Cong-Min Huang National Taiwan University R98548012@ntu.edu.tw ABSTRACT Distributed Video Coding (DVC) has become increasingly popular in recent times among the researchers in video coding due to its attractive and promising features. DVC primarily has a modified complexity balance between the encoder and decoder, in contrast to conventional video codecs. However, Most of the reported DVC schemes have a high time-delay in decoder which hinders its practical application in real-time systems. In this work, we focus on speed up the Side Information(SI) generation module in DVC, which is a major function in the DVC coding algorithm and one of the time-consuming factor at the decoder. By applied it through Compute Unified Device Architecture (CUDA) based on General-Purpose Graphics Processing Unit (GPGPU), the experimental results show that a considerable speedup can be obtained by using the proposed parallelized SI generation algorithm. Keywords Distributed video coding, Wyner-Ziv video coding, Side information, frame interpolation, CUDA. 1. INTRODUCTION Distributed video coding (DVC) has recently attracted a vast amount of attention from the video coding community all around the world. The new coding paradigm is also known as Wyner-Ziv (WZ) video coding, which is based on Slepian-Wolf [1] and Wyner-Ziv [2] theorems. These theorems mainly state that separate encoding and joint decoding of two correlated sources, X and Y, can be encoded to the same minimum rate as joint encoding and decoding in the conventional video coding. DVC codec subverts the traditional prediction-based standard video coding scheme by exploiting the source statistics at the decoder with the development of simpler encoders. That is, the complexities of encoder and decoder are reversed. Hence the encoder becomes fairly simple and leaves all the computationally expensive processing to the decoder. This is done by shifting the complex procedure of motion estimation/compensation from the encoder to the decoder. In contrast to conventional coders the motion estimation is thus only done at the decoder side. It is used to generate a motion compensated prediction Y , called side information(SI), of the original frame X, so SI may be seen as a “corrupted” version of the original information, and error correcting codes (LDPC or Turbo code) are typically used to improve the quality of the side information until a target quality for the final decoded frame is achieved. These features are effectively utilized in several application domains, e.g. video conferencing with mobile devices, wireless video cameras and wireless low-power surveillance. However, most of the reported DVC architectures face a common problem: high decoding complexity, which restrains them from being used in real-time video application. The complexity arises mainly from two factors: one is iterative LDPC (or Turbo) decoding process with a feedback channel, and the other is motion estimation procedure in the Side Information (SI) generation. In order to obtain a solution which is more suitable for practical applications, new ideas have been proposed to amend or to optimize the structure of decoder. However, SI generation plays a key role in determining the performance of the codec and the reconstructed video quality is also sensitive to the side information. It is common that reduced cost on a motion search for a faster generation may cause a sharp decrease in PSNR. Besides, abundant channel decoding iterations guarantee decoding bit accuracy, which is also critical to the video quality. For these reasons, instead of reducing some computing steps, we are inclined to adopt parallel approaches to achieve a faster implementation with complete computation. With the Graphics Processing Unit (GPU) becoming more powerful and widespread, GPU are finding broader applications in scientific and general purpose computation. Our proposed parallel approach utilizes merely a low-grade NVIDIA GPU to significantly reduce the time necessary for SI generation while keeping the same SI quality. The paper is organized as follows. First, we introduce the DVC codec we used [3] and its SI generation module is discussed in section 2. Then, the proposed parallel approaches to SI generation are introduced in section 3. The experimental results are demonstrated in section. Finally, section 5 concludes the paper. 2. DISCOVER Codec Figure 1 - DISCOVER codec architecture
  • 2. 2.1 Introduction The DISCOVER WZ video codec , developed by a European project funded under the European Commission 1ST FP6 programme, is based on the early Stanford WZ video coding architecture proposed in [4, 5], further information may be obtained at [3, 6]. Its architecture is illustrated in Figure 1. The DISCOVER codec is probably the most efficient WZ video codec now available. Its performance is reported in detail with the corresponding test conditions in [6]; moreover, executable code may be downloaded, allowing all researchers to compare performances for other sequences and conditions as well. In this work, we only parallelized the SI generation module using GPGPU, the remaining modules are the same. Due to scope of this work, we only describe the SI generation module. For details of other modules, see [5]. 2.2 SI Generation Module in DISCOVER Codec The following techniques [7][8] are used to obtain high quality side information. Fig. 2 shows the architecture proposed for the frame interpolation scheme. First, forward motion estimation from Xb to Xf is performed. A block matching based on a modified MAD (mean absolute difference) criterion is used in order to regularize the motion vector field, which favors motion vectors closer to the origin. Then, bidirectional motion estimation is performed in order to find symmetric motion vectors from the current WZ frame to Xb and Xf. Spatial motion smoothing based on a weighted vector median filter is applied afterwards to the obtained motion field to remove outliers. Finally, motion compensation is performed between Xb and Xf along the obtained motion field, so as to generate the side information. A hierarchical coarse-to-fine approach is used in the bidirectional motion estimation: the first iteration corresponds to a large block size (16×16) and tracks fast motion reliably, while the second iteration achieves higher precision using a smaller block size (8×8). The motion search is performed using the half-pixel precision method described in [9]. Among the processes mentioned above, the most time- consuming steps are forward motion estimation and FIR filter (for half-pixel motion estimation), which comprise about 70% and 25% of the entire procedure, respectively. Consequently, we focused on parallelizing these two parts using GPGPU to reduce the processing time. 3. PARALLEL APPROACH TO SI- GENERATION ON GPGPU PLATFORM 3.1 Parallelized Forward Motion Estimation The proposed parallel algorithm for Forward motion estimation is implemented on GPGPU platform using CUDA. We parallelize this part at block level to induce the least thread overhead, To promise the balanced workload on each core, we also use the indexes of WZ blocks gathered before decoding and allocate almost the same number of blocks to each core. For simplicity, we illustrate the proposed approach with an example. The input sequence is QCIF (176x144), and block size is 16x16, 99 blocks in target frame. The parallel processing of the forward motion estimation is shown in Figure 4. First, we launch a CUDA kernel to compute the motion filed between past frame and future frame, which each block in future frame map to a respective CUDA block, and each have 1024-4096 candidate blocks within search range in reference frame. We using 512 threads per CUDA block to compute the cost(modified MAD) of each candidate block in parallel fashion. Then, each thread keeps the local minimum cost(among all candidate blocks it processed) and its respective motion vector in shared memory. Hence, we have 512 local minimum costs when all threads is done for a CUDA block, the next thing we need to do is pick the global minimum cost among local minimum costs by a reduction algorithm we refer form [10]. Finally, we get the motion filed when CUDA kernel is done and keep the result in device memory for another CUDA kernel launch after to find correspond motion vector that closest to the origin of the block in interpolated frame and copy the result back to the host. The reference frame is transmit to device and store in GPU`s texture memory, which can access faster when multiple read of the same position . And each 16x16block in future frame is store in shared memory for each thread in CUDA block to access faster. Moreover, the local minimum cost and its corresponding motion vector for each thread are store in shared memory for the same reason. Moreover, we do Loop unrolling, avoid bank conflicts in shared memory, and minimize the number of accesses of global memory. Figure 3 – Reduction Algorithm
  • 3. 3.2 Parallelized FIR Filter The utilized FIR filter[9] references several neighbor pixel locations to interpolate the resulting pixel, and therefore has a higher complexity. We improved the filter performance by parallelizing the upsampling process at pixel level. 4. EXPERIMENTAL RESULT All evaluations are run on a PC with an AMD Athlon 64 X2 5600+ CPU at 2.91GHz (1MB cache) and an NVIDIA GeForce GT220 graphics card. The test sequences are Foreman, Soccer, Coastguard and Hall Monitor with QCIF resolution, 15 Hz frame rate and whole sequences.GOP size is 2 and the 8-th quantization table(Q=8) is used. In addition, the spending time presented here only include each component in SI generation processing time, so order to focusing on the performance evaluation of SI generation module. All the time units are reported in seconds. The SI generation processing time for all test sequences is illustrated in Table 1. It is shown that we can achieve 14.15 times (avg) speed up for forward motion estimation and 6.87 times (avg) speed up for FIR filter. For entire SI generation procedure, we can achieve 9.46 times (avg) speed up. B1 B2 B99 We have also tested the algorithm on a PC with a less powerful CPU and the same grade NVIDIA graphics card, which resulted in even higher increase of processing speed (20-24 times), but not reported here. Therefore, the experimental result is highly depend on the power of CPU and GPU. 5. CONCLUSIONS In this paper, a parallel algorithm based on GPGPU using NVIDIA CUDA for SI generation in distributed video coding was proposed. To achieve a load balancing and optimal runtime, we presented a dynamic distribution scheme based on a task tool model and threshold searching method. Experimental results demonstrate that our algorithm can achieve up to 10 times (avg) faster than sequential processing of the side information module. 6. REFERENCES [1] D. Slepian and J. Wolf, "Noiseless coding of correlated information sources," IEEE Trans. Inf. Theory, vol. 19, no. 4, pp. 471-480, 1973. [2] A. D. Wyner and J. Ziv, "The rate-distortion function for source coding with side information at the decoder," IEEE Trans. Inf. Theory, vol. 22, pp. 1-10, 1976. 4096 candidates 4096 candidates 4096 candidates 4096 candidates 4096 candidates 4096 candidates 4096 candidates 4096 candidates 176/16 144/16 Kernel Shared MemoryFeature Frame Figure 4 - Parallel approach for forward motion estimation
  • 4. [3] X. Artigas, J. Ascenso, M. Dalai, S. Klomp, D. Kubasov and M. Ouaret, "The discover codec: Architecture, techniques and evaluation," Nov, 2007. [4] A. A. Aaron, S. Rane, E. Setton, and B. Girod, “Transform- Domain Wyner–Ziv Codec for Video,” Visual Communications and Image Processing, San Jose, CA, January 2004. [5] B. Girod, A. Aaron, S. Rane, and D. Rebollo- Monedero,“Distributed Video Coding,” Proceedings of the IEEE, vol. 93, no. 1, pp. 71–83, January 2005. [6] DISCOVER Page, http://www.img.lx.it.pt/~discover/home.html [7] J. Ascenso, C. Brites and F. Pereira “Content Adaptive Wyner-Ziv Video Coding Driven by Motion Activity”, IEEE International Conference on Image Processing, Atlanta, USA, October 2006. [8] J. Ascenso, C. Brites and F. Pereira, “Improving Frame Interpolation with Spatial Motion Smoothing for Pixel Domain Distributed Video Coding”, 5th EURASIP Conference on Speech and Image Processing, Multimedia Communications and Services, Smolenice, Slovak Republic, July 2005. [9] S. Klomp, Y. Vatis and J. Ostermann, “Side Information Interpolation with Sub-pel Motion Compensation for Wyner- Ziv Decoder”, Int. Conf. on Signal Processing and Multimedia Applications, Setúbal, Portugal, August 2006. [10] Mark Harris, “Optimizing parallel reduction in CUDA”, NVIDIA Developer Technology, 2007. Table 1. SI Generation time for test sequences SI Generation Time (ms) CPU GPGPU CPU/GPGPU Foreman (74 WZ frames, 76 key frames) FIR filter 1228(11.8%) 178(16.5%) 6.90 Forward ME 8850 (85.2%) 594 (54.8%) 14.90 Others 306 (3%) 311 (28.7%) - Total 10384 (100%) 1083 (100%) 9.59 Average (per WZ frame) 140.32 14.64 9.59 Soccer (74 WZ frames, 76 key frames) FIR filter 1173(11.3%) 173(16.8%) 6.78 Forward ME 8911 (86.0%) 593 (57.5%) 15.03 Others 266 (2.7%) 265 (25.7%) - Total 10350 (100%) 1031 (100%) 10.04 Average (per WZ frame) 139.86 13.93 10.04 Coastguard (74 WZ frames, 76 key frames) FIR filter 1267 (12.3%) 181(15.3%) 7.00 Forward ME 8769(84.9%) 705(59.7%) 12.44 Others 294 (2.8%) 294 (25.0%) - Total 10330 (100%) 1180 (100%) 8.75 Average (per WZ frame) 139.59 15.95 8.75 Hall Monitor (81 WZ frames, 83 key frames) FIR filter 1386(12.2%) 204(16.9%) 6.79 Forward ME 9702(85.0%) 682(56.6%) 14.23 Others 322 (2.8%) 319 (26.5%) - Total 11410 (100%) 1205 (100%) 9.47 Average (per WZ frame) 140.86 14.88 9.47