SlideShare una empresa de Scribd logo
1 de 29
What is H.264?

• Video compression standard

• Official name: Advanced Video Coding (AVC) for generic
  audiovisual services
   o aka: MPEG-4/Part 10 or MPEG-4 AVC
• It's in your iPod
   o Current generation standardized format
   o Compression efficiency: H.264 >> XviD and DivX
How H.264 Compresses Video

     Frame 1        Frame 2         Frame 3         Frame 4        Frame 5




  Spatial
                          Temporal        <Source: Foreman, QCIF @ 25 fps>
Redundancy
                         Redundancy
    • Three redundancy reduction principles:
       1. Spatial redundancy (Intra-frame prediction)
       2. Temporal redundancy (Inter-frame prediction)
       3. Entropy coding (Mapping more common symbols to shorter codes)
Simple Video Encoder
Intra-frame Prediction
• Prediction block is formed from previously encoded blocks in
  the same frame
• Use spatial similarities to compress each frame
   o Use neighboring pixels to make a prediction on a block
   o Transmit the difference between actual and predicted
   o Tradeoff: prediction accuracy vs. # control bits
• Compression efficiency is relatively low in most areas of a
  typical scene

• Relatively low computation cost




                             Divide into 16x16 macroblocks (MBs)
Inter-frame Prediction

• Temporal locality
• Use previous frame as prediction for current frame
• Record movements
   o "motion vectors" (MVs)
Motion Vectors
Motion Estimation Algorithms

• Block Matching
   o 16 pixel x 16 pixel macroblocks
   o Estimate the movement of each macroblock
• Phase Correlation
   o Perform the search in the frequency domain
   o Only works well for translational motion
• Bayesian methods
tree moved down people moved farther to
                        and to the right the right than tree




Frame 1 (reference)             Frame 2 (current)




                          Macroblock to be coded
Big (Computational) Problem
• HD Video- 1080p (1920×1080) = 8,160 macroblocks
• Search window-how far we search for original block
  o   Normally 16 pixels; sometimes 32 pixels
  o   (2*16+1)*(2*16+1) = 1089 positions




                                          ME block

            Reference                                Current
            Frame          Search                    Frame
                           Space
Profiling Results

• Motion estimation (ME) dominates the encoding time!




  Results from JM H.264 Reference
  Code
Amdahl's Law

• Limits the overall speedup
• Eventually, the speedup limited by unparallized portion of
  the code
   o Optimized ME implementation (like x264) generally
     results in lower overall speedup
Previous Implementations

• x264
   o CPU
   o Open source
   o C and hand-coded assembly
   o VERY optimized
       MMX, SSE2, SSE3, SSE4
   o Considered the fastest implementation of H.264
   o Multithreaded (pthread support)
   o Slow! Slower than last generation encoders.
In CUDA
     • Several published articles which implemented H.264
       encoder in CUDA.
     • All of them target ME for parallelization
     • An example*
        o ME = 5 kernels
        o Full-search (i.e., unoptimized ME)
        o Sub-pel MV support
        o Sub-partition support




* Wei-Nien Chen; Hsueh-Ming Hang, "H.264/AVC motion estimation implmentation on Compute Unified Device Architecture (CUDA)," Multimedia and Expo, 2008
IEEE International Conference on, pp.697-700, June 23 2008-April 26 2008.
Problems with Previous Work

• Do not address inter-block dependencies
  o Sacrifice quality for parallelizability (i.e. speed)




                     MVp Dependencies
Our Project

• H.264 specifies how the decoder will work
   o Flexibility in encoder
       e.g. other CUDA implementations
• Solve motion estimation problem in parallel
   1.Deal with the dependency between blocks
   2.Best guess of MVp
Direct Approach: Wavefront
Our Approach: Pyramid ME

• Also known as "Hierarchical" ME
• Perform ME at a number of resolutions in increasing order
   o Use the MV found at the higher level as an estimate of
     the MVp in the lower level
Motion Vector


Sub-sampled 16x
Using Pyramid ME to Solve MVp Problem
Our Prototyping Framework

• Originally MATLAB + nvmex
• Now pyCUDA + matplotlib
• Motivation
  o Simplicity
  o Flexibility (output images, graphs, etc.)
  o pyCUDA == awesome
  o Automatic tuning in the future
Our Prototyping Framework
Our CUDA Implementation

• CUDA + C
• One kernel / level of hierarchy
• One block per macroblock
• One thread per search position
   o With 512 thread limit, search window size <= 11
   o Can perform argmin reduction to find the best MV
• Texture memory for reference and current frame
   o Allows for sub-pixel interpolation
   o Handles border clamping
Results

Gold    203.3 msec
CUDA    3.6 msec        Speedup = 56
x264    11.6 msec

• Not appropriate to compare the CUDA time to the x264 time.
• The x264 is performing a more accurate search.
   o The CUDA implementation will be made more accurate in
     the future.
   o We implemented small subset of the ME features
Conclusions

• H.264 ME in CUDA is viable, but will not be easy
   o Competing against very well written CPU code
• Full encoding process of H.264 is very complicated
   o Complex control flow and data dependencies
Future Work

• Improve estimate for MVp
• Pipeline data transfers
• Downsample on GPU vs. CPU
   o Data access concerns
• Process multiple frames together
   o Improve occupancy
• More than ME in CUDA
   o More dependency constraints
CUDA as a Development Framework

• Opened up GPU
   o Took less than a month!
• Documentation is sparse
• Right way isn't always known
• Debugging is a pain
• Emulation mode is VERY slow
• CUDA servers can become locked and need rebooting
Acknowledgements

Dark_Shikari (x264 dev)
Various other people in #x264 channel @ Freenode.net
H.264 Encoder Block Diagram

                                                                                                Bitstream
Video Input                    +                     Transform &                      Entropy
                                                                                                Output
                                                     Quantization                     Coding
                                      -
                                                               Inverse Quantization
                                                               & Inverse Transform



                             Intra/Inter Mode
                                 Decision
                                                                    + +

                 Motion                        Intra
              Compensation                  Prediction



                                                 Picture            Deblocking
                                                Buffering             Filter

                Motion
               Estimation
                                                                     Block prediction
References

E. G. Richardson, Iain (2003). H.264 and MPEG-4 Video Compression: Video Coding for Next-generation
Multimedia. Chichester: John Wiley & Sons Ltd..

Wei-Nien Chen; Hsueh-Ming Hang, "H.264/AVC motion estimation implmentation on Compute Unified
Device Architecture (CUDA)," Multimedia and Expo, 2008 IEEE International Conference on, pp.697-700,
June 23 2008-April 26 2008.

S Ryoo, CI Rodrigues, SS Baghsorkhi, SS Stone, DB."Optimization Principles and Application Performance
Evaluation of a Multithreaded GPU Using CUDA" 2008.

http://www.cs.cf.ac.uk/Dave/Multimedia/node256.html

http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/AV0405/ZAMPOGLU/Hierarchicalestimation.h
tml

Más contenido relacionado

La actualidad más candente

H.264 video standard
H.264 video standardH.264 video standard
H.264 video standardSajan Sahu
 
Andes andes clarity for risc-v vector processor
Andes andes clarity for risc-v vector processorAndes andes clarity for risc-v vector processor
Andes andes clarity for risc-v vector processorRISC-V International
 
SemiDynamics new family of High Bandwidth Vector-capable Cores
SemiDynamics new family of High Bandwidth Vector-capable CoresSemiDynamics new family of High Bandwidth Vector-capable Cores
SemiDynamics new family of High Bandwidth Vector-capable CoresRISC-V International
 
LAS16-405:OpenDataPlane: Software Defined Dataplane leader
LAS16-405:OpenDataPlane: Software Defined Dataplane leaderLAS16-405:OpenDataPlane: Software Defined Dataplane leader
LAS16-405:OpenDataPlane: Software Defined Dataplane leaderLinaro
 
RISC-V NOEL-V - A new high performance RISC-V Processor Family
RISC-V NOEL-V - A new high performance RISC-V Processor FamilyRISC-V NOEL-V - A new high performance RISC-V Processor Family
RISC-V NOEL-V - A new high performance RISC-V Processor FamilyRISC-V International
 
LF_DPDK17_mediated devices: better userland IO
LF_DPDK17_mediated devices: better userland IOLF_DPDK17_mediated devices: better userland IO
LF_DPDK17_mediated devices: better userland IOLF_DPDK
 
h.264 video compression standard.
h.264 video compression standard.h.264 video compression standard.
h.264 video compression standard.Videoguy
 
P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304Linaro
 
Closing the RISC-V compliance gap via fuzzing
Closing the RISC-V compliance gap via fuzzingClosing the RISC-V compliance gap via fuzzing
Closing the RISC-V compliance gap via fuzzingRISC-V International
 
RISC-V Linker Relaxation and LLD
RISC-V Linker Relaxation and LLDRISC-V Linker Relaxation and LLD
RISC-V Linker Relaxation and LLDRay Song
 
Semi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V coresSemi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V coresRISC-V International
 
Memory ECC - The Comprehensive of SEC-DED.
Memory ECC - The Comprehensive of SEC-DED. Memory ECC - The Comprehensive of SEC-DED.
Memory ECC - The Comprehensive of SEC-DED. Sk Cheah
 
Hard IP Core design | Convolution Encoder
Hard IP Core design | Convolution EncoderHard IP Core design | Convolution Encoder
Hard IP Core design | Convolution EncoderArchit Vora
 

La actualidad más candente (20)

RISC-V 30908 patra
RISC-V 30908 patraRISC-V 30908 patra
RISC-V 30908 patra
 
RISC-V Zce Extension
RISC-V Zce ExtensionRISC-V Zce Extension
RISC-V Zce Extension
 
Andes open cl for RISC-V
Andes open cl for RISC-VAndes open cl for RISC-V
Andes open cl for RISC-V
 
H.264 video standard
H.264 video standardH.264 video standard
H.264 video standard
 
Andes andes clarity for risc-v vector processor
Andes andes clarity for risc-v vector processorAndes andes clarity for risc-v vector processor
Andes andes clarity for risc-v vector processor
 
SemiDynamics new family of High Bandwidth Vector-capable Cores
SemiDynamics new family of High Bandwidth Vector-capable CoresSemiDynamics new family of High Bandwidth Vector-capable Cores
SemiDynamics new family of High Bandwidth Vector-capable Cores
 
LAS16-405:OpenDataPlane: Software Defined Dataplane leader
LAS16-405:OpenDataPlane: Software Defined Dataplane leaderLAS16-405:OpenDataPlane: Software Defined Dataplane leader
LAS16-405:OpenDataPlane: Software Defined Dataplane leader
 
REDA services
REDA servicesREDA services
REDA services
 
RISC-V NOEL-V - A new high performance RISC-V Processor Family
RISC-V NOEL-V - A new high performance RISC-V Processor FamilyRISC-V NOEL-V - A new high performance RISC-V Processor Family
RISC-V NOEL-V - A new high performance RISC-V Processor Family
 
RISC-V assembly
RISC-V assemblyRISC-V assembly
RISC-V assembly
 
LF_DPDK17_mediated devices: better userland IO
LF_DPDK17_mediated devices: better userland IOLF_DPDK17_mediated devices: better userland IO
LF_DPDK17_mediated devices: better userland IO
 
h.264 video compression standard.
h.264 video compression standard.h.264 video compression standard.
h.264 video compression standard.
 
P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304P4 to OpenDataPlane Compiler - BUD17-304
P4 to OpenDataPlane Compiler - BUD17-304
 
Closing the RISC-V compliance gap via fuzzing
Closing the RISC-V compliance gap via fuzzingClosing the RISC-V compliance gap via fuzzing
Closing the RISC-V compliance gap via fuzzing
 
RISC-V Linker Relaxation and LLD
RISC-V Linker Relaxation and LLDRISC-V Linker Relaxation and LLD
RISC-V Linker Relaxation and LLD
 
Semi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V coresSemi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V cores
 
Memory ECC - The Comprehensive of SEC-DED.
Memory ECC - The Comprehensive of SEC-DED. Memory ECC - The Comprehensive of SEC-DED.
Memory ECC - The Comprehensive of SEC-DED.
 
Hard IP Core design | Convolution Encoder
Hard IP Core design | Convolution EncoderHard IP Core design | Convolution Encoder
Hard IP Core design | Convolution Encoder
 
Secure IoT Firmware for RISC-V
Secure IoT Firmware for RISC-VSecure IoT Firmware for RISC-V
Secure IoT Firmware for RISC-V
 
Open j9 jdk on RISC-V
Open j9 jdk on RISC-VOpen j9 jdk on RISC-V
Open j9 jdk on RISC-V
 

Similar a H 264 in cuda presentation

Aruna Ravi - M.S Thesis
Aruna Ravi - M.S ThesisAruna Ravi - M.S Thesis
Aruna Ravi - M.S ThesisArunaRavi
 
Video Compression Basics by sahil jain
Video Compression Basics by sahil jainVideo Compression Basics by sahil jain
Video Compression Basics by sahil jainSahil Jain
 
Introduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainIntroduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainVideoguy
 
Compression: Video Compression (MPEG and others)
Compression: Video Compression (MPEG and others)Compression: Video Compression (MPEG and others)
Compression: Video Compression (MPEG and others)danishrafiq
 
Emerging H.264 Standard: Overview and TMS320DM642- Based ...
Emerging H.264 Standard: Overview and TMS320DM642- Based ...Emerging H.264 Standard: Overview and TMS320DM642- Based ...
Emerging H.264 Standard: Overview and TMS320DM642- Based ...Videoguy
 
MPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingMPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingChristian Kehl
 
Scrambling For Video Surveillance
Scrambling For Video SurveillanceScrambling For Video Surveillance
Scrambling For Video SurveillanceKobi Magnezi
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitMilind Bhandarkar
 
Emerging H.264 Standard:
Emerging H.264 Standard:Emerging H.264 Standard:
Emerging H.264 Standard:Videoguy
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architectureDhaval Kaneria
 
Machine Learning approaches at video compression
Machine Learning approaches at video compression Machine Learning approaches at video compression
Machine Learning approaches at video compression Roberto Iacoviello
 
Generic Video Adaptation Framework Towards Content – and Context Awareness in...
Generic Video Adaptation Framework Towards Content – and Context Awareness in...Generic Video Adaptation Framework Towards Content – and Context Awareness in...
Generic Video Adaptation Framework Towards Content – and Context Awareness in...Alpen-Adria-Universität
 
HEVC VIDEO CODEC By Vinayagam Mariappan
HEVC VIDEO CODEC By Vinayagam MariappanHEVC VIDEO CODEC By Vinayagam Mariappan
HEVC VIDEO CODEC By Vinayagam MariappanVinayagam Mariappan
 
An Introduction to Versatile Video Coding (VVC) for UHD, HDR and 360 Video
An Introduction to  Versatile Video Coding (VVC) for UHD, HDR and 360 VideoAn Introduction to  Versatile Video Coding (VVC) for UHD, HDR and 360 Video
An Introduction to Versatile Video Coding (VVC) for UHD, HDR and 360 VideoDr. Mohieddin Moradi
 
Video Compression Standards - History & Introduction
Video Compression Standards - History & IntroductionVideo Compression Standards - History & Introduction
Video Compression Standards - History & IntroductionChamp Yen
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...Edge AI and Vision Alliance
 

Similar a H 264 in cuda presentation (20)

Aruna Ravi - M.S Thesis
Aruna Ravi - M.S ThesisAruna Ravi - M.S Thesis
Aruna Ravi - M.S Thesis
 
Video Compression Basics by sahil jain
Video Compression Basics by sahil jainVideo Compression Basics by sahil jain
Video Compression Basics by sahil jain
 
Introduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainIntroduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag Jain
 
Compression: Video Compression (MPEG and others)
Compression: Video Compression (MPEG and others)Compression: Video Compression (MPEG and others)
Compression: Video Compression (MPEG and others)
 
Emerging H.264 Standard: Overview and TMS320DM642- Based ...
Emerging H.264 Standard: Overview and TMS320DM642- Based ...Emerging H.264 Standard: Overview and TMS320DM642- Based ...
Emerging H.264 Standard: Overview and TMS320DM642- Based ...
 
MPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video EncodingMPEG-1 Part 2 Video Encoding
MPEG-1 Part 2 Video Encoding
 
Deblocking_Filter_v2
Deblocking_Filter_v2Deblocking_Filter_v2
Deblocking_Filter_v2
 
Scrambling For Video Surveillance
Scrambling For Video SurveillanceScrambling For Video Surveillance
Scrambling For Video Surveillance
 
Moving object detection on FPGA
Moving object detection on FPGAMoving object detection on FPGA
Moving object detection on FPGA
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & Profit
 
Emerging H.264 Standard:
Emerging H.264 Standard:Emerging H.264 Standard:
Emerging H.264 Standard:
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
Machine Learning approaches at video compression
Machine Learning approaches at video compression Machine Learning approaches at video compression
Machine Learning approaches at video compression
 
HEVC intra coding
HEVC intra codingHEVC intra coding
HEVC intra coding
 
Generic Video Adaptation Framework Towards Content – and Context Awareness in...
Generic Video Adaptation Framework Towards Content – and Context Awareness in...Generic Video Adaptation Framework Towards Content – and Context Awareness in...
Generic Video Adaptation Framework Towards Content – and Context Awareness in...
 
HEVC VIDEO CODEC By Vinayagam Mariappan
HEVC VIDEO CODEC By Vinayagam MariappanHEVC VIDEO CODEC By Vinayagam Mariappan
HEVC VIDEO CODEC By Vinayagam Mariappan
 
Cuda project paper
Cuda project paperCuda project paper
Cuda project paper
 
An Introduction to Versatile Video Coding (VVC) for UHD, HDR and 360 Video
An Introduction to  Versatile Video Coding (VVC) for UHD, HDR and 360 VideoAn Introduction to  Versatile Video Coding (VVC) for UHD, HDR and 360 Video
An Introduction to Versatile Video Coding (VVC) for UHD, HDR and 360 Video
 
Video Compression Standards - History & Introduction
Video Compression Standards - History & IntroductionVideo Compression Standards - History & Introduction
Video Compression Standards - History & Introduction
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
 

Último

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Último (20)

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

H 264 in cuda presentation

  • 1. What is H.264? • Video compression standard • Official name: Advanced Video Coding (AVC) for generic audiovisual services o aka: MPEG-4/Part 10 or MPEG-4 AVC • It's in your iPod o Current generation standardized format o Compression efficiency: H.264 >> XviD and DivX
  • 2. How H.264 Compresses Video Frame 1 Frame 2 Frame 3 Frame 4 Frame 5 Spatial Temporal <Source: Foreman, QCIF @ 25 fps> Redundancy Redundancy • Three redundancy reduction principles: 1. Spatial redundancy (Intra-frame prediction) 2. Temporal redundancy (Inter-frame prediction) 3. Entropy coding (Mapping more common symbols to shorter codes)
  • 4. Intra-frame Prediction • Prediction block is formed from previously encoded blocks in the same frame • Use spatial similarities to compress each frame o Use neighboring pixels to make a prediction on a block o Transmit the difference between actual and predicted o Tradeoff: prediction accuracy vs. # control bits • Compression efficiency is relatively low in most areas of a typical scene • Relatively low computation cost Divide into 16x16 macroblocks (MBs)
  • 5. Inter-frame Prediction • Temporal locality • Use previous frame as prediction for current frame • Record movements o "motion vectors" (MVs)
  • 7. Motion Estimation Algorithms • Block Matching o 16 pixel x 16 pixel macroblocks o Estimate the movement of each macroblock • Phase Correlation o Perform the search in the frequency domain o Only works well for translational motion • Bayesian methods
  • 8. tree moved down people moved farther to and to the right the right than tree Frame 1 (reference) Frame 2 (current) Macroblock to be coded
  • 9. Big (Computational) Problem • HD Video- 1080p (1920×1080) = 8,160 macroblocks • Search window-how far we search for original block o Normally 16 pixels; sometimes 32 pixels o (2*16+1)*(2*16+1) = 1089 positions ME block Reference Current Frame Search Frame Space
  • 10. Profiling Results • Motion estimation (ME) dominates the encoding time! Results from JM H.264 Reference Code
  • 11. Amdahl's Law • Limits the overall speedup • Eventually, the speedup limited by unparallized portion of the code o Optimized ME implementation (like x264) generally results in lower overall speedup
  • 12. Previous Implementations • x264 o CPU o Open source o C and hand-coded assembly o VERY optimized  MMX, SSE2, SSE3, SSE4 o Considered the fastest implementation of H.264 o Multithreaded (pthread support) o Slow! Slower than last generation encoders.
  • 13. In CUDA • Several published articles which implemented H.264 encoder in CUDA. • All of them target ME for parallelization • An example* o ME = 5 kernels o Full-search (i.e., unoptimized ME) o Sub-pel MV support o Sub-partition support * Wei-Nien Chen; Hsueh-Ming Hang, "H.264/AVC motion estimation implmentation on Compute Unified Device Architecture (CUDA)," Multimedia and Expo, 2008 IEEE International Conference on, pp.697-700, June 23 2008-April 26 2008.
  • 14. Problems with Previous Work • Do not address inter-block dependencies o Sacrifice quality for parallelizability (i.e. speed) MVp Dependencies
  • 15. Our Project • H.264 specifies how the decoder will work o Flexibility in encoder  e.g. other CUDA implementations • Solve motion estimation problem in parallel 1.Deal with the dependency between blocks 2.Best guess of MVp
  • 17. Our Approach: Pyramid ME • Also known as "Hierarchical" ME • Perform ME at a number of resolutions in increasing order o Use the MV found at the higher level as an estimate of the MVp in the lower level
  • 19. Using Pyramid ME to Solve MVp Problem
  • 20. Our Prototyping Framework • Originally MATLAB + nvmex • Now pyCUDA + matplotlib • Motivation o Simplicity o Flexibility (output images, graphs, etc.) o pyCUDA == awesome o Automatic tuning in the future
  • 22. Our CUDA Implementation • CUDA + C • One kernel / level of hierarchy • One block per macroblock • One thread per search position o With 512 thread limit, search window size <= 11 o Can perform argmin reduction to find the best MV • Texture memory for reference and current frame o Allows for sub-pixel interpolation o Handles border clamping
  • 23. Results Gold 203.3 msec CUDA 3.6 msec Speedup = 56 x264 11.6 msec • Not appropriate to compare the CUDA time to the x264 time. • The x264 is performing a more accurate search. o The CUDA implementation will be made more accurate in the future. o We implemented small subset of the ME features
  • 24. Conclusions • H.264 ME in CUDA is viable, but will not be easy o Competing against very well written CPU code • Full encoding process of H.264 is very complicated o Complex control flow and data dependencies
  • 25. Future Work • Improve estimate for MVp • Pipeline data transfers • Downsample on GPU vs. CPU o Data access concerns • Process multiple frames together o Improve occupancy • More than ME in CUDA o More dependency constraints
  • 26. CUDA as a Development Framework • Opened up GPU o Took less than a month! • Documentation is sparse • Right way isn't always known • Debugging is a pain • Emulation mode is VERY slow • CUDA servers can become locked and need rebooting
  • 27. Acknowledgements Dark_Shikari (x264 dev) Various other people in #x264 channel @ Freenode.net
  • 28. H.264 Encoder Block Diagram Bitstream Video Input + Transform & Entropy Output Quantization Coding - Inverse Quantization & Inverse Transform Intra/Inter Mode Decision + + Motion Intra Compensation Prediction Picture Deblocking Buffering Filter Motion Estimation Block prediction
  • 29. References E. G. Richardson, Iain (2003). H.264 and MPEG-4 Video Compression: Video Coding for Next-generation Multimedia. Chichester: John Wiley & Sons Ltd.. Wei-Nien Chen; Hsueh-Ming Hang, "H.264/AVC motion estimation implmentation on Compute Unified Device Architecture (CUDA)," Multimedia and Expo, 2008 IEEE International Conference on, pp.697-700, June 23 2008-April 26 2008. S Ryoo, CI Rodrigues, SS Baghsorkhi, SS Stone, DB."Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA" 2008. http://www.cs.cf.ac.uk/Dave/Multimedia/node256.html http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/AV0405/ZAMPOGLU/Hierarchicalestimation.h tml