SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
Halide
Domain Specific Language
Halide: A Language and Compiler for Optimizing
Parallelism, Locality, and Recomputation in
Image Processing Pipelines
Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris,
Frédo Durand, Saman Amarasinghe.
PLDI '13 Proceedings of the 34th ACM SIGPLAN Conference on Programming
Language Design and Implementation
Recap Halide
● Separate descriptions of ‘algorithm’per output pixel and execution ‘schedule’
Recap Halide
● Separate descriptions of ‘algorithm’per output pixel and execution ‘schedule’
● Halide is aprogramming languageembedded in C++
Recap Halide
● Separate descriptions of ‘algorithm’per output pixel and execution ‘schedule’
● Halide is aprogramming languageembedded in C++
● Compiler based on LLVM/Clang
Recap Halide
● Separate descriptions of ‘algorithm’per output pixel and execution ‘schedule’
● Halide is aprogramming languageembedded in C++
● Compiler based on LLVM/Clang
Parallelism
distribute across threads and
SIMD parallel vector
Locality
compute in tiles interleave
tiles of blurx, blury store blurx
in local cache
Compiling Scheduled Pipelines
Compiling Scheduled Pipelines
1. Lowering and Loop synthesis
2. Bounds Inference
3. Sliding Window Optimization and Storage Folding
4. Flattening
5. Vectorization and Unrolling
6. Back-end Code Generation
Lowering
● Lowering process is synthesizes a single, complete set of loop nests and
allocations, given a Halide pipeline and a fully-specified schedule.
Lowering
● Lowering process is synthesizes a single, complete set of loop nests and
allocations, given a Halide pipeline and a fully-specified schedule.
● Lowering begins from the function defining the out.
○ Lowering then proceeds recursively up the pipeline, from callers to callees(here, from out to
blurx)
Lowering
● Lowering process is synthesizes a single, complete set of loop nests and
allocations, given a Halide pipeline and a fully-specified schedule.
● Lowering begins from the function defining the out.
○ Lowering then proceeds recursively up the pipeline, from callers to callees(here, from out to
blurx)
● Each loop is labeled as being serial, parallel, unrolled, or vectorized,
according to the schedule.
Bounds Inference
● The next stage of lowering generates and injects appropriate definitions for
these variables. Like function lowering, bounds inference proceeds
recursively back from the output.
Sliding Window Optimization and Storage Folding
● If a realization of a function is stored at higher loop level than its computation,
with an intervening serial loop, then iterations of that loop can reuse values
generated by previous iterations.
Sliding Window Optimization and Storage Folding
● If a realization of a function is stored at higher loop level than its computation,
with an intervening serial loop, then iterations of that loop can reuse values
generated by previous iterations.
● This transformations that lets us trade off parallelism (because the intervening
loop must be serial) for reuse (because we avoid recomputing values already
computed by previous iterations.)
Flattenting
● The compiler flattens multi-dimensional loads, stores and allocations into their
single-dimensional equivalent.
Flattenting
● The compiler flattens multi-dimensional loads, stores and allocations into their
single-dimensional equivalent.
● By convention, we always set the stride of the innermost dimension to 1, to
ensure we can perform dense vector loads and stores in that dimension.
Vectorization and Unrolling
● Relace loops of constant size scheduled as vectorized or unrolled with
transformed versions of their loop bodies.
Vectorization and Unrolling
● Relace loops of constant size scheduled as vectorized or unrolled with
transformed versions of their loop bodies.
● Unrolling replaces a loop of size n with n sequential statements performing
each loop iteration in turn.
Vectorization and Unrolling
● Relace loops of constant size scheduled as vectorized or unrolled with
transformed versions of their loop bodies.
● Unrolling replaces a loop of size n with n sequential statements performing
each loop iteration in turn.
● Vectorization completely replaces a loop of size n with a signle statement.
Back-end Code Generation
● We first run a standard constant-folding and dead-code elimination pass on
our IR. Which also performs symbolic simplification of common patterns
produced by bounds inference.
Back-end Code Generation
● We first run a standard constant-folding and dead-code elimination pass on
our IR. Which also performs symbolic simplification of common patterns
produced by bounds inference.
● There is mostly a one-to-one mapping between our representation and
LLVM’s, but two specific patterns warrant mention.
○ First, parallel for loops are lowered to LLVM code that first builds a closure containing state
referred to in the body of a for loop.
○ Second, many vector patterns are difficult to express or generate poor code if passed directly
to LLVM. We use peephole optimization to retoute these to architecture-specific intrinsics.
Compiling Scheduled Pipelines
Autotuning
● The optimal schedule dependents on machine architecture, image
dimensions and code generation in complex ways, and exhibits global
dependencies between choices due to loop fusion and caching behavior.
Autotuning
● The optimal schedule dependents on machine architecture, image
dimensions and code generation in complex ways, and exhibits global
dependencies between choices due to loop fusion and caching behavior.
● Use genetic algorithm to seek out a plausible approximate solution.
○ fixed population size(128 individuals per generation)
Autotuning
● The optimal schedule dependents on machine architecture, image
dimensions and code generation in complex ways, and exhibits global
dependencies between choices due to loop fusion and caching behavior.
● Use genetic algorithm to seek out a plausible approximate solution.
○ fixed population size(128 individuals per generation)
● schedule patterns
○ compute and store under x, and vetorize by x
○ fully parallelized and tiled
○ parallelized over y and vectorized over x.
Results
● Platform
○ CPU - quad core Xeon W3520
○ GPU - NVIDIA Tesla C2070
● Algorithm
Result
Autotuning Performance
Q&A

Más contenido relacionado

La actualidad más candente

Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...
Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...
Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...RSIS International
 
SmB café 13 sep '12 - Compaan Design
SmB café 13 sep '12 - Compaan DesignSmB café 13 sep '12 - Compaan Design
SmB café 13 sep '12 - Compaan DesignChristiaan van Gorkum
 
GRAPHITE: An Extensible Graph Traversal Framework for Relational Database Ma...
GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Ma...GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Ma...
GRAPHITE: An Extensible Graph Traversal Framework for Relational Database Ma...Marcus Paradies
 
PERFORMANCE VEHICULAR AD-HOC NETWORK (VANET)
PERFORMANCE VEHICULAR AD-HOC NETWORK (VANET) PERFORMANCE VEHICULAR AD-HOC NETWORK (VANET)
PERFORMANCE VEHICULAR AD-HOC NETWORK (VANET) Limon Prince
 
Datomic rtree-pres
Datomic rtree-presDatomic rtree-pres
Datomic rtree-presjsofra
 
Compiler Construction | Lecture 15 | Memory Management
Compiler Construction | Lecture 15 | Memory ManagementCompiler Construction | Lecture 15 | Memory Management
Compiler Construction | Lecture 15 | Memory ManagementEelco Visser
 
MPEG-4 BIFS and MPEG-2 TS: Latest developments for digital radio services
MPEG-4 BIFS and MPEG-2 TS: Latest developments for digital radio servicesMPEG-4 BIFS and MPEG-2 TS: Latest developments for digital radio services
MPEG-4 BIFS and MPEG-2 TS: Latest developments for digital radio servicesCyril Concolato
 
Programmable array-logic-and-programmable-logic-array
Programmable array-logic-and-programmable-logic-arrayProgrammable array-logic-and-programmable-logic-array
Programmable array-logic-and-programmable-logic-arrayJher Carlson Atasan
 
Bca1040 digital logic
Bca1040  digital logicBca1040  digital logic
Bca1040 digital logicsmumbahelp
 
Ieee project reversible logic gates by_amit
Ieee project reversible logic gates  by_amitIeee project reversible logic gates  by_amit
Ieee project reversible logic gates by_amitAmith Bhonsle
 
H2O World - GLM - Tomas Nykodym
H2O World - GLM - Tomas NykodymH2O World - GLM - Tomas Nykodym
H2O World - GLM - Tomas NykodymSri Ambati
 
Design and minimization of reversible programmable logic arrays and its reali...
Design and minimization of reversible programmable logic arrays and its reali...Design and minimization of reversible programmable logic arrays and its reali...
Design and minimization of reversible programmable logic arrays and its reali...Sajib Mitra
 
Arc: An IR for Batch and Stream Programming
Arc: An IR for Batch and Stream ProgrammingArc: An IR for Batch and Stream Programming
Arc: An IR for Batch and Stream ProgrammingLars Kroll
 
ScilabTEC 2015 - KIT
ScilabTEC 2015 - KITScilabTEC 2015 - KIT
ScilabTEC 2015 - KITScilab
 
Preference of Efficient Architectures for GF(p) Elliptic Curve Crypto Operati...
Preference of Efficient Architectures for GF(p) Elliptic Curve Crypto Operati...Preference of Efficient Architectures for GF(p) Elliptic Curve Crypto Operati...
Preference of Efficient Architectures for GF(p) Elliptic Curve Crypto Operati...CSCJournals
 

La actualidad más candente (20)

Reginf pldi3
Reginf pldi3Reginf pldi3
Reginf pldi3
 
Logic Synthesis
Logic SynthesisLogic Synthesis
Logic Synthesis
 
Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...
Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...
Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...
 
SmB café 13 sep '12 - Compaan Design
SmB café 13 sep '12 - Compaan DesignSmB café 13 sep '12 - Compaan Design
SmB café 13 sep '12 - Compaan Design
 
GRAPHITE: An Extensible Graph Traversal Framework for Relational Database Ma...
GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Ma...GRAPHITE:  An Extensible Graph Traversal Framework for Relational Database Ma...
GRAPHITE: An Extensible Graph Traversal Framework for Relational Database Ma...
 
PERFORMANCE VEHICULAR AD-HOC NETWORK (VANET)
PERFORMANCE VEHICULAR AD-HOC NETWORK (VANET) PERFORMANCE VEHICULAR AD-HOC NETWORK (VANET)
PERFORMANCE VEHICULAR AD-HOC NETWORK (VANET)
 
Datomic rtree-pres
Datomic rtree-presDatomic rtree-pres
Datomic rtree-pres
 
Compiler Construction | Lecture 15 | Memory Management
Compiler Construction | Lecture 15 | Memory ManagementCompiler Construction | Lecture 15 | Memory Management
Compiler Construction | Lecture 15 | Memory Management
 
MPEG-4 BIFS and MPEG-2 TS: Latest developments for digital radio services
MPEG-4 BIFS and MPEG-2 TS: Latest developments for digital radio servicesMPEG-4 BIFS and MPEG-2 TS: Latest developments for digital radio services
MPEG-4 BIFS and MPEG-2 TS: Latest developments for digital radio services
 
Unit iv(simple code generator)
Unit iv(simple code generator)Unit iv(simple code generator)
Unit iv(simple code generator)
 
Programmable array-logic-and-programmable-logic-array
Programmable array-logic-and-programmable-logic-arrayProgrammable array-logic-and-programmable-logic-array
Programmable array-logic-and-programmable-logic-array
 
Bca1040 digital logic
Bca1040  digital logicBca1040  digital logic
Bca1040 digital logic
 
Ieee project reversible logic gates by_amit
Ieee project reversible logic gates  by_amitIeee project reversible logic gates  by_amit
Ieee project reversible logic gates by_amit
 
H2O World - GLM - Tomas Nykodym
H2O World - GLM - Tomas NykodymH2O World - GLM - Tomas Nykodym
H2O World - GLM - Tomas Nykodym
 
MPEG-4 BIFS Overview
MPEG-4 BIFS OverviewMPEG-4 BIFS Overview
MPEG-4 BIFS Overview
 
Design and minimization of reversible programmable logic arrays and its reali...
Design and minimization of reversible programmable logic arrays and its reali...Design and minimization of reversible programmable logic arrays and its reali...
Design and minimization of reversible programmable logic arrays and its reali...
 
Arc: An IR for Batch and Stream Programming
Arc: An IR for Batch and Stream ProgrammingArc: An IR for Batch and Stream Programming
Arc: An IR for Batch and Stream Programming
 
ScilabTEC 2015 - KIT
ScilabTEC 2015 - KITScilabTEC 2015 - KIT
ScilabTEC 2015 - KIT
 
Preference of Efficient Architectures for GF(p) Elliptic Curve Crypto Operati...
Preference of Efficient Architectures for GF(p) Elliptic Curve Crypto Operati...Preference of Efficient Architectures for GF(p) Elliptic Curve Crypto Operati...
Preference of Efficient Architectures for GF(p) Elliptic Curve Crypto Operati...
 
Coding style for good synthesis
Coding style for good synthesisCoding style for good synthesis
Coding style for good synthesis
 

Similar a Halide - 2

Deterministic Galois: On-demand, Portable and Parameterless
Deterministic Galois: On-demand, Portable and ParameterlessDeterministic Galois: On-demand, Portable and Parameterless
Deterministic Galois: On-demand, Portable and ParameterlessDonald Nguyen
 
FlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkFlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkTheodoros Vasiloudis
 
ReactiveX
ReactiveXReactiveX
ReactiveXBADR
 
Paper Study - Incremental Data-Flow Analysis Algorithms by Ryder et al
Paper Study - Incremental Data-Flow Analysis Algorithms by Ryder et alPaper Study - Incremental Data-Flow Analysis Algorithms by Ryder et al
Paper Study - Incremental Data-Flow Analysis Algorithms by Ryder et alMin-Yih Hsu
 
Bonn motion, traffic generation and nam
Bonn motion, traffic generation and namBonn motion, traffic generation and nam
Bonn motion, traffic generation and namManas Gaur
 
Towards Automated Design Space Exploration and Code Generation using Type Tra...
Towards Automated Design Space Exploration and Code Generation using Type Tra...Towards Automated Design Space Exploration and Code Generation using Type Tra...
Towards Automated Design Space Exploration and Code Generation using Type Tra...waqarnabi
 
Iqra -IBM Quantum Challenge Webinar
Iqra -IBM Quantum Challenge WebinarIqra -IBM Quantum Challenge Webinar
Iqra -IBM Quantum Challenge WebinarSyedMuhammadMooazam
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemDatabricks
 
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanScala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanJimin Hsieh
 
How Optimizely (Safely) Maximizes Database Concurrency.pdf
How Optimizely (Safely) Maximizes Database Concurrency.pdfHow Optimizely (Safely) Maximizes Database Concurrency.pdf
How Optimizely (Safely) Maximizes Database Concurrency.pdfScyllaDB
 
PyData Los Angeles 2020 (Abhilash Majumder)
PyData Los Angeles 2020 (Abhilash Majumder)PyData Los Angeles 2020 (Abhilash Majumder)
PyData Los Angeles 2020 (Abhilash Majumder)Abhilash Majumder
 
Solve it Differently with Reactive Programming
Solve it Differently with Reactive ProgrammingSolve it Differently with Reactive Programming
Solve it Differently with Reactive ProgrammingSupun Dissanayake
 
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengGeneralized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengSpark Summit
 
Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRDatabricks
 
FlinkML - Big data application meetup
FlinkML - Big data application meetupFlinkML - Big data application meetup
FlinkML - Big data application meetupTheodoros Vasiloudis
 
Lecture summary: architectures for baseband signal processing of wireless com...
Lecture summary: architectures for baseband signal processing of wireless com...Lecture summary: architectures for baseband signal processing of wireless com...
Lecture summary: architectures for baseband signal processing of wireless com...Frank Kienle
 

Similar a Halide - 2 (20)

Deterministic Galois: On-demand, Portable and Parameterless
Deterministic Galois: On-demand, Portable and ParameterlessDeterministic Galois: On-demand, Portable and Parameterless
Deterministic Galois: On-demand, Portable and Parameterless
 
FlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache FlinkFlinkML: Large Scale Machine Learning with Apache Flink
FlinkML: Large Scale Machine Learning with Apache Flink
 
ReactiveX
ReactiveXReactiveX
ReactiveX
 
Machine Learning @NECST
Machine Learning @NECSTMachine Learning @NECST
Machine Learning @NECST
 
Paper Study - Incremental Data-Flow Analysis Algorithms by Ryder et al
Paper Study - Incremental Data-Flow Analysis Algorithms by Ryder et alPaper Study - Incremental Data-Flow Analysis Algorithms by Ryder et al
Paper Study - Incremental Data-Flow Analysis Algorithms by Ryder et al
 
Bonn motion, traffic generation and nam
Bonn motion, traffic generation and namBonn motion, traffic generation and nam
Bonn motion, traffic generation and nam
 
Towards Automated Design Space Exploration and Code Generation using Type Tra...
Towards Automated Design Space Exploration and Code Generation using Type Tra...Towards Automated Design Space Exploration and Code Generation using Type Tra...
Towards Automated Design Space Exploration and Code Generation using Type Tra...
 
Iqra -IBM Quantum Challenge Webinar
Iqra -IBM Quantum Challenge WebinarIqra -IBM Quantum Challenge Webinar
Iqra -IBM Quantum Challenge Webinar
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving System
 
Conv-TasNet.pdf
Conv-TasNet.pdfConv-TasNet.pdf
Conv-TasNet.pdf
 
Java concurrency
Java concurrencyJava concurrency
Java concurrency
 
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala TaiwanScala & Spark(1.6) in Performance Aspect for Scala Taiwan
Scala & Spark(1.6) in Performance Aspect for Scala Taiwan
 
How Optimizely (Safely) Maximizes Database Concurrency.pdf
How Optimizely (Safely) Maximizes Database Concurrency.pdfHow Optimizely (Safely) Maximizes Database Concurrency.pdf
How Optimizely (Safely) Maximizes Database Concurrency.pdf
 
PyData Los Angeles 2020 (Abhilash Majumder)
PyData Los Angeles 2020 (Abhilash Majumder)PyData Los Angeles 2020 (Abhilash Majumder)
PyData Los Angeles 2020 (Abhilash Majumder)
 
Solve it Differently with Reactive Programming
Solve it Differently with Reactive ProgrammingSolve it Differently with Reactive Programming
Solve it Differently with Reactive Programming
 
DaViT.pdf
DaViT.pdfDaViT.pdf
DaViT.pdf
 
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengGeneralized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
 
Generalized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkRGeneralized Linear Models in Spark MLlib and SparkR
Generalized Linear Models in Spark MLlib and SparkR
 
FlinkML - Big data application meetup
FlinkML - Big data application meetupFlinkML - Big data application meetup
FlinkML - Big data application meetup
 
Lecture summary: architectures for baseband signal processing of wireless com...
Lecture summary: architectures for baseband signal processing of wireless com...Lecture summary: architectures for baseband signal processing of wireless com...
Lecture summary: architectures for baseband signal processing of wireless com...
 

Más de Kobe Yu

Neural Network File Format for Inference Framework
Neural Network File Format for Inference FrameworkNeural Network File Format for Inference Framework
Neural Network File Format for Inference FrameworkKobe Yu
 
Tensorflow Lite and ARM Compute Library
Tensorflow Lite and ARM Compute LibraryTensorflow Lite and ARM Compute Library
Tensorflow Lite and ARM Compute LibraryKobe Yu
 
社群與企業之開源專案合作經驗分享:以阿龜微氣候天眼通為例
社群與企業之開源專案合作經驗分享:以阿龜微氣候天眼通為例社群與企業之開源專案合作經驗分享:以阿龜微氣候天眼通為例
社群與企業之開源專案合作經驗分享:以阿龜微氣候天眼通為例Kobe Yu
 
FarmHarvestBot 開源授權建議
FarmHarvestBot 開源授權建議FarmHarvestBot 開源授權建議
FarmHarvestBot 開源授權建議Kobe Yu
 
Agrino 應用於農業感測的開源專案
Agrino  應用於農業感測的開源專案Agrino  應用於農業感測的開源專案
Agrino 應用於農業感測的開源專案Kobe Yu
 
機器學習應用於蔬果辨識
機器學習應用於蔬果辨識機器學習應用於蔬果辨識
機器學習應用於蔬果辨識Kobe Yu
 

Más de Kobe Yu (6)

Neural Network File Format for Inference Framework
Neural Network File Format for Inference FrameworkNeural Network File Format for Inference Framework
Neural Network File Format for Inference Framework
 
Tensorflow Lite and ARM Compute Library
Tensorflow Lite and ARM Compute LibraryTensorflow Lite and ARM Compute Library
Tensorflow Lite and ARM Compute Library
 
社群與企業之開源專案合作經驗分享:以阿龜微氣候天眼通為例
社群與企業之開源專案合作經驗分享:以阿龜微氣候天眼通為例社群與企業之開源專案合作經驗分享:以阿龜微氣候天眼通為例
社群與企業之開源專案合作經驗分享:以阿龜微氣候天眼通為例
 
FarmHarvestBot 開源授權建議
FarmHarvestBot 開源授權建議FarmHarvestBot 開源授權建議
FarmHarvestBot 開源授權建議
 
Agrino 應用於農業感測的開源專案
Agrino  應用於農業感測的開源專案Agrino  應用於農業感測的開源專案
Agrino 應用於農業感測的開源專案
 
機器學習應用於蔬果辨識
機器學習應用於蔬果辨識機器學習應用於蔬果辨識
機器學習應用於蔬果辨識
 

Último

Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile EnvironmentVictorSzoltysek
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 

Último (20)

Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 

Halide - 2

  • 2. Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain Paris, Frédo Durand, Saman Amarasinghe. PLDI '13 Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation
  • 3. Recap Halide ● Separate descriptions of ‘algorithm’per output pixel and execution ‘schedule’
  • 4. Recap Halide ● Separate descriptions of ‘algorithm’per output pixel and execution ‘schedule’ ● Halide is aprogramming languageembedded in C++
  • 5. Recap Halide ● Separate descriptions of ‘algorithm’per output pixel and execution ‘schedule’ ● Halide is aprogramming languageembedded in C++ ● Compiler based on LLVM/Clang
  • 6. Recap Halide ● Separate descriptions of ‘algorithm’per output pixel and execution ‘schedule’ ● Halide is aprogramming languageembedded in C++ ● Compiler based on LLVM/Clang
  • 7.
  • 8.
  • 9. Parallelism distribute across threads and SIMD parallel vector
  • 10. Locality compute in tiles interleave tiles of blurx, blury store blurx in local cache
  • 11.
  • 13. Compiling Scheduled Pipelines 1. Lowering and Loop synthesis 2. Bounds Inference 3. Sliding Window Optimization and Storage Folding 4. Flattening 5. Vectorization and Unrolling 6. Back-end Code Generation
  • 14. Lowering ● Lowering process is synthesizes a single, complete set of loop nests and allocations, given a Halide pipeline and a fully-specified schedule.
  • 15. Lowering ● Lowering process is synthesizes a single, complete set of loop nests and allocations, given a Halide pipeline and a fully-specified schedule. ● Lowering begins from the function defining the out. ○ Lowering then proceeds recursively up the pipeline, from callers to callees(here, from out to blurx)
  • 16. Lowering ● Lowering process is synthesizes a single, complete set of loop nests and allocations, given a Halide pipeline and a fully-specified schedule. ● Lowering begins from the function defining the out. ○ Lowering then proceeds recursively up the pipeline, from callers to callees(here, from out to blurx) ● Each loop is labeled as being serial, parallel, unrolled, or vectorized, according to the schedule.
  • 17. Bounds Inference ● The next stage of lowering generates and injects appropriate definitions for these variables. Like function lowering, bounds inference proceeds recursively back from the output.
  • 18. Sliding Window Optimization and Storage Folding ● If a realization of a function is stored at higher loop level than its computation, with an intervening serial loop, then iterations of that loop can reuse values generated by previous iterations.
  • 19. Sliding Window Optimization and Storage Folding ● If a realization of a function is stored at higher loop level than its computation, with an intervening serial loop, then iterations of that loop can reuse values generated by previous iterations. ● This transformations that lets us trade off parallelism (because the intervening loop must be serial) for reuse (because we avoid recomputing values already computed by previous iterations.)
  • 20. Flattenting ● The compiler flattens multi-dimensional loads, stores and allocations into their single-dimensional equivalent.
  • 21. Flattenting ● The compiler flattens multi-dimensional loads, stores and allocations into their single-dimensional equivalent. ● By convention, we always set the stride of the innermost dimension to 1, to ensure we can perform dense vector loads and stores in that dimension.
  • 22. Vectorization and Unrolling ● Relace loops of constant size scheduled as vectorized or unrolled with transformed versions of their loop bodies.
  • 23. Vectorization and Unrolling ● Relace loops of constant size scheduled as vectorized or unrolled with transformed versions of their loop bodies. ● Unrolling replaces a loop of size n with n sequential statements performing each loop iteration in turn.
  • 24. Vectorization and Unrolling ● Relace loops of constant size scheduled as vectorized or unrolled with transformed versions of their loop bodies. ● Unrolling replaces a loop of size n with n sequential statements performing each loop iteration in turn. ● Vectorization completely replaces a loop of size n with a signle statement.
  • 25. Back-end Code Generation ● We first run a standard constant-folding and dead-code elimination pass on our IR. Which also performs symbolic simplification of common patterns produced by bounds inference.
  • 26. Back-end Code Generation ● We first run a standard constant-folding and dead-code elimination pass on our IR. Which also performs symbolic simplification of common patterns produced by bounds inference. ● There is mostly a one-to-one mapping between our representation and LLVM’s, but two specific patterns warrant mention. ○ First, parallel for loops are lowered to LLVM code that first builds a closure containing state referred to in the body of a for loop. ○ Second, many vector patterns are difficult to express or generate poor code if passed directly to LLVM. We use peephole optimization to retoute these to architecture-specific intrinsics.
  • 28. Autotuning ● The optimal schedule dependents on machine architecture, image dimensions and code generation in complex ways, and exhibits global dependencies between choices due to loop fusion and caching behavior.
  • 29. Autotuning ● The optimal schedule dependents on machine architecture, image dimensions and code generation in complex ways, and exhibits global dependencies between choices due to loop fusion and caching behavior. ● Use genetic algorithm to seek out a plausible approximate solution. ○ fixed population size(128 individuals per generation)
  • 30. Autotuning ● The optimal schedule dependents on machine architecture, image dimensions and code generation in complex ways, and exhibits global dependencies between choices due to loop fusion and caching behavior. ● Use genetic algorithm to seek out a plausible approximate solution. ○ fixed population size(128 individuals per generation) ● schedule patterns ○ compute and store under x, and vetorize by x ○ fully parallelized and tiled ○ parallelized over y and vectorized over x.
  • 31. Results ● Platform ○ CPU - quad core Xeon W3520 ○ GPU - NVIDIA Tesla C2070 ● Algorithm
  • 34. Q&A