SlideShare una empresa de Scribd logo
1 de 77
CILK/CILK++
AND
REDUCERS
YUNMING ZHANG
RICE UNIVERSITY
1
OUTLINE
• CILK and CILK++ Language Features and
Usages
• Work stealing runtime
• CILK++ Reducers
• Conclusions
2
IDEALIZED SHARED
MEMORY ARCHITECTURE
3
• Hardware model
• Processors
• Shared global
memory
• Software model
• Threads
• Shared variables
• Communication
• Synchronization
Slide from Comp 422 Rice University Lecture 4
CILK AND CILK++
DESIGN GOALS
• Programmer friendly
• Dynamic tasking
• Parallel extension to C
• Scalable performance
• Efficient runtime system
• Minimum program overhead
4
CILK KEYWORDS
• Cilk: a Cilk function
• Spawn: call can execute asynchronously
in a concurrent thread
• Sync: current thread waits for all locally-
spawned functions
5
CILK EXAMPLE
cilk int fib(n) {
if (n < 2)
return n;
else {
int n1, n2;
n1 = spawn fib(n-1);
n2 = spawn fib(n-2);
sync;
return (n1 + n2);
}
}
6
Borrowed from Comp 422 Rice University Lecture 4
CILK++ EXAMPLE
int fib(n) {
if (n < 2)
return n;
else {
int n1, n2;
n1 = cilk_spawn fib(n-1);
n2 = fib(n-2);
cilk_sync;
return (n1 + n2);
}
}
7
Borrowed from Comp 422 Rice University Lecture 4
CILK++ EXAMPLE
WITH DAG
8
Pictures from “Reducers and Other CILK+ HyperObjects”
Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel).
Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
OUTLINE
• CILK and CILK++ Language Features and
Usages
• Work stealing runtime
• CILK++ Reducers
• Conclusions
9
WORK FIRST
PRINCIPLE
• Work: T1
• Critical path length: T∞
• Number of processor: P
• Expected time
• Tp = T1/P + O(T∞)
• Parallel slackness assumption
• T1/P >> C∞T∞
10
WORK FIRST
PRINCIPLE
• Minimize scheduling overhead borne by
work at the expense of increasing critical
path
• Tp ≤ C1Ts/P + C∞T∞
≈ C1Ts/P
Minimize C1 even at the expense of a larger
C∞
11
WORK STEALING
DESIGN GOALS
• Minimizing contentions
• Decentralized task deque
• Doubly linked deque
• Minimizing communication
• Steal work rather than push work
• Load balance across cores
• Lazy task creation
• Steal from the top of the deque
12
CILK WORK STEALING
SCHEDULER
13
Pictures from “Reducers and Other CILK+ HyperObjects”
Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel).
Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
CILK WORK STEALING
SCHEDULER
14
Pictures from “Reducers and Other CILK+ HyperObjects”
Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel).
Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
CILK WORK STEALING
SCHEDULER
15
Pictures from “Reducers and Other CILK+ HyperObjects”
Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel).
Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
CILK WORK STEALING
SCHEDULER
16
Pictures from “Reducers and Other CILK+ HyperObjects”
Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel).
Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
CILK WORK STEALING
SCHEDULER
17
Pictures from “Reducers and Other CILK+ HyperObjects”
Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel).
Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
CILK WORK STEALING
SCHEDULER
18
Pictures from “Reducers and Other CILK+ HyperObjects”
Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel).
Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
CILK WORK STEALING
SCHEDULER
Pictures from “Reducers and Other CILK+ HyperObjects”
Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel).
Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
CILK WORK STEALING
SCHEDULER
Pictures from “Reducers and Other CILK+ HyperObjects”
Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel).
Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
CILK WORK STEALING
SCHEDULER
21
Pictures from “Reducers and Other CILK+ HyperObjects”
Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel).
Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
CILK WORK STEALING
SCHEDULER
22
Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo
Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
CILK WORK STEALING
SCHEDULER
23
Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo
Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
CILK WORK STEALING
SCHEDULER
24
Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo
Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
CILK WORK STEALING
SCHEDULER
25
Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo
Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
CILK WORK STEALING
SCHEDULER
26
Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo
Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
TWO CLONE
STRATEGY
• Fast clone
• Identical in most respects to the C elision of the Cilk
program
• Very little execution overhead
• Sync statements compile to no op
• Allocates an continuation
• Program variables and instruction pointer
• Slow clone
• Convert a spawn schedule to slow clone only when it
is stolen
• Restores program state from activation frame that
contains local variables, program counter and other
parts of the procedure instance
27
FAST CLONE
28
SLOW CLONE
Slow_fib(frame * _cilk_frame){
restore states of the program
switch (_cilk_frame->header.entry)
{
fast_fib(_cilk_frame->n - 1 );
case 1: goto _cilk_sync1;
fast_fib(_cilk_frame->n - 2 );
case 2: goto _cilk_sync2;
sync (not a no op)
case 3: goto _cilk_sync3;
}
}
29
EXTENDED DEQUE
WITH CALL STACKS
30
Stack frame
Full frame
Extended Deque
Call stack
FRAMES
• C++ Main Frame
• Local variables of the procedure instance
• Temporary variables
• Linkage information for return values
31
FRAMES
• CILK++ Stack Frame
• Everything in C++ Main Frame
• Continuation
• Parent pointer
• Have exactly one child
• Used by Fast Clone
• A worker can have multiple Stack Frames
32
FRAMES
• CILK++ Full Frame (used by slow clone)
• Everything in CILK++ Stack Frame
• Lock
• Join counter
• List of children (has more than one
children)
• A worker has at most one Full Frame
33
FUNCTION CALL
34
Stack frame
Full frame
Extended Deque (Before Function Call)Function call
Spawn
Call return
Spawn return
Sync
Randomly steal
Provably good
steal
Unconditionally
steal
Resume full
frame
FUNCTION CALL
35
Stack frame
Full frame
Extended Deque (After Function Call)Function call
Spawn
Call return
Spawn return
Sync
Randomly steal
Provably good
steal
Unconditionally
steal
Resume full
frame
New stack
frame
SPAWN
36
Stack frame
Full frame
Extended Deque (Before Spawn Call)Function call
Spawn
Call return
Spawn return
Sync
Randomly steal
Provably good
steal
Unconditionally
steal
Resume full
frame
SPAWN
37
Stack frame
Full frame
Extended Deque (After Spawn Call)Function call
Spawn
Call return
Spawn return
Sync
Randomly steal
Provably good
steal
Unconditionally
steal
Resume full
frame
Set
continuation
in last stack
frame
RESUME FULL FRAME
38
Stack frame
Full frame
Extended DequeFunction call
Spawn
Call return
Spawn return
Sync
Randomly steal
Provably good
steal
Unconditionally
steal
Resume full
frame
Set the full frame to be the only frame in the
call stack, resume execution on the
continuation
RANDOMLY STEAL
39
Stack frame
Full frame
Extended DequeFunction call
Spawn
Call return
Spawn return
Sync
Randomly steal
Provably good
steal
Unconditionally
steal
Resume full
frame
Steal this call stack
RANDOMLY STEAL
40
Stack frame
Full frame
Extended DequeFunction call
Spawn
Call return
Spawn return
Sync
Randomly steal
Provably good
steal
Unconditionally
steal
Resume full
frame
Steal this call stack
1 1 1
RANDOMLY STEAL
41
Stack frame
Full frame
Extended Deque
Function call
Spawn
Call return
Spawn return
Sync
Randomly steal
Provably good
steal
Unconditionally
steal
Resume full
frame
1
1 1
PROVABLY GOOD
STEAL
42
Stack frame
Full frame
Extended DequeFunction call
Spawn
Call return
Spawn return
Sync
Randomly steal
Provably good
steal
Unconditionally
steal
Resume full
frame
0
UNCONDITIONALLY
STEAL
43
Stack frame
Full frame
Extended DequeFunction call
Spawn
Call return
Spawn return
Sync
Randomly steal
Provably good
steal
Unconditionally
steal
Resume full
frame
2
FUNCTION CALL
RETURN
44
Stack frame
Full frame
Extended Deque (Before Return from a Call Case1)Function call
Spawn
Call return
Spawn return
Sync
Randomly steal
Provably good
steal
Unconditionally
steal
Resume full
frame
FUNCTION CALL
RETURN
45
Stack frame
Full frame
Extended Deque (Return from a Call Case 1)Function call
Spawn
Call return
Spawn return
Sync
Randomly steal
Provably good
steal
Unconditionally
steal
Resume full
frame
FUNCTION CALL
RETURN
46
Stack frame
Full frame
Extended Deque (Return from a Call Case2)Function call
Spawn
Call return
Spawn return
Sync
Randomly steal
Provably good
steal
Unconditionally
steal
Resume full
frame
Worker executes an
unconditional steal
SPAWN RETURN
47
Stack frame
Full frame
Extended Deque (Before Spawn return Case 1)Function call
Spawn
Call return
Spawn return
Sync
Randomly steal
Provably good
steal
Unconditionally
steal
Resume full
frame
SPAWN RETURN
48
Stack frame
Full frame
Extended Deque (After Spawn return Case 1)Function call
Spawn
Call return
Spawn return
Sync
Randomly steal
Provably good
steal
Unconditionally
steal
Resume full
frame
SPAWN RETURN
49
Stack frame
Full frame
Extended Deque (Return from a SpawnCase2)Function call
Spawn
Call return
Spawn return
Sync
Randomly steal
Provably good
steal
Unconditionally
steal
Resume full
frame
Worker executes an
provably good steal
SYNC
50
Stack frame
Full frame
Extended Deque (Sync Case 1)Function call
Spawn
Call return
Spawn return
Sync
Randomly steal
Provably good
steal
Unconditionally
steal
Resume full
frame
Do nothing if it
is a stack
frame (No Op)
SYNC
51
Stack frame
Full frame
Extended Deque (Sync Case 2)Function call
Spawn
Call return
Spawn return
Sync
Randomly steal
Provably good
steal
Unconditionally
steal
Resume full
frame
Pop the frame,
provably good steal
OUTLINE
• CILK and CILK++ Language Features and
Usages
• Work stealing runtime
• CILK++ Reducers
• Conclusions
52
PROBLEMS WITH
NON-LOCAL VARIABLES
bool has_property(Node *)
List<Node *> output_list;
void walk(Node *x)
{
if (x) {
if (has_property(x))
output_list.push_back(x);
cilk_spawn walk(x->left);
walk(x->right);
cilk_sync;
}
}
53
REDUCER
DESIGN GOALS
• Support parallelization of programs
containing global variables
• Enable efficient parallel scaling by
avoiding a single point of contention
• Provide deterministic result for
associative reduce operations
• Operate independently of any control
constructs
54
REDUCER EXAMPLE
bool has_property(Node *)
List_append_reducer<Node *> output_list;
void walk(Node *x)
{
if (x) {
if (has_property(x))
output_list.push_back(x);
cilk_spawn walk(x->left);
walk(x->right);
cilk_sync;
}
}
55
HYPER OBJECTS
56
Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo
Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
REDUCER
57
Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo
Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
SEMANTICS OF
REDUCERS
• The child strand owns the view owned by
parent function before cilk_spawn
• The parent strand owns a new view,
initialized to identity view e,
• A special optimization ensures that if a
view is unchanged when combined with
the identity view
• Parent strand P own the view from
completed child strands
58
REDUCING OVER LIST
CONCATENATION
59
Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo
Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
REDUCING OVER LIST
CONCATENATION
60
Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo
Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
IMPLEMENTATION OF
REDUCER
• Each worker maintains a hypermap
• Hypermap
• Maps reducers to the views
• User
• The view of the current procedure
• Children
• The view of the children procedures
• Right
• The view of right sibling
• Identity
• The default value of a view
61
UNDERSTANDING
HYPERMAPS
bool has_property(Node *)
List_append_reducer<Node *> output_list;
void walk(Node *x) ------------ Proc A
{
if (x) {
if (has_property(x))
output_list.push_back(x);
cilk_spawn walk(x->left); ---------proc B
cilk_spawn walk(x->right); -------- proc C
cilk_sync;
}
62
HYPERMAP CREATION
64
Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo
Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
HYPERMAP CREATION
65
Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo
Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
HYPERMAP CREATION
66
Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo
Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
HYPERMAP CREATION
67
Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo
Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
HYPERMAP CREATION
68
Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo
Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
LOOK UP FAILURE
• Inserts a view containing an identity
element for the reducer into the
hypermap.
• Following the lazy principle
• Look up returns the newly inserted
identity view
69
RANDOM WORK
STEALING
A random steal operation steals a full frame
P and replaces it with a new full frame C in
the victim.
USERC ← USERP;
U S E R P ← 0/ ;
CHILDRENP←0/;
RIGHTP←0/.
70
RANDOM WORK
STEALING
71
Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo
Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
RETURN FROM A CALL
Let C be a child frame of the parent frame P
that originally called C, and suppose that C
returns.
• If C is a stack frame, do nothing,
• If C is a full frame.
• Transfer ownership of view
• Children and Right are empty
• USERP ← USERC
77
RETURN FROM A
SPAWN
Let C be a child frame of the parent frame P that
originally spawned C, and suppose that C returns.
• Always do USERC ← REDUCE(USERC,RIGHTC)
• If C is a stack frame, do nothing
• If C is a full frame
• If C has siblings,
• RIGHTL ← REDUCE(RIGHTL,USERC)
• C is the leftmost child
• CHILDRENP ←
REDUCE(CHILDRENP,USERC)
78
SYNC
A cilk_sync statement waits until all children have com-
pleted. When frame P executes a cilk_sync, one of following
two cases applies:
• If P is a stack frame, do nothing.
• If P is a full frame,
• USERP ← REDUCE(CHILDRENP,USERP).
82
BENEFITS OF
REDUCERS
83
OUTLINE
• CILK and CILK++ Language Features and
Usages
• Work stealing runtime
• CILK++ Reducers
• Conclusions
84
CONCLUSIONS
• CILK and CILK++ provide a programmer
friendly programming model
• Extension to C
• Incremental parallelism
• Scaling on future machines
• Non-compromising performance
• Work stealing runtime
• Minimizing overheads
• Reducers
85
FINAL NOTES
• Designed for an idealized shared memory
model
• Today’s architectures are typically NUMA
• Task creation can be lazier
• http://ieeexplore.ieee.org/xpls/abs_all.jsp?
arnumber=6012915&tag=1
• Cilk_for
• Divide and conquer parallelization
86

Más contenido relacionado

La actualidad más candente

Return oriented programming
Return oriented programmingReturn oriented programming
Return oriented programminghybr1s
 
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5Jeff Larkin
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Universität Rostock
 
A peek on numerical programming in perl and python e christopher dyken 2005
A peek on numerical programming in perl and python  e christopher dyken  2005A peek on numerical programming in perl and python  e christopher dyken  2005
A peek on numerical programming in perl and python e christopher dyken 2005Jules Krdenas
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in Rherbps10
 
Ieee2013_upgrading_knowledge_matlab_pt2
Ieee2013_upgrading_knowledge_matlab_pt2Ieee2013_upgrading_knowledge_matlab_pt2
Ieee2013_upgrading_knowledge_matlab_pt2Georgios Drakopoulos
 
Hardware Description Beyond Register-Transfer Level (RTL) Languages
Hardware Description Beyond Register-Transfer Level (RTL) LanguagesHardware Description Beyond Register-Transfer Level (RTL) Languages
Hardware Description Beyond Register-Transfer Level (RTL) LanguagesLEGATO project
 
GTC16 - S6510 - Targeting GPUs with OpenMP 4.5
GTC16 - S6510 - Targeting GPUs with OpenMP 4.5GTC16 - S6510 - Targeting GPUs with OpenMP 4.5
GTC16 - S6510 - Targeting GPUs with OpenMP 4.5Jeff Larkin
 
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
Runtime Code Generation and Data Management for Heterogeneous Computing in JavaRuntime Code Generation and Data Management for Heterogeneous Computing in Java
Runtime Code Generation and Data Management for Heterogeneous Computing in JavaJuan Fumero
 
Recursion & Erlang, FunctionalConf 14, Bangalore
Recursion & Erlang, FunctionalConf 14, BangaloreRecursion & Erlang, FunctionalConf 14, Bangalore
Recursion & Erlang, FunctionalConf 14, BangaloreBhasker Kode
 
Python Basis Tutorial
Python Basis TutorialPython Basis Tutorial
Python Basis Tutorialmd sathees
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerMarina Kolpakova
 
GEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions FrameworkGEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions FrameworkAlexey Smirnov
 
How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)Sławomir Zborowski
 
Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Pragmatic Optimization in Modern Programming - Mastering Compiler OptimizationsPragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Pragmatic Optimization in Modern Programming - Mastering Compiler OptimizationsMarina Kolpakova
 
SC13: OpenMP and NVIDIA
SC13: OpenMP and NVIDIASC13: OpenMP and NVIDIA
SC13: OpenMP and NVIDIAJeff Larkin
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep LearningSungjoon Choi
 
Socket Programming- Data Link Access
Socket Programming- Data Link AccessSocket Programming- Data Link Access
Socket Programming- Data Link AccessLJ PROJECTS
 
Detecting Deadlock, Double-Free and Other Abuses in a Million Lines of Linux ...
Detecting Deadlock, Double-Free and Other Abuses in a Million Lines of Linux ...Detecting Deadlock, Double-Free and Other Abuses in a Million Lines of Linux ...
Detecting Deadlock, Double-Free and Other Abuses in a Million Lines of Linux ...Peter Breuer
 

La actualidad más candente (20)

Return oriented programming
Return oriented programmingReturn oriented programming
Return oriented programming
 
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
 
Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...Inside LoLA - Experiences from building a state space tool for place transiti...
Inside LoLA - Experiences from building a state space tool for place transiti...
 
A peek on numerical programming in perl and python e christopher dyken 2005
A peek on numerical programming in perl and python  e christopher dyken  2005A peek on numerical programming in perl and python  e christopher dyken  2005
A peek on numerical programming in perl and python e christopher dyken 2005
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in R
 
Ieee2013_upgrading_knowledge_matlab_pt2
Ieee2013_upgrading_knowledge_matlab_pt2Ieee2013_upgrading_knowledge_matlab_pt2
Ieee2013_upgrading_knowledge_matlab_pt2
 
Hardware Description Beyond Register-Transfer Level (RTL) Languages
Hardware Description Beyond Register-Transfer Level (RTL) LanguagesHardware Description Beyond Register-Transfer Level (RTL) Languages
Hardware Description Beyond Register-Transfer Level (RTL) Languages
 
GTC16 - S6510 - Targeting GPUs with OpenMP 4.5
GTC16 - S6510 - Targeting GPUs with OpenMP 4.5GTC16 - S6510 - Targeting GPUs with OpenMP 4.5
GTC16 - S6510 - Targeting GPUs with OpenMP 4.5
 
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
Runtime Code Generation and Data Management for Heterogeneous Computing in JavaRuntime Code Generation and Data Management for Heterogeneous Computing in Java
Runtime Code Generation and Data Management for Heterogeneous Computing in Java
 
Recursion & Erlang, FunctionalConf 14, Bangalore
Recursion & Erlang, FunctionalConf 14, BangaloreRecursion & Erlang, FunctionalConf 14, Bangalore
Recursion & Erlang, FunctionalConf 14, Bangalore
 
Python Basis Tutorial
Python Basis TutorialPython Basis Tutorial
Python Basis Tutorial
 
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the CompilerPragmatic Optimization in Modern Programming - Demystifying the Compiler
Pragmatic Optimization in Modern Programming - Demystifying the Compiler
 
effective_r27
effective_r27effective_r27
effective_r27
 
GEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions FrameworkGEM - GNU C Compiler Extensions Framework
GEM - GNU C Compiler Extensions Framework
 
How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)How it's made: C++ compilers (GCC)
How it's made: C++ compilers (GCC)
 
Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Pragmatic Optimization in Modern Programming - Mastering Compiler OptimizationsPragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
Pragmatic Optimization in Modern Programming - Mastering Compiler Optimizations
 
SC13: OpenMP and NVIDIA
SC13: OpenMP and NVIDIASC13: OpenMP and NVIDIA
SC13: OpenMP and NVIDIA
 
Object Detection Methods using Deep Learning
Object Detection Methods using Deep LearningObject Detection Methods using Deep Learning
Object Detection Methods using Deep Learning
 
Socket Programming- Data Link Access
Socket Programming- Data Link AccessSocket Programming- Data Link Access
Socket Programming- Data Link Access
 
Detecting Deadlock, Double-Free and Other Abuses in a Million Lines of Linux ...
Detecting Deadlock, Double-Free and Other Abuses in a Million Lines of Linux ...Detecting Deadlock, Double-Free and Other Abuses in a Million Lines of Linux ...
Detecting Deadlock, Double-Free and Other Abuses in a Million Lines of Linux ...
 

Similar a CILK/CILK++ and Reducers

Make static instrumentation great again, High performance fuzzing for Windows...
Make static instrumentation great again, High performance fuzzing for Windows...Make static instrumentation great again, High performance fuzzing for Windows...
Make static instrumentation great again, High performance fuzzing for Windows...Lucas Leong
 
Power Up Your Build - Omer van Kloeten @ Wix 2018-04
Power Up Your Build - Omer van Kloeten @ Wix 2018-04Power Up Your Build - Omer van Kloeten @ Wix 2018-04
Power Up Your Build - Omer van Kloeten @ Wix 2018-04Omer van Kloeten
 
Scale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataScale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataTravis Oliphant
 
Continuous Kernel Integration
Continuous Kernel IntegrationContinuous Kernel Integration
Continuous Kernel IntegrationMajor Hayden
 
Linker and loader upload
Linker and loader   uploadLinker and loader   upload
Linker and loader uploadBin Yang
 
Clean Code for East Bay .NET User Group
Clean Code for East Bay .NET User GroupClean Code for East Bay .NET User Group
Clean Code for East Bay .NET User GroupTheo Jungeblut
 
(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel Architectures(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel ArchitecturesJoel Falcou
 
EKON 24 ML_community_edition
EKON 24 ML_community_editionEKON 24 ML_community_edition
EKON 24 ML_community_editionMax Kleiner
 
Clean Code at Silicon Valley Code Camp 2011 (02/17/2012)
Clean Code at Silicon Valley Code Camp 2011 (02/17/2012)Clean Code at Silicon Valley Code Camp 2011 (02/17/2012)
Clean Code at Silicon Valley Code Camp 2011 (02/17/2012)Theo Jungeblut
 
The joy of functional programming
The joy of functional programmingThe joy of functional programming
The joy of functional programmingSteve Zhang
 
BlueHat Seattle 2019 || Modern Binary Analysis with ILs
BlueHat Seattle 2019 || Modern Binary Analysis with ILsBlueHat Seattle 2019 || Modern Binary Analysis with ILs
BlueHat Seattle 2019 || Modern Binary Analysis with ILsBlueHat Security Conference
 
Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016
Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016 Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016
Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016 Alexander Lisachenko
 
Implementation and Comparison of Softcore Multiplier Architectures for FPGAs
Implementation and Comparison of Softcore Multiplier Architectures for FPGAsImplementation and Comparison of Softcore Multiplier Architectures for FPGAs
Implementation and Comparison of Softcore Multiplier Architectures for FPGAsShahid Abbas
 
What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...
What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...
What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...HostedbyConfluent
 
Intro. to static analysis
Intro. to static analysisIntro. to static analysis
Intro. to static analysisChong-Kuan Chen
 
JIT vs. AOT: Unity And Conflict of Dynamic and Static Compilers
JIT vs. AOT: Unity And Conflict of Dynamic and Static Compilers JIT vs. AOT: Unity And Conflict of Dynamic and Static Compilers
JIT vs. AOT: Unity And Conflict of Dynamic and Static Compilers Nikita Lipsky
 
Concepts of Functional Programming for Java Brains (2010)
Concepts of Functional Programming for Java Brains (2010)Concepts of Functional Programming for Java Brains (2010)
Concepts of Functional Programming for Java Brains (2010)Peter Kofler
 
Seattle Cassandra Users: An OSS Java Abstraction Layer for Cassandra
Seattle Cassandra Users: An OSS Java Abstraction Layer for CassandraSeattle Cassandra Users: An OSS Java Abstraction Layer for Cassandra
Seattle Cassandra Users: An OSS Java Abstraction Layer for CassandraJosh Turner
 

Similar a CILK/CILK++ and Reducers (20)

Make static instrumentation great again, High performance fuzzing for Windows...
Make static instrumentation great again, High performance fuzzing for Windows...Make static instrumentation great again, High performance fuzzing for Windows...
Make static instrumentation great again, High performance fuzzing for Windows...
 
Power Up Your Build - Omer van Kloeten @ Wix 2018-04
Power Up Your Build - Omer van Kloeten @ Wix 2018-04Power Up Your Build - Omer van Kloeten @ Wix 2018-04
Power Up Your Build - Omer van Kloeten @ Wix 2018-04
 
Scale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyDataScale up and Scale Out Anaconda and PyData
Scale up and Scale Out Anaconda and PyData
 
Continuous Kernel Integration
Continuous Kernel IntegrationContinuous Kernel Integration
Continuous Kernel Integration
 
Linker and loader upload
Linker and loader   uploadLinker and loader   upload
Linker and loader upload
 
Clean Code for East Bay .NET User Group
Clean Code for East Bay .NET User GroupClean Code for East Bay .NET User Group
Clean Code for East Bay .NET User Group
 
(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel Architectures(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel Architectures
 
EKON 24 ML_community_edition
EKON 24 ML_community_editionEKON 24 ML_community_edition
EKON 24 ML_community_edition
 
Clean Code at Silicon Valley Code Camp 2011 (02/17/2012)
Clean Code at Silicon Valley Code Camp 2011 (02/17/2012)Clean Code at Silicon Valley Code Camp 2011 (02/17/2012)
Clean Code at Silicon Valley Code Camp 2011 (02/17/2012)
 
The joy of functional programming
The joy of functional programmingThe joy of functional programming
The joy of functional programming
 
Ruby Under The Hood
Ruby Under The HoodRuby Under The Hood
Ruby Under The Hood
 
BlueHat Seattle 2019 || Modern Binary Analysis with ILs
BlueHat Seattle 2019 || Modern Binary Analysis with ILsBlueHat Seattle 2019 || Modern Binary Analysis with ILs
BlueHat Seattle 2019 || Modern Binary Analysis with ILs
 
Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016
Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016 Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016
Solving Cross-Cutting Concerns in PHP - DutchPHP Conference 2016
 
Implementation and Comparison of Softcore Multiplier Architectures for FPGAs
Implementation and Comparison of Softcore Multiplier Architectures for FPGAsImplementation and Comparison of Softcore Multiplier Architectures for FPGAs
Implementation and Comparison of Softcore Multiplier Architectures for FPGAs
 
STL Algorithms In Action
STL Algorithms In ActionSTL Algorithms In Action
STL Algorithms In Action
 
What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...
What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...
What’s Slowing Down Your Kafka Pipeline? With Ruizhe Cheng and Pete Stevenson...
 
Intro. to static analysis
Intro. to static analysisIntro. to static analysis
Intro. to static analysis
 
JIT vs. AOT: Unity And Conflict of Dynamic and Static Compilers
JIT vs. AOT: Unity And Conflict of Dynamic and Static Compilers JIT vs. AOT: Unity And Conflict of Dynamic and Static Compilers
JIT vs. AOT: Unity And Conflict of Dynamic and Static Compilers
 
Concepts of Functional Programming for Java Brains (2010)
Concepts of Functional Programming for Java Brains (2010)Concepts of Functional Programming for Java Brains (2010)
Concepts of Functional Programming for Java Brains (2010)
 
Seattle Cassandra Users: An OSS Java Abstraction Layer for Cassandra
Seattle Cassandra Users: An OSS Java Abstraction Layer for CassandraSeattle Cassandra Users: An OSS Java Abstraction Layer for Cassandra
Seattle Cassandra Users: An OSS Java Abstraction Layer for Cassandra
 

Último

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Último (20)

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

CILK/CILK++ and Reducers

  • 2. OUTLINE • CILK and CILK++ Language Features and Usages • Work stealing runtime • CILK++ Reducers • Conclusions 2
  • 3. IDEALIZED SHARED MEMORY ARCHITECTURE 3 • Hardware model • Processors • Shared global memory • Software model • Threads • Shared variables • Communication • Synchronization Slide from Comp 422 Rice University Lecture 4
  • 4. CILK AND CILK++ DESIGN GOALS • Programmer friendly • Dynamic tasking • Parallel extension to C • Scalable performance • Efficient runtime system • Minimum program overhead 4
  • 5. CILK KEYWORDS • Cilk: a Cilk function • Spawn: call can execute asynchronously in a concurrent thread • Sync: current thread waits for all locally- spawned functions 5
  • 6. CILK EXAMPLE cilk int fib(n) { if (n < 2) return n; else { int n1, n2; n1 = spawn fib(n-1); n2 = spawn fib(n-2); sync; return (n1 + n2); } } 6 Borrowed from Comp 422 Rice University Lecture 4
  • 7. CILK++ EXAMPLE int fib(n) { if (n < 2) return n; else { int n1, n2; n1 = cilk_spawn fib(n-1); n2 = fib(n-2); cilk_sync; return (n1 + n2); } } 7 Borrowed from Comp 422 Rice University Lecture 4
  • 8. CILK++ EXAMPLE WITH DAG 8 Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 9. OUTLINE • CILK and CILK++ Language Features and Usages • Work stealing runtime • CILK++ Reducers • Conclusions 9
  • 10. WORK FIRST PRINCIPLE • Work: T1 • Critical path length: T∞ • Number of processor: P • Expected time • Tp = T1/P + O(T∞) • Parallel slackness assumption • T1/P >> C∞T∞ 10
  • 11. WORK FIRST PRINCIPLE • Minimize scheduling overhead borne by work at the expense of increasing critical path • Tp ≤ C1Ts/P + C∞T∞ ≈ C1Ts/P Minimize C1 even at the expense of a larger C∞ 11
  • 12. WORK STEALING DESIGN GOALS • Minimizing contentions • Decentralized task deque • Doubly linked deque • Minimizing communication • Steal work rather than push work • Load balance across cores • Lazy task creation • Steal from the top of the deque 12
  • 13. CILK WORK STEALING SCHEDULER 13 Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 14. CILK WORK STEALING SCHEDULER 14 Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 15. CILK WORK STEALING SCHEDULER 15 Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 16. CILK WORK STEALING SCHEDULER 16 Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 17. CILK WORK STEALING SCHEDULER 17 Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 18. CILK WORK STEALING SCHEDULER 18 Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 19. CILK WORK STEALING SCHEDULER Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 20. CILK WORK STEALING SCHEDULER Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 21. CILK WORK STEALING SCHEDULER 21 Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 22. CILK WORK STEALING SCHEDULER 22 Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 23. CILK WORK STEALING SCHEDULER 23 Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 24. CILK WORK STEALING SCHEDULER 24 Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 25. CILK WORK STEALING SCHEDULER 25 Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 26. CILK WORK STEALING SCHEDULER 26 Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 27. TWO CLONE STRATEGY • Fast clone • Identical in most respects to the C elision of the Cilk program • Very little execution overhead • Sync statements compile to no op • Allocates an continuation • Program variables and instruction pointer • Slow clone • Convert a spawn schedule to slow clone only when it is stolen • Restores program state from activation frame that contains local variables, program counter and other parts of the procedure instance 27
  • 29. SLOW CLONE Slow_fib(frame * _cilk_frame){ restore states of the program switch (_cilk_frame->header.entry) { fast_fib(_cilk_frame->n - 1 ); case 1: goto _cilk_sync1; fast_fib(_cilk_frame->n - 2 ); case 2: goto _cilk_sync2; sync (not a no op) case 3: goto _cilk_sync3; } } 29
  • 30. EXTENDED DEQUE WITH CALL STACKS 30 Stack frame Full frame Extended Deque Call stack
  • 31. FRAMES • C++ Main Frame • Local variables of the procedure instance • Temporary variables • Linkage information for return values 31
  • 32. FRAMES • CILK++ Stack Frame • Everything in C++ Main Frame • Continuation • Parent pointer • Have exactly one child • Used by Fast Clone • A worker can have multiple Stack Frames 32
  • 33. FRAMES • CILK++ Full Frame (used by slow clone) • Everything in CILK++ Stack Frame • Lock • Join counter • List of children (has more than one children) • A worker has at most one Full Frame 33
  • 34. FUNCTION CALL 34 Stack frame Full frame Extended Deque (Before Function Call)Function call Spawn Call return Spawn return Sync Randomly steal Provably good steal Unconditionally steal Resume full frame
  • 35. FUNCTION CALL 35 Stack frame Full frame Extended Deque (After Function Call)Function call Spawn Call return Spawn return Sync Randomly steal Provably good steal Unconditionally steal Resume full frame New stack frame
  • 36. SPAWN 36 Stack frame Full frame Extended Deque (Before Spawn Call)Function call Spawn Call return Spawn return Sync Randomly steal Provably good steal Unconditionally steal Resume full frame
  • 37. SPAWN 37 Stack frame Full frame Extended Deque (After Spawn Call)Function call Spawn Call return Spawn return Sync Randomly steal Provably good steal Unconditionally steal Resume full frame Set continuation in last stack frame
  • 38. RESUME FULL FRAME 38 Stack frame Full frame Extended DequeFunction call Spawn Call return Spawn return Sync Randomly steal Provably good steal Unconditionally steal Resume full frame Set the full frame to be the only frame in the call stack, resume execution on the continuation
  • 39. RANDOMLY STEAL 39 Stack frame Full frame Extended DequeFunction call Spawn Call return Spawn return Sync Randomly steal Provably good steal Unconditionally steal Resume full frame Steal this call stack
  • 40. RANDOMLY STEAL 40 Stack frame Full frame Extended DequeFunction call Spawn Call return Spawn return Sync Randomly steal Provably good steal Unconditionally steal Resume full frame Steal this call stack 1 1 1
  • 41. RANDOMLY STEAL 41 Stack frame Full frame Extended Deque Function call Spawn Call return Spawn return Sync Randomly steal Provably good steal Unconditionally steal Resume full frame 1 1 1
  • 42. PROVABLY GOOD STEAL 42 Stack frame Full frame Extended DequeFunction call Spawn Call return Spawn return Sync Randomly steal Provably good steal Unconditionally steal Resume full frame 0
  • 43. UNCONDITIONALLY STEAL 43 Stack frame Full frame Extended DequeFunction call Spawn Call return Spawn return Sync Randomly steal Provably good steal Unconditionally steal Resume full frame 2
  • 44. FUNCTION CALL RETURN 44 Stack frame Full frame Extended Deque (Before Return from a Call Case1)Function call Spawn Call return Spawn return Sync Randomly steal Provably good steal Unconditionally steal Resume full frame
  • 45. FUNCTION CALL RETURN 45 Stack frame Full frame Extended Deque (Return from a Call Case 1)Function call Spawn Call return Spawn return Sync Randomly steal Provably good steal Unconditionally steal Resume full frame
  • 46. FUNCTION CALL RETURN 46 Stack frame Full frame Extended Deque (Return from a Call Case2)Function call Spawn Call return Spawn return Sync Randomly steal Provably good steal Unconditionally steal Resume full frame Worker executes an unconditional steal
  • 47. SPAWN RETURN 47 Stack frame Full frame Extended Deque (Before Spawn return Case 1)Function call Spawn Call return Spawn return Sync Randomly steal Provably good steal Unconditionally steal Resume full frame
  • 48. SPAWN RETURN 48 Stack frame Full frame Extended Deque (After Spawn return Case 1)Function call Spawn Call return Spawn return Sync Randomly steal Provably good steal Unconditionally steal Resume full frame
  • 49. SPAWN RETURN 49 Stack frame Full frame Extended Deque (Return from a SpawnCase2)Function call Spawn Call return Spawn return Sync Randomly steal Provably good steal Unconditionally steal Resume full frame Worker executes an provably good steal
  • 50. SYNC 50 Stack frame Full frame Extended Deque (Sync Case 1)Function call Spawn Call return Spawn return Sync Randomly steal Provably good steal Unconditionally steal Resume full frame Do nothing if it is a stack frame (No Op)
  • 51. SYNC 51 Stack frame Full frame Extended Deque (Sync Case 2)Function call Spawn Call return Spawn return Sync Randomly steal Provably good steal Unconditionally steal Resume full frame Pop the frame, provably good steal
  • 52. OUTLINE • CILK and CILK++ Language Features and Usages • Work stealing runtime • CILK++ Reducers • Conclusions 52
  • 53. PROBLEMS WITH NON-LOCAL VARIABLES bool has_property(Node *) List<Node *> output_list; void walk(Node *x) { if (x) { if (has_property(x)) output_list.push_back(x); cilk_spawn walk(x->left); walk(x->right); cilk_sync; } } 53
  • 54. REDUCER DESIGN GOALS • Support parallelization of programs containing global variables • Enable efficient parallel scaling by avoiding a single point of contention • Provide deterministic result for associative reduce operations • Operate independently of any control constructs 54
  • 55. REDUCER EXAMPLE bool has_property(Node *) List_append_reducer<Node *> output_list; void walk(Node *x) { if (x) { if (has_property(x)) output_list.push_back(x); cilk_spawn walk(x->left); walk(x->right); cilk_sync; } } 55
  • 56. HYPER OBJECTS 56 Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 57. REDUCER 57 Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 58. SEMANTICS OF REDUCERS • The child strand owns the view owned by parent function before cilk_spawn • The parent strand owns a new view, initialized to identity view e, • A special optimization ensures that if a view is unchanged when combined with the identity view • Parent strand P own the view from completed child strands 58
  • 59. REDUCING OVER LIST CONCATENATION 59 Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 60. REDUCING OVER LIST CONCATENATION 60 Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 61. IMPLEMENTATION OF REDUCER • Each worker maintains a hypermap • Hypermap • Maps reducers to the views • User • The view of the current procedure • Children • The view of the children procedures • Right • The view of right sibling • Identity • The default value of a view 61
  • 62. UNDERSTANDING HYPERMAPS bool has_property(Node *) List_append_reducer<Node *> output_list; void walk(Node *x) ------------ Proc A { if (x) { if (has_property(x)) output_list.push_back(x); cilk_spawn walk(x->left); ---------proc B cilk_spawn walk(x->right); -------- proc C cilk_sync; } 62
  • 63. HYPERMAP CREATION 64 Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 64. HYPERMAP CREATION 65 Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 65. HYPERMAP CREATION 66 Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 66. HYPERMAP CREATION 67 Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 67. HYPERMAP CREATION 68 Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 68. LOOK UP FAILURE • Inserts a view containing an identity element for the reducer into the hypermap. • Following the lazy principle • Look up returns the newly inserted identity view 69
  • 69. RANDOM WORK STEALING A random steal operation steals a full frame P and replaces it with a new full frame C in the victim. USERC ← USERP; U S E R P ← 0/ ; CHILDRENP←0/; RIGHTP←0/. 70
  • 70. RANDOM WORK STEALING 71 Pictures from “Reducers and Other CILK+ HyperObjects” Talk by Matteo Frigo (Intel). Pablo Halpern ( Intel). Charles E. Leiserson (MIT). Stephen Lewin-Berlin (Intel).
  • 71. RETURN FROM A CALL Let C be a child frame of the parent frame P that originally called C, and suppose that C returns. • If C is a stack frame, do nothing, • If C is a full frame. • Transfer ownership of view • Children and Right are empty • USERP ← USERC 77
  • 72. RETURN FROM A SPAWN Let C be a child frame of the parent frame P that originally spawned C, and suppose that C returns. • Always do USERC ← REDUCE(USERC,RIGHTC) • If C is a stack frame, do nothing • If C is a full frame • If C has siblings, • RIGHTL ← REDUCE(RIGHTL,USERC) • C is the leftmost child • CHILDRENP ← REDUCE(CHILDRENP,USERC) 78
  • 73. SYNC A cilk_sync statement waits until all children have com- pleted. When frame P executes a cilk_sync, one of following two cases applies: • If P is a stack frame, do nothing. • If P is a full frame, • USERP ← REDUCE(CHILDRENP,USERP). 82
  • 75. OUTLINE • CILK and CILK++ Language Features and Usages • Work stealing runtime • CILK++ Reducers • Conclusions 84
  • 76. CONCLUSIONS • CILK and CILK++ provide a programmer friendly programming model • Extension to C • Incremental parallelism • Scaling on future machines • Non-compromising performance • Work stealing runtime • Minimizing overheads • Reducers 85
  • 77. FINAL NOTES • Designed for an idealized shared memory model • Today’s architectures are typically NUMA • Task creation can be lazier • http://ieeexplore.ieee.org/xpls/abs_all.jsp? arnumber=6012915&tag=1 • Cilk_for • Divide and conquer parallelization 86

Notas del editor

  1. CILK and CILK++ adopt the shared memory model, No uniform address, sockets, abstraction
  2. If you have taken Comp 322, Spawn is very similar to the “async” keyword in Habanero Java, the Sync keyword is similar to the “finish” scopeCilk++ extends C++??
  3. An example of thefibonacci sequence computation in cilk, Spawn two threads at each invocation of the function, notice the cilk keyword is used to denote a cilk function,
  4. Cilk++ took away the cilk keyword, prefixed cilk_ to spawn and sync
  5. Directed acyclic graphSpawn creates parallel executions, B and C, they join together and recombine to execute D
  6. Work:The time needed to execute the program serialyParallel slackness assumption: number of processors is much smaller than average degree of parallelism
  7. To support dynamic task creation
  8. The Cilk runtime uses a specialworkstealing scheduler, There are two kinds of schedulers, worksharing, where all the workers steal from a unified task queue, it is less efficient for a number of reasons, There is a single lock potentially on the task queue to deal with contentionsThe queue could be empty, but there are still work leftThe workstealing runtime solvees the problem by building an extended deque for each worker, when a worker is out of work, it steals randomly from other workersWe will demonstrate the process in the next few slidesDecentralized Push work rather than pull work (when necessary)Loop contians a spawn, package child task, stack, single processor 9LAZY TASK CREATION
  9. Steal from the top to reduce contentionSteal from the top to get bigger subtree (divide and conquer), larger task granularity, minimize stealsSteal from the top increase possible locality of the program (cache locality
  10. The reason
  11. All sync statements compile to no-ops because a fast clone never has any children when it is executing, we know at compile time that all previously spawned procedures have completed. Thus, no operations are required for a sync statementBefore it recursively spawns,
  12. Looks a lot like orginal fib (highlight the original sequential code), the rest is bookeepingLittle bit bookeeping, Sig is the signature , included the pointer to the slow clone rountine, fibsig represents the slow cloneEntry point, instruction pointerComes back to the principle we described earlier
  13. Uses fast_fib locally
  14. Set continuation in original proc’s stack frameAllocates a stack frame for BPushes B’s stack frame to the tail of deque
  15. Pick a random victim v, where v ̸= w. Repeat this step while the deque of v is empty. Remove the oldest call stack from the deque of v, and pro- mote all stack frames to full frames. For every promoted frame, increment the join counter of the parent frame (full by Invariant 3). Make every newly created child the right- most child of its parent. Let loot be the youngest frame that was stolen. Promote the oldest frame now in v’s extended deque to a full frame and make it the rightmost child of loot. Increment loot’s join counter. Execute a resume-full-frame action on loot.
  16. Pick a random victim v, where v ̸= w. Repeat this step while the deque of v is empty. Remove the oldest call stack from the deque of v, and pro- mote all stack frames to full frames. For every promoted frame, increment the join counter of the parent frame (full by Invariant 3). Make every newly created child the right- most child of its parent. Let loot be the youngest frame that was stolen. Promote the oldest frame now in v’s extended deque to a full frame and make it the rightmost child of loot. Increment loot’s join counter. Execute a resume-full-frame action on loot.
  17. Pick a random victim v, where v ̸= w. Repeat this step while the deque of v is empty. Remove the oldest call stack from the deque of v, and pro- mote all stack frames to full frames. For every promoted frame, increment the join counter of the parent frame (full by Invariant 3). Make every newly created child the right- most child of its parent. Let loot be the youngest frame that was stolen. Promote the oldest frame now in v’s extended deque to a full frame and make it the rightmost child of loot. Increment loot’s join counter. Execute a resume-full-frame action on loot.
  18. Joint counter, frames left in heap, (0)Assert that the frame A begin stolen is a full frame and the extended deque is empty. Decrement the join counter of A. If the join counter is 0 and no worker is working on A, execute a resume-full-frame action on A. Otherwise, begin random work stealing.3
  19. Assert that the frame A being stolen is a full frame, the extended deque is empty, and A’s join counter is positive. Decrement the join counter of A. Execute a resume-full- frame action on A.
  20. Set continuation in original proc’s stack frameAllocates a stack frame for BPushes B’s stack frame to the tail of deque
  21. Just removing a stack frame
  22. This case the full frame has finished execution
  23. Set continuation in original proc’s stack frameAllocates a stack frame for BPushes B’s stack frame to the tail of deque
  24. Set continuation in original proc’s stack frameAllocates a stack frame for BPushes B’s stack frame to the tail of deque
  25. This case the full frame has finished execution
  26. Do nothing if it is a stack frame
  27. Do nothing if it is a stack frame
  28. Little modificationsDeterministic output even in the presence of output (associative)
  29. Can be used to parallelize many programs containing global (or nonlocal) variables without locking, atomic updating, or the need to logically restructure the codeThe programmer can count on a deterministic result as long as the reducer operator is associative. Commutability is not requiredReducers opeerateindependenly of any control constructs, such as parallel for, and of any data structures that contribute their values to the final result
  30. Little modificationsDeterministic output even in the presence of output (associative)
  31. Fast clone uses identity view
  32. Example of serial execution
  33. Children of A would be {B, C}Right Sibling of B would be CUser would be view in A,
  34. We distinguish two cases: the “fast path” when C is a stack frame, and the “slow path” when C is a full framebecause both P and C share the view stored in the map at the head of the deque to which both P and C belong. which transfers ownership of child views to the parent. The other two hypermaps of C are guaranteed to be empty and do not participate in the update
  35. Set continuation in original proc’s stack frameAllocates a stack frame for BPushes B’s stack frame to the tail of deque
  36. Just removing a stack frame
  37. We distinguish two cases: the “fast path” when C is a stack frame, and the “slow path” when C is a full framebecause both P and C share the view stored in the map at the head of the deque to which both P and C belong. which transfers ownership of child views to the parent. The other two hypermaps of C are guaranteed to be empty and do not participate in the update
  38. This case the full frame has finished execution
  39. We distinguish two cases: the “fast path” when C is a stack frame, and the “slow path” when C is a full framebecause both P and C share the view stored in the map at the head of the deque to which both P and C belong. which transfers ownership of child views to the parent. The other two hypermaps of C are guaranteed to be empty and do not participate in the update
  40. Again we distinguish the “fast path” when C is a stack frame from the “slow path” when C is a full frame:
  41. If proc B finishes first,
  42. If proc B finishes first, the results would be in children of A, If C finishes, it would be the left most, Children of A would just be a union of current children of A and UserCTwo of them are leftmost case
  43. When C finishesC has a right sibling, B, so the result of C is accumulated into Right BWhen B finishes, the children of A has UserB
  44. 1. Doing nothing is correct because all children of P, if any exist, were stack frames, and thus they transferred ownership of their views to P when they completed. Thus, no outstanding child views exist that must be reduced into P’s. 2. Then after P passes the cilk_sync state- ment but before executing any client code, we perform the update. This up- date reduces all reducers of completed children into the parent.
  45. Comparing reducers against mutual exclusion
  46. Future scaling with dynmiac parallelismProvides a simple way to add incremental parallelismIncremental parallelization of programsInspired many future works, such as Habanero Java, Habanero C, X10,
  47. Eagerly saving all the state, gather the states using an Exception when they make a steal