SlideShare una empresa de Scribd logo
1 de 44
Descargar para leer sin conexión
Overview


Introduction
 Synchronization
 Non-blocking

Synchronization



Is Non-blocking Synchronization performancebeneficial for Parallel Applications?



NOBLE: A Non-blocking Synchronization Interface.
How can we make non-blocking synchronization
accessible to the parallel programmer?



Lock-free Skip lists



Conclusions, Future Work
Systems: SMP


Cache-coherent distributed shared
memory multiprocessor systems:
 UMA
 NUMA
Synchronization
Barriers
 Locks, semaphores,… (mutual
exclusion)


“A significant part of the work performed
by today’s parallel applications is spent on
synchronization.”
...
Lock-Based Synchronization:
Sequential
Non-blocking Synchronization


Lock-Free Synchronization
 Optimistic

approach

• Assumes it’s alone and prepares
operation which later takes place (unless
interfered) in one atomic step, using
hardware atomic primitives
• Interference is detected via shared
memory
• Retries until not interfered by other
operations
• Can cause starvation
Slide provided by Jim Anderson

Example: Shared Queue
The usual approach is to implement operations using retry loops.
Here’s an example:
type Qtype = record v: valtype; next: pointer to Qtype end
type Qtype = record v: valtype; next: pointer to Qtype end
shared var Tail: pointer to Qtype;
shared var Tail: pointer to Qtype;
local var old, new: pointer to Qtype
local var old, new: pointer to Qtype
procedure Enqueue (input: valtype)
procedure Enqueue (input: valtype)
new := (input, NIL);
new := (input, NIL);
repeat old := Tail
repeat old := Tail
until CAS2(&Tail, &(old->next), old, NIL, new, new)
until CAS2(&Tail, &(old->next), old, NIL, new, new)

old
Tail

new

old
Tail

new
Non-blocking Synchronization


Lock-Free Synchronization
 Avoids

problems that locks have

 Fast
 Starvation?



(not in the Context of HPC)

Wait-Free Synchronization
 Always

finishes in a finite number of its own

steps.
• Complex algorithms
• Memory consuming
• Less efficient on average than lock-free
Overview


Introduction
 Synchronization
 Non-blocking

Synchronization



Is Non-blocking Synchronization performancebeneficial for Parallel Scientific Applications?



NOBLE: A Non-blocking Synchronization Interface.
How can we make non-blocking synchronization
accessible to the parallel programmer?



Conclusions, Future Work
Non-blocking
Synchronisation
Synchronisation:
 An alternative approach for synchronisation
introduced 25 years ago
 Many theoretical results
Evaluation:
 Micro-benchmarks shows better
performance than mutual exclusion in real or
simulated multiprocessor systems.
Practice




Non-blocking synchronization is still not
used in practical applications
Non-blocking solutions are often
 complex
 having

non-standard or un-clear
interfaces
 non-practical

?

?
Practice
Question?
”How the performance of
parallel scientific
applications is affected by
the use of non-blocking
synchronisation rather than
lock-based one?”

?

?

?
Answers
How the performance of parallel scientific
applications is affected by the use of nonblocking synchronisation rather than lockbased one?






The identification of the basic locking
operations that parallel programmers use in
their applications.
The efficient non-blocking implementation of
these synchronisation operations.
The architectural implications on the design
of non-blocking synchronisation.
Comparison of the lock-based and lock-free
versions of the respective applications
Applications
Ocean

simulates eddy currents in an ocean basin.

Radiosity

computes the equilibrium distribution of light in a scene
using the radiosity method.

Volrend

renders 3D volume data into an image using a raycasting method.

Water

Evaluates forces and potentials that occur over time
between water molecules.

Spark98

a collection of sparse matrix kernels.
Each kernel performs a sequence of sparse matrix
vector product operations using matrices that are
derived from a family of three-dimensional finite
element earthquake applications.
Removing Locks in
Applications


Many locks are
“Simple Locks”.



Many critical
sections contain
shared floatingpoint variables.



Large critical
sections.







CAS, FAA and LL/SC can
be used to implement
non-blocking version.
Floating-point
synchronization primitives
are needed. A DoubleFetch-and-Add primitive
was designed.
Efficient Non-blocking
implementations of big
ADT are used.
Experimental Results:
Speedup
58P
58P

32P
24P

24P

58P
58P
SPARK98
Before:
spark_setlock(lockid);
w[col][0] += A[Anext][0][0]*v[i][0] + A[Anext][1][0]*v[i][1] + A[Anext][2][0]*v[i][2];
w[col][1] += A[Anext][0][1]*v[i][0] + A[Anext][1][1]*v[i][1] + A[Anext][2][1]*v[i][2];
w[col][2] += A[Anext][0][2]*v[i][0] + A[Anext][1][2]*v[i][1] + A[Anext][2][2]*v[i][2];
spark_unsetlock(lockid);
After:
dfad(&w[col][0], A[Anext][0][0]*v[i][0] + A[Anext][1][0]*v[i][1] + A[Anext][2][0]*v[i][2]);
dfad(&w[col][1], A[Anext][0][1]*v[i][0] + A[Anext][1][1]*v[i][1] + A[Anext][2][1]*v[i][2]);
dfad(&w[col][2], A[Anext][0][2]*v[i][0] + A[Anext][1][2]*v[i][1] + A[Anext][2][2]*v[i][2]);
Overview


Introduction
 Synchronization
 Non-blocking

Synchronization



Is Non-blocking Synchronization beneficial for
Parallel Scientific Applications?



NOBLE: A Non-blocking Synchronization Interface.
How can we make non-blocking synchronization
accessible to the parallel programmer?



Conclusions, Future Work
Practice




Non-blocking synchronization is still not
used in practical applications
Non-blocking solutions are often
 complex
 having

non-standard or un-clear
interfaces
 non-practical

?

?
NOBLE: Brings Non-blocking closer to Practice


Create a non-blocking inter-process
communication interface with the properties:
 Attractive

functionality
 Programmer friendly
 Easy to adapt existing solutions
 Efficient
 Portable
 Adaptable for different programming languages
NOBLE Design: Portable
Noble.h
#define NBL...
#define NBL...
#define NBL...

Exported definitions
Identical for all platforms
Platform in-dependent

QueueLF.c

StackLF.c

#include “Platform/Primitives.h”
…

#include “Platform/Primitives.h”
…

...

Platform dependent
SunHardware.asm

IntelHardware.asm

CAS, TAS, Spin-Locks
…

CAS, TAS, Spin-Locks
...

...
Using NOBLE
• First create a global variable
handling the shared data
object, for example a stack:
• Create the stack with the
appropriate implementation:

Globals
#include <noble.h>
...
NBLStack* stack;

Main
stack=NBLStackCreateLF(10000);
...

Threads
• When some thread wants to
do some operation:

NBLStackPush(stack, item);

or
item=NBLStackPop(stack);
Using NOBLE
Globals
#include <noble.h>
...
NBLStack* stack;

Main


When the data structure is
not in use anymore:

stack=NBLStackCreateLF(10000);
...
NBLStackFree(stack);
Using NOBLE
Globals
#include <noble.h>
...
NBLStack* stack;
• To change the synchronization mechanism, only one
line of code has to be changed!

Main
stack=NBLStackCreateLB();
...
NBLStackFree(stack);

Threads
NBLStackPush(stack, item);

or
item=NBLStackPop(stack);
Design: Attractive functionality


Data structures for multi-threaded usage
 FIFO

Queues
 Priority Queues
 Dictionaries
 Stacks
 Singly linked lists
 Snapshots
 MWCAS
 ...


Clear specifications
Status


Multiprocessor support
 Sun

Solaris (Sparc)
 Win32 (Intel x86)
 SGI (Mips)
 Linux (Intel x86)
Availiable for academic use:
http://www.noble-library.org/
Did our Work have any
Impact?
1)

2)

3)

Industry has initialized contacts and
uses a test version of NOBLE.
Free-ware developers has showed
interest.
Interest from research organisations.
NOBLE is freely availiable for
research and educational purposes.
A Lock-Free Skip list


Presented as part of the: H. Sundell, Ph. Tsigas
Fast and Lock-Free Concurrent Priority Queues
for Multi-Thread Systems. 17th IEEE/ACM
International Parallel and Distributed
Processing Symposium (IPDPS ´03), May 2003
(TR 2002). Best Paper Award

A very similar lock-free skip list algorithm will be
presented this August at the ACM Symposium
on Principles of Distributed Computing (PODC
2004):
”Lock-Free Linked Lists and Skip Lists”
Mikhail Fomitchev, Eric Ruppert
Randomized Algorithm: Skip Lists


William Pugh: ”Skip Lists: A Probabilistic
Alternative to Balanced Trees”, 1990
 Layers

of ordered lists with different
densities, achieves a tree-like behavior

Head

Tail

1

2
 Time

3

4

5

6

7

complexity: O(log2N) – probabilistic!

…
25%
50%
Our Lock-Free Concurrent
Skip List
 Define

node state to depend on the
insertion status at lowest level as well
as a deletion flag

1
3
2
1

p

D

2

D

 Insert
 Set

3

D

4

D

5

D

6

D

7

D

from lowest level going upwards

deletion flag. Delete from
highest level going downwards

3
2
1

p

D
Concurrent Insert vs. Delete
operations


b)

1

Problem:

2
Delete

3
Insert

- both nodes are deleted!


4

a)

Solution (Harris et al): Use bit 0 of
pointer to mark deletion status
1

b)

2 *
c)

a)

3

4
Dynamic Memory Management
Problem: System memory allocation
functionality is blocking!
 Solution (lock-free), IBM freelists:


 Pre-allocate

a number of nodes, link
them into a dynamic stack structure,
and allocate/reclaim using CAS
Allocate

Head

Mem 1

Mem 2

Reclaim

Used 1
© Ph. Tsigas 2003-2004

…

Mem n
The ABA problem


Problem: Because of concurrency
(pre-emption in particular), same
pointer value does not always mean
same node (i.e. CAS succeeds)!!!
Step 1:

1

6

7

3

7

4
Step 2:

2
4

© Ph. Tsigas 2003-2004
The ABA problem


Solution: (Valois et al) Add reference
counting to each node, in order to prevent
nodes that are of interest to some thread to
be reclaimed until all threads have left the
node
New Step 2:

1 *

6 *

1

1

CAS Failes!

2

3
?

7
?

4
1

© Ph. Tsigas 2003-2004

?
Helping Scheme


Threads need to traverse safely
2 *

1

4

or

1



4

?

?


2 *

Need to remove marked-to-be-deleted
nodes while traversing – Help!
Finds previous node, finish deletion and
continues traversing from previous node

1

2 *

4
© Ph. Tsigas 2003-2004
Overlapping operations on
Insert 2
shared data
2


Example: Insert operation 1

4

- which of 2 or 3 gets inserted?


Solution: Compare-And-Swap
atomic primitive:
CAS(p:pointer to word, old:word,
new:word):boolean
atomic do
if *p = old then
*p := new;
return true;
else return false;

© Ph. Tsigas 2003-2004

3
Insert 3
Experiments
1-30 threads on platforms with
different levels of real concurrency
 10000 Insert vs. DeleteMin operations
by each thread. 100 vs. 1000 initial
inserts
 Compare with other implementations:


 Lotan

and Shavit, 2000
 Hunt et al “An Efficient Algorithm for
Concurrent Priority Queue Heaps”,
1996
© Ph. Tsigas 2003-2004
Full Concurrency

© Ph. Tsigas 2003-2004
Medium Pre-emption

© Ph. Tsigas 2003-2004
High Pre-emption

© Ph. Tsigas 2003-2004
Lessons Learned








The Non-Blocking Synchronization
Paradigm can be suitable and beneficial to
large scale parallel applications.
Experimental Reproducable Work. Many
results claimed by simulation are not
consistent with what we observed.
Applications gave us nice problems to look
at and do theoretical work on. (IPDPS 2003
Algorithmic Best Paper Award)
NOBLE helped programmers to trust our
implementations.

© Ph. Tsigas 2003-2004
Future Work
Extend NOBLE for loosely coupled
systems.
 Extend the set of data structures
supported by NOBLE based on the
needs of the applications.
 Reactive-Synchronisation


© Ph. Tsigas 2003-2004
Questions?


Contact Information:
 Address:

Philippas Tsigas
Computing Science
Chalmers University of Technology

 Email:
 Web:

tsigas @ cs.chalmers.se
http://www.cs.chalmers.se/~tsigas
http://www.cs.chalmers.se/~dcs
http://www.noble-library.org

© Ph. Tsigas 2003-2004
Pointers:














NOBLE: A Non-Blocking Inter-Process Communication Library. ACM Workshop
on Languages, Compilers, and Run-time Systems for Scalable Computers (LCR
´02).
Evaluating The Performance of Non-Blocking Synchronization on Shared Memory
Multiprocessors. ACM SIGMETRICS 2001/Performance2001 Joint International
Conference on Measurement and Modeling of Computer Systems (SIGMETRICS
2001).
Integrating Non-blocking Synchronization in Parallel Applications: Performance
Advantages and Methodologies. ACM Workshop on Software and Performance
(WOSP ´01).
A Simple, Fast and Scalable Non-Blocking Concurrent FIFO queue for Shared
Memory Multiprocessor Systems, ACM Symposium on Parallel Algorithms and
Architectures (SPAA ´01).
Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems. 17th
IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS
´03).
Fast, Reactive and Lock-free Multi-word Compare-and-swap Algorithms. 12th
EEE/ACM International Conference on Parallel Architectures and Compilation
Techniques (PACT ´03)
Scalable and Lock-free Cuncurrent Dictionaries. Proceedings of the 19th ACM
Symposium on Applied Computing (SAC ’04).

© Ph. Tsigas 2003-2004

Más contenido relacionado

La actualidad más candente

NXTTour: An Open Source Robotic System Operated over the Internet
NXTTour: An Open Source Robotic System Operated over the InternetNXTTour: An Open Source Robotic System Operated over the Internet
NXTTour: An Open Source Robotic System Operated over the InternetJoao Alves
 
Task and Data Parallelism: Real-World Examples
Task and Data Parallelism: Real-World ExamplesTask and Data Parallelism: Real-World Examples
Task and Data Parallelism: Real-World ExamplesSasha Goldshtein
 
Neural Network as a function
Neural Network as a functionNeural Network as a function
Neural Network as a functionTaisuke Oe
 
convolutional neural network (CNN, or ConvNet)
convolutional neural network (CNN, or ConvNet)convolutional neural network (CNN, or ConvNet)
convolutional neural network (CNN, or ConvNet)RakeshSaran5
 
Review on cs231 part-2
Review on cs231 part-2Review on cs231 part-2
Review on cs231 part-2Jeong Choi
 
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.GeeksLab Odessa
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksTianxiang Xiong
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
 
Event driven, mobile artificial intelligence algorithms
Event driven, mobile artificial intelligence algorithmsEvent driven, mobile artificial intelligence algorithms
Event driven, mobile artificial intelligence algorithmsDinesh More
 
Collaborative archietyped for ipv4
Collaborative archietyped for ipv4Collaborative archietyped for ipv4
Collaborative archietyped for ipv4Fredrick Ishengoma
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Deep Learning
Deep LearningDeep Learning
Deep LearningJun Wang
 
HardNet: Convolutional Network for Local Image Description
HardNet: Convolutional Network for Local Image DescriptionHardNet: Convolutional Network for Local Image Description
HardNet: Convolutional Network for Local Image DescriptionDmytro Mishkin
 

La actualidad más candente (13)

NXTTour: An Open Source Robotic System Operated over the Internet
NXTTour: An Open Source Robotic System Operated over the InternetNXTTour: An Open Source Robotic System Operated over the Internet
NXTTour: An Open Source Robotic System Operated over the Internet
 
Task and Data Parallelism: Real-World Examples
Task and Data Parallelism: Real-World ExamplesTask and Data Parallelism: Real-World Examples
Task and Data Parallelism: Real-World Examples
 
Neural Network as a function
Neural Network as a functionNeural Network as a function
Neural Network as a function
 
convolutional neural network (CNN, or ConvNet)
convolutional neural network (CNN, or ConvNet)convolutional neural network (CNN, or ConvNet)
convolutional neural network (CNN, or ConvNet)
 
Review on cs231 part-2
Review on cs231 part-2Review on cs231 part-2
Review on cs231 part-2
 
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
AI&BigData Lab 2016. Александр Баев: Transfer learning - зачем, как и где.
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
Event driven, mobile artificial intelligence algorithms
Event driven, mobile artificial intelligence algorithmsEvent driven, mobile artificial intelligence algorithms
Event driven, mobile artificial intelligence algorithms
 
Collaborative archietyped for ipv4
Collaborative archietyped for ipv4Collaborative archietyped for ipv4
Collaborative archietyped for ipv4
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
HardNet: Convolutional Network for Local Image Description
HardNet: Convolutional Network for Local Image DescriptionHardNet: Convolutional Network for Local Image Description
HardNet: Convolutional Network for Local Image Description
 

Destacado (17)

Advancedrn
AdvancedrnAdvancedrn
Advancedrn
 
Caqa5e ch4
Caqa5e ch4Caqa5e ch4
Caqa5e ch4
 
Lec13 cdn
Lec13 cdnLec13 cdn
Lec13 cdn
 
worklight_development_environment
worklight_development_environmentworklight_development_environment
worklight_development_environment
 
Asp controls
Asp  controlsAsp  controls
Asp controls
 
Visual studio-2012-product-guide
Visual studio-2012-product-guideVisual studio-2012-product-guide
Visual studio-2012-product-guide
 
Ch20
Ch20Ch20
Ch20
 
(148064384) bfs
(148064384) bfs(148064384) bfs
(148064384) bfs
 
Cdn imw01
Cdn imw01Cdn imw01
Cdn imw01
 
Introto netthreads-090906214344-phpapp01
Introto netthreads-090906214344-phpapp01Introto netthreads-090906214344-phpapp01
Introto netthreads-090906214344-phpapp01
 
Collaborative filtering hyoungtae cho
Collaborative filtering hyoungtae choCollaborative filtering hyoungtae cho
Collaborative filtering hyoungtae cho
 
Hans enocson how big data creates opportunities for productivity improvements...
Hans enocson how big data creates opportunities for productivity improvements...Hans enocson how big data creates opportunities for productivity improvements...
Hans enocson how big data creates opportunities for productivity improvements...
 
Big data trendsdirections nimführ.ppt
Big data trendsdirections nimführ.pptBig data trendsdirections nimführ.ppt
Big data trendsdirections nimführ.ppt
 
Des
DesDes
Des
 
Intellij idea features
Intellij idea featuresIntellij idea features
Intellij idea features
 
Android sql examples
Android sql examplesAndroid sql examples
Android sql examples
 
Going beyond-data-and-analytics-v4
Going beyond-data-and-analytics-v4Going beyond-data-and-analytics-v4
Going beyond-data-and-analytics-v4
 

Similar a Role of locking- cds

Comparing Write-Ahead Logging and the Memory Bus Using
Comparing Write-Ahead Logging and the Memory Bus UsingComparing Write-Ahead Logging and the Memory Bus Using
Comparing Write-Ahead Logging and the Memory Bus Usingjorgerodriguessimao
 
Architecture of the oasis mobile shared virtual memory system
Architecture of the oasis mobile shared virtual memory systemArchitecture of the oasis mobile shared virtual memory system
Architecture of the oasis mobile shared virtual memory systemZongYing Lyu
 
Wireless Ad Hoc Networks
Wireless Ad Hoc NetworksWireless Ad Hoc Networks
Wireless Ad Hoc NetworksTara Hardin
 
Towards high performance computing(hpc) through parallel programming paradigm...
Towards high performance computing(hpc) through parallel programming paradigm...Towards high performance computing(hpc) through parallel programming paradigm...
Towards high performance computing(hpc) through parallel programming paradigm...ijpla
 
Introduction to OpenVX
Introduction to OpenVXIntroduction to OpenVX
Introduction to OpenVX家榮 張
 
Performance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memoryPerformance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memoryZongYing Lyu
 
CRIWG 2010: Enabling Collaboration transparency
CRIWG 2010: Enabling Collaboration transparencyCRIWG 2010: Enabling Collaboration transparency
CRIWG 2010: Enabling Collaboration transparencypgarcial
 
Actor model in F# and Akka.NET
Actor model in F# and Akka.NETActor model in F# and Akka.NET
Actor model in F# and Akka.NETRiccardo Terrell
 
An Overview of Distributed Debugging
An Overview of Distributed DebuggingAn Overview of Distributed Debugging
An Overview of Distributed DebuggingAnant Narayanan
 
Life & Work of Butler Lampson | Turing100@Persistent
Life & Work of Butler Lampson | Turing100@PersistentLife & Work of Butler Lampson | Turing100@Persistent
Life & Work of Butler Lampson | Turing100@PersistentPersistent Systems Ltd.
 
Linux Assignment 3
Linux Assignment 3Linux Assignment 3
Linux Assignment 3Diane Allen
 
An Implementation on Effective Robot Mission under Critical Environemental Co...
An Implementation on Effective Robot Mission under Critical Environemental Co...An Implementation on Effective Robot Mission under Critical Environemental Co...
An Implementation on Effective Robot Mission under Critical Environemental Co...IJERA Editor
 
FIFO Based Routing Scheme for Clock-less System
FIFO Based Routing Scheme for Clock-less SystemFIFO Based Routing Scheme for Clock-less System
FIFO Based Routing Scheme for Clock-less SystemWaqas Tariq
 
Performance analysis of synchronisation problem
Performance analysis of synchronisation problemPerformance analysis of synchronisation problem
Performance analysis of synchronisation problemharshit200793
 
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringBridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringRafael Ferreira da Silva
 
Software Architectures, Week 2 - Decomposition techniques
Software Architectures, Week 2 - Decomposition techniquesSoftware Architectures, Week 2 - Decomposition techniques
Software Architectures, Week 2 - Decomposition techniquesAngelos Kapsimanis
 
Producer consumer-problems
Producer consumer-problemsProducer consumer-problems
Producer consumer-problemsRichard Ashworth
 

Similar a Role of locking- cds (20)

Comparing Write-Ahead Logging and the Memory Bus Using
Comparing Write-Ahead Logging and the Memory Bus UsingComparing Write-Ahead Logging and the Memory Bus Using
Comparing Write-Ahead Logging and the Memory Bus Using
 
Architecture of the oasis mobile shared virtual memory system
Architecture of the oasis mobile shared virtual memory systemArchitecture of the oasis mobile shared virtual memory system
Architecture of the oasis mobile shared virtual memory system
 
Harmful interupts
Harmful interuptsHarmful interupts
Harmful interupts
 
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-CentersTowards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
 
Wireless Ad Hoc Networks
Wireless Ad Hoc NetworksWireless Ad Hoc Networks
Wireless Ad Hoc Networks
 
Towards high performance computing(hpc) through parallel programming paradigm...
Towards high performance computing(hpc) through parallel programming paradigm...Towards high performance computing(hpc) through parallel programming paradigm...
Towards high performance computing(hpc) through parallel programming paradigm...
 
Tools and Methods for Continuously Expanding Software Applications
Tools and Methods for Continuously Expanding Software ApplicationsTools and Methods for Continuously Expanding Software Applications
Tools and Methods for Continuously Expanding Software Applications
 
Introduction to OpenVX
Introduction to OpenVXIntroduction to OpenVX
Introduction to OpenVX
 
Performance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memoryPerformance improvement techniques for software distributed shared memory
Performance improvement techniques for software distributed shared memory
 
CRIWG 2010: Enabling Collaboration transparency
CRIWG 2010: Enabling Collaboration transparencyCRIWG 2010: Enabling Collaboration transparency
CRIWG 2010: Enabling Collaboration transparency
 
Actor model in F# and Akka.NET
Actor model in F# and Akka.NETActor model in F# and Akka.NET
Actor model in F# and Akka.NET
 
An Overview of Distributed Debugging
An Overview of Distributed DebuggingAn Overview of Distributed Debugging
An Overview of Distributed Debugging
 
Life & Work of Butler Lampson | Turing100@Persistent
Life & Work of Butler Lampson | Turing100@PersistentLife & Work of Butler Lampson | Turing100@Persistent
Life & Work of Butler Lampson | Turing100@Persistent
 
Linux Assignment 3
Linux Assignment 3Linux Assignment 3
Linux Assignment 3
 
An Implementation on Effective Robot Mission under Critical Environemental Co...
An Implementation on Effective Robot Mission under Critical Environemental Co...An Implementation on Effective Robot Mission under Critical Environemental Co...
An Implementation on Effective Robot Mission under Critical Environemental Co...
 
FIFO Based Routing Scheme for Clock-less System
FIFO Based Routing Scheme for Clock-less SystemFIFO Based Routing Scheme for Clock-less System
FIFO Based Routing Scheme for Clock-less System
 
Performance analysis of synchronisation problem
Performance analysis of synchronisation problemPerformance analysis of synchronisation problem
Performance analysis of synchronisation problem
 
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven EngineeringBridging Concepts and Practice in eScience via Simulation-driven Engineering
Bridging Concepts and Practice in eScience via Simulation-driven Engineering
 
Software Architectures, Week 2 - Decomposition techniques
Software Architectures, Week 2 - Decomposition techniquesSoftware Architectures, Week 2 - Decomposition techniques
Software Architectures, Week 2 - Decomposition techniques
 
Producer consumer-problems
Producer consumer-problemsProducer consumer-problems
Producer consumer-problems
 

Último

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 

Último (20)

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 

Role of locking- cds

  • 1.
  • 2. Overview  Introduction  Synchronization  Non-blocking Synchronization  Is Non-blocking Synchronization performancebeneficial for Parallel Applications?  NOBLE: A Non-blocking Synchronization Interface. How can we make non-blocking synchronization accessible to the parallel programmer?  Lock-free Skip lists  Conclusions, Future Work
  • 3. Systems: SMP  Cache-coherent distributed shared memory multiprocessor systems:  UMA  NUMA
  • 4. Synchronization Barriers  Locks, semaphores,… (mutual exclusion)  “A significant part of the work performed by today’s parallel applications is spent on synchronization.” ...
  • 6. Non-blocking Synchronization  Lock-Free Synchronization  Optimistic approach • Assumes it’s alone and prepares operation which later takes place (unless interfered) in one atomic step, using hardware atomic primitives • Interference is detected via shared memory • Retries until not interfered by other operations • Can cause starvation
  • 7. Slide provided by Jim Anderson Example: Shared Queue The usual approach is to implement operations using retry loops. Here’s an example: type Qtype = record v: valtype; next: pointer to Qtype end type Qtype = record v: valtype; next: pointer to Qtype end shared var Tail: pointer to Qtype; shared var Tail: pointer to Qtype; local var old, new: pointer to Qtype local var old, new: pointer to Qtype procedure Enqueue (input: valtype) procedure Enqueue (input: valtype) new := (input, NIL); new := (input, NIL); repeat old := Tail repeat old := Tail until CAS2(&Tail, &(old->next), old, NIL, new, new) until CAS2(&Tail, &(old->next), old, NIL, new, new) old Tail new old Tail new
  • 8. Non-blocking Synchronization  Lock-Free Synchronization  Avoids problems that locks have  Fast  Starvation?  (not in the Context of HPC) Wait-Free Synchronization  Always finishes in a finite number of its own steps. • Complex algorithms • Memory consuming • Less efficient on average than lock-free
  • 9. Overview  Introduction  Synchronization  Non-blocking Synchronization  Is Non-blocking Synchronization performancebeneficial for Parallel Scientific Applications?  NOBLE: A Non-blocking Synchronization Interface. How can we make non-blocking synchronization accessible to the parallel programmer?  Conclusions, Future Work
  • 10. Non-blocking Synchronisation Synchronisation:  An alternative approach for synchronisation introduced 25 years ago  Many theoretical results Evaluation:  Micro-benchmarks shows better performance than mutual exclusion in real or simulated multiprocessor systems.
  • 11. Practice   Non-blocking synchronization is still not used in practical applications Non-blocking solutions are often  complex  having non-standard or un-clear interfaces  non-practical ? ?
  • 12. Practice Question? ”How the performance of parallel scientific applications is affected by the use of non-blocking synchronisation rather than lock-based one?” ? ? ?
  • 13. Answers How the performance of parallel scientific applications is affected by the use of nonblocking synchronisation rather than lockbased one?     The identification of the basic locking operations that parallel programmers use in their applications. The efficient non-blocking implementation of these synchronisation operations. The architectural implications on the design of non-blocking synchronisation. Comparison of the lock-based and lock-free versions of the respective applications
  • 14. Applications Ocean simulates eddy currents in an ocean basin. Radiosity computes the equilibrium distribution of light in a scene using the radiosity method. Volrend renders 3D volume data into an image using a raycasting method. Water Evaluates forces and potentials that occur over time between water molecules. Spark98 a collection of sparse matrix kernels. Each kernel performs a sequence of sparse matrix vector product operations using matrices that are derived from a family of three-dimensional finite element earthquake applications.
  • 15. Removing Locks in Applications  Many locks are “Simple Locks”.  Many critical sections contain shared floatingpoint variables.  Large critical sections.    CAS, FAA and LL/SC can be used to implement non-blocking version. Floating-point synchronization primitives are needed. A DoubleFetch-and-Add primitive was designed. Efficient Non-blocking implementations of big ADT are used.
  • 17. SPARK98 Before: spark_setlock(lockid); w[col][0] += A[Anext][0][0]*v[i][0] + A[Anext][1][0]*v[i][1] + A[Anext][2][0]*v[i][2]; w[col][1] += A[Anext][0][1]*v[i][0] + A[Anext][1][1]*v[i][1] + A[Anext][2][1]*v[i][2]; w[col][2] += A[Anext][0][2]*v[i][0] + A[Anext][1][2]*v[i][1] + A[Anext][2][2]*v[i][2]; spark_unsetlock(lockid); After: dfad(&w[col][0], A[Anext][0][0]*v[i][0] + A[Anext][1][0]*v[i][1] + A[Anext][2][0]*v[i][2]); dfad(&w[col][1], A[Anext][0][1]*v[i][0] + A[Anext][1][1]*v[i][1] + A[Anext][2][1]*v[i][2]); dfad(&w[col][2], A[Anext][0][2]*v[i][0] + A[Anext][1][2]*v[i][1] + A[Anext][2][2]*v[i][2]);
  • 18. Overview  Introduction  Synchronization  Non-blocking Synchronization  Is Non-blocking Synchronization beneficial for Parallel Scientific Applications?  NOBLE: A Non-blocking Synchronization Interface. How can we make non-blocking synchronization accessible to the parallel programmer?  Conclusions, Future Work
  • 19. Practice   Non-blocking synchronization is still not used in practical applications Non-blocking solutions are often  complex  having non-standard or un-clear interfaces  non-practical ? ?
  • 20. NOBLE: Brings Non-blocking closer to Practice  Create a non-blocking inter-process communication interface with the properties:  Attractive functionality  Programmer friendly  Easy to adapt existing solutions  Efficient  Portable  Adaptable for different programming languages
  • 21. NOBLE Design: Portable Noble.h #define NBL... #define NBL... #define NBL... Exported definitions Identical for all platforms Platform in-dependent QueueLF.c StackLF.c #include “Platform/Primitives.h” … #include “Platform/Primitives.h” … ... Platform dependent SunHardware.asm IntelHardware.asm CAS, TAS, Spin-Locks … CAS, TAS, Spin-Locks ... ...
  • 22. Using NOBLE • First create a global variable handling the shared data object, for example a stack: • Create the stack with the appropriate implementation: Globals #include <noble.h> ... NBLStack* stack; Main stack=NBLStackCreateLF(10000); ... Threads • When some thread wants to do some operation: NBLStackPush(stack, item); or item=NBLStackPop(stack);
  • 23. Using NOBLE Globals #include <noble.h> ... NBLStack* stack; Main  When the data structure is not in use anymore: stack=NBLStackCreateLF(10000); ... NBLStackFree(stack);
  • 24. Using NOBLE Globals #include <noble.h> ... NBLStack* stack; • To change the synchronization mechanism, only one line of code has to be changed! Main stack=NBLStackCreateLB(); ... NBLStackFree(stack); Threads NBLStackPush(stack, item); or item=NBLStackPop(stack);
  • 25. Design: Attractive functionality  Data structures for multi-threaded usage  FIFO Queues  Priority Queues  Dictionaries  Stacks  Singly linked lists  Snapshots  MWCAS  ...  Clear specifications
  • 26. Status  Multiprocessor support  Sun Solaris (Sparc)  Win32 (Intel x86)  SGI (Mips)  Linux (Intel x86) Availiable for academic use: http://www.noble-library.org/
  • 27. Did our Work have any Impact? 1) 2) 3) Industry has initialized contacts and uses a test version of NOBLE. Free-ware developers has showed interest. Interest from research organisations. NOBLE is freely availiable for research and educational purposes.
  • 28. A Lock-Free Skip list  Presented as part of the: H. Sundell, Ph. Tsigas Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems. 17th IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS ´03), May 2003 (TR 2002). Best Paper Award A very similar lock-free skip list algorithm will be presented this August at the ACM Symposium on Principles of Distributed Computing (PODC 2004): ”Lock-Free Linked Lists and Skip Lists” Mikhail Fomitchev, Eric Ruppert
  • 29. Randomized Algorithm: Skip Lists  William Pugh: ”Skip Lists: A Probabilistic Alternative to Balanced Trees”, 1990  Layers of ordered lists with different densities, achieves a tree-like behavior Head Tail 1 2  Time 3 4 5 6 7 complexity: O(log2N) – probabilistic! … 25% 50%
  • 30. Our Lock-Free Concurrent Skip List  Define node state to depend on the insertion status at lowest level as well as a deletion flag 1 3 2 1 p D 2 D  Insert  Set 3 D 4 D 5 D 6 D 7 D from lowest level going upwards deletion flag. Delete from highest level going downwards 3 2 1 p D
  • 31. Concurrent Insert vs. Delete operations  b) 1 Problem: 2 Delete 3 Insert - both nodes are deleted!  4 a) Solution (Harris et al): Use bit 0 of pointer to mark deletion status 1 b) 2 * c) a) 3 4
  • 32. Dynamic Memory Management Problem: System memory allocation functionality is blocking!  Solution (lock-free), IBM freelists:   Pre-allocate a number of nodes, link them into a dynamic stack structure, and allocate/reclaim using CAS Allocate Head Mem 1 Mem 2 Reclaim Used 1 © Ph. Tsigas 2003-2004 … Mem n
  • 33. The ABA problem  Problem: Because of concurrency (pre-emption in particular), same pointer value does not always mean same node (i.e. CAS succeeds)!!! Step 1: 1 6 7 3 7 4 Step 2: 2 4 © Ph. Tsigas 2003-2004
  • 34. The ABA problem  Solution: (Valois et al) Add reference counting to each node, in order to prevent nodes that are of interest to some thread to be reclaimed until all threads have left the node New Step 2: 1 * 6 * 1 1 CAS Failes! 2 3 ? 7 ? 4 1 © Ph. Tsigas 2003-2004 ?
  • 35. Helping Scheme  Threads need to traverse safely 2 * 1 4 or 1  4 ? ?  2 * Need to remove marked-to-be-deleted nodes while traversing – Help! Finds previous node, finish deletion and continues traversing from previous node 1 2 * 4 © Ph. Tsigas 2003-2004
  • 36. Overlapping operations on Insert 2 shared data 2  Example: Insert operation 1 4 - which of 2 or 3 gets inserted?  Solution: Compare-And-Swap atomic primitive: CAS(p:pointer to word, old:word, new:word):boolean atomic do if *p = old then *p := new; return true; else return false; © Ph. Tsigas 2003-2004 3 Insert 3
  • 37. Experiments 1-30 threads on platforms with different levels of real concurrency  10000 Insert vs. DeleteMin operations by each thread. 100 vs. 1000 initial inserts  Compare with other implementations:   Lotan and Shavit, 2000  Hunt et al “An Efficient Algorithm for Concurrent Priority Queue Heaps”, 1996 © Ph. Tsigas 2003-2004
  • 38. Full Concurrency © Ph. Tsigas 2003-2004
  • 39. Medium Pre-emption © Ph. Tsigas 2003-2004
  • 40. High Pre-emption © Ph. Tsigas 2003-2004
  • 41. Lessons Learned     The Non-Blocking Synchronization Paradigm can be suitable and beneficial to large scale parallel applications. Experimental Reproducable Work. Many results claimed by simulation are not consistent with what we observed. Applications gave us nice problems to look at and do theoretical work on. (IPDPS 2003 Algorithmic Best Paper Award) NOBLE helped programmers to trust our implementations. © Ph. Tsigas 2003-2004
  • 42. Future Work Extend NOBLE for loosely coupled systems.  Extend the set of data structures supported by NOBLE based on the needs of the applications.  Reactive-Synchronisation  © Ph. Tsigas 2003-2004
  • 43. Questions?  Contact Information:  Address: Philippas Tsigas Computing Science Chalmers University of Technology  Email:  Web: tsigas @ cs.chalmers.se http://www.cs.chalmers.se/~tsigas http://www.cs.chalmers.se/~dcs http://www.noble-library.org © Ph. Tsigas 2003-2004
  • 44. Pointers:        NOBLE: A Non-Blocking Inter-Process Communication Library. ACM Workshop on Languages, Compilers, and Run-time Systems for Scalable Computers (LCR ´02). Evaluating The Performance of Non-Blocking Synchronization on Shared Memory Multiprocessors. ACM SIGMETRICS 2001/Performance2001 Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS 2001). Integrating Non-blocking Synchronization in Parallel Applications: Performance Advantages and Methodologies. ACM Workshop on Software and Performance (WOSP ´01). A Simple, Fast and Scalable Non-Blocking Concurrent FIFO queue for Shared Memory Multiprocessor Systems, ACM Symposium on Parallel Algorithms and Architectures (SPAA ´01). Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems. 17th IEEE/ACM International Parallel and Distributed Processing Symposium (IPDPS ´03). Fast, Reactive and Lock-free Multi-word Compare-and-swap Algorithms. 12th EEE/ACM International Conference on Parallel Architectures and Compilation Techniques (PACT ´03) Scalable and Lock-free Cuncurrent Dictionaries. Proceedings of the 19th ACM Symposium on Applied Computing (SAC ’04). © Ph. Tsigas 2003-2004