14. Ignoring cores…
2 cores won't hurt you
4 cores will hurt a little
8 cores will hurt a bit
16 will start hurting
32 cores will hurt a lot (2009)
...
1 M cores ouch (2019)
15. Java is so Java
Java Memory Model
wait - notify
Thread.yield()
Thread.sleep()
synchronized
Executor Framework -> NIO (2)
22. Issues
• Locks
– manually lock and unlock
– lock ordering is a big problem
– locks are not compositional
• How do we decide what is concurrent?
• Need to pre-design, but now we have to
retrofit concurrency via new requirements
24. A good solution
• Substantially less error prone
• Makes it much easier to identify concurrency
• Runs on today’s (and future) parallel hardware
– Works if you keep adding cores/threads
27. Transactional memory
analogous to database transactions
Hardware vs. software implementations
Idea goes as far back as 1986
First appearance in a programming language:
Concurrent Haskell 2005
29. STM Design Space
STM Algorithms / Strategies
Granularity
word vs block
Locks vs Optimistic concurrency
Conflict detection
eager vs lazy
Contention management
30. STM Problems
• Non abortable operations
– I/O
• STM Overhead
– read/write barrier elimination
• Where to place transaction boundaries?
• Still need condition variables
– ordering problems are important
33. Available Data Structures
Lists, Vectors, Maps
hash list based on Vlists
VDList - deques based on Vlists
red-black trees
Real Time Queues and Deques
deques, output-restricted deques
binary random access lists
binomial heaps
skew binary random access lists
skew binomial heaps
catenable lists
heaps with efficient merging
catenable deques
35. Actors
Invented by Carl Hewitt at MIT (1973)
Formal Model
Programming languages
Hardware
Led to continuations, Scheme
Recently revived by Erlang
Erlang’s model is not derived explicitly from
Actors
37. Example
object account extends Actor
{
private var balance = 0
def act() {
loop {
react {
case Withdraw(amount) =>
balance -= amount
sender ! Balance(balance)
case Deposit(amount) =>
balance += amount
sender ! Balance(balance)
case BalanceRequest =>
sender ! Balance(balance)
case TerminateRequest =>
}
}
}
38. Problems
DOS of the actor mail queue
Multiple actor coordination
reinvent transactions?
Actors can still deadlock and starve
Programmer defines granularity
by choosing what is an actor
41. Erlang vs JVM
Erlang
per process GC heap
tail call
distributed
JVM
per JVM heap
tail call (fixed in JSR-292?, at least in scalac)
not distributed
few kinds of actors (Scala)
42. Actor Variants
Clojure Agents
Designed for loosely coupled stuff
Code/actions sent to agents
Code is queued when it hits the agent
Agent framework guarantees serialization
State of agent is always available for read (unlike
actors which could be busy processing when you send a
read message)
not in favor of transparent distribution
Clojure agents can operate in an ‘open world’ - actors
answer a specific set of messages
43. Dataflow
Dataflow is a software architecture based on the idea
that changing the value of a variable should automatically
force recalculation of the values of variables which
depend on its value
Bill Ackerman’s PhD Thesis at MIT (1984)
Declarative Concurrency in functional languages
Research in the 1980’s and 90’s
Inherent concurrency
Turns out to be very difficult to implement
Interest in declarative concurrency is slowly returning
44. The Model
Dataflow Variables
create variable
bind value
read value or block
Threads
Dataflow Streams
List whose tail is an unbound dataflow variable
Deterministic computation!
45. Example: Variables 1
object Test5 extends App {
val x, y, z = new DataFlowVariable[Int]
val main = thread {
x << 1
if (x() > y()) { z << x } else {z << y }
}
val setY = thread {
Thread.sleep(5000)
y << 2
}
main ! 'exit
setY ! 'exit
}
46. Example: Streams (Oz)
fun {Ints N Max}
if N == Max then nil
else
{Delay 1000}
N|{Ints N+1 Max}
end
end
fun {Sum S Stream}
case Stream of nil then S
[] H|T then S|{Sum H+S T} end
end
local X Y in
thread X = {Ints 0 1000} end
thread Y = {Sum 0 X} end
{Browse Y}
end
47. Implementations
Mozart Oz
http://www.mozart-oz.org/
Akka
http://github.com/jboner/scala-dataflow
dataflow variables and streams
Ruby library
http://github.com/larrytheliquid/dataflow
dataflow variables and streams
Groovy
http://code.google.com/p/gparallelizer/
51. The Model
Space uncoupling
Time uncoupling
Readers are decoupled from Writers
Content addressable by pattern matching
Can emulate
Actor like continuations
CSP
Message Passing
Semaphores
52. Example
public class Account implements Entry {
public Integer accountNo;
public Integer value;
public Account() { ... }
public Account(int accountNo, int value {
this.accountNo = newInteger(accountNo);
this.value = newInteger(value);
}
}
try {
Account newAccount = new Account(accountNo, value);
space.write(newAccount, null, Lease.FOREVER);
}
space.read(accountNo);
53. Implementations
Jini/JavaSpaces
http://incubator.apache.org/river/RIVER/inde
x.html
BlitzSpaces
http://www.dancres.org/blitz/blitz_js.html
PyLinda
http://code.google.com/p/pylinda/
Rinda
built in to Ruby
54. Problems
Low level
High latency to the space - the space is
contention point / hot spot
Scalability
More for distribution than concurrency
and sometimes called the instruction address register (IAR)[1] or just part of the instruction sequencer,[2] is a processor register that indicates where a computer is in its programsequence.
For the concept, see Mutually exclusive events."mutex" redirects here. For the computer program object that negotiates mutual exclusion among threads, see lock (computer science).Coarse-grained systems consist of fewer, larger components than fine-grained systems; a coarse-grained description of a system regards large subcomponents while a fine-grained description regards smaller components of which the larger ones are composed.
- the process calculi (or process algebras) are a diverse family of related approaches for formally modellingconcurrent systems.Communicating Sequential Processes (limbo and Go) - process calculusThe approach taken in developing CSP into a process algebra was influenced by Robin Milner's work on theCalculus of Communicating Systems (CCS), and vice versa. CCS is useful for evaluating the qualitative correctness of properties of a system such as deadlock or livelockA Petri net (also known as a place/transition net or P/T net) is one of several mathematical modeling languages for the description of distributed systemsPi - as a continuation of work on the process calculus CCS (Calculus of Communicating Systems). The π-calculus allows channel names to be communicated along the channels themselves, and in this way it is able to describe concurrent computations whose network configuration may change during the computation.The join-calculus is a member of the -calculus family of process calculi, and can be considered, at its core, an asynchronous -calculus with several strong restrictions[3]:Scope restriction, reception, and replicated reception are syntactically merged into a single construct, the definition;Communication occurs only on defined names;For every defined name there is exactly one replicated reception.
Transactional memory attempts to simplify parallel programming by allowing a group of load and store instructions to execute in an atomic way. It is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing.A dataflow network is a network of concurrently executing processes or automata that can communicate by sending data over channels (seemessage passing.)A tuple space is an implementation of the associative memory paradigm for parallel/distributed computing. It provides a repository of tuples that can be accessed concurrently. As an illustrative example, consider that there are a group of processors that produce pieces of data and a group of processors that use the data. Producers post their data as tuples in the space, and the consumers then retrieve data from the space that match a certain pattern. This is also known as the blackboard metaphor. Tuple space may be thought as a form of distributed shared memory.Tuple spaces were the theoretical underpinning of the Linda language developed by David Gelernter and Nicholas Carriero at Yale University.Implementations of tuple spaces have also been developed for Java (JavaSpaces), Lisp, Lua, Prolog, Python, Ruby, Smalltalk, Tcl, and the.NET framework.
. A contention management process is invoked when a conflict occurs between a first transaction and a second transaction. The pre-determined commit order is used in the contention management process to aid in determining whether the first transaction or the second transaction should win the conflict and be allowed to proceed."
Dataflow concurrency is deterministic. This means that it will always behave the same. If you run it once and it yields output 5 then it will do that every time, run it 10 million times, same result. If it on the other hand deadlocks the first time you run it, then it will deadlock every single time you run it. Also, there is no difference between sequential code and concurrent code. These properties makes it very easy to reason about concurrency. The limitation is that the code needs to be side-effect free, e.g. deterministic. You can’t use exceptions, time, random etc., but need to treat the part of your program that uses dataflow concurrency as a pure function with input and output.The best way to learn how to program with dataflow variables is to read the fantastic book Concepts, Techniques, and Models of Computer Programming. By Peter Van Roy and Seif Haridi.