Are streams just collections? What's the difference between Java 8 streams and Reactive Streams? How do I implement Reactive Streams with Akka? Pub/sub, dynamic push/pull, non-blocking, non-dropping; these are some of the other concepts covered. We'll also discuss how to leverage streams in a real-world application.
4. What's an array?
• A series of elements arranged in memory
• Has a beginning and an end
5. What's a stream?
• A series of elements emitted over time
• Live data (e.g, events) or at rest data (e.g, partitions of a file)
• May not have a beginning or an end
6. Appeal of stream processing?
• Scaling business logic
• Processing real-time data (fast data)
• Batch processing of large data sets (big data)
• Monitoring, analytics, complex event processing, etc
7. Challenges?
• Ephemeral
• Unbounded in size
• Potential "flooding" downstream
• Unfamiliar programming paradigm
You cannot step twice into the same
stream. For as you are stepping in, other
waters are ever flowing on to you.
— Heraclitus
8. Exploring two challenges of
stream processing
• An Rx-based approach for passing data across an
asynchronous boundary
• An approach for implementing back pressure
14. Flow control
• We need a way to signal when a subscriber is able to
process more data
• Effectively push-based (dynamic pull/push)
A lack of back pressure will eventually lead to an Out of Memory
Exception (OOME), which is the worst possible outcome. Then
you lose not just the work that overloaded the system, but
everything, even the stuff that you were safely working on.
— Jim Powers, Typesafe
21. Why Reactive Streams?
• Reactive Streams is a specification and low-level API for
library developers.
• Started as an initiative in late 2013 between engineers at
Netflix, Pivotal, and Typesafe
• Streaming was complex!
• Play had “iteratees”, Akka had Akka IO
22. What is Reactive Streams?
• TCK (Technology Compatibility Kit)
• API (JVM, JavaScript)
• Specifications for library developers
• Early conversation on future spec for IO
23. 1. Flow control via back pressure
• Fast publisher responsibilities
1. Not generate elements, if it is able to control their
production rate
2. Buffer elements in a bounded manner until more
demand is signalled
3. Drop elements until more demand is signalled
4. Tear down the stream if unable to apply any of the above
strategies
24. 2. An Rx-based
approach to
asyncrony
public interface Processor<T, R> extends Subscriber<T>, Publisher<R> {}
public interface Publisher<T> {
public void subscribe(Subscriber<? super T> s);
}
public interface Subscriber<T> {
public void onSubscribe(Subscription s);
public void onNext(T t);
public void onError(Throwable t);
public void onComplete();
}
public interface Subscription {
public void request(long n);
public void cancel();
}
26. Three main repositories
• Reactive Streams for the JVM
• Reactive Streams for JavaScript
• Reactive Streams IO (for network protocols such as TCP,
WebSockets and possibly HTTP/2)
• Early exploration kicked off by Netflix
• 2016 timeframe
27. Reactive Streams
Visit the Reactive Streams website for more information.
http://www.reactive-streams.org/
29. Akka Streams
Akka Streams provides a way to express and run a chain of
asynchronous processing steps acting on a sequence of
elements.
• DSL for async/non-blocking stream processing
• Default back pressure
• Conforms to the Reactive Streams spec for interop
31. • Source - A processing stage with exactly one output
• Sink - A processing stage with exactly one input
• Flow - A processing stage which has exactly one input and
output
• RunnableFlow - A Flow that has both ends "attached" to a
Source and Sink
38. Materialization
• Separate the what from the how
• Declarative Source/Flow/Sink to
create a blueprint
• FlowMaterializer turns blueprint
into actors
• Involves an extra step, but no magic
39. Error handling
• The element causing division by zero will be dropped
• Result will be a Future completed with Success(228)
val decider: Supervision.Decider = exc => exc match {
case _: ArithmeticException => Supervision.Resume
case _ => Supervision.Stop
}
// ActorFlowMaterializer takes the list of transformations comprising a akka.stream.scaladsl.Flow
// and materializes them in the form of org.reactivestreams.Processor
implicit val mat = ActorFlowMaterializer(
ActorFlowMaterializerSettings(system).withSupervisionStrategy(decider))
val source = Source(0 to 5).map(100 / _)
val result = source.runWith(Sink.fold(0)(_ + _))
40. Dynamic push/pull backpressure
• Fast subscriber can issue more Request(n) even before more
data arrives
• Publisher can accumulate demand
• Conforming to "fast publisher" responsibilities
• Total demand of elements is safe to publish
• Subscriber's buffer will never overflow
42. Fan out
• Broadcast[T] (1 input, n outputs)
• Signals each output given an input signal
• Balance[T] (1 input => n outputs)
• Signals one of its output ports given an input signal
• FlexiRoute[In] (1 input, n outputs)
• Write custom fan out elements using a simple DSL
43. Fan in
• Merge[In] (n inputs , 1 output)
• Picks signals randomly from inputs
• Zip[A,B,Out] (2 inputs, 1 output)
• Zipping into an (A,B) tuple stream
• Concat[T] (2 inputs, 1 output)
• Concatenate streams (first, then second)
44. Scala example
val g = FlowGraph.closed() { implicit builder: FlowGraph.Builder =>
import FlowGraph.Implicits._
val in = Source(1 to 10)
val out = Sink.ignore
val bcast = builder.add(Broadcast[Int](2))
val merge = builder.add(Merge[Int](2))
val f1, f2, f3, f4 = Flow[Int].map(_ + 10)
in ~> f1 ~> bcast ~> f2 ~> merge ~> f3 ~> out
bcast ~> f4 ~> merge
}
45.
46. Advanced flow control
// return only the freshest element when the subscriber signals demand
val droppyStream: Flow[Message, Message] =
Flow[Message].conflate(seed = identity)((lastMessage, newMessage) => newMessage)
• conflate can be thought as a special fold operation that
collapses multiple upstream elements into one aggregate
element
• groupedWithin chunks up this stream into groups of
elements received within a time window, or limited by the
given number of elements, whatever happens first
47. Other sinks and sources - simple
streaming from/to Kafka
implicit val actorSystem = ActorSystem("ReactiveKafka")
implicit val materializer = ActorMaterializer()
val kafka = new ReactiveKafka(host = "localhost:9092", zooKeeperHost = "localhost:2181")
val publisher = kafka.consume("lowercaseStrings", "groupName", new StringDecoder())
val subscriber = kafka.publish("uppercaseStrings", "groupName", new StringEncoder())
// consume lowercase strings from kafka and publish them transformed to uppercase
Source(publisher).map(_.toUpperCase).to(Sink(subscriber)).run()
48. A quick comparison with Java 8
Streams
• Pull-based, synchronous sequences of values
• Iterators with a more parallelism-friendly interface
• Intermediate operations are lazy (e.g, filter, map)
• Terminal operations are eager (e.g, reduce)
• Only high-level control (no next/hasNext)
• Similar to Scala Collections
49. Java 8 Streams
String concatenatedString = listOfStrings
.stream()
.peek(s -> listOfStrings.add("three")) // don't do this!
.reduce((a, b) -> a + " " + b)
.get();
50. Code review and demo
Part 4 of 4
Source code available at https://github.com/rocketpages