3. Birth of the platform
Legacy solution issues:
Delays
Resource utilization
Storage for temp data
Hard to scale
Not fault tolerant
Licenses
Batch based
6. Our Storm cluster became generic enough to be
offered as a service to other teams.
Just needed to address a few points:
• Simpler scaling
• Resource isolation
Birth of the platform
Storm
7. Birth of the platform
Storm
Mesos
Our Storm cluster became generic enough to be
offered as a service to other teams.
Just needed to address a few points:
• Simpler scaling
• Resource isolation
8. Birth of the platform
Storm
Mesos
Our Storm cluster became generic enough to be
offered as a service to other teams.
Just needed to address a few points:
• Simpler scaling – Storm-mesos integration
• Resource isolation
9. Birth of the platform
Storm
Mesos
Our Storm cluster became generic enough to be
offered as a service to other teams.
Just needed to address a few points:
• Simpler scaling – Storm-mesos integration
• Resource isolation - cgroups
10. Birth of the platform
Storm
Mesos
Providing stream processing platform as a service
Storm cluster infrastructure
• 600 CPU cores, 3TB RAM
• Scala common library with reusable components
• Monitoring/alerting/logging for topologies
• Normal load - 0.7M messages/s
12. Storm basics
• Tuple – a record/message/item from
whose stream consists
• Spout – source of stream
• Bolt – a step in processing chain
• Topology – graph of connected bolts and
spouts describing data flow
• Worker – one of many distributed JVM
processes that executes a topology
13. Storm basics
• Tuple – a record/message/item from
whose stream consists
• Spout – source of stream
• Bolt – a step in processing chain
• Topology – graph of connected bolts and
spouts describing data flow
• Worker – one of many distributed JVM
processes that executes a topology
14. Storm basics
• Tuple – a record/message/item from
whose stream consists
• Spout – source of stream
• Bolt – a step in processing chain
• Topology – graph of connected bolts and
spouts describing data flow
• Worker – one of many distributed JVM
processes that executes a topology
17. Storm basics
• Tuple – a record/message/item from
whose stream consists
• Spout – source of stream
• Bolt – a step in processing chain
• Topology – graph of connected bolts and
spouts describing data flow
• Worker – one of many distributed JVM
processes that executes a topology
19. Storm basics
• Tuple – a record/message/item from
whose stream consists
• Spout – source of stream
• Bolt – a step in processing chain
• Topology – graph of connected bolts and
spouts describing data flow
• Worker – one of many distributed JVM
processes that executes a topology
20. Storm basics – reliable processing
Spout types:
• Unreliable
• Reliable
Guarantees:
• At most once
• At least once
21. Storm basics – reliable processing
Bolts may emit tuples anchored to one or more input tuples.
Here tuple B is descendant of A
22. Storm basics – reliable processing
Multiple anchorings form a tuple tree.
23. Storm basics – reliable processing
Bolts can either
• “acknowledge” or
• “fail”
it’s input tuples.
24. Storm basics – reliable processing
Failing in any of the bolts of the tuple tree will fail original tuples(s).
Spouts will retry and re-emit them again.
25. Our commons library
Tiny layer on top of Storm API and ScalaStorm* DSL to make developing in
Scala more convenient
• Typed messages
• Unified exception handling
• Reusable components
* https://github.com/velvia/ScalaStorm
27. Our commons library – typed messages
override def execute(t: Tuple) = { // what if wrong tuple comes here...
val click = t.getValue(0).asInstanceOf[Click] // it would crash the worker with an exception
val clickId = t.getInteger(0) // or worse - what if that's not clickId...
}
Standard "execute" method
28. Our commons library – typed messages
case class ClickMessage(id: Int, url: String) extends
BaseMessage
message
{1, "http://example.com"}
29. Our commons library – typed messages
case class ClickMessage(id: Int, url: String) extends BaseMessage
…
override def exec(t: Tuple) = {
case ClickMessage(id, url) =>
...
using anchor t emitMsg NextMessage(id)
}
We started to use typed Scala case classes
30. Our commons library – typed messages
Many fine-grained bolts can lead to high number of threads in worker processes and huge
heartbeat states stored in ZooKeeper.
override def transformer(): BaseMessage = {
case m: BaseMessage => MyNewMessage()
}
Each bolt brings at least two threads overhead.
Message transformation as standard functionality in base bolt helps to avoid “mapper” bolts..
31. Our commons library – exception handling
class MyBolt … with FailTupleExceptionHandler
…
class MyOtherBolt … {
override def handleException(t: Tuple, tw: Throwable): Unit = …
}
• FailTupleExceptionHandler
• WorkerRestartExceptionHandler
• AckTupleExceptionHandler
• DeactivateTopologyExceptionHandler
• AckTupleWithLimitExceptionHandler
35. Challenge 1: Data is not perfectly ordered
• out-of-order items in both streams might cause unjoined results
36. Challenge 1: Data is not perfectly ordered
• increase join window to compensate for out-of-order items in left stream
• increase synchronization offset for out-of-order items in right stream
38. Challenge 2: topic partitions not consumed evenly
• introduced PartitionAwareKafkaSpout – each item knows it's source partition
trait PartitionAwareMessage extends BaseMessage {
def partition: Int
}
• use minimal timestamp across all partitions for window expiration and sync time
40. Challenge 3: joins with huge join windows
• there are cases when join windows need to be minutes or even hours rather
than seconds – it may be difficult to hold these huge buffers in Storm worker's
RAM
• items are not acknowledged until they aren't joined and fully processed – so
huge number of items stuck in join buffer would not work with reliable Storm
topologies
41. Challenge 3: joins with huge join windows
Introduced another flavor of the join using external storage
• store join window items to Aerospike in-memory storage via REST API
• allows to store and retrieve arbitrary data by key
• API supports batching for performance
42. Challenge 3: joins with huge join windows
Feeding the data to join window
46. Challenge 3: joins with huge join windows
• fewer nuances than with in-memory join
• more external components
• supports huge join windows
• no handling for unjoined right stream items
• supports right stream with no continuous
throughput (allows pauses)