Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

understand Storm in pictures

Apache Storm understand in pictures, basic, Acker and so on.

  • Sé el primero en comentar

understand Storm in pictures

  1. 1. Storm In Pictures http://zqhxuyuan.github.io/ 2016-7-15
  2. 2. Storm基本构件(What Makes Storm) DAG Tuple Tuple Tuple Tuple Tuple Stream Spout Bolt
  3. 3. Topology、Stream、Spout、Bolt network of spouts and bolts DAG
  4. 4. Topology、Stream、Spout、Bolt unbounded sequence of tuples Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple Tuple
  5. 5. Topology、Stream、Spout、Bolt Source of Stream
  6. 6. Topology、Stream、Spout、Bolt Processes input streams,Produces new streams Sink
  7. 7. Topology、Stream、Spout、Bolt Processes input streams,Produces new streams
  8. 8. Message/Tuple Transform
  9. 9. Tuple
  10. 10. Tuple
  11. 11. Tuple
  12. 12. Tuple
  13. 13. Tuple
  14. 14. Tuple
  15. 15. Tuple Tuple Tuple Tuple
  16. 16. Tuple Tuple Tuple
  17. 17. Tuple Tuple Tuple
  18. 18. Tuple Tuple
  19. 19. Tuple Tuple Tuple Tuple Tuple Tuple ⼀一个Tuple的⽣生命周期 1. Spout发射出去 2. 在Stream中流动 3. 被Bolt处理计算 4. 由Bolt再次发送 5. 再次进⼊入消息流 6. 直到被完全处理 ① ② ③ ④ ⑤ ⑥
  20. 20. Tuple Tuple Tuple Tuple Tuple Tuple ✖️ ✖️ ✖️ ✖️ ✖️ Guaranteeing Message Processing 1. At Least Once: Acker 2. Exactly Once: Trident 如果消息处理失败,Storm如何做到消息被重新处理?
  21. 21. Sentence Spout Split Sentence Bolt Word Count Bolt ["the cow jumped over the moon"] [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] Storm considers a tuple coming off a spout "fully processed" when the tuple tree has been exhausted and every message in the tree has been processed tuple tree 🐂 ⽓气冲天
  22. 22. Sentence Spout Split Sentence Bolt Word Count Bolt ["the cow jumped over the moon"] [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] collector.emit("split", new Values("the cow jumped over the moon"), 1) msgIdstream-id used for identify tuple lateremit a tuple to one of output streams Tuple Lifecycle(API Layer) a tuple coming off of a spout
  23. 23. Sentence Spout Split Sentence Bolt Word Count Bolt ["the cow jumped over the moon"] [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] collector.emit("split", new Values("the cow jumped over the moon"), 1) tuple tree fully processed Tuple Lifecycle(API Layer) w’ll talk about later
  24. 24. Sentence Spout Split Sentence Bolt Word Count Bolt ["the cow jumped over the moon"] [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] collector.emit("split", new Values("the cow jumped over the moon"), 1) tuple tree failed(time-out) × × Tuple Lifecycle(API Layer)
  25. 25. Sentence Spout Split Sentence Bolt Word Count Bolt Kestrel /Kafka Sentence Spout Split Sentence Bolt Word Count Bolt Kestrel /Kafka Sentence Spout Split Sentence Bolt Word Count Bolt Kestrel /Kafka Sentence Spout Split Sentence Bolt Word Count Bolt Kestrel /Kafka Sentence Spout Split Sentence Bolt Word Count Bolt Kestrel /Kafka ack(1) tuple’s mesgId=1 take the message off the queue Tuple Lifecycle(State Machine)
  26. 26. Sentence Spout Split Sentence Bolt Word Count Bolt Kestrel /Kafka Sentence Spout Split Sentence Bolt Word Count Bolt Kestrel /Kafka Sentence Spout Split Sentence Bolt Word Count Bolt Kestrel /Kafka Sentence Spout Split Sentence Bolt Word Count Bolt Kestrel /Kafka Sentence Spout Split Sentence Bolt Word Count Bolt Kestrel /Kafka × put the message back on the queue fail(1) tuple’s mesgId=1 Tuple Lifecycle(State Machine)
  27. 27. 1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple 1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately. Sentence Spout Split Sentence Bolt Word Count Bolt ["the cow jumped over the moon"] [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] anchored anchored anchored anchored anchored anchored each word tuple is anchored by sentence tuple Storm: YOU: spout tuple word tuple Tuple Lifecycle(Program Layer) Kestrel /Kafka ["the cow jumped over the moon"] input tuple output tuple input tuple output tuple
  28. 28. 1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple 1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately. Sentence Spout Split Sentence Bolt Word Count Bolt [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] anchored anchored anchored anchored anchored anchored each word-count tuple is anchored by word tuple Storm: YOU: anchored anchored anchored anchored anchored anchored ["the cow jumped over the moon"] word-count tuple Tuple Lifecycle(Program Layer) Kestrel /Kafka ["the cow jumped over the moon"] word tuple input tuple output tuple input tuple output tuple
  29. 29. 1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple 1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately. Sentence Spout Split Sentence Bolt Word Count Bolt [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] anchored anchored anchored anchored anchored anchored Storm: YOU: anchored anchored anchored anchored anchored anchored ["the cow jumped over the moon"] ✅ Tuple Lifecycle(Program Layer) Kestrel /Kafka ["the cow jumped over the moon"] ack word tuple: [“the”]
  30. 30. 1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple 1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately. Sentence Spout Split Sentence Bolt Word Count Bolt [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] anchored anchored anchored anchored anchored anchored Storm: YOU: anchored anchored anchored anchored anchored anchored ["the cow jumped over the moon"] ✅ ✅ Tuple Lifecycle(Program Layer) Kestrel /Kafka ["the cow jumped over the moon"] ack word tuple: [“cow”]
  31. 31. 1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple 1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately. Sentence Spout Split Sentence Bolt Word Count Bolt [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] anchored anchored anchored anchored anchored anchored Storm: YOU: anchored anchored anchored anchored anchored anchored ["the cow jumped over the moon"] ✅ ✅ ✅ ✅ ✅ ✅ Tuple Lifecycle(Program Layer) Kestrel /Kafka ["the cow jumped over the moon"] ack word tuple: [“moon”]
  32. 32. 1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple 1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately. Sentence Spout Split Sentence Bolt Word Count Bolt [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored ["the cow jumped over the moon"] ✅ ✅ ✅ ✅ ✅ ✅ Storm: YOU: Tuple Lifecycle(Program Layer) Kestrel /Kafka ["the cow jumped over the moon"] ✅ ack sentence tuple: [“the cow jumped over the moon”] the input tuple is acked after all the word tuples are emitted input tuple word tuples
  33. 33. 1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple 1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately. Sentence Spout Split Sentence Bolt Word Count Bolt [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored ["the cow jumped over the moon"] ✅ ✅ ✅ ✅ ✅ ✅ Storm: YOU: Kestrel /Kafka tuple tree full processed ack(msgId=1) Tuple Lifecycle(Program Layer) ["the cow jumped over the moon"] ✅
  34. 34. 1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple 1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately. Sentence Spout Split Sentence Bolt Word Count Bolt [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored ["the cow jumped over the moon"] ✅ ✅ ✅ ✅ ✅ ✅ Storm: YOU: Kestrel /Kafka tuple tree full processed ack(msgId=1) Tuple Lifecycle(Program Layer) ✅
  35. 35. 1. tell Storm whenever you're creating a new link in the tree of tuples 2. tell Storm when you have finished processing an individual tuple 1. can detect when the tree of tuples is fully processed 2. can ack or fail the spout tuple appropriately. Sentence Spout Split Sentence Bolt Word Count Bolt [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored anchored ["the cow jumped over the moon"] ✅ ✅ ✅ ✅ ✅ Storm: YOU: Kestrel /Kafka Tuple Lifecycle(Program Layer) Since the word tuple is anchored, the spout tuple at the root of the tree w’be replayed later on if the word tuple failed to be processed downstream ["the cow jumped over the moon"] tuple tree failed fail(msgId=1) ×× this.collector.fail(tuple)
  36. 36. Sentence Spout Split Sentence Bolt Word Count Bolt [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] ["the cow jumped over the moon"] Kestrel /Kafka ["the cow jumped over the moon"]
  37. 37. Sentence Spout Split Sentence Bolt Word Count Bolt [“cow”] [“the”] ["jumped”] ["over”] ["the”] ["moon”] ["the”,1] ["jumped”,1] ["cow”,1] ["the”,2] ["over”,1] ["moon”,1] ["the cow jumped over the moon"] Kestrel /Kafka × × ×
  38. 38. tuple1 tuple2 tuple3 input tuple output tuple multi-anchored tuple tuple1 tuple2 tuple3 × tuple1 tuple2 tuple3 replay…tuple3 failed
  39. 39. ONE MORE THING + reading an input tuple, + emitting tuples based on it + and then acking the tuple at the end of the execute() Every tuple you process must be acked or failed. Storm uses memory to track each tuple, so if you don't ack/fail every tuple, the task will eventually run OOM. STORM DO IT FOR YOU! YOU DON’T NEED Attention Anchor & Ack Anymore ✅
  40. 40. Acker
  41. 41. Spout数据源发射⼀一个Tuple,怎么算被完全处理? Spout Bolt1 Bolt2 Bolt3tuple1
  42. 42. tuple1 SentenceSpout tuple1 tuple3 SplitBolt ["the cow jumped..”] tuple4 tuple2 tuple6 tuple7 tuple5 tuple3 tuple4 tuple2 [“the”] ["cow”] ["jumped”] ["cow”,1] ["the”,1] ["jumped”,1] ["the cow jumped.”] tuple6 tuple7 tuple5 WordCountBolt PrintBolt Tuple Tree 🌲
  43. 43. 在Spout中发射⼀一个新的源Tuple时, 可以为该源Tuple指定⼀一个MessageId。 多个源Tuple可以共⽤用同⼀一个MessageId, 表⽰示多个源Tuple组成同⼀一个消息单元, 它们会被放到同⼀一棵Tuple树中 tuple1 tuple2 Spout tuple1 tuple3 Bolt1 tuple2 tuple4 Bolt2 tuple3 tuple5 tuple4 tuple6 Bolt3 Bolt4 tuple5 tuple6 Bolt5 collector.emit(new Values(tuple1), Message1); collector.emit(new Values(tuple2), Message1); collector.emit(new Values(tuple1), Message1); collector.emit(new Values(tuple2), Message2); Tuple Tree 🌲 🌲 🌲 Message1
  44. 44. 1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了 tuple1 tuple2 Spout tuple1 tuple3 Bolt1 tuple2 tuple4 Bolt2 tuple3 tuple5 tuple4 tuple6 Bolt3 Bolt4 tuple5 tuple6 Bolt5
  45. 45. 1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了 tuple1 tuple2 Spout tuple1 tuple3 Bolt1 tuple2 tuple4 Bolt2 tuple3 tuple5 tuple4 tuple6 Bolt3 Bolt4 tuple5 tuple6 Bolt5
  46. 46. tuple1 tuple2 Spout tuple1 tuple3 Bolt1 tuple2 tuple4 Bolt2 tuple3 tuple5 tuple4 tuple6 Bolt3 Bolt4 tuple5 tuple6 Bolt5 1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了
  47. 47. tuple1 tuple2 Spout tuple1 tuple3 Bolt1 tuple2 tuple4 Bolt2 tuple3 tuple5 tuple4 tuple6 Bolt3 Bolt4 tuple5 tuple6 Bolt5 1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了
  48. 48. tuple1 tuple2 Spout tuple1 tuple3 Bolt1 tuple2 tuple4 Bolt2 tuple3 tuple5 tuple4 tuple6 Bolt3 Bolt4 tuple5 tuple6 Bolt5 1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了
  49. 49. tuple1 tuple2 Spout tuple1 tuple3 Bolt1 tuple2 tuple4 Bolt2 tuple3 tuple5 tuple4 tuple6 Bolt3 Bolt4 tuple5 tuple6 Bolt5 1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了
  50. 50. tuple1 tuple2 Spout tuple1 tuple3 Bolt1 tuple2 tuple4 Bolt2 tuple3 tuple5 tuple4 tuple6 Bolt3 Bolt4 tuple5 tuple6 Bolt5 1. Spout中Message1绑定了tuple1和tuple2(同⼀一个MessageId) 2. tuple1发送给Bolt1处理,tuple2发送给Bolt2处理 3. Bolt1处理tuple1⽣生成tuple3,Bolt2处理tuple2⽣生成tuple4 4. Bolt1⽣生成的tuple3流向Bolt3,Bolt2⽣生成的tuple4流向Bolt4 5. Bolt3处理tuple3⽣生成tuple5,Bolt4处理tuple4⽣生成tuple6 6. Bolt3⽣生成的tuple5和Bolt4⽣生成的tuple6都流向了同⼀一个Bolt5 7. Bolt5处理完tuple5和tuple6,表⽰示Message1被完全处理了 Message1 ✅
  51. 51. Spout Bolt1 Bolt2 Bolt3tuple1 tuple2 tuple3 完全处理: 源Tuple以及由该源Tuple衍⽣生的所有Tuple都经过了Topology中每⼀一个应该到达的Bolt的处理 tuple1 tuple1 tuple2 tuple2 tuple3 tuple3 Spout发射Tuple Bolt1接收Tuple1 Bolt1处理Tuple1 Bolt1发射Tuple2 Bolt2接收Tuple2 Bolt2处理Tuple2 Bolt2发射Tuple3 Bolt3接收Tuple3 Bolt3处理Tuple3 spout-tuple-1 processed table:只有全部为Y,才表⽰示完全处理 Spout Bolt1 Bolt2 Bolt3tuple1 tuple2tuple1 tuple1 tuple2 Spout Bolt1 Bolt2 Bolt3tuple1 tuple2 tuple3tuple1 tuple1 tuple2 tuple2 tuple3 ✅ × × × × ✅
  52. 52. Spout Bolt1 Bolt2 Bolt3tuple1 tuple24tuple1 tuple23 tuple25 tuple26 tuple22 tuple21 tuple27 …… …… tuple33 tuple32 tuple34 tuple35 tuple31 What would spout-tuple-1 processing table like? A REALLY LARGE/HUGE TABLE!!!
  53. 53. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack Acker组件:跟踪Spout发出的每⼀一个Tuple的Tuple🌲 🌲 1. emit(tuple, …) 2. ack(tuple)
  54. 54. Solution1:拉链式
  55. 55. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  56. 56. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  57. 57. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  58. 58. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  59. 59. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  60. 60. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  61. 61. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  62. 62. Solution1:渐进式
  63. 63. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  64. 64. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  65. 65. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  66. 66. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  67. 67. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  68. 68. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  69. 69. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack emit em it ack ack 🌲
  70. 70. How Storm Implements Acker… How does Storm implement reliability in an efficient way?
  71. 71. A Storm topology has a set of special "acker" tasks that track the DAG of tuples for every spout tuple. When an acker sees that a DAG is complete, it sends a message to the spout task that created the spout tuple to ack the message. 1. Acker can have many tasks just like Spout/Bolt 2. DAG of tuples is a Tuple Tree which 3. generate by Spout #tuple(by one of Spout task) 4. The Spout #tuple associated with a MessageId 5. When all tuples on Tuple Tree are full processed 6. Acker send a message to the Spout task on #3 7. Spout can ack the Message along with #tuple
  72. 72. 理解Storm可靠性的最好的⽅方法是来看看tuple和tuple树的⽣生命周期,当⼀一个tuple被创建,不管是spout还是bolt创建 的,它会被赋予⼀一个64位的id,⽽而acker就是利⽤用这个id去跟踪所有tuple的。每个tuple知道它的祖宗的id(从spout 发出来的那个tuple的id,⼀一棵tuple树的root tuple-id是固定的), 每当你新发射⼀一个tuple, 它的祖宗id都会传给这个 新的tuple。当⼀一个tuple被ack的时候,会发⼀一个消息给acker,告诉acker这个tuple树发⽣生了怎么样的变化。 具体来说就是它告诉acker: 我已经完成了,我有这些⼉儿⼦子tuple, 你跟踪⼀一下他们吧。 The best way to understand Storm's reliability implement is to look at the lifecycle of tuples and tuple DAGs. When a tuple is created in a topology, whether in a spout or a bolt, it is given a random 64 bit id. These ids are used by ackers to track the tuple DAG for every spout tuple. Every tuple knows the ids of all the spout tuples for which it exists in their tuple trees. When you emit a new tuple in a bolt, the spout tuple ids from the tuple's anchors are copied into the new tuple. When a tuple is acked, it sends a message to the appropriate acker tasks with information about how the tuple tree changed. In particular it tells the acker "I am now completed within the tree for this spout tuple, and here are the new tuples in the tree that were anchored to me".
  73. 73. When a tuple is acked, it sends a message to the appropriate acker tasks with information about how the tuple tree changed. In particular it tells the acker "I am now completed within the tree for this spout tuple, and here are the new tuples in the tree that were anchored to me" For example, if tuples "D" and "E" were created based on tuple "C", here's how the tuple tree changes when "C" is acked: Since "C" is removed from the tree at the same time that "D" and "E" are added to it, the tree can never be prematurely completed. 1. Bolt emit 时不会向Acker发送消息,Bolt ack 时才会向Acker发送消息 2. ack时知道要ack的input tuple的id和emit时产⽣生的所有output tuple的ids 3. 所以ack时可以把input tuple id和emit的所有output tuple ids先计算好后 才向Acker发送消息 4. Acker收到Bolt的ack消息,将当前的ack val和收到的ack消息进⾏行计算, 得到的结果表⽰示tuple树的变化情况 5. Bolt⼀一旦对input tuple进⾏行ack后,从当前input tuple⼀一直回溯到 root tuple都不再需要保存相关信息 只需要在Acker中保存最新emit出来的output tuples 为什么不需要记录祖先tuple-id(不仅仅是spout tuple id,也包括上游输⼊入tuple)
  74. 74. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack ack ack Acker组件:跟踪Spout发出的每⼀一个Tuple的Tuple🌲 🌲 1. emit(tuple, …) 2. ack(tuple) emit emit
  75. 75. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack ack ack 🌲 emit emit tuple1
  76. 76. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack ack ack 🌲 emit emit tuple1 tuple2 × × × ×:表⽰示⽗父tuple已经完成,Acker需要跟踪⼦子tuples
  77. 77. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack ack ack 🌲 emit emit tuple1 tuple2 × × tuple3 × × × ×
  78. 78. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack ack ack 🌲 emit emit tuple1 tuple2 × × tuple3 × × × × × ××
  79. 79. Spout Bolt1 Bolt2 Bolt3 AckerBolt tuple1 tuple2 tuple3 ack_value tuple1 tuple2 tuple3 ack()/fail()? em it ack ack ack 🌲 emit emit tuple1 tuple2 × × tuple3 × × × × × ×× ✅
  80. 80. ⼀一点代数知识 ⾃自⼰己和⾃自⼰己^异或^⼀一定等于0 0000 ^ 0000 ——— 0000 0 0 1 1 0 1 1 0^ 100 1 0001 ^ 0001 ——— 0000 0010 ^ 0010 ——— 0000 0011 ^ 0011 ——— 0000 0100 ^ 0100 ——— 0000 010100110110010011 ^ 010100110110010011 ——————————— 000000000000000000 两个不相同(不是⾃自⼰己和⾃自⼰己)异或不为0 0000 ^ 0001 ——— 0001 0001 ^ 1001 ——— 1000 0010 ^ 0110 ——— 0100 0011 ^ 0010 ——— 0001 1100 ^ 0100 ——— 1000 010100110110010011 ^ 010100111110010011 ——————————— 000000001000000000 0 1 0 1 0 1 1 0 0 1
  81. 81. 那么有没有办法得到0呢? 0000 ^ 0001 ——— 0001 0001 ^ 1100 ——— 1101 1101 ^ 0010 ——— 1111 1111 ^ 1001 ——— 0110 0110 ^ 0110 ——— 0000 0^X1=X1 X1^X2=X3 X3^X4=X5 X5^X6=X7 X7^X7= 0 X1 X1 X2 X3 X4 X5 X6 X7 X7 ⾃自⼰己和⾃自⼰己异或⼀一定等于0 0001 1100 0000 0001 1101 0010 11010001 1111 1111 1001 0110 0110 0110 0000 X1 X2 X4 X6 X7
  82. 82. Spout Bolt10001 Bolt21010 Bolt30011 Spout/Bolt发射Tuple时都会为Tuple⽣生成⼀一个ID Spout/Bolt有往下游发射Tuple,必须有Bolt接收 最后⼀一个Bolt没有发射Tuple,表⽰示Topology结束 0001 1010 0011 发射 接收 发射 接收 发射 接收 0001 1010 00110001 1010 0011^ ^ ^ ^ ^ ( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( ) 0000 0000 0000 ^ ^ 0000 tuple1 tuple1 tuple2 tuple3tuple2 tuple3
  83. 83. 0001 1010 00110001 1010 0011^ ^ ^ ^ ^ ( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( ) 0000 0000 0000 ^ ^ 0000 Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011 Spout发射⼀一个Tuple,id=0001,Acker跟踪此spout tuple tuple1 tuple1 tuple2 tuple3tuple2 tuple3
  84. 84. 0001 1010 00110001 1010 0011^ ^ ^ ^ ^ ( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( ) 0000 0000 0000 ^ ^ 0000 Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011 Bolt1接收到Spout发射的input tuple,但还没有处理,不会和Acker通信 tuple1 tuple1 tuple2 tuple3tuple2 tuple3
  85. 85. 0001 1010 00110001 1010 0011^ ^ ^ ^ ^ ( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( ) 0000 0000 0000 ^ ^ 0000 Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011 Bolt1发射新的Tuple:1010,并且对input tuple=tuple1进⾏行ack,会和Acker通信 tuple1 tuple1 tuple2 tuple3tuple2 tuple3
  86. 86. 0001 1010 00110001 1010 0011^ ^ ^ ^ ^ ( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( ) 0000 0000 0000 ^ ^ 0000 Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011 Acker中只会保留新⽣生成的⼦子tuple=tuple2的id,祖先tuple ids不会记录 tuple1 tuple1 tuple2 tuple3tuple2 tuple3
  87. 87. 0001 1010 00110001 1010 0011^ ^ ^ ^ ^ ( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( ) 0000 0000 0000 ^ ^ 0000 Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011 Bolt2接收tuple2,处理tuple2,发射⼦子tuple=tuple3,ack(tuple2) tuple1 tuple1 tuple2 tuple3tuple2 tuple3
  88. 88. 0001 1010 00110001 1010 0011^ ^ ^ ^ ^ ( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( ) 0000 0000 0000 ^ ^ 0000 Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011 Acker中只会保留新⽣生成的⼦子tuple=tuple3的id,祖先tuple ids不会记录 tuple1 tuple1 tuple2 tuple3tuple2 tuple3
  89. 89. 0001 1010 00110001 1010 0011^ ^ ^ ^ ^ ( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( ) 0000 0000 0000 ^ ^ 0000 Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011 Bolt3接收tuple3,处理tuple3,不再发射新tuple,ack(tuple3) tuple1 tuple1 tuple2 tuple3tuple2 tuple3
  90. 90. 0001 1010 00110001 1010 0011^ ^ ^ ^ ^ ( )0001 1010 00110001 1010 0011^ ^ ^ ^ ^( ) ( ) 0000 0000 0000 ^ ^ 0000 Spout Bolt10001 Bolt21010 Bolt300110001 1010 0011 tuple1 tuple1 tuple2 tuple3tuple2 tuple3 没有新⽣生成的tuple,Acker的ack_val=0,表⽰示TupleTree完全处理 ✅
  91. 91. (spout-tuple-id, tmp-ack-val) tmp-ack-val = spout-tuple-id ^ (child-tuple-id1 ^ child-tuple-id2 ... ) tmp-ack-val是要ack的tuple的id与由它新创建的所有的tuple的id异或的结果 以spout产⽣生spout-tuple-id为例(tuple1),Bolt1产⽣生bolt1-tuple-id(tuple2), Bolt2产⽣生bolt2-tuple-id(tuple3),Bolt3不产⽣生tuple。 Spout发射Tuple1,Acker记录tuple1的id,⽤用于跟踪spout-tuple tmp-ack-val = spout-tuple-id Bolt1处理Spout的tuple1,发射tuple2,并ack Spout的tuple1 tmp-ack-val = spout-tuple-id ^ (spout-tuple-id ^ bolt1-tuple-id) = (spout-tuple-id ^ spout-tuple-id) ^ bolt1-tuple-id = 0 ^ bolt1-tuple-id = bolt1-tuple-id Bolt2处理Bolt1的tuple2,发射tuple3,并ack Bolt1的tuple2 tmp-ack-val = spout-tuple-id ^ (spout-tuple-id ^ bolt1-tuple-id) ^ (bolt1-tuple-id ^ bolt2-tuple-id) = (spout-tuple-id ^ spout-tuple-id) ^ (bolt1-tuple-id ^ bolt1-tuple-id) ^ bolt2-tuple-id = 0 ^ 0 ^ bolt2-tuple-id = bolt2-tuple-id Bolt3处理Bolt2的tuple3,不发射tuple,并ack Bolt2的tuple3 tmp-ack-val = spout-tuple-id ^ (spout-tuple-id ^ bolt1-tuple-id) ^ (bolt1-tuple-id ^ bolt2-tuple-id) ^ bolt2-tuple-id = (spout-tuple-id ^ spout-tuple-id) ^ (bolt1-tuple-id ^ bolt1-tuple-id) ^ (bolt2-tuple-id ^ bolt2-tuple-id) = 0 ^ 0 ^ 0 = 0
  92. 92. Spout Bolt1 Bolt2 Bolt3 Acker Task1 tuple11 tuple12 tuple13 ack_value tuple11 tuple12 tuple13 ack()/fail()? em it ack ack ack 🌲 emit emit Acker Task2 Acker Task3
  93. 93. Spout Bolt1 Bolt2 Bolt3 Acker Task2 tuple21 tuple22 tuple23 ack_value tuple21 tuple22 tuple23 ack()/fail()? em it ack ack ack 🌲 emit emit Acker Task1 Acker Task3
  94. 94. Spout Bolt1 Bolt2 Bolt3 Acker Task3 tuple31 tuple32 tuple33 ack_value tuple31 tuple32 tuple23 ack()/fail()? em it ack ack ack 🌲 emit emit Acker Task1 Acker Task2
  95. 95. Spout Bolt1 Bolt2 Bolt3 Acker Task1 tuple11 tuple12 tuple13 ack_value tuple11 tuple12 tuple13 ack()/fail()? em it ack ack ack 🌲 emit emit Acker Task2 Acker Task2 1. 当⼀一个tuple需要ack时,它到底应该选择哪个Acker来发送这个信息 2. Acker是怎么知道每⼀一个spout tuple应该交给哪个Spout task来处理
  96. 96. 1. 设置Config.TOPOLOGY_ACKERS=1或者更⼤大,默认⼀一个Worker⼀一个Acker 2. 在发射tuple的时候指定messageId来达到跟踪某个特定的Spout tuple的⺫⽬目的 3. 对⼀一个tuple树的所有Tuple执⾏行成功都很关⼼心,发射这些tuple时anchor它们 Spout ack(msgId) different from Bolt ack(tuple) What We Should Do When We Want Use Reliability Of Storm Acker
  97. 97. 参考⽂文档 http://blog.csdn.net/zhangzhebjut/article/details/38467145 http://storm.apache.org/releases/1.0.1/Guaranteeing-message-processing.html http://www.cnblogs.com/foreach-break/p/storm_at_least_once.html http://blog.jassassin.com/2014/10/22/storm/storm-ack/

×