Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Próxima SlideShare
Cargando en…5
×

# GenStage and Flow - Jose Valim

1.351 visualizaciones

Publicado el

Elixir Club 5

Publicado en: Software
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv

¿Estás seguro?    No
Tu mensaje aparecerá aquí

### GenStage and Flow - Jose Valim

1. 1. @elixirlang / elixir-lang.org
2. 2. GenStage&Flow github.com/elixir-lang/gen_stage
3. 3. Prelude: Fromeager,  tolazy,  toconcurrent,  todistributed
4. 4. Exampleproblem: wordcounting
5. 5. “roses are redn violets are bluen" %{“are” => 2, “blue” => 1, “red” => 1, “roses" => 1, “violets" => 1}
6. 6. Eager
7. 7. File.read!("source") "roses are redn violets are bluen"
8. 8. File.read!("source") |> String.split("n") ["roses are red", "violets are blue"]
9. 9. File.read!("source") |> String.split("n") |> Enum.flat_map(&String.split/1) ["roses", “are", “red", "violets", "are", "blue"]
10. 10. File.read!("source") |> String.split("n") |> Enum.flat_map(&String.split/1) |> Enum.reduce(%{}, fn word, map -> Map.update(map, word, 1, & &1 + 1) end) %{“are” => 2, “blue” => 1, “red” => 1, “roses" => 1, “violets" => 1}
11. 11. Eager • Simple • Efficient for small collections • Inefficient for large collections with multiple passes
12. 12. File.read!(“really large file") |> String.split("n") |> Enum.flat_map(&String.split/1)
13. 13. Lazy
14. 14. File.stream!("source", :line) #Stream<...>
15. 15. File.stream!("source", :line) |> Stream.flat_map(&String.split/1) #Stream<...>
16. 16. File.stream!("source", :line) |> Stream.flat_map(&String.split/1) |> Enum.reduce(%{}, fn word, map -> Map.update(map, word, 1, & &1 + 1) end) %{“are” => 2, “blue” => 1, “red” => 1, “roses" => 1, “violets" => 1}
17. 17. Lazy • Folds computations, goes item by item • Less memory usage at the cost of computation • Allows us to work with large or infinite collections
18. 18. Concurrent
19. 19. #Stream<...> File.stream!("source", :line)
20. 20. File.stream!("source", :line) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) |> Flow.partition() |> Flow.reduce(fn -> %{} end, fn word, map -> Map.update(map, word, 1, & &1 + 1) end) #Flow<...>
21. 21. %{“are” => 2, “blue” => 1, “red” => 1, “roses" => 1, “violets" => 1} File.stream!("source", :line) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) |> Flow.partition() |> Flow.reduce(fn -> %{} end, fn word, map -> Map.update(map, word, 1, & &1 + 1) end) |> Enum.into(%{})
22. 22. Flow • We give up ordering and process locality for concurrency • Tools for working with bounded and unbounded data
23. 23. Flow • It is not magic! There is an overhead when data flows through processes • Requires volume and/or cpu/io-bound work to see benefits
24. 24. Flowstats • ~1200 lines of code (LOC) • ~1300 lines of documentation (LOD)
25. 25. Topics • 1200LOC: How is flow implemented? • 1300LOD: How to reason about flows?
26. 26. GenStage
27. 27. GenStage • It is a new behaviour • Exchanges data between stages transparently with back-pressure • Breaks into producers, consumers and producer_consumers
28. 28. GenStage producer consumer producer producer consumer producer consumer consumer
29. 29. GenStage:Demand-driven BA Producer Consumer 1. consumer subscribes to producer 2. consumer sends demand 3. producer sends events Asks 10 Sends max 10 Subscribes
30. 30. B CA Asks 10Asks 10 Sends max 10 Sends max 10 GenStage:Demand-driven
31. 31. • It is a message contract • It pushes back-pressure to the boundary • GenStage is one impl of this contract GenStage:Demand-driven
32. 32. GenStage example
33. 33. BA Printer (Consumer) Counter (Producer)
34. 34. defmodule Producer do use GenStage def init(counter) do {:producer, counter} end def handle_demand(demand, counter) when demand > 0 do events = Enum.to_list(counter..counter+demand-1) {:noreply, events, counter + demand} end end
35. 35. state demand handle_demand 0 10 {:noreply, [0, 1, …, 9], 10} 10 5 {:noreply, [10, 11, 12, 13, 14], 15} 15 5 {:noreply, [15, 16, 17, 18, 19], 20}
36. 36. defmodule Consumer do use GenStage def init(:ok) do {:consumer, :the_state_does_not_matter} end def handle_events(events, _from, state) do Process.sleep(1000) IO.inspect(events) {:noreply, [], state} end end
37. 37. {:ok, counter} = GenStage.start_link(Producer, 0) {:ok, printer} = GenStage.start_link(Consumer, :ok) GenStage.sync_subscribe(printer, to: counter) (wait 1 second) [0, 1, 2, ..., 499] (500 events) (wait 1 second) [500, 501, 502, ..., 999] (500 events)
38. 38. Subscribeoptions • max_demand: the maximum amount of events to ask (default 1000) • min_demand: when reached, ask for more events (default 500)
39. 39. max_demand:10,min_demand:0 1. the consumer asks for 10 items 2. the consumer receives 10 items 3. the consumer processes 10 items 4. the consumer asks for 10 more 5. the consumer waits
40. 40. max_demand:10,min_demand:5 1. the consumer asks for 10 items 2. the consumer receives 10 items 3. the consumer processes 5 of 10 items 4. the consumer asks for 5 more 5. the consumer processes the remaining 5
41. 41. GenStage dispatchers
42. 42. Dispatchers • Per producer • Effectively receive the demand and send data • Allow a producer to dispatch to multiple consumers at once
43. 43. prod 1 2,5 3 4 1,2,3,4,5 DemandDispatcher
44. 44. prod 1,2,3 1,2,3 1,2,3 1,2,3 1,2,3 BroadcastDispatcher
45. 45. prod 1,5 2,6 3 4 1,2,3,4,5,6 PartitionDispatcher rem(event, 4)
46. 46. http://bit.ly/genstage
47. 47. Topics • 1200LOC: How is flow implemented? • Its core is a 80LOC stage • 1300LOD: How to reason about flows?
48. 48. Flow
49. 49. “roses are redn violets are bluen" %{“are” => 2, “blue” => 1, “red” => 1, “roses" => 1, “violets" => 1}
50. 50. File.stream!("source", :line) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) |> Flow.partition() |> Flow.reduce(fn -> %{} end, fn word, map -> Map.update(map, word, 1, & &1 + 1) end) #Flow<...>
51. 51. %{“are” => 2, “blue” => 1, “red” => 1, “roses" => 1, “violets" => 1} File.stream!("source", :line) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) |> Flow.partition() |> Flow.reduce(fn -> %{} end, fn word, map -> Map.update(map, word, 1, & &1 + 1) end) |> Enum.into(%{})
52. 52. File.stream!("source", :line) |> Flow.from_enumerable() Producer
53. 53. File.stream!("source", :line) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) Producer Stage 1 Stage 2 Stage 3 Stage 4 DemandDispatcher
54. 54. "blue" Producer Stage 1 Stage 2 Stage 3 Stage 4 "roses are red" "roses""are""red" "violets are blue" "violets""are"
55. 55. File.stream!("source", :line) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) Producer Stage 1 Stage 2 Stage 3 Stage 4
56. 56. File.stream!("source", :line) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) |> Flow.reduce(fn -> %{} end, fn word, map -> Map.update(map, word, 1, & &1 + 1) end) Producer Stage 1 Stage 2 Stage 3 Stage 4
57. 57. Producer Stage 1 Stage 2 Stage 3 Stage 4 "roses are red" %{“are” => 1, “red” => 1, “roses" => 1} "violets are blue" %{“are” => 1, “blue” => 1, “violets" => 1}
58. 58. Producer Stage 1 Stage 2 Stage 3 Stage 4 %{“are” => 1, “red” => 1, “roses" => 1} %{“are” => 1, “blue” => 1, “violets" => 1}
59. 59. File.stream!("source", :line) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) |> Flow.partition() |> Flow.reduce(fn -> %{} end, fn word, map -> Map.update(map, word, 1, & &1 + 1) end) Producer Stage 1 Stage 2 Stage 3 Stage 4 Stage A Stage B Stage C Stage D
60. 60. Stage CStage A Stage B Stage D Stage 1 "roses""are" Stage 4 %{“are” => 1} "red" Producer"roses are red" %{“roses” => 1} "violets are blue" %{“are” => 1, “red” => 1} Stage 2 Stage 3 %{“are” => 2, “red” => 1} "are"
61. 61. Mapper 4 Reducer C Producer Reducer A Reducer B Reducer D Mapper 1 Mapper 2 Mapper 3 DemandDispatcher PartitionDispatcher
62. 62. File.stream!("source", :line) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) |> Flow.partition() |> Flow.reduce(fn -> %{} end, fn word, map -> Map.update(map, word, 1, & &1 + 1) end) |> Enum.into(%{}) • reduce/3 collects all data into maps • when it is done, the maps are streamed • into/2 collects the state into a map
63. 63. Windowsandtriggers • If reduce/3 runs until all data is processed… what happens on unbounded data? • See Flow.Window documentation.
64. 64. Postlude: Fromeager,  tolazy,  toconcurrent,  todistributed
65. 65. Enum(eager) File.read!("source") |> String.split("n") |> Enum.flat_map(&String.split/1) |> Enum.reduce(%{}, fn word, map -> Map.update(map, word, 1, & &1 + 1) end)
66. 66. Stream(lazy) File.stream!("source", :line) |> Stream.flat_map(&String.split/1) |> Enum.reduce(%{}, fn word, map -> Map.update(map, word, 1, & &1 + 1) end)
67. 67. Flow(concurrent) File.stream!("source", :line) |> Flow.from_enumerable() |> Flow.flat_map(&String.split/1) |> Flow.partition() |> Flow.reduce(%{}, fn word, map -> Map.update(map, word, 1, & &1 + 1) end) |> Enum.into(%{})
68. 68. Flowfeatures • Provides map and reduce operations, partitions, flow merge, flow join • Configurable batch size (max & min demand) • Data windowing with triggers and watermarks
69. 69. Distributed? • Flow API has feature parity with frameworks like Apache Spark • However, there is no distribution nor execution guarantees
70. 70. CHEN, Y., ALSPAUGH, S., AND KATZ, R. Interactive analytical processing in big data systems: a cross-industry study of MapReduce workloads “small inputs are common in practice: 40–80% of Cloudera customers’ MapReduce jobs and 70% of jobs in a Facebook trace have ≤ 1GB of input”
71. 71. GOG, I., SCHWARZKOPF, M., CROOKS, N., GROSVENOR, M. P., CLEMENT, A., AND HAND, S. Musketeer: all for one, one for all in data processing systems. “For between 40-80% of the jobs submitted to MapReduce systems, you’d be better off just running them on a single machine”
72. 72. Distributed? • Single machine matters - try it! • The gap between concurrent and distributed in Elixir is small • Durability concerns will be tackled next
73. 73. Thank-yous
74. 74. Libraryauthors • Give GenStage a try • Build your own producers • TIP: use @behaviour GenStage if you want to make it an optional dep
75. 75. Inspirations • Akka Streams - back pressure contract • Apache Spark - map reduce API • Apache Beam - windowing model • Microsoft Naiad - stage notifications
76. 76. TheTeam • The Elixir Core Team • Specially James Fish (fishcakez) • And Eric and Chris for therapy sessions
77. 77. consulting and software engineering Builtanddesignedat
78. 78. consulting and software engineering Elixircoaching Elixirdesignreview Customdevelopment
79. 79. @elixirlang / elixir-lang.org