3. Beam Programming
model
• Powerful framework for ETL and data
transformation pipelines
• Batch and Stream processing with the
same code
• Java and Python SKDs
• Executed in multiple engines (Apache
Spark, Apache Flink, Apache Hadoop
MapReduce etc.)
• Most importantly, on Google Dataflow.
Fully managed service, a.k.a. no DevOps
involved.
Material in the presentation taken from either https://beam.apache.org or https://cloud.google.com/dataflow/
3/17/18 MeasureCamp #12 3
11. Cool but why?
• Adobe’s Virtual Report suites support custom cut-offs only in Workspace à NOT practical
• What about data outside Adobe Analytics?
• Server / Network logs
• Kiosks / POS logs
• Powerful ETL framework - Alternative to Spark
• There is no other system in existence which provides this degree of flexibility and power, period
…according to Google*.
• No integrated Machine Learning library like Spark’s MLlib, however… you have TensorFlow/Google
Cloud ML or can write separate ML applications in Spark
*: https://cloud.google.com/dataflow/blog/dataflow-beam-and-spark-comparison
3/17/18 MeasureCamp #12 11