Slides from my talk at Python Frederick in July 2017 on Pyspark, Big data and Elastic MapReduce. The slides are fairly short and were only there to give a quick overview. The meat of the presentation is in the Github repo link on the last slide.
4. Big Data is when you apply large distributed
computing platforms to a massive dataset.
5. EMR
Amazon Elastic MapReduce (EMR)
is an Amazon Web Services (AWS)
tool for big data processing and
analysis. Amazon EMR offers the
expandable low-configuration
service as an easier alternative to
running in-house cluster computing.
6. Apache Spark
Apache® Spark™ is a powerful open
source processing engine built
around speed, ease of use, and
sophisticated analytics.