5. www.edureka.co/apache-spark-scala-training
How Spark fits into Hadoop Ecosystem ?
Spark is intended to enhance, not replace, the Hadoop stack
Spark is designed to read and write data to HDFS as well as other storage systems such as
CSV files, Amazon S3 and NoSQL databases
6. www.edureka.co/apache-spark-scala-training
Word Count Problem - Spark
Spark Scala Code for Word Count Problem
Spark Python Code for Word Count Problem
Clearly processing data with Spark is much
easier than MapReduce and Spark gives you
the flexibility to choose your favorite
language Scala, Java, Python etc.
8. www.edureka.co/apache-spark-scala-training
Why Spark for Big Data Analytics ?
Following features make Spark, the best fit for Big Data Analytics :
Spark simplifies data analysis
Spark provides built-in libraries to do advanced analytics
Spark speaks more than one language
Spark provides faster results
Spark allows you to use different Hadoop vendors
11. www.edureka.co/apache-spark-scala-training
Spark : Best of both Worlds
It’s a common misconception Spark is only for in-memory processing. From its inception
Spark was designed to be a general execution engine that works both in-memory and on-
disk. Almost all Spark operators perform external operations when data does not fit in
memory
12. www.edureka.co/apache-spark-scala-training
Spark Libraries
Spark SQL : Spark’s module for working with structured data
MLlib : Spark’s machine learning library
GraphX : Spark’s API for graph computation
Spark Streaming : Spark’s API to process streaming data
16. www.edureka.co/apache-spark-scala-training
Spark is here to stay
Spark is not one of those "here today, gone tomorrow". Spark is here to stay for the foreseeable
future, and it is well worth to get your teeth into it in order to get value out of your data
18. www.edureka.co/apache-spark-scala-training
References
IBM backs Apache Spark for Big Data Analytics :
http://www.forbes.com/sites/paulmiller/2015/06/15/ibm-backs-apache-spark-for-big-data-analytics/
Why Cloudera is saying 'Goodbye, MapReduce' and 'Hello, Spark' :
http://fortune.com/2015/09/09/cloudera-spark-mapreduce/
5 reasons to turn to Spark for Big Data Analytics :
http://www.infoworld.com/article/2897287/big-data/5-reasons-to-turn-to-spark-for-big-data-analytics.html
19. www.edureka.co/apache-spark-scala-training
References
Spark new record for large scale sorting :
https://databricks.com/blog/2014/11/05/spark-officially-sets-a-new-record-in-large-scale-sorting.html
How eBay uses Spark to ignite Data Analytics :
http://www.ebaytechblog.com/2014/05/28/using-spark-to-ignite-data-analytics/
Spark is fast on disk too :
https://gigaom.com/2014/10/10/databricks-demolishes-big-data-benchmark-to-prove-spark-is-fast-on-disk-too/