This document provides an overview of Apache Spark, including:
- Spark is a cluster computing framework that allows processing of large datasets in parallel across a cluster using a programming model based around Resilient Distributed Datasets (RDDs).
- Spark supports multiple languages and has related projects for SQL, streaming, machine learning and graph processing.
- Spark 1.0 was recently released with a focus on stability, performance improvements, and new features like support for Java 8 and Spark SQL.