3. Steps to create Scala Object in Spark
Apache Spark 3
Create directory
mkdir sparkApp
Change directory
cd sparkApp
Create a file called simple.sbt
vi simple.sbt(Paste the below line in simple.sbt)
name := "Simple Project"
version := "1.0"
scalaVersion := "2.11.0"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0"
4. Steps to create Scala Object in Spark Cont
Apache Spark 4
mkdir src
cd src
mkdir main
cd main
mkdir scala
cd scala
5. Steps to create Scala Object in Spark Cont
Apache Spark 5
vi SimpleApp.scala
Paste below lines
/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SimpleApp {
def main(args: Array[String]) {
val logFile = "YOUR_SPARK_HOME/README.md"
val conf = new SparkConf().setAppName("Simple Application")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
}
}
6. Build jar using SBT
Apache Spark 6
Type below commands to build a jar
sbt package
Once the jar is build.then type below commands
spark-submit --class "SimpleApp" --master local[4] target/scala-2.11/simple-project_2.11-1.0.jar
7. Assignments
Apache Spark 7
1)Create Applications for word count
2)Create Application to join two RDD’s
3)Create application to print values from key-value pair of RDD
4)Create application to get only values from key-value pair of RDD
5)Create application to print min,max and median of values in key-value pair RDD