A Machine Learning Server in Scala.
Building and Deploying ML Applications on production in a fraction of the time.
Slides from SF Scala meetup January 2015 at StumbleUpon.
4. You have a mobile app
A Classic Recommender Example…
App
Predict
products
You need a Recommendation Engine
Predict products that a customer will like – and show it.
Predictive
model
Algorithm - You don't need to write your own:
Spark MLlib - ALS algorithm
Predictive model - based on users’ previous behaviors
5. def pseudocode () {
// Read training data
val trainingData = sc.textFile("trainingData.txt").map(_.split(',') match
{ …. })
// Build a predictive model with an algorithm
val model = ALS.train(trainingData, 10, 20, 0.01)
// Make prediction
allUsers.foreach { user =>
model.recommendProducts(user, 5)
}
}
A Classic Recommender Example
prototyping…
6. • How to deploy a scalable service that respond to dynamic prediction query?
• How do you persist the predictive model, in a distributed environment?
• How to make HBase, Spark and algorithms talking to each other?
• How should I prepare, or transform, the data for model training?
• How to update the model with new data without downtime?
• Where should I add some business logics?
• How to make the code configurable, re-usable and maintainable?
• How do I build all these with a separate of concerns (SoC)?
Beyond Prototyping
7. Engine
Event Server
(data storage)
Data: User Actions
Query via REST:
User ID
Predicted Result:
A list of Product IDs
A Classic Recommender Example
on production…
Mobile App
8. • PredictionIO is a machine learning server for
building and deploying predictive engines
on production
in a fraction of the time.
• Built on Apache Spark, MLlib and HBase.
PredictionIO
9. Data: User Actions
Query via REST:
User ID
Predicted Result:
A list of Product IDs
Engine
Event Server
(data storage)
Mobile App
Event Server
10. • $ pio eventserver
• Event-based
client.create_event(
event="rate",
entity_type="user",
entity_id=“user_123”,
target_entity_type="item",
target_entity_id=“item_100”,
properties= { "rating" : 5.0 }
)
Event Server Collecting Date
11. Query via REST:
User ID
Predicted Result:
A list of Product IDs
Engine
Data: User Actions
Event Server
(data storage)
Mobile App
Engine
12. • DASE - the “MVC” for Machine Learning
• Data: Data Source and Data Preparator
• Algorithm(s)
• Serving
• Evaluator
Engine Building an Engine with
Separation of Concerns (SoC)
13. A. Train deployable predictive model(s)
B. Respond to dynamic query
C. Evaluation
Engine Functions of an Engine
14. Engine A. Train predictive model(s)
class DataSource(…) extends PDataSource
def readTraining(sc: SparkContext)
==> trainingData
class Preparator(…) extends PPreparator
def prepare(sc: SparkContext, trainingData: TrainingData)
==> preparedData
class Algorithm1(…) extends PAlgorithm
def train(prepareData: PreparedData)
==> Model
$ pio train
15. Engine A. Train predictive model(s)
class DataSource(…) extends PDataSource
override def readTraining(sc: SparkContext): TrainingData = {
val eventsDb = Storage.getPEvents()
val eventsRDD: RDD[Event] = eventsDb.find(….)(sc)
val ratingsRDD: RDD[Rating] = eventsRDD.map { event =>
val rating = try {
val ratingValue: Double = event.event match {….}
Rating(event.entityId, event.targetEntityId.get, ratingValue)
} catch {…}
rating
}
new TrainingData(ratingsRDD)
}
16. Engine A. Train predictive model(s)
class Algorithm1(val ap: ALSAlgorithmParams) extends PAlgorithm
def train(preparedData: PreparedData): Model1 = {
mllibRatings = data….
val m = ALS.train(mllibRatings, ap.rank, ap.numIterations, ap.lambda)
new Model1(
rank = m.rank,
userFeatures = m.userFeatures,
productFeatures = m.productFeatures
)
}
17. Engine A. Train predictive model(s)
Event Server
Algorithm 1 Algorithm 3Algorithm 2
PreparedDate
Engine
Data Preparator
Data Source
TrainingDate
Model 3Model 1Model 2
18. B. Respond to dynamic queryEngine
• Query (Input) :
$ curl -H "Content-Type: application/json" -d
'{ "user": "1", "num": 4 }'
http://localhost:8000/queries.json
case class Query(
val user: String,
val num: Int
) extends Serializable
19. B. Respond to dynamic queryEngine
• Predicted Result (Output):
{“itemScores”:[{"item":"22","score":4.072304374729956},
{"item":"62","score":4.058482414005789},
{"item":"75","score":4.046063009943821}]}
case class PredictedResult(
val itemScores: Array[ItemScore]
) extends Serializable
case class ItemScore(
item: String,
score: Double
) extends Serializable
20. class Algorithm1(…) extends PAlgorithm
def predict(model: ALSModel, query: Query)
==> predictedResult
class Serving extends LServing
def serve(query: Query, predictedResults: Seq[PredictedResult])
==> predictedResult
B. Respond to dynamic queryEngine
Query via REST
21. Engine B. Respond to dynamic query
class Algorithm1(val ap: ALSAlgorithmParams) extends
PAlgorithm
def predict(model: ALSModel, query: Query): PredictedResult = {
model….{ userInt =>
val itemScores = model.recommendProducts (…).map (….)
new PredictedResult(itemScores)
}.getOrElse{….}
}
22. B. Respond to dynamic queryEngine
Algorithm 1
Model 1
Serving
Mobile App
Algorithm 3
Model 3
Algorithm 2
Model 2
Predicted Results
Query (input)
Predicted Result (output)
Engine
24. Running on Production
• Install PredictionIO
$ bash -c "$(curl -s http://install.prediction.io/install.sh)"
• Start the Event Server
$ pio eventserver
• Deploy an Engine
$ pio build; pio train; pio deploy
• Update Engine Model with New Data
$ pio train; pio deploy
25. Deploy on Production
Website
Mobile App
Email
Campaign
Event Server
(data storage)
Data
Query via REST
Predicted
Result
Engine 1
Engine 3
Engine 2
Engine 4
26. The Next Step
• Quickstart with an Engine Template!
• Follow on Github: github.com/predictionio/
• Learn PredictionIO: prediction.io/
• Learn Scala! Scala for the Impatient
• Contribute!