SlideShare una empresa de Scribd logo
1 de 43
Descargar para leer sin conexión
Dvir Volk - Redis Labs
Real-time Machine Learning with
Redis-ML and Apache Spark
Agenda
● Intro to Redis and Redis Labs - 5 min
● Using Redis-ML for Model Serving - why and
how - 10 min
● Building a recommendation system using
Spark-ML and Redis-ML - 10 min
● Q&A
2
Redis Labs – Home of Redis
Founded in 2011
HQ in Mountain View CA, R&D center in Tel-Aviv IL
The commercial company behind Open Source Redis
Provider of the Redis Enterprise (Redise
) technology,
platform and products
3
Redise
Cloud Private
Redis Labs Products
Redise
Cloud Redise
Pack ManagedRedise
Pack
SERVICES SOFTWARE
Fully managed Redise
service
in VPCs within AWS, MS
Azure, GCP & IBM Softlayer
Fully managed Redise
service
on hosted servers within
AWS, MS Azure, GCP, IBM
Softlayer, Heroku, CF &
OpenShift
Downloadable Redise
software for any enterprise
datacenter or cloud
environment
Fully managed Redise
Pack in
private data centers
&& &
A Brief Overview of Redis
● Started in 2009 by Salvatore Sanfilippo
● Most popular KV store
● In memory - disk backed
● Notable Users:
○ Twitter, Netflix, Uber, Groupon, Twitch
○ Many, many more...
Redis Main Differentiators
Simplicity
(through Data Structures)
Extensibility
(through Redis Modules)
Performance
ListsSorted Sets
Hashes Hyperlog-logs
Geospatial
Indexes
Bitmaps
SetsStrings
Bit field
6
A Quick Recap of Redis
Key
"I'm a Plain Text String!"
{ A: “foo”, B: “bar”, C: “baz” }
Strings / Bitmaps / BitFields
Hash Tables (objects!)
Linked Lists
Sets
Sorted Sets
Geo Sets
HyperLogLog
{ A , B , C , D , E }
[ A → B → C → D → E ]
{ A: 0.1, B: 0.3, C: 100, D: 1337 }
{ A: (51.5, 0.12), B: (32.1, 34.7) }
00110101 11001110 10101010
Simple Redis Example (string)
127.0.0.1:6379> SET spark summit
OK
127.0.0.1:6379> GET spark
"summit"
127.0.0.1:6379>
Simple Redis Example (hash)
127.0.0.1:6379> HMSET spark_hash org apache version 2.1.1
OK
127.0.0.1:6379> HGET spark_hash version
"2.1.1"
127.0.0.1:6379> HGETALL spark_hash
1) "org"
2) "apache"
3) "version"
4) "2.1.1"
Another Redis Example (ZSET)
127.0.0.1:6379> ZADD my_sorted_set 1 foo
(integer) 1
127.0.0.1:6379> ZADD my_sorted_set 5 bar
(integer) 1
127.0.0.1:6379> ZADD my_sorted_set 3 baz
(integer) 1
127.0.0.1:6379> ZRANGE my_sorted_set 0 2
1) "foo"
2) "baz"
3) "bar"
New Capabilities
Redis Modules: A New Hope
• Dynamic libraries loaded to redis
• Written in C/C++
• Use a C ABI/API isolating redis internals
• Use existing or add new data-structures
• Near Zero latency access to data
New Commands
New Data Types
Modules: A New Approach
Adapt your database to your data, not the other way around
Secure way to store data in
Redis via encrypt/decrypt with
various Themis primitives
Time Series Graph
Time series values aggregation
in Redis
Crypto Engine Wrapper
Graph database on Redis based
on Cypher language
Based on Generic Cell Rate
Algorithm (GCRA)
Rate Limiter
ReJSON
Secondary Index/RQL
JSON Engine on Redis.
Pre-released
Indexing + SQL -like syntax for
querying indexes.
Pre-released
Neural Redis Redis-ML RediSearch
Full Text Search Engine in RedisMachine Learning Model
Serving
Simple Neural Network Native
to Redis
Redis ML
Machine Learning Model Server
ML Models Serving Challenges
– Models are becoming bigger and more complex
– Can be challenging to deploy & serve
– Do not scale well, speed and size
– Can be very expensive
14
Spark-ML End-to-End Flow
Spark Training
Custom Server
Model saved to
Parquet file
Data Loaded
to Spark
Pre-computed
results
Batch Evaluation
?
ClientApp
A Simpler Machine Learning
Lifecycle
Any Training
Platform
Data loaded into Spark Model is saved in
Redis-ML
Redis-ML
Serving Client
Client
App
Client
App
Client
App
+
Spark Training
16
Redis-ML – ML Serving Engine
• Store training output as “hot model”
• Perform evaluation directly in Redis
• Enjoy the performance, scalability and HA of Redis
17
ML Models
Tree Ensembles
Linear Regression
Logistic Regression
Matrix + Vector Operations
More to come...
Redis-ML
18
Random Forest Model
• A collection of decision trees
• Supports classification &
regression
• Splitter Node can be:
◦ Categorical (e.g. day == “Sunday”)
◦ Numerical (e.g. age < 43)
• Decision is taken by the majority of
decision trees
19
Titanic Survival Predictor on a
Decision Tree
YES
Sex =
Male ?
Age <
9.5?
*Sibps >
2.5?
Survived
Died
SurvivedDied
*Sibps = siblings + spouses
NO
20
Titanic Survival Predictor on a Random Forest
YES
Sex =
Male ?
Age <
9.5?
*Sibps >
2.5?
Survived
Died
SurvivedDied
NO YES
Country=
US?
State =
CA?
Height>
1.60m?
Survived
Died
SurvivedDied
NO YES
Weight<
80kg?
I.Q<100?
Eye color
=blue?
Survived
Died
SurvivedDied
NO
Tree #1 Tree #2 Tree #3
Would John Survive The Titanic
• John’s features:
{male, 34, married + 2, US, CA, 1.78m, 78kg,
110iq, blue eyes}
• Tree#1 – Survived
• Tree#2 – Failed
• Tree#3 – Survived
• Random forest decision - Survived
22
Forest Data Type Example
> MODULE LOAD "./redis-ml.so"
OK
> ML.FOREST.ADD myforest 0 . CATEGORIC sex “male” .L LEAF 1 .R LEAF 0
OK
> ML.FOREST.RUN myforest sex:male
"1"
> ML.FOREST.RUN myforest sex:no_thanx
"0"
Using Redis-ML With Spark
scala> import com.redislabs.client.redisml.MLClient
scala> import com.redislabs.provider.redis.ml.Forest
scala> val jedis = new Jedis("localhost")
scala> val rfModel = pipelineModel.stages.last.asInstanceOf[RandomForest]
// Create a new forest instance
scala> val f = new Forest(rfModel.trees)
// Load the model to redis
scala> f.loadToRedis("forest-test", "localhost")
// Classify a feature vector
scala> jedis.getClient.sendCommand(MLClient.ModuleCommand.FOREST_RUN,
"forest-test", makeInputString (0))
scala> jedis.getClient.getStatusCodeReply
res53: String = 1
Real World Challenge
• Ad serving company
• Need to serve 20,000 ads/sec @ 50msec
data-center latency
• Runs 1k campaigns → 1K random forest
• Each forest has 15K trees
• On average each tree has 7 levels (depth)
• Would require < 1000 x c4.8xlarge
25
Redis ML with Spark ML
Classification Time Over Spark
40x Faster
26
Real World Example:
Movie Recommendation System
Spark Training
Overview
Redis-ML
+
28
Concept: One Forest For Each Movie
29
User Features:
(Age, Gender, Movie Ratings)
Movie_1 Forest
3
Movie_2 Forest 2
Movie_n Forest
5
.
.
.
The Tools
Transform:
30
Train:
Classify: +
Containers:
Using the Dockers
$ docker pull shaynativ/redis-ml
$ docker run --net=host shaynativ/redis-ml &
$
$ docker pull shaynativ/spark-redis-ml
$ docker run --net=host shaynativ/spark-redis-ml
Step 1: Get The Data
32
• Download and extract the MovieLens 100K Dataset
• The data is organized in separate files:
– Ratings: user id | item id | rating (1-5) | timestamp
– Item (movie) info: movie id | genre info fields (1/0)
– User info: user id | age | gender | occupation
• Our classifier should return the expected rating (from 1 to 5) a user would
give the movie in question
Step 2: Transform
33
• The training data for each movie should contain 1 line per user:
– class (rating from 1 to 5 the user gave to this movie)
– user info (age, gender, occupation)
– user ratings of other movies (movie_id:rating ...)
– user genre rating averages (genre:avg_score ...)
• Run gen_data.py to transform the files to the desired format
Step3: Train and Load to Redis
// Create a new forest instance
val rf = new
RandomForestClassifier().setFeatureSubsetStrategy("auto").setLabelCol("indexedLabel").setFeat
uresCol("indexedFeatures").setNumTrees(500)
…..
// Train model
val model = pipeline.fit(trainingData)
…..
val rfModel = model.stages(2).asInstanceOf[RandomForestClassificationModel]
// Load the model to redis
val f = new Forest(rfModel.trees)
f.loadToRedis(”movie-10", "127.0.0.1")
34
Step 3: Execute in Redis
Redis-ML
+
Spark
Training
Client App
Python Client Example
>> import redis
>> config = {"host":"localhost", "port":6379}
>> r = redis.StrictRedis(**config)
>> user_profile = r.get("user_shay_profile")
>> print(user_profile)
12:1.0,13:1.0,14:3.0,15:1.0,17:1.0,18:1.0,19:1.0,20:1.0,23:1.0,24:5.0,1.0,115:1.0,116:2.
0,117:2.0,119:1.0,120:4.0,121:2.0,122:2.0,
........
1360:1.0,1361:1.0,1362:1.0,
1701:6.0,1799:435.0,1801:0.2,1802:0.11,1803:0.04,1812:0.04,1813:0.07,1814:0.24,1815:0.09
,1816:0.32,1817:0.06
>> r.execute_command("ML.FOREST.RUN", "movie-10", user_profile)
'3'
Redis CLI Example
>keys *
127.0.0.1:6379> KEYS *
1) "movie-5"
2) "movie-1"
........
10) "movie-10"
11) "user_1_profile”
>ML.FOREST.RUN movie-10
12:1.0,13:1.0,,332:3.0,333:1.0,334:1.0,335:2.0,336:1.0,357:2.0,358:1.0,359
:
........
412:2.0,423:1.0,454:1.0,455:1.0,456:1.0,457:3.0,458:1.0,459:1.0,470:1”
"3”
Performance
Redis time: 0.635129ms, res=3
Spark time: 46.657662ms, res=3.0
---------------------------------------
Redis time: 0.644444ms, res=3
Spark time: 49.028983ms, res=3.0
---------------------------------------
Classification averages:
redis: 0.9401250000000001 ms
spark: 58.01970206666667 ms
ratio: 61.71488053893542
diffs: 0.0
Python Script
r = redis.StrictRedis()
user_profile = r.get("user-1-profile")
results = {}
for i in range(1, 11):
results[i] = r.execute_command("ML.FOREST.RUN", "movie-{}".format(i), user_profile)
sorted_results = sorted(results.items(), key=operator.itemgetter(1), reverse=True)
for k,v in sorted_results:
print "movie-{}:{}".format(k,v)
print "nRecommended movie: movie-{}".format(sorted_results[0][0])
The Results...
$ ./classify_user.py 1
Movies sorted by scores:
movie-4:3
movie-3:2
movie-6:2
movie-7:2
movie-8:2
movie-9:2
movie-1:1
movie-2:1
movie-5:1
movie-10:0
Recommended movie for user 1: movie-4
Summary
• Train with Spark, Serve with Redis
• 97% resource cost serving
• Simplify ML lifecycle
• Redise
(Cloud or Pack):
‒Scaling, HA, Performance
‒PAYG – cost optimized
‒Ease of use
Spark Training
Data loaded into Spark Model is saved in
Redis-ML
Redis-ML
Serving Client
Client
App
Client
App
Client
App
+
Resources
● Redis-ML: https://github.com/RedisLabsModules/redis-ml
● Spark-Redis-ML: https://github.com/RedisLabs/spark-redis-ml
● Databricks Notebook: http://bit.ly/sparkredisml
● Dockers: https://hub.docker.com/r/shaynativ/redis-ml/
https://hub.docker.com/r/shaynativ/spark-redis-ml/
Thank You.
dvir@redislabs.com

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Terraform modules restructured
Terraform modules restructuredTerraform modules restructured
Terraform modules restructured
 
A Hands-on Introduction on Terraform Best Concepts and Best Practices
A Hands-on Introduction on Terraform Best Concepts and Best Practices A Hands-on Introduction on Terraform Best Concepts and Best Practices
A Hands-on Introduction on Terraform Best Concepts and Best Practices
 
AWS를 활용해서 글로벌 게임 런칭하기 - 박진성 AWS 솔루션즈 아키텍트 :: AWS Summit Seoul 2021
AWS를 활용해서 글로벌 게임 런칭하기 - 박진성 AWS 솔루션즈 아키텍트 :: AWS Summit Seoul 2021AWS를 활용해서 글로벌 게임 런칭하기 - 박진성 AWS 솔루션즈 아키텍트 :: AWS Summit Seoul 2021
AWS를 활용해서 글로벌 게임 런칭하기 - 박진성 AWS 솔루션즈 아키텍트 :: AWS Summit Seoul 2021
 
Splunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operatorSplunk: Druid on Kubernetes with Druid-operator
Splunk: Druid on Kubernetes with Druid-operator
 
AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)
AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)
AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)
 
AWS CDK Introduction
AWS CDK IntroductionAWS CDK Introduction
AWS CDK Introduction
 
Terraform
TerraformTerraform
Terraform
 
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis LabsRedis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
 
Introduction to Tokyo Products
Introduction to Tokyo ProductsIntroduction to Tokyo Products
Introduction to Tokyo Products
 
Introduction to AWS Lambda and Serverless Applications
Introduction to AWS Lambda and Serverless ApplicationsIntroduction to AWS Lambda and Serverless Applications
Introduction to AWS Lambda and Serverless Applications
 
Deep Dive into AWS SAM
Deep Dive into AWS SAMDeep Dive into AWS SAM
Deep Dive into AWS SAM
 
[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항
[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항
[오픈소스컨설팅]클라우드기반U2L마이그레이션 전략 및 고려사항
 
Using Virtual Private Cloud (vpc)
Using Virtual Private Cloud (vpc)Using Virtual Private Cloud (vpc)
Using Virtual Private Cloud (vpc)
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
[WhaTap DevOps Day] 세션 1 : Observability Practice on AWS
[WhaTap DevOps Day] 세션 1 : Observability Practice on AWS[WhaTap DevOps Day] 세션 1 : Observability Practice on AWS
[WhaTap DevOps Day] 세션 1 : Observability Practice on AWS
 
A Brief Look at Serverless Architecture
A Brief Look at Serverless ArchitectureA Brief Look at Serverless Architecture
A Brief Look at Serverless Architecture
 
Building a Development Workflow for Serverless Applications - March 2017 AWS ...
Building a Development Workflow for Serverless Applications - March 2017 AWS ...Building a Development Workflow for Serverless Applications - March 2017 AWS ...
Building a Development Workflow for Serverless Applications - March 2017 AWS ...
 
Mule caching strategy with redis cache
Mule caching strategy with redis cacheMule caching strategy with redis cache
Mule caching strategy with redis cache
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Data Source API in Spark
Data Source API in SparkData Source API in Spark
Data Source API in Spark
 

Similar a Getting Ready to Use Redis with Apache Spark with Dvir Volk

RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-MLRedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
Redis Labs
 
How in memory technology will impact machine deep learning services (redis la...
How in memory technology will impact machine deep learning services (redis la...How in memory technology will impact machine deep learning services (redis la...
How in memory technology will impact machine deep learning services (redis la...
Avner Algom
 
Deploying Real-Time Decision Services Using Redis with Tague Griffith
Deploying Real-Time Decision Services Using Redis with Tague GriffithDeploying Real-Time Decision Services Using Redis with Tague Griffith
Deploying Real-Time Decision Services Using Redis with Tague Griffith
Databricks
 

Similar a Getting Ready to Use Redis with Apache Spark with Dvir Volk (20)

Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
Building a Large Scale Recommendation Engine with Spark and Redis-ML with Sha...
 
RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-MLRedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
RedisConf17 - Redis Labs - Implementing Real-time Machine Learning with Redis-ML
 
Serving predictive models with Redis
Serving predictive models with RedisServing predictive models with Redis
Serving predictive models with Redis
 
Spark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir Volk
 
Getting started with Amazon ElastiCache
Getting started with Amazon ElastiCacheGetting started with Amazon ElastiCache
Getting started with Amazon ElastiCache
 
How in memory technology will impact machine deep learning services (redis la...
How in memory technology will impact machine deep learning services (redis la...How in memory technology will impact machine deep learning services (redis la...
How in memory technology will impact machine deep learning services (redis la...
 
What's new with enterprise Redis - Leena Joshi, Redis Labs
What's new with enterprise Redis - Leena Joshi, Redis LabsWhat's new with enterprise Redis - Leena Joshi, Redis Labs
What's new with enterprise Redis - Leena Joshi, Redis Labs
 
Big Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with RedisBig Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with Redis
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Deploying Real-Time Decision Services Using Redis with Tague Griffith
Deploying Real-Time Decision Services Using Redis with Tague GriffithDeploying Real-Time Decision Services Using Redis with Tague Griffith
Deploying Real-Time Decision Services Using Redis with Tague Griffith
 
Rails Tips and Best Practices
Rails Tips and Best PracticesRails Tips and Best Practices
Rails Tips and Best Practices
 
Deep Dive on Amazon Relational Database Service
Deep Dive on Amazon Relational Database ServiceDeep Dive on Amazon Relational Database Service
Deep Dive on Amazon Relational Database Service
 
Brk2051 sql server on linux and docker
Brk2051 sql server on linux and dockerBrk2051 sql server on linux and docker
Brk2051 sql server on linux and docker
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
 
DBA Basics guide
DBA Basics guideDBA Basics guide
DBA Basics guide
 
Introduction Mysql
Introduction Mysql Introduction Mysql
Introduction Mysql
 
Mysql introduction
Mysql introduction Mysql introduction
Mysql introduction
 
Fast Data at Scale - AWS Summit Tel Aviv 2017
Fast Data at Scale - AWS Summit Tel Aviv 2017Fast Data at Scale - AWS Summit Tel Aviv 2017
Fast Data at Scale - AWS Summit Tel Aviv 2017
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Real-time Analytics with Redis
Real-time Analytics with RedisReal-time Analytics with Redis
Real-time Analytics with Redis
 

Más de Spark Summit

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Spark Summit
 

Más de Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
Indicium: Interactive Querying at Scale Using Apache Spark, Zeppelin, and Spa...
 

Último

Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 

Último (20)

Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 

Getting Ready to Use Redis with Apache Spark with Dvir Volk

  • 1. Dvir Volk - Redis Labs Real-time Machine Learning with Redis-ML and Apache Spark
  • 2. Agenda ● Intro to Redis and Redis Labs - 5 min ● Using Redis-ML for Model Serving - why and how - 10 min ● Building a recommendation system using Spark-ML and Redis-ML - 10 min ● Q&A 2
  • 3. Redis Labs – Home of Redis Founded in 2011 HQ in Mountain View CA, R&D center in Tel-Aviv IL The commercial company behind Open Source Redis Provider of the Redis Enterprise (Redise ) technology, platform and products 3
  • 4. Redise Cloud Private Redis Labs Products Redise Cloud Redise Pack ManagedRedise Pack SERVICES SOFTWARE Fully managed Redise service in VPCs within AWS, MS Azure, GCP & IBM Softlayer Fully managed Redise service on hosted servers within AWS, MS Azure, GCP, IBM Softlayer, Heroku, CF & OpenShift Downloadable Redise software for any enterprise datacenter or cloud environment Fully managed Redise Pack in private data centers && &
  • 5. A Brief Overview of Redis ● Started in 2009 by Salvatore Sanfilippo ● Most popular KV store ● In memory - disk backed ● Notable Users: ○ Twitter, Netflix, Uber, Groupon, Twitch ○ Many, many more...
  • 6. Redis Main Differentiators Simplicity (through Data Structures) Extensibility (through Redis Modules) Performance ListsSorted Sets Hashes Hyperlog-logs Geospatial Indexes Bitmaps SetsStrings Bit field 6
  • 7. A Quick Recap of Redis Key "I'm a Plain Text String!" { A: “foo”, B: “bar”, C: “baz” } Strings / Bitmaps / BitFields Hash Tables (objects!) Linked Lists Sets Sorted Sets Geo Sets HyperLogLog { A , B , C , D , E } [ A → B → C → D → E ] { A: 0.1, B: 0.3, C: 100, D: 1337 } { A: (51.5, 0.12), B: (32.1, 34.7) } 00110101 11001110 10101010
  • 8. Simple Redis Example (string) 127.0.0.1:6379> SET spark summit OK 127.0.0.1:6379> GET spark "summit" 127.0.0.1:6379>
  • 9. Simple Redis Example (hash) 127.0.0.1:6379> HMSET spark_hash org apache version 2.1.1 OK 127.0.0.1:6379> HGET spark_hash version "2.1.1" 127.0.0.1:6379> HGETALL spark_hash 1) "org" 2) "apache" 3) "version" 4) "2.1.1"
  • 10. Another Redis Example (ZSET) 127.0.0.1:6379> ZADD my_sorted_set 1 foo (integer) 1 127.0.0.1:6379> ZADD my_sorted_set 5 bar (integer) 1 127.0.0.1:6379> ZADD my_sorted_set 3 baz (integer) 1 127.0.0.1:6379> ZRANGE my_sorted_set 0 2 1) "foo" 2) "baz" 3) "bar"
  • 11. New Capabilities Redis Modules: A New Hope • Dynamic libraries loaded to redis • Written in C/C++ • Use a C ABI/API isolating redis internals • Use existing or add new data-structures • Near Zero latency access to data New Commands New Data Types
  • 12. Modules: A New Approach Adapt your database to your data, not the other way around Secure way to store data in Redis via encrypt/decrypt with various Themis primitives Time Series Graph Time series values aggregation in Redis Crypto Engine Wrapper Graph database on Redis based on Cypher language Based on Generic Cell Rate Algorithm (GCRA) Rate Limiter ReJSON Secondary Index/RQL JSON Engine on Redis. Pre-released Indexing + SQL -like syntax for querying indexes. Pre-released Neural Redis Redis-ML RediSearch Full Text Search Engine in RedisMachine Learning Model Serving Simple Neural Network Native to Redis
  • 14. ML Models Serving Challenges – Models are becoming bigger and more complex – Can be challenging to deploy & serve – Do not scale well, speed and size – Can be very expensive 14
  • 15. Spark-ML End-to-End Flow Spark Training Custom Server Model saved to Parquet file Data Loaded to Spark Pre-computed results Batch Evaluation ? ClientApp
  • 16. A Simpler Machine Learning Lifecycle Any Training Platform Data loaded into Spark Model is saved in Redis-ML Redis-ML Serving Client Client App Client App Client App + Spark Training 16
  • 17. Redis-ML – ML Serving Engine • Store training output as “hot model” • Perform evaluation directly in Redis • Enjoy the performance, scalability and HA of Redis 17
  • 18. ML Models Tree Ensembles Linear Regression Logistic Regression Matrix + Vector Operations More to come... Redis-ML 18
  • 19. Random Forest Model • A collection of decision trees • Supports classification & regression • Splitter Node can be: ◦ Categorical (e.g. day == “Sunday”) ◦ Numerical (e.g. age < 43) • Decision is taken by the majority of decision trees 19
  • 20. Titanic Survival Predictor on a Decision Tree YES Sex = Male ? Age < 9.5? *Sibps > 2.5? Survived Died SurvivedDied *Sibps = siblings + spouses NO 20
  • 21. Titanic Survival Predictor on a Random Forest YES Sex = Male ? Age < 9.5? *Sibps > 2.5? Survived Died SurvivedDied NO YES Country= US? State = CA? Height> 1.60m? Survived Died SurvivedDied NO YES Weight< 80kg? I.Q<100? Eye color =blue? Survived Died SurvivedDied NO Tree #1 Tree #2 Tree #3
  • 22. Would John Survive The Titanic • John’s features: {male, 34, married + 2, US, CA, 1.78m, 78kg, 110iq, blue eyes} • Tree#1 – Survived • Tree#2 – Failed • Tree#3 – Survived • Random forest decision - Survived 22
  • 23. Forest Data Type Example > MODULE LOAD "./redis-ml.so" OK > ML.FOREST.ADD myforest 0 . CATEGORIC sex “male” .L LEAF 1 .R LEAF 0 OK > ML.FOREST.RUN myforest sex:male "1" > ML.FOREST.RUN myforest sex:no_thanx "0"
  • 24. Using Redis-ML With Spark scala> import com.redislabs.client.redisml.MLClient scala> import com.redislabs.provider.redis.ml.Forest scala> val jedis = new Jedis("localhost") scala> val rfModel = pipelineModel.stages.last.asInstanceOf[RandomForest] // Create a new forest instance scala> val f = new Forest(rfModel.trees) // Load the model to redis scala> f.loadToRedis("forest-test", "localhost") // Classify a feature vector scala> jedis.getClient.sendCommand(MLClient.ModuleCommand.FOREST_RUN, "forest-test", makeInputString (0)) scala> jedis.getClient.getStatusCodeReply res53: String = 1
  • 25. Real World Challenge • Ad serving company • Need to serve 20,000 ads/sec @ 50msec data-center latency • Runs 1k campaigns → 1K random forest • Each forest has 15K trees • On average each tree has 7 levels (depth) • Would require < 1000 x c4.8xlarge 25
  • 26. Redis ML with Spark ML Classification Time Over Spark 40x Faster 26
  • 27. Real World Example: Movie Recommendation System
  • 29. Concept: One Forest For Each Movie 29 User Features: (Age, Gender, Movie Ratings) Movie_1 Forest 3 Movie_2 Forest 2 Movie_n Forest 5 . . .
  • 31. Using the Dockers $ docker pull shaynativ/redis-ml $ docker run --net=host shaynativ/redis-ml & $ $ docker pull shaynativ/spark-redis-ml $ docker run --net=host shaynativ/spark-redis-ml
  • 32. Step 1: Get The Data 32 • Download and extract the MovieLens 100K Dataset • The data is organized in separate files: – Ratings: user id | item id | rating (1-5) | timestamp – Item (movie) info: movie id | genre info fields (1/0) – User info: user id | age | gender | occupation • Our classifier should return the expected rating (from 1 to 5) a user would give the movie in question
  • 33. Step 2: Transform 33 • The training data for each movie should contain 1 line per user: – class (rating from 1 to 5 the user gave to this movie) – user info (age, gender, occupation) – user ratings of other movies (movie_id:rating ...) – user genre rating averages (genre:avg_score ...) • Run gen_data.py to transform the files to the desired format
  • 34. Step3: Train and Load to Redis // Create a new forest instance val rf = new RandomForestClassifier().setFeatureSubsetStrategy("auto").setLabelCol("indexedLabel").setFeat uresCol("indexedFeatures").setNumTrees(500) ….. // Train model val model = pipeline.fit(trainingData) ….. val rfModel = model.stages(2).asInstanceOf[RandomForestClassificationModel] // Load the model to redis val f = new Forest(rfModel.trees) f.loadToRedis(”movie-10", "127.0.0.1") 34
  • 35. Step 3: Execute in Redis Redis-ML + Spark Training Client App
  • 36. Python Client Example >> import redis >> config = {"host":"localhost", "port":6379} >> r = redis.StrictRedis(**config) >> user_profile = r.get("user_shay_profile") >> print(user_profile) 12:1.0,13:1.0,14:3.0,15:1.0,17:1.0,18:1.0,19:1.0,20:1.0,23:1.0,24:5.0,1.0,115:1.0,116:2. 0,117:2.0,119:1.0,120:4.0,121:2.0,122:2.0, ........ 1360:1.0,1361:1.0,1362:1.0, 1701:6.0,1799:435.0,1801:0.2,1802:0.11,1803:0.04,1812:0.04,1813:0.07,1814:0.24,1815:0.09 ,1816:0.32,1817:0.06 >> r.execute_command("ML.FOREST.RUN", "movie-10", user_profile) '3'
  • 37. Redis CLI Example >keys * 127.0.0.1:6379> KEYS * 1) "movie-5" 2) "movie-1" ........ 10) "movie-10" 11) "user_1_profile” >ML.FOREST.RUN movie-10 12:1.0,13:1.0,,332:3.0,333:1.0,334:1.0,335:2.0,336:1.0,357:2.0,358:1.0,359 : ........ 412:2.0,423:1.0,454:1.0,455:1.0,456:1.0,457:3.0,458:1.0,459:1.0,470:1” "3”
  • 38. Performance Redis time: 0.635129ms, res=3 Spark time: 46.657662ms, res=3.0 --------------------------------------- Redis time: 0.644444ms, res=3 Spark time: 49.028983ms, res=3.0 --------------------------------------- Classification averages: redis: 0.9401250000000001 ms spark: 58.01970206666667 ms ratio: 61.71488053893542 diffs: 0.0
  • 39. Python Script r = redis.StrictRedis() user_profile = r.get("user-1-profile") results = {} for i in range(1, 11): results[i] = r.execute_command("ML.FOREST.RUN", "movie-{}".format(i), user_profile) sorted_results = sorted(results.items(), key=operator.itemgetter(1), reverse=True) for k,v in sorted_results: print "movie-{}:{}".format(k,v) print "nRecommended movie: movie-{}".format(sorted_results[0][0])
  • 40. The Results... $ ./classify_user.py 1 Movies sorted by scores: movie-4:3 movie-3:2 movie-6:2 movie-7:2 movie-8:2 movie-9:2 movie-1:1 movie-2:1 movie-5:1 movie-10:0 Recommended movie for user 1: movie-4
  • 41. Summary • Train with Spark, Serve with Redis • 97% resource cost serving • Simplify ML lifecycle • Redise (Cloud or Pack): ‒Scaling, HA, Performance ‒PAYG – cost optimized ‒Ease of use Spark Training Data loaded into Spark Model is saved in Redis-ML Redis-ML Serving Client Client App Client App Client App +
  • 42. Resources ● Redis-ML: https://github.com/RedisLabsModules/redis-ml ● Spark-Redis-ML: https://github.com/RedisLabs/spark-redis-ml ● Databricks Notebook: http://bit.ly/sparkredisml ● Dockers: https://hub.docker.com/r/shaynativ/redis-ml/ https://hub.docker.com/r/shaynativ/spark-redis-ml/