SlideShare una empresa de Scribd logo
1 de 21
26 Trillion App Recommendation
using100 Lines of Spark Code
Ayman Farahat
● Motivation
● Spark Implementation
○ Collabrative Filtering
○ Data Frames
○ BLAS-3
● Results and lessons learnt.
Overview
● App discovery is a challenging problem due to the exponential
growth in number of apps
● Over 1.5 million apps available through both market places
(i.e. Itunes and Google Play store)
● Develop app recommendation engine using various user
behavior signals
○ Explicit Signal (App rating)
○ Implicit Signal (frequency/duration of app usage)
Motivation
● Data available through Flurry SDK is rich in both coverage
and depth
● Collected session length for Apps used on IOS platform in
period between Sept 1-15 2015 .
● Restricted analysis to Apps used by 100 or more users
○ ~496 million Users
○ ~53,793 Apps
Flurry Data and Summary
● User Count : 496,508,312
● App Count : 153,773
● App 100+ : 53,793
● Train time : 52 minutes
● Predict time : 8 minutes
Data Summary
● Utilize a collaborative filtering based App recommendation
● Run collaborative filtering that works at scale to generate:
○ Low dimension user features
○ Low dimension App features
○ Compute user x App rating for all possible
combinations (26.7 Trillion)
● Used spark framework to efficiently train and recommend.
Our Approach
● Projects the users and Apps (in our case) into a lower
dimensional space
Collaborative Filtering Model
● Used out of sample prediction accuracy on 20+ Apps Users
● The MSE was minimum with number of factors fixed at 60
Model Fitting and Parameter Optimization
● Join operation can greatly benefit from caching.
● Filter out Apps that have less than 100 users
cleandata = allapps.join(cleanapps)
● Do a replicated join in Spark
#only keep the apps that had 100 or more user
cleanapps = myapps.filter(lambda x :x[1] > MAXAPPS).map(lambda x: int(x[0]))
#persist the apps data
apps = sc.broadcast(set(cleanapps.collect()))
# filter by the data set: I have simulated a replicated join
cleandata = allapps.filter(lambda x: x[1] in apps.value)
Data Frames
● In spark you can use a dataframe directly
Record = Row("userId", "iuserId", "appId", "value")
MAXAPPS = 100
#transform allapps to a df
allappsdf = allapps.map(lambda x: Record(*x)).toDF()
# register the DF and issue SQL queries
sqlContext.registerDataFrameAsTable(allappdf, "table1")
#here I am grouing by the AppID
df2 = sqlContext.sql("SELECT appId as appId2, avg(value), count(*) from table1 group by appId")
topappsdf = df2.filter(df2.c2 >MAXAPPS)
#DF join
cleandata = allappsdf.join(topappsdf, allapps.appId == topappdf.appId2)
Data Frames
● The number of possible user x App combinations is very large
Default prediction : PredictAll
○predictions = model.predictAll(testdata).map(lambda r: ((r[0], r[1]), r[2]))
○ Prediction is simply matrix multiplication of user “i” and App “j”
● Never completes and most of time spent on reshuffle.
● The users are not partioned so can be on all Nodes.
● The Apps are not partioned so can be on all Nodes.
● Reshuffle is extremely slow.
BLAS 3
● The key is that the Number of Apps << Number of users
● Exploit the low number of Apps to optimize the prediction time
BLAS 3
● The App features being smaller in size can be stored in
primary memory (BLAS 3)
● We broadcast the Apps to all executors, which reduces the
overall reshuffling of data
● use BLAS-3 matrix multiplication available within numpy
which is highly optimized
BLAS 3
Basic linear algbera system for solving problems of the form
D = a A * b B + c C
Highly optimized for matrix multiplication.
BLAS 3
import numpy
from numpy import *
myModel=MatrixFactorizationModel.load(sc, "BingBong”)
m1 = myModel.productFeatures()
m2 = m1.map(lambda (product,feature) : feature).collect()
m3 = matrix(m2).transpose()
pf = sc.broadcast(m3)
uf = myModel.userFeatures().coalesce(100)
#get predictions on all user
f1 = uf.map(lambda (userID, features): (userID, squeeze(asarray(matrix(array(features)) * pf.value))))
BLAS 3
Evaluation :Predicted Score
Predicted Score : Positive
Predicted Score : Negative
Evaluation of Recommendation
● Identify users with high(low) scores
● Design of experiment :
● High score x Recommendation
● High score x Placebo
● Low score x Recommendation
● High score x Placebo
Future Work
● Spark econometrics library (std. error, robust std. errors.. )
● Online experiments to measure value of recommendation .
● Experiments with various implicit ratings :
● number of sessions
● days used
● Log of days used
26 Trillion App Recomendations using 100 Lines of Spark Code - Ayman Farahat

Más contenido relacionado

La actualidad más candente

Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Spark Summit
 
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudH2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudSri Ambati
 
Spark Summit EU talk by Reza Karimi
Spark Summit EU talk by Reza KarimiSpark Summit EU talk by Reza Karimi
Spark Summit EU talk by Reza KarimiSpark Summit
 
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0Databricks
 
Spark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with SparkSpark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with SparkDatabricks
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Spark Summit
 
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16BigMine
 
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Modern Data Stack France
 
Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015Databricks
 
Multi dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframesMulti dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframesRomi Kuntsman
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustSpark Summit
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Spark Summit
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Spark Summit
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Databricks
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RDatabricks
 
Spark and the Future of Advanced Analytics by Thomas Dinsmore
Spark and the Future of Advanced Analytics by Thomas DinsmoreSpark and the Future of Advanced Analytics by Thomas Dinsmore
Spark and the Future of Advanced Analytics by Thomas DinsmoreSpark Summit
 
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...Spark Summit
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksDatabricks
 
Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...
Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...
Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...Databricks
 
Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material Bryan Yang
 

La actualidad más candente (20)

Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
 
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudH2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks Cloud
 
Spark Summit EU talk by Reza Karimi
Spark Summit EU talk by Reza KarimiSpark Summit EU talk by Reza Karimi
Spark Summit EU talk by Reza Karimi
 
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0
Spark Summit EU 2016 Keynote - Simplifying Big Data in Apache Spark 2.0
 
Spark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with SparkSpark Application Carousel: Highlights of Several Applications Built with Spark
Spark Application Carousel: Highlights of Several Applications Built with Spark
 
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
Practical Large Scale Experiences with Spark 2.0 Machine Learning: Spark Summ...
 
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
 
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)
 
Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015Spark streaming State of the Union - Strata San Jose 2015
Spark streaming State of the Union - Strata San Jose 2015
 
Multi dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframesMulti dimension aggregations using spark and dataframes
Multi dimension aggregations using spark and dataframes
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
 
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
Netflix's Recommendation ML Pipeline Using Apache Spark: Spark Summit East ta...
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
 
Spark and the Future of Advanced Analytics by Thomas Dinsmore
Spark and the Future of Advanced Analytics by Thomas DinsmoreSpark and the Future of Advanced Analytics by Thomas Dinsmore
Spark and the Future of Advanced Analytics by Thomas Dinsmore
 
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
Petabyte Scale Anomaly Detection Using R & Spark by Sridhar Alla and Kiran Mu...
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...
Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...
Ray: A Cluster Computing Engine for Reinforcement Learning Applications with ...
 
Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material
 

Destacado

Spark Summit EU talk by Simon Whitear
Spark Summit EU talk by Simon WhitearSpark Summit EU talk by Simon Whitear
Spark Summit EU talk by Simon WhitearSpark Summit
 
Spark Summit EU talk by Larisa Sawyer
Spark Summit EU talk by Larisa SawyerSpark Summit EU talk by Larisa Sawyer
Spark Summit EU talk by Larisa SawyerSpark Summit
 
Spark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit
 
Сервис бронирования отелей в последний момент LastRoom
Сервис бронирования отелей в последний момент LastRoomСервис бронирования отелей в последний момент LastRoom
Сервис бронирования отелей в последний момент LastRoomТимур Низами
 
Sayed Kassem Gallup Report
Sayed Kassem Gallup ReportSayed Kassem Gallup Report
Sayed Kassem Gallup ReportSayed Kassem
 
Iskander Business Partner Studie: Verständlichkeit in der Unternehmenskommuni...
Iskander Business Partner Studie: Verständlichkeit in der Unternehmenskommuni...Iskander Business Partner Studie: Verständlichkeit in der Unternehmenskommuni...
Iskander Business Partner Studie: Verständlichkeit in der Unternehmenskommuni...Iskander Business Partner GmbH
 
Marketing in a Social World - FinanceConnect 2015
Marketing in a Social World - FinanceConnect 2015Marketing in a Social World - FinanceConnect 2015
Marketing in a Social World - FinanceConnect 2015LinkedIn India
 
La paritaria nacional es un fraude
La paritaria nacional es un fraudeLa paritaria nacional es un fraude
La paritaria nacional es un fraudeLaura Marrone
 
1. Stosowanie przepisów prawa w gospodarowaniu
1. Stosowanie przepisów prawa w gospodarowaniu1. Stosowanie przepisów prawa w gospodarowaniu
1. Stosowanie przepisów prawa w gospodarowaniuLukas Pobocha
 
Ten Things Agencies Should Accentuate When It Comes To Social Media
Ten Things Agencies Should Accentuate When It Comes To Social MediaTen Things Agencies Should Accentuate When It Comes To Social Media
Ten Things Agencies Should Accentuate When It Comes To Social MediaAbhishek Shah
 
NYU 5-Day Lean Launchpad Syllabus
NYU 5-Day Lean Launchpad SyllabusNYU 5-Day Lean Launchpad Syllabus
NYU 5-Day Lean Launchpad SyllabusNew York University
 
Lista de-precios-compugreiff-enero-09-2013
Lista de-precios-compugreiff-enero-09-2013Lista de-precios-compugreiff-enero-09-2013
Lista de-precios-compugreiff-enero-09-2013xxxxx
 

Destacado (15)

Spark Summit EU talk by Simon Whitear
Spark Summit EU talk by Simon WhitearSpark Summit EU talk by Simon Whitear
Spark Summit EU talk by Simon Whitear
 
Spark Summit EU talk by Larisa Sawyer
Spark Summit EU talk by Larisa SawyerSpark Summit EU talk by Larisa Sawyer
Spark Summit EU talk by Larisa Sawyer
 
Spark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van Hovell
 
Сервис бронирования отелей в последний момент LastRoom
Сервис бронирования отелей в последний момент LastRoomСервис бронирования отелей в последний момент LastRoom
Сервис бронирования отелей в последний момент LastRoom
 
Sayed Kassem Gallup Report
Sayed Kassem Gallup ReportSayed Kassem Gallup Report
Sayed Kassem Gallup Report
 
tablas y Secciones
tablas y Seccionestablas y Secciones
tablas y Secciones
 
Iskander Business Partner Studie: Verständlichkeit in der Unternehmenskommuni...
Iskander Business Partner Studie: Verständlichkeit in der Unternehmenskommuni...Iskander Business Partner Studie: Verständlichkeit in der Unternehmenskommuni...
Iskander Business Partner Studie: Verständlichkeit in der Unternehmenskommuni...
 
Marketing in a Social World - FinanceConnect 2015
Marketing in a Social World - FinanceConnect 2015Marketing in a Social World - FinanceConnect 2015
Marketing in a Social World - FinanceConnect 2015
 
Cosma Paradores
Cosma ParadoresCosma Paradores
Cosma Paradores
 
La paritaria nacional es un fraude
La paritaria nacional es un fraudeLa paritaria nacional es un fraude
La paritaria nacional es un fraude
 
1. Stosowanie przepisów prawa w gospodarowaniu
1. Stosowanie przepisów prawa w gospodarowaniu1. Stosowanie przepisów prawa w gospodarowaniu
1. Stosowanie przepisów prawa w gospodarowaniu
 
Inclusion europe.
Inclusion europe.Inclusion europe.
Inclusion europe.
 
Ten Things Agencies Should Accentuate When It Comes To Social Media
Ten Things Agencies Should Accentuate When It Comes To Social MediaTen Things Agencies Should Accentuate When It Comes To Social Media
Ten Things Agencies Should Accentuate When It Comes To Social Media
 
NYU 5-Day Lean Launchpad Syllabus
NYU 5-Day Lean Launchpad SyllabusNYU 5-Day Lean Launchpad Syllabus
NYU 5-Day Lean Launchpad Syllabus
 
Lista de-precios-compugreiff-enero-09-2013
Lista de-precios-compugreiff-enero-09-2013Lista de-precios-compugreiff-enero-09-2013
Lista de-precios-compugreiff-enero-09-2013
 

Similar a 26 Trillion App Recomendations using 100 Lines of Spark Code - Ayman Farahat

Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleJim Dowling
 
Download It
Download ItDownload It
Download Itbutest
 
Use Machine Learning to Get the Most out of Your Big Data Clusters
Use Machine Learning to Get the Most out of Your Big Data ClustersUse Machine Learning to Get the Most out of Your Big Data Clusters
Use Machine Learning to Get the Most out of Your Big Data ClustersDatabricks
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaGoDataDriven
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaAlluxio, Inc.
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKzmhassan
 
Lecture07 abap on line
Lecture07 abap on lineLecture07 abap on line
Lecture07 abap on lineMilind Patil
 
Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...
Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...
Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...Amazon Web Services
 
Hadoop cluster performance profiler
Hadoop cluster performance profilerHadoop cluster performance profiler
Hadoop cluster performance profilerIhor Bobak
 
Real-Time Forecasting at Scale using Delta Lake and Delta Caching
Real-Time Forecasting at Scale using Delta Lake and Delta CachingReal-Time Forecasting at Scale using Delta Lake and Delta Caching
Real-Time Forecasting at Scale using Delta Lake and Delta CachingDatabricks
 
Prometheus Everything, Observing Kubernetes in the Cloud
Prometheus Everything, Observing Kubernetes in the CloudPrometheus Everything, Observing Kubernetes in the Cloud
Prometheus Everything, Observing Kubernetes in the CloudSneha Inguva
 
Workshop: Building a Streaming Data Platform on AWS
Workshop: Building a Streaming Data Platform on AWSWorkshop: Building a Streaming Data Platform on AWS
Workshop: Building a Streaming Data Platform on AWSAmazon Web Services
 

Similar a 26 Trillion App Recomendations using 100 Lines of Spark Code - Ayman Farahat (20)

Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
 
Download It
Download ItDownload It
Download It
 
Use Machine Learning to Get the Most out of Your Big Data Clusters
Use Machine Learning to Get the Most out of Your Big Data ClustersUse Machine Learning to Get the Most out of Your Big Data Clusters
Use Machine Learning to Get the Most out of Your Big Data Clusters
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
 
System mldl meetup
System mldl meetupSystem mldl meetup
System mldl meetup
 
Final Presentation.pptx
Final Presentation.pptxFinal Presentation.pptx
Final Presentation.pptx
 
CS267_Graph_Lab
CS267_Graph_LabCS267_Graph_Lab
CS267_Graph_Lab
 
Blue book
Blue bookBlue book
Blue book
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at Helixa
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
 
p850-ries
p850-riesp850-ries
p850-ries
 
Lecture07 abap on line
Lecture07 abap on lineLecture07 abap on line
Lecture07 abap on line
 
Unit 1
Unit  1Unit  1
Unit 1
 
Vedic Calculator
Vedic CalculatorVedic Calculator
Vedic Calculator
 
Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...
Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...
Monitoring as Code: Getting to Monitoring-Driven Development - DEV314 - re:In...
 
Hadoop cluster performance profiler
Hadoop cluster performance profilerHadoop cluster performance profiler
Hadoop cluster performance profiler
 
les07.pdf
les07.pdfles07.pdf
les07.pdf
 
Real-Time Forecasting at Scale using Delta Lake and Delta Caching
Real-Time Forecasting at Scale using Delta Lake and Delta CachingReal-Time Forecasting at Scale using Delta Lake and Delta Caching
Real-Time Forecasting at Scale using Delta Lake and Delta Caching
 
Prometheus Everything, Observing Kubernetes in the Cloud
Prometheus Everything, Observing Kubernetes in the CloudPrometheus Everything, Observing Kubernetes in the Cloud
Prometheus Everything, Observing Kubernetes in the Cloud
 
Workshop: Building a Streaming Data Platform on AWS
Workshop: Building a Streaming Data Platform on AWSWorkshop: Building a Streaming Data Platform on AWS
Workshop: Building a Streaming Data Platform on AWS
 

Más de Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 

Más de Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Último

Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Último (20)

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 

26 Trillion App Recomendations using 100 Lines of Spark Code - Ayman Farahat

  • 1. 26 Trillion App Recommendation using100 Lines of Spark Code Ayman Farahat
  • 2. ● Motivation ● Spark Implementation ○ Collabrative Filtering ○ Data Frames ○ BLAS-3 ● Results and lessons learnt. Overview
  • 3. ● App discovery is a challenging problem due to the exponential growth in number of apps ● Over 1.5 million apps available through both market places (i.e. Itunes and Google Play store) ● Develop app recommendation engine using various user behavior signals ○ Explicit Signal (App rating) ○ Implicit Signal (frequency/duration of app usage) Motivation
  • 4. ● Data available through Flurry SDK is rich in both coverage and depth ● Collected session length for Apps used on IOS platform in period between Sept 1-15 2015 . ● Restricted analysis to Apps used by 100 or more users ○ ~496 million Users ○ ~53,793 Apps Flurry Data and Summary
  • 5. ● User Count : 496,508,312 ● App Count : 153,773 ● App 100+ : 53,793 ● Train time : 52 minutes ● Predict time : 8 minutes Data Summary
  • 6. ● Utilize a collaborative filtering based App recommendation ● Run collaborative filtering that works at scale to generate: ○ Low dimension user features ○ Low dimension App features ○ Compute user x App rating for all possible combinations (26.7 Trillion) ● Used spark framework to efficiently train and recommend. Our Approach
  • 7. ● Projects the users and Apps (in our case) into a lower dimensional space Collaborative Filtering Model
  • 8. ● Used out of sample prediction accuracy on 20+ Apps Users ● The MSE was minimum with number of factors fixed at 60 Model Fitting and Parameter Optimization
  • 9. ● Join operation can greatly benefit from caching. ● Filter out Apps that have less than 100 users cleandata = allapps.join(cleanapps) ● Do a replicated join in Spark #only keep the apps that had 100 or more user cleanapps = myapps.filter(lambda x :x[1] > MAXAPPS).map(lambda x: int(x[0])) #persist the apps data apps = sc.broadcast(set(cleanapps.collect())) # filter by the data set: I have simulated a replicated join cleandata = allapps.filter(lambda x: x[1] in apps.value) Data Frames
  • 10. ● In spark you can use a dataframe directly Record = Row("userId", "iuserId", "appId", "value") MAXAPPS = 100 #transform allapps to a df allappsdf = allapps.map(lambda x: Record(*x)).toDF() # register the DF and issue SQL queries sqlContext.registerDataFrameAsTable(allappdf, "table1") #here I am grouing by the AppID df2 = sqlContext.sql("SELECT appId as appId2, avg(value), count(*) from table1 group by appId") topappsdf = df2.filter(df2.c2 >MAXAPPS) #DF join cleandata = allappsdf.join(topappsdf, allapps.appId == topappdf.appId2) Data Frames
  • 11. ● The number of possible user x App combinations is very large Default prediction : PredictAll ○predictions = model.predictAll(testdata).map(lambda r: ((r[0], r[1]), r[2])) ○ Prediction is simply matrix multiplication of user “i” and App “j” ● Never completes and most of time spent on reshuffle. ● The users are not partioned so can be on all Nodes. ● The Apps are not partioned so can be on all Nodes. ● Reshuffle is extremely slow. BLAS 3
  • 12. ● The key is that the Number of Apps << Number of users ● Exploit the low number of Apps to optimize the prediction time BLAS 3
  • 13. ● The App features being smaller in size can be stored in primary memory (BLAS 3) ● We broadcast the Apps to all executors, which reduces the overall reshuffling of data ● use BLAS-3 matrix multiplication available within numpy which is highly optimized BLAS 3
  • 14. Basic linear algbera system for solving problems of the form D = a A * b B + c C Highly optimized for matrix multiplication. BLAS 3
  • 15. import numpy from numpy import * myModel=MatrixFactorizationModel.load(sc, "BingBong”) m1 = myModel.productFeatures() m2 = m1.map(lambda (product,feature) : feature).collect() m3 = matrix(m2).transpose() pf = sc.broadcast(m3) uf = myModel.userFeatures().coalesce(100) #get predictions on all user f1 = uf.map(lambda (userID, features): (userID, squeeze(asarray(matrix(array(features)) * pf.value)))) BLAS 3
  • 17. Predicted Score : Positive
  • 18. Predicted Score : Negative
  • 19. Evaluation of Recommendation ● Identify users with high(low) scores ● Design of experiment : ● High score x Recommendation ● High score x Placebo ● Low score x Recommendation ● High score x Placebo
  • 20. Future Work ● Spark econometrics library (std. error, robust std. errors.. ) ● Online experiments to measure value of recommendation . ● Experiments with various implicit ratings : ● number of sessions ● days used ● Log of days used

Notas del editor

  1. The challenge is that native ads are supposed to allow a similar experience than content; but at the same time should not mislead users. “In some instances 16-35% of ads are confused for creative.” You mean confused for content?
  2. The challenge is that native ads are supposed to allow a similar experience than content; but at the same time should not mislead users. “In some instances 16-35% of ads are confused for creative.” You mean confused for content?
  3. The challenge is that native ads are supposed to allow a similar experience than content; but at the same time should not mislead users. “In some instances 16-35% of ads are confused for creative.” You mean confused for content?
  4. The challenge is that native ads are supposed to allow a similar experience than content; but at the same time should not mislead users. “In some instances 16-35% of ads are confused for creative.” You mean confused for content?
  5. The challenge is that native ads are supposed to allow a similar experience than content; but at the same time should not mislead users. “In some instances 16-35% of ads are confused for creative.” You mean confused for content?
  6. The challenge is that native ads are supposed to allow a similar experience than content; but at the same time should not mislead users. “In some instances 16-35% of ads are confused for creative.” You mean confused for content?
  7. The challenge is that native ads are supposed to allow a similar experience than content; but at the same time should not mislead users. “In some instances 16-35% of ads are confused for creative.” You mean confused for content?
  8. The challenge is that native ads are supposed to allow a similar experience than content; but at the same time should not mislead users. “In some instances 16-35% of ads are confused for creative.” You mean confused for content?
  9. The challenge is that native ads are supposed to allow a similar experience than content; but at the same time should not mislead users. “In some instances 16-35% of ads are confused for creative.” You mean confused for content?
  10. The challenge is that native ads are supposed to allow a similar experience than content; but at the same time should not mislead users. “In some instances 16-35% of ads are confused for creative.” You mean confused for content?
  11. The challenge is that native ads are supposed to allow a similar experience than content; but at the same time should not mislead users. “In some instances 16-35% of ads are confused for creative.” You mean confused for content?
  12. The challenge is that native ads are supposed to allow a similar experience than content; but at the same time should not mislead users. “In some instances 16-35% of ads are confused for creative.” You mean confused for content?
  13. The challenge is that native ads are supposed to allow a similar experience than content; but at the same time should not mislead users. “In some instances 16-35% of ads are confused for creative.” You mean confused for content?
  14. The challenge is that native ads are supposed to allow a similar experience than content; but at the same time should not mislead users. “In some instances 16-35% of ads are confused for creative.” You mean confused for content?