SlideShare una empresa de Scribd logo
1 de 32
Recommendation Engine Powered by Hadoop PranabGhosh pkghosh@yahoo.com August 11th 2011 Meetup
About me Started with numerical computation on main frames, followed by many years of C and C++ systems and real time programming, followed by many years of java, JEE and enterprise apps Worked for Oracle, HP, Yahoo, Motorola, many startups and mid size companies Currently Big Data consultant using Hadoop and other cloud related technologies Interested in Distributed Computation, Big Data, NOSQL DB and Data Mining. August 11th 2011 Meetup
Hadoop Power of functional programming and parallel processing join hands to create Hadoop Basically parallel processing framework running on cluster of commodity machines Stateless functional programming because processing of each row of data does not depend upon any other row or any state Divide and conquer parallel processing. Data gets partitioned and each partition get processed by a separate mapper or reducer task. August 11th 2011 Meetup
More About Hadoop Data locality, at least for the mapper. Code gets shipped to where the data partition resides Data is replicated, partitioned and resides in Hadoop Distributed File System (HDFS) Mapper output: {k -> v}. Reducer: input {k -> List(v)} Reducer output {k -> v} Many to many shuffle between mapper output and reducer input. Lot of network IO. Simple paradigm, but surprising solves an incredible array of problems. August 11th 2011 Meetup
Recommendation Engine Does not require an introduction. You know it if you have visited Amazon or Netflix. We love it when they get right, hate it otherwise. Very computationally intensive, ideal for Hadoop processing. In memory based recommendation engines, the entire data set is used directly e.g collaboration filtering, content based recommendation engine In model based recommendation, a model is built first by training the data and then predictions made e.g., Bayesian, Clustering August 11th 2011 Meetup
Content Based recommendation A memory based system, based purely on the attributes of an item only An item with p attributes is considered as a point in a p dimensional space. Uses nearest neighbor approach. Similar items are found using distance measurement in the p dimensional space. Useful for addressing the cold start problem i.e., a new item in introduced in the inventory. Computationally intensive. Not very useful for real time recommendation. August 11th 2011 Meetup
Model Based Recommendation Based on traditional machine learning approach In contract to memory based algorithms, creates a learning model using the ratings as training data. The model is built offline as a batch process and saved. Model needs to be rebuilt when significant change in data is detected. Once the trained model is available, making recommendation is quick. Effective for real time recommendation. August 11th 2011 Meetup
Collaboration Filter In collaboration filtering based recommendation engine, recommendations are made based not only the user’s rating but also rating by other users for the same item and some other items. Hence the name collaboration filtering. Requires social data i.e., user’s interest level for an item. It could be explicit e.g.,  product rating or implicit based on user’s interaction and behavior in a site.  More appropriate name might be user intent based recommendation engine. Two approaches. In user based, similar users are found first. In item based, similar items are found first. August 11th 2011 Meetup
Item Based or User Based? Item based CF is generally preferred. Similarity relationship between items is relatively static and  stable, because items naturally map into many genres. User based CF is less preferred, because we humans are more complex than a laptop or smart phone (although some marketing folks may disagree). As we grow and go through life experiences, our interests change. Our similarity relationship in terms of common interests with other humans is more dynamic and change over time August 11th 2011 Meetup
Utility Matrix Matrix of user and item. The cell contains a value indicative of the users interest level for that item e.g., rating. Matrix is sparse The purpose of recommendation engine is to predict the values for the empty cells based on available cell values Denser the matrix, better the quality of recommendation. But generally the matrix sparse. If I have rated item A and I need recommendation, enough users must have rated A as well as other items. August 11th 2011 Meetup
Example Utility Matrix August 11th 2011 Meetup
Rating Prediction Example Let’s say we are interested in predicting r35 i.e., rating of item i5 for user u3. Item based CF : r35 = (c52 x r32 + c54 x r34)  /  (c52 + c54) where items i2 and i4 are similar to i5 User based CF : r35 = (c31 x r15 + c32  x r25) / (c31 +c32) where users u1 and u2 are similar to u3 cij = similarity coefficient between items i and j  or users i and j and rij = rating of item j by user i August 11th 2011 Meetup
Rating Estimation In the previous slide, we assumed rating data for item, user pair was already available, through some rating mechanism a.k.a explicit rating. However there may not be a product rating feature available in a site. Even if the rating feature is there, many users may not use it.Evenif many users rate, explicit rating by users tend to be biased. We need a way to estimate rating based on user behavior in the site and some heuristic a.k.a implicit rating August 11th 2011 Meetup
Heuristics for Rating: An Example August 11th 2011 Meetup
Similarity computation For item based CF, the first step is finding similar items. For user based CF, the first step is finding similar users We will use Pearson Correlation Coefficient. It indicates how well a set of data points lie in a straight line. In a 2 dimensional space of 2 items, rating of the 2 items  by an user is a data point.  There are other similarity measure algorithms e.g., euclidian distance, cosine distance August 11th 2011 Meetup
Pearson Correlation Coefficient c(i,j) = cov(i,j) / (stddev(i) * stddev(j))  cov(i,j) = sum ((r(u,i) - av(r(i)) * (r(u,j) - av(r(j))) / n  stddev(i) = sqrt(sum((r(u,i) - av(r(i)) ** 2) / n)  stddev(j) = sqrt(sum((r(u,j) - av(r(j)) ** 2) / n)  The covariance can also be expressed in this alternative form, which we will be using cov(i,j) = sum(r(u,i) * r(u,j)) / n - av(r(i)) * av(r(j)  c(i,j) = Pearson correlation coefficient between product i and j  cov(i,j) = Covariance of rating for products i and j  stddev(i) = Std deviation of rating for product i stddev(j) = Std deviation of rating for product j  r(u,i) = Rating for user u for product i av(r(i)) = Average rating for product i over all users that rated  sum = Sum over all users  n = Num of data points August 11th 2011 Meetup
Map Reduce We are going to have 2 MR jobs working in tandem for items based CF. Additional preprocessing MR jobs are also necessary to process click stream data. The first MR calculates correlation for all item pairs, based on rating data. Essentially finds similar items. The second MR takes the output of the first MR and the rating data for the user in question. The output is a list of items ranked by predicted rating August 11th 2011 Meetup
Correlation Map Reduce  It takes two kinds of input. The first kind has item id pair and two mean and std dev values for the ratings . This is generated by another pre processor MR.  The second input has item rating for all users. This is generated by another preprocessor MR analyzing click stream data. Each row is for one user along with variable number of product ratings by an user August 11th 2011 Meetup
Correlation Mapper Input August 11th 2011 Meetup
Correlation Mapper Output The mapper produces two kinds of output.  The first kind contains {pid1,pid2,0 -> m1,s1,m2ms2}. It’s the mean and std dev for a pid pair The second kind contains {pid1,pid2,1 -> r1xr2}. It’s the product of rating  for the pidpair for some user. We are appending 0 and 1 to the mapper output key, for secondary sorting which will ensure that for a given pid pair, the reducer will receive the value of the first kind of record followed by multiple values of the second kind of mapper output August 11th 2011 Meetup
Correlation Mapper Output August 11th 2011 Meetup
Correlation Reducer Partitioner based on the first two tokens of key (pid1,pid2), so that the values for the same pid pair go to the same reducer Grouping comparator on the first two tokens of key (pid1,pid2), so that all the mapper out put for the same pid pair is treated as one group and passed to the reducer in one call The reducer output is pid pair and the corresponding correlation coefficient {pid1,pid2 -> c12} For a pid pair, the reducer has at it’s disposal all the data for Pearson correlation computation. August 11th 2011 Meetup
Correlation Reducer Output August 11th 2011 Meetup
Prediction Map Reduce This is the second MR that takes item correlation data which is the output of the first MR and the rating data for the target user. We are running this MR to make rating prediction and ultimately recommendation for an user. The user rating data is passed to Hadoop as so called “side data”. The mapper output consists of pid of an item as the key and the rating of the related item multiplied by the correlation coefficint and the correlation coefficient  as the value. {pid1 -> rating(pid3) x c13, c13} August 11th 2011 Meetup
Prediction Mapper Input August 11th 2011 Meetup
Prediction Mapper Output August 11th 2011 Meetup
Prediction Reducer The reducer gets a pid as a key and a list of tuples as value. Each tuple consists of weighted rating of a related item and the corresponding correlation coefficient. {pid1 -> [(pid3 x c31, c31), (pid5 x c51, c51),…..] The reducer sums up the weighted rating and divides the sum by sum of correlation value. This is the final predicted rating for an item. The reducer output  is an item pid and the predicted rating for the item. All that remains is to sort the predicted ratings and use the top n items for making recommendation August 11th 2011 Meetup
Realtime Prediction We would like to make recommendation when there is a significant event e.g., item gets put on a shopping cart. But Hadoop is an offline batch processing system. How do we circumvent that? We have to do pre computation and cache the results. There are 2 MR jobs: Correlation MR to calculate item correlation and Prediction MR to prediction rating.  We should re run the 2 MR jobs as necessary when significant change in user item rating is detected  August 11th 2011 Meetup
Pre Computation As mentioned earlier item correlation is relatively stable and only needs to be re computed when there is significant change in the utility matrix  Correlation MR for item similarity should be run only after significant over all  change in utility matrix has been detected, since the last run. For a given user, which is basically a row in the utility matrix,  if significant change is detected e.g., new rating by the user  for a product is available, we should re run rating prediction MR for the user.  August 11th 2011 Meetup
Cold Start Problem How do we make recommendation when a new item is introduced in the inventory or a new user visits the site For new item, although we have no user interest data available we can use content based recommendation. Essentially, it’s  similarity computation based on the attributes of the item only.  For new user (cold user?)  the problem is much harder, unless detailed user profile data is available. August 11th 2011 Meetup
Some Temporal Issues When does an item have enough rating data to be accurately recommendable? How to define the threshold? When is there enough user rating, to be able to get good recommendations? How to define the threshold? How to deal with old ratings, as users interest shifts with passing time? When is there enough data in the utility matrix to bootstrap the recommendation system? August 11th 2011 Meetup
Resources My 2 part blog posts on this topic at http://pkghosh.wordpress.com  “Programming Collective Intelligence” by Toby Segaram, O’Reilly “Mining of Massive Datasets” by AnandRajaraman and Jeffrey Ullman August 11th 2011 Meetup

Más contenido relacionado

La actualidad más candente

Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material Bryan Yang
 
An Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBaseAn Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBaseLukas Vlcek
 
Graph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataGraph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataBenjamin Bengfort
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and HadoopJosh Patterson
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whyKorea Sdec
 
AI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analyticsAI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analyticsDataWorks Summit
 
MLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning LibraryMLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning Libraryjeykottalam
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Databricks
 
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...sparktc
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsDataStax Academy
 
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutTed Dunning
 
Apache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow ElephantApache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow ElephantGrant Ingersoll
 
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...Spark Summit
 
What’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics StackWhat’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics StackTuri, Inc.
 
Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Benjamin Bengfort
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkJake Mannix
 
Introduction to Mahout
Introduction to MahoutIntroduction to Mahout
Introduction to MahoutTed Dunning
 
Scalable Machine Learning in R and Python with H2O
Scalable Machine Learning in R and Python with H2OScalable Machine Learning in R and Python with H2O
Scalable Machine Learning in R and Python with H2OSri Ambati
 
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...Spark Summit
 
What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.Andy Petrella
 

La actualidad más candente (20)

Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material
 
An Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBaseAn Introduction to Apache Hadoop, Mahout and HBase
An Introduction to Apache Hadoop, Mahout and HBase
 
Graph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataGraph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational Data
 
Machine Learning and Hadoop
Machine Learning and HadoopMachine Learning and Hadoop
Machine Learning and Hadoop
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
AI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analyticsAI from your data lake: Using Solr for analytics
AI from your data lake: Using Solr for analytics
 
MLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning LibraryMLlib: Spark's Machine Learning Library
MLlib: Spark's Machine Learning Library
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
 
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
 
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache Mahout
 
Apache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow ElephantApache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow Elephant
 
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
 
What’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics StackWhat’s New in the Berkeley Data Analytics Stack
What’s New in the Berkeley Data Analytics Stack
 
Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and Spark
 
Introduction to Mahout
Introduction to MahoutIntroduction to Mahout
Introduction to Mahout
 
Scalable Machine Learning in R and Python with H2O
Scalable Machine Learning in R and Python with H2OScalable Machine Learning in R and Python with H2O
Scalable Machine Learning in R and Python with H2O
 
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
Data Driven-Toyota Customer 360 Insights on Apache Spark and MLlib-(Brian Kur...
 
What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.
 

Similar a Recommendation Engine Powered by Hadoop - Pranab Ghosh

Overview of Movie Recommendation System using Machine learning by R programmi...
Overview of Movie Recommendation System using Machine learning by R programmi...Overview of Movie Recommendation System using Machine learning by R programmi...
Overview of Movie Recommendation System using Machine learning by R programmi...IRJET Journal
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 
PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...predictionio
 
Search Quality Evaluation to Help Reproducibility: An Open-source Approach
Search Quality Evaluation to Help Reproducibility: An Open-source ApproachSearch Quality Evaluation to Help Reproducibility: An Open-source Approach
Search Quality Evaluation to Help Reproducibility: An Open-source ApproachAlessandro Benedetti
 
Search Quality Evaluation to Help Reproducibility : an Open Source Approach
Search Quality Evaluation to Help Reproducibility : an Open Source ApproachSearch Quality Evaluation to Help Reproducibility : an Open Source Approach
Search Quality Evaluation to Help Reproducibility : an Open Source ApproachAlessandro Benedetti
 
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to RAnshik Bansal
 
Recommender Systems in the Linked Data era
Recommender Systems in the Linked Data eraRecommender Systems in the Linked Data era
Recommender Systems in the Linked Data eraRoku
 
Recsys 2018 overview and highlights
Recsys 2018 overview and highlightsRecsys 2018 overview and highlights
Recsys 2018 overview and highlightsSandra Garcia
 
(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project report(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project reportGaurav Sawant
 
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...Mladen Jovanovic
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data AnalyticsOsman Ali
 
Experiments on Design Pattern Discovery
Experiments on Design Pattern DiscoveryExperiments on Design Pattern Discovery
Experiments on Design Pattern DiscoveryTim Menzies
 
R programming for psychometrics
R programming for psychometricsR programming for psychometrics
R programming for psychometricsDiane Talley
 
Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationAlessandro Benedetti
 
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...OpenSource Connections
 
Rated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationSease
 
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item RecommendationAn Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item RecommendationEnrico Palumbo
 
Explain Yourself: Why You Get the Recommendations You Do
Explain Yourself: Why You Get the Recommendations You DoExplain Yourself: Why You Get the Recommendations You Do
Explain Yourself: Why You Get the Recommendations You DoDatabricks
 

Similar a Recommendation Engine Powered by Hadoop - Pranab Ghosh (20)

Overview of Movie Recommendation System using Machine learning by R programmi...
Overview of Movie Recommendation System using Machine learning by R programmi...Overview of Movie Recommendation System using Machine learning by R programmi...
Overview of Movie Recommendation System using Machine learning by R programmi...
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...
 
50120140505004
5012014050500450120140505004
50120140505004
 
Search Quality Evaluation to Help Reproducibility: An Open-source Approach
Search Quality Evaluation to Help Reproducibility: An Open-source ApproachSearch Quality Evaluation to Help Reproducibility: An Open-source Approach
Search Quality Evaluation to Help Reproducibility: An Open-source Approach
 
Search Quality Evaluation to Help Reproducibility : an Open Source Approach
Search Quality Evaluation to Help Reproducibility : an Open Source ApproachSearch Quality Evaluation to Help Reproducibility : an Open Source Approach
Search Quality Evaluation to Help Reproducibility : an Open Source Approach
 
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to R
 
Recommender Systems in the Linked Data era
Recommender Systems in the Linked Data eraRecommender Systems in the Linked Data era
Recommender Systems in the Linked Data era
 
Recsys 2018 overview and highlights
Recsys 2018 overview and highlightsRecsys 2018 overview and highlights
Recsys 2018 overview and highlights
 
(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project report(Gaurav sawant & dhaval sawlani)bia 678 final project report
(Gaurav sawant & dhaval sawlani)bia 678 final project report
 
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Getting started with R
Getting started with RGetting started with R
Getting started with R
 
Experiments on Design Pattern Discovery
Experiments on Design Pattern DiscoveryExperiments on Design Pattern Discovery
Experiments on Design Pattern Discovery
 
R programming for psychometrics
R programming for psychometricsR programming for psychometrics
R programming for psychometrics
 
Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
 
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...
 
Rated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
 
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item RecommendationAn Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
An Empirical Comparison of Knowledge Graph Embeddings for Item Recommendation
 
Explain Yourself: Why You Get the Recommendations You Do
Explain Yourself: Why You Get the Recommendations You DoExplain Yourself: Why You Get the Recommendations You Do
Explain Yourself: Why You Get the Recommendations You Do
 

Más de BigDataCloud

Webinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning PlatformsWebinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning PlatformsBigDataCloud
 
Crime Analysis & Prediction System
Crime Analysis & Prediction SystemCrime Analysis & Prediction System
Crime Analysis & Prediction SystemBigDataCloud
 
REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS BigDataCloud
 
Cloud Computing Services
Cloud Computing ServicesCloud Computing Services
Cloud Computing ServicesBigDataCloud
 
Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!BigDataCloud
 
Big Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & AppsBig Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & AppsBigDataCloud
 
Big Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud PlatformBig Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud PlatformBigDataCloud
 
Streak + Google Cloud Platform
Streak + Google Cloud PlatformStreak + Google Cloud Platform
Streak + Google Cloud PlatformBigDataCloud
 
Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value BigDataCloud
 
Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.BigDataCloud
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningBigDataCloud
 
Recommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural GuideRecommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural GuideBigDataCloud
 
Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?BigDataCloud
 
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, PivotalHadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, PivotalBigDataCloud
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBigDataCloud
 
Big Data Cloud Meetup - Jan 24 2013 - Zettaset
Big Data Cloud Meetup - Jan 24 2013 - ZettasetBig Data Cloud Meetup - Jan 24 2013 - Zettaset
Big Data Cloud Meetup - Jan 24 2013 - ZettasetBigDataCloud
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookBigDataCloud
 
What Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will WinWhat Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will WinBigDataCloud
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBigDataCloud
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud
 

Más de BigDataCloud (20)

Webinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning PlatformsWebinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning Platforms
 
Crime Analysis & Prediction System
Crime Analysis & Prediction SystemCrime Analysis & Prediction System
Crime Analysis & Prediction System
 
REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS
 
Cloud Computing Services
Cloud Computing ServicesCloud Computing Services
Cloud Computing Services
 
Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!
 
Big Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & AppsBig Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & Apps
 
Big Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud PlatformBig Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud Platform
 
Streak + Google Cloud Platform
Streak + Google Cloud PlatformStreak + Google Cloud Platform
Streak + Google Cloud Platform
 
Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value
 
Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
 
Recommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural GuideRecommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural Guide
 
Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?
 
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, PivotalHadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
 
Big Data Cloud Meetup - Jan 24 2013 - Zettaset
Big Data Cloud Meetup - Jan 24 2013 - ZettasetBig Data Cloud Meetup - Jan 24 2013 - Zettaset
Big Data Cloud Meetup - Jan 24 2013 - Zettaset
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
 
What Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will WinWhat Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will Win
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
 

Último

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 

Último (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 

Recommendation Engine Powered by Hadoop - Pranab Ghosh

  • 1. Recommendation Engine Powered by Hadoop PranabGhosh pkghosh@yahoo.com August 11th 2011 Meetup
  • 2. About me Started with numerical computation on main frames, followed by many years of C and C++ systems and real time programming, followed by many years of java, JEE and enterprise apps Worked for Oracle, HP, Yahoo, Motorola, many startups and mid size companies Currently Big Data consultant using Hadoop and other cloud related technologies Interested in Distributed Computation, Big Data, NOSQL DB and Data Mining. August 11th 2011 Meetup
  • 3. Hadoop Power of functional programming and parallel processing join hands to create Hadoop Basically parallel processing framework running on cluster of commodity machines Stateless functional programming because processing of each row of data does not depend upon any other row or any state Divide and conquer parallel processing. Data gets partitioned and each partition get processed by a separate mapper or reducer task. August 11th 2011 Meetup
  • 4. More About Hadoop Data locality, at least for the mapper. Code gets shipped to where the data partition resides Data is replicated, partitioned and resides in Hadoop Distributed File System (HDFS) Mapper output: {k -> v}. Reducer: input {k -> List(v)} Reducer output {k -> v} Many to many shuffle between mapper output and reducer input. Lot of network IO. Simple paradigm, but surprising solves an incredible array of problems. August 11th 2011 Meetup
  • 5. Recommendation Engine Does not require an introduction. You know it if you have visited Amazon or Netflix. We love it when they get right, hate it otherwise. Very computationally intensive, ideal for Hadoop processing. In memory based recommendation engines, the entire data set is used directly e.g collaboration filtering, content based recommendation engine In model based recommendation, a model is built first by training the data and then predictions made e.g., Bayesian, Clustering August 11th 2011 Meetup
  • 6. Content Based recommendation A memory based system, based purely on the attributes of an item only An item with p attributes is considered as a point in a p dimensional space. Uses nearest neighbor approach. Similar items are found using distance measurement in the p dimensional space. Useful for addressing the cold start problem i.e., a new item in introduced in the inventory. Computationally intensive. Not very useful for real time recommendation. August 11th 2011 Meetup
  • 7. Model Based Recommendation Based on traditional machine learning approach In contract to memory based algorithms, creates a learning model using the ratings as training data. The model is built offline as a batch process and saved. Model needs to be rebuilt when significant change in data is detected. Once the trained model is available, making recommendation is quick. Effective for real time recommendation. August 11th 2011 Meetup
  • 8. Collaboration Filter In collaboration filtering based recommendation engine, recommendations are made based not only the user’s rating but also rating by other users for the same item and some other items. Hence the name collaboration filtering. Requires social data i.e., user’s interest level for an item. It could be explicit e.g., product rating or implicit based on user’s interaction and behavior in a site. More appropriate name might be user intent based recommendation engine. Two approaches. In user based, similar users are found first. In item based, similar items are found first. August 11th 2011 Meetup
  • 9. Item Based or User Based? Item based CF is generally preferred. Similarity relationship between items is relatively static and stable, because items naturally map into many genres. User based CF is less preferred, because we humans are more complex than a laptop or smart phone (although some marketing folks may disagree). As we grow and go through life experiences, our interests change. Our similarity relationship in terms of common interests with other humans is more dynamic and change over time August 11th 2011 Meetup
  • 10. Utility Matrix Matrix of user and item. The cell contains a value indicative of the users interest level for that item e.g., rating. Matrix is sparse The purpose of recommendation engine is to predict the values for the empty cells based on available cell values Denser the matrix, better the quality of recommendation. But generally the matrix sparse. If I have rated item A and I need recommendation, enough users must have rated A as well as other items. August 11th 2011 Meetup
  • 11. Example Utility Matrix August 11th 2011 Meetup
  • 12. Rating Prediction Example Let’s say we are interested in predicting r35 i.e., rating of item i5 for user u3. Item based CF : r35 = (c52 x r32 + c54 x r34) / (c52 + c54) where items i2 and i4 are similar to i5 User based CF : r35 = (c31 x r15 + c32 x r25) / (c31 +c32) where users u1 and u2 are similar to u3 cij = similarity coefficient between items i and j or users i and j and rij = rating of item j by user i August 11th 2011 Meetup
  • 13. Rating Estimation In the previous slide, we assumed rating data for item, user pair was already available, through some rating mechanism a.k.a explicit rating. However there may not be a product rating feature available in a site. Even if the rating feature is there, many users may not use it.Evenif many users rate, explicit rating by users tend to be biased. We need a way to estimate rating based on user behavior in the site and some heuristic a.k.a implicit rating August 11th 2011 Meetup
  • 14. Heuristics for Rating: An Example August 11th 2011 Meetup
  • 15. Similarity computation For item based CF, the first step is finding similar items. For user based CF, the first step is finding similar users We will use Pearson Correlation Coefficient. It indicates how well a set of data points lie in a straight line. In a 2 dimensional space of 2 items, rating of the 2 items by an user is a data point. There are other similarity measure algorithms e.g., euclidian distance, cosine distance August 11th 2011 Meetup
  • 16. Pearson Correlation Coefficient c(i,j) = cov(i,j) / (stddev(i) * stddev(j)) cov(i,j) = sum ((r(u,i) - av(r(i)) * (r(u,j) - av(r(j))) / n stddev(i) = sqrt(sum((r(u,i) - av(r(i)) ** 2) / n) stddev(j) = sqrt(sum((r(u,j) - av(r(j)) ** 2) / n) The covariance can also be expressed in this alternative form, which we will be using cov(i,j) = sum(r(u,i) * r(u,j)) / n - av(r(i)) * av(r(j) c(i,j) = Pearson correlation coefficient between product i and j cov(i,j) = Covariance of rating for products i and j stddev(i) = Std deviation of rating for product i stddev(j) = Std deviation of rating for product j r(u,i) = Rating for user u for product i av(r(i)) = Average rating for product i over all users that rated sum = Sum over all users n = Num of data points August 11th 2011 Meetup
  • 17. Map Reduce We are going to have 2 MR jobs working in tandem for items based CF. Additional preprocessing MR jobs are also necessary to process click stream data. The first MR calculates correlation for all item pairs, based on rating data. Essentially finds similar items. The second MR takes the output of the first MR and the rating data for the user in question. The output is a list of items ranked by predicted rating August 11th 2011 Meetup
  • 18. Correlation Map Reduce It takes two kinds of input. The first kind has item id pair and two mean and std dev values for the ratings . This is generated by another pre processor MR. The second input has item rating for all users. This is generated by another preprocessor MR analyzing click stream data. Each row is for one user along with variable number of product ratings by an user August 11th 2011 Meetup
  • 19. Correlation Mapper Input August 11th 2011 Meetup
  • 20. Correlation Mapper Output The mapper produces two kinds of output. The first kind contains {pid1,pid2,0 -> m1,s1,m2ms2}. It’s the mean and std dev for a pid pair The second kind contains {pid1,pid2,1 -> r1xr2}. It’s the product of rating for the pidpair for some user. We are appending 0 and 1 to the mapper output key, for secondary sorting which will ensure that for a given pid pair, the reducer will receive the value of the first kind of record followed by multiple values of the second kind of mapper output August 11th 2011 Meetup
  • 21. Correlation Mapper Output August 11th 2011 Meetup
  • 22. Correlation Reducer Partitioner based on the first two tokens of key (pid1,pid2), so that the values for the same pid pair go to the same reducer Grouping comparator on the first two tokens of key (pid1,pid2), so that all the mapper out put for the same pid pair is treated as one group and passed to the reducer in one call The reducer output is pid pair and the corresponding correlation coefficient {pid1,pid2 -> c12} For a pid pair, the reducer has at it’s disposal all the data for Pearson correlation computation. August 11th 2011 Meetup
  • 23. Correlation Reducer Output August 11th 2011 Meetup
  • 24. Prediction Map Reduce This is the second MR that takes item correlation data which is the output of the first MR and the rating data for the target user. We are running this MR to make rating prediction and ultimately recommendation for an user. The user rating data is passed to Hadoop as so called “side data”. The mapper output consists of pid of an item as the key and the rating of the related item multiplied by the correlation coefficint and the correlation coefficient as the value. {pid1 -> rating(pid3) x c13, c13} August 11th 2011 Meetup
  • 25. Prediction Mapper Input August 11th 2011 Meetup
  • 26. Prediction Mapper Output August 11th 2011 Meetup
  • 27. Prediction Reducer The reducer gets a pid as a key and a list of tuples as value. Each tuple consists of weighted rating of a related item and the corresponding correlation coefficient. {pid1 -> [(pid3 x c31, c31), (pid5 x c51, c51),…..] The reducer sums up the weighted rating and divides the sum by sum of correlation value. This is the final predicted rating for an item. The reducer output is an item pid and the predicted rating for the item. All that remains is to sort the predicted ratings and use the top n items for making recommendation August 11th 2011 Meetup
  • 28. Realtime Prediction We would like to make recommendation when there is a significant event e.g., item gets put on a shopping cart. But Hadoop is an offline batch processing system. How do we circumvent that? We have to do pre computation and cache the results. There are 2 MR jobs: Correlation MR to calculate item correlation and Prediction MR to prediction rating. We should re run the 2 MR jobs as necessary when significant change in user item rating is detected August 11th 2011 Meetup
  • 29. Pre Computation As mentioned earlier item correlation is relatively stable and only needs to be re computed when there is significant change in the utility matrix Correlation MR for item similarity should be run only after significant over all change in utility matrix has been detected, since the last run. For a given user, which is basically a row in the utility matrix, if significant change is detected e.g., new rating by the user for a product is available, we should re run rating prediction MR for the user. August 11th 2011 Meetup
  • 30. Cold Start Problem How do we make recommendation when a new item is introduced in the inventory or a new user visits the site For new item, although we have no user interest data available we can use content based recommendation. Essentially, it’s similarity computation based on the attributes of the item only. For new user (cold user?) the problem is much harder, unless detailed user profile data is available. August 11th 2011 Meetup
  • 31. Some Temporal Issues When does an item have enough rating data to be accurately recommendable? How to define the threshold? When is there enough user rating, to be able to get good recommendations? How to define the threshold? How to deal with old ratings, as users interest shifts with passing time? When is there enough data in the utility matrix to bootstrap the recommendation system? August 11th 2011 Meetup
  • 32. Resources My 2 part blog posts on this topic at http://pkghosh.wordpress.com “Programming Collective Intelligence” by Toby Segaram, O’Reilly “Mining of Massive Datasets” by AnandRajaraman and Jeffrey Ullman August 11th 2011 Meetup