SlideShare una empresa de Scribd logo
1 de 53
Descargar para leer sin conexión
Recommendations
  in Production
     Alex MacCaw
Netflix Prize
Amazon.com
Facebook
Last.fm
StumbleUpon
Google Suggest
iTunes
Rotten Tomatoes
Yelp
Google Search
Chicken or Egg
• Google Reader
• IMDB
Acts As
Recommendable
Types of
   recommendations

• Content Based
• User Based
• Item Based
Programming Collective Intelligence
Has Many Through Relationship
User            Has Many Through
                                              Book

  Has Many                              Has Many




             UserBooks
              Can have score (rating)
User

class User < ActiveRecord::Base
  has_many :user_books
  has_many :books, :through => :user_books
  acts_as_recommendable :books, :through => :user_books
end
Gives you


User#similar_users
User#recommended_books
Book#similar_books
The algorithms
• Manhattan Distance
• Euclidean distance
• Cosine
• Pearson correlation coefficient
• Jaccard
• Levenshtein
How does it work?
Strategy

• Map data into Euclidean Space
• Calculate similarity
• Use similarities to recommend
The Black   John Tucker
          Knight      Must Die
James       4            5

Jonah       3            2

George      5            3

 Alex       4            2
5.00

                   3.75
The Black Knight
                   2.50

                   1.25

                     0
                          0   1.25   2.50   3.75   5.00



                      John Tucker Must Die
5.00

                   3.75
The Black Knight
                   2.50

                   1.25

                     0
                          0   1.25   2.50   3.75   5.00



                      John Tucker Must Die
item id
{
                        user id
    1 => {
       1 => 1.0,
       2 => 0.0,                  score
       ...
    },
    ...
}
[[1, 0.5554], [2, 0.888], [3, 0.8843], ...]
Problem 1

It was far too slow to calculate on the fly
                 (obvious)
SELECT   * FROM quot;usersquot;   WHERE (quot;usersquot;.quot;idquot; = 2)
SELECT   * FROM quot;booksquot;
SELECT   * FROM quot;usersquot;
SELECT   quot;user_booksquot;.*   FROM quot;user_booksquot; WHERE (quot;user_booksquot;.user_id IN (1,2,3,4,5,6,7,8,9,10))
SELECT   * FROM quot;booksquot;   WHERE (quot;booksquot;.quot;idquot; IN (11,6,12,7,13,8,14,9,15,1,2,19,20,3,10,4,5))
SELECT   * FROM quot;booksquot;   WHERE (quot;booksquot;.quot;idquot; IN (20,3,19,6))




                                              All books             All user_books
Solution
      Cache the dataset
        Build offline


rake recommendations:build
SELECT   *   FROM   quot;user_booksquot; WHERE (quot;user_booksquot;.user_id = 2)
SELECT   *   FROM   quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 5)
SELECT   *   FROM   quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 4)
SELECT   *   FROM   quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 8)
SELECT   *   FROM   quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 7)
SELECT   *   FROM   quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 2)
SELECT   *   FROM   quot;booksquot; WHERE (quot;booksquot;.quot;idquot; = 1)
Problem 2

Fetching the dataset took too
 long since it was so massive
Solution


Split up the cache by item
Rails.cache.write(
   quot;aar_books_1quot;, scores
 )
Problem 3

The dataset was so big it
     crashed Ruby!
Solution


Get rid of ActiveRecord

Only deal with integers
items = options[:on_class].connection.select_values(
  quot;SELECT id from #{options[:on_class].table_name}quot;
  ).collect(&:to_i)
Problem 4


It still crashed Ruby!
{
    1 => {
       1 => 1.0,
       2 => 0.0,
       ...
    },
    ...
}
Solution

Remove unnecessary
 cruft from dataset
{
    1 => {
       1 => 1.0,
       ...
    },
    ...
}
Problem 5

It was too slow
Solution


Re-write the slow bits in C
Details

• RubyInline
• Implemented Pearson
• Monkey patched original Ruby methods
• Very fast
InlineC = Module.new do
    inline do |builder|
      builder.c '
      #include <math.h>
      #include quot;ruby.hquot;
      double c_sim_pearson(VALUE items) {



Ruby Object
InlineC = Module.new do
        inline do |builder|
          builder.c '
          #include <math.h>
          #include quot;ruby.hquot;
          double c_sim_pearson(VALUE items) {




No Floats :(
Hash Lookup

if (!st_lookup(RHASH(prefs1)->tbl, items_a[i], &prefs1_item_ob)) {
  prefs1_item = 0.0;
} else {
  prefs1_item = NUM2DBL(prefs1_item_ob);
}
Conversion

return num / den;
Design Designs

• Not too many relationships
• Not to many ‘items’
• Similarity matrix for items, not users
Changing data
Scaling Even Further


• K Means clustering
• Split cluster by category
Adding ratings
ActiveRecord::Schema.define(:version => 1) do
  create_table quot;booksquot;, :force => true do |t|
    t.string   quot;namequot;
    t.datetime quot;created_atquot;
    t.datetime quot;updated_atquot;
  end

 create_table quot;user_booksquot;, :force => true do |t|
   t.integer quot;user_idquot;,    :null => false
   t.integer quot;book_idquot;,    :null => false
   t.integer quot;ratingquot;,     :default => 0
 end

  create_table   quot;usersquot;, :force => true do |t|
    t.string     quot;namequot;
    t.datetime   quot;created_atquot;
    t.datetime   quot;updated_atquot;
  end
end
class User < ActiveRecord::Base
  has_many :user_books
  has_many :books, :through => :user_books
  acts_as_recommendable :books, :through => :user_books, :score => :rating
end
That’s it
Improvements?


• Better API
• Perform calculations over a cluster (EC2)
  using Map/Nanite
class AARN < Nanite::Actor
  expose :sim_pearson

  def sim_pearson(item1, item2)
    Optimizations.c_sim_pearson(item1, item2)
  end
end
Questions?
              http://eribium.org/blog

           twitter : maccman
       email/jabber: maccman@gmail.com



http://github.com/maccman/acts_as_recommendable
             http://rubyurl.com/kUpk

Más contenido relacionado

Similar a Acts As Recommendable

development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...
development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...
development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...ActsAsCon
 
Relaxing With CouchDB
Relaxing With CouchDBRelaxing With CouchDB
Relaxing With CouchDBleinweber
 
Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)MongoSF
 
Rocky Mountain URISA Talk (June 2008)
Rocky Mountain URISA Talk (June 2008)Rocky Mountain URISA Talk (June 2008)
Rocky Mountain URISA Talk (June 2008)Dave Bouwman
 
Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsSerge Smetana
 
Symfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worldsSymfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worldsIgnacio Martín
 
Automated Frontend Testing
Automated Frontend TestingAutomated Frontend Testing
Automated Frontend TestingNeil Crosby
 
Automated Testing with Ruby
Automated Testing with RubyAutomated Testing with Ruby
Automated Testing with RubyKeith Pitty
 
Solving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with RailsSolving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with Railsfreelancing_god
 
Esoteric, Obfuscated, Artistic Programming in Ruby
Esoteric, Obfuscated, Artistic Programming in RubyEsoteric, Obfuscated, Artistic Programming in Ruby
Esoteric, Obfuscated, Artistic Programming in Rubymametter
 
Python fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanPython fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanWei-Yuan Chang
 
JSLT: JSON querying and transformation
JSLT: JSON querying and transformationJSLT: JSON querying and transformation
JSLT: JSON querying and transformationLars Marius Garshol
 
Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)Alexey Grigorev
 
Building unit tests correctly
Building unit tests correctlyBuilding unit tests correctly
Building unit tests correctlyDror Helper
 
Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012Michael Peacock
 

Similar a Acts As Recommendable (20)

development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...
development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...
development, ruby, conferences, frameworks, ruby on rails, confreaks, actsasc...
 
Relaxing With CouchDB
Relaxing With CouchDBRelaxing With CouchDB
Relaxing With CouchDB
 
Sphinx on Rails
Sphinx on RailsSphinx on Rails
Sphinx on Rails
 
Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)
 
Rocky Mountain URISA Talk (June 2008)
Rocky Mountain URISA Talk (June 2008)Rocky Mountain URISA Talk (June 2008)
Rocky Mountain URISA Talk (June 2008)
 
Performance Optimization of Rails Applications
Performance Optimization of Rails ApplicationsPerformance Optimization of Rails Applications
Performance Optimization of Rails Applications
 
Symfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worldsSymfony & Javascript. Combining the best of two worlds
Symfony & Javascript. Combining the best of two worlds
 
Cooking with Chef
Cooking with ChefCooking with Chef
Cooking with Chef
 
Automated Frontend Testing
Automated Frontend TestingAutomated Frontend Testing
Automated Frontend Testing
 
Automated Testing with Ruby
Automated Testing with RubyAutomated Testing with Ruby
Automated Testing with Ruby
 
Solving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with RailsSolving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with Rails
 
Esoteric, Obfuscated, Artistic Programming in Ruby
Esoteric, Obfuscated, Artistic Programming in RubyEsoteric, Obfuscated, Artistic Programming in Ruby
Esoteric, Obfuscated, Artistic Programming in Ruby
 
Python fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuanPython fundamentals - basic | WeiYuan
Python fundamentals - basic | WeiYuan
 
Couch db and_the_web
Couch db and_the_webCouch db and_the_web
Couch db and_the_web
 
JSLT: JSON querying and transformation
JSLT: JSON querying and transformationJSLT: JSON querying and transformation
JSLT: JSON querying and transformation
 
Cucumber
CucumberCucumber
Cucumber
 
MongoDB
MongoDBMongoDB
MongoDB
 
Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)Duplicates everywhere (Kiev)
Duplicates everywhere (Kiev)
 
Building unit tests correctly
Building unit tests correctlyBuilding unit tests correctly
Building unit tests correctly
 
Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012Dealing with Continuous Data Processing, ConFoo 2012
Dealing with Continuous Data Processing, ConFoo 2012
 

Último

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 

Último (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 

Acts As Recommendable