SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
BIG DATA FOR EBOOKS AND EREADERS 
INMAR GIVONI, PHD 
VP, BIG DATA, KOBO 
IGIVONI@KOBO.COM
A BIT ABOUT KOBO
A LOT OF BITS OF KOBO DATA
A LOT OF BITS OF KOBO DATA
TECHNOLOGY/TOOLS USED BY KOBO BIG DATA 
•Processing/Streaming: Hadoop, Storm, Flume 
•Storage/streaming: Hive, sql, redis, couchbase 
•Search: Solr+plugins 
•Languages: Python, Java, C++, R
ABOUT KOBO’S BIG DATA TEAM
BIG DATA IS LIKE TEENAGE SEX: EVERYONE TALKS ABOUT IT, NOBODY REALLY KNOWS HOW TO DO IT, EVERYONE THINKS EVERYONE ELSE IS DOING IT, SO EVERYONE CLAIMS THEY ARE DOING IT... 
DAN ARIELY
SEARCH
RECOMMENDATIONS
RELATED ITEMS THROUGH CONTENT ANALYSIS
SIMILARITY ANALYSIS
ADULT CONTENT CLASSIFICATION
BEYOND THE BOOK
•FIND INTERESTING THINGS IN A BOOK 
•RELATE THEM TO INTERESTING CONTENT 
FROM THE INTERNET
1. Key-term detection 
2. Disambiguation 
3. Filtering
KEY-TERM EXTRACTION 
•Approach: TF-IDF top terms 
•Only to Wikipedia terms 
•Greedy up-to-5-gram matching
DISAMBIGUATING KEY-TERMS 
•Problem : Disambiguation of item to Wikipedia articles mapping 
•Solution : choose Wikipedia articles that make sense collectively [1] 
[1] Local and Global Algorithms for Disambiguation to Wikipedia, ACL 2011
Gandalf 
Hobbit 
Bilbo Baggins
Gandalf 
Hobbit 
Bilbo Baggins
Gandalf 
Hobbit 
Bilbo Baggins
WIKIPEDIA TERM SIMILARITY 
•Using Google Similarity Distance
IN MORE TECHNICAL TERMS… 
•Max-weight k-clique in a k-partite graph problem. 
•We used a greedy approach
SOME SCREEN SHOTS
SOME SCREEN SHOTS
SOME SCREEN SHOTS
PAGE LAYOUT OPTIMIZATION
•MANY WAYS OF SHOWING CONTENT ON THE KOBO WEBPAGE 
•FIND BEST LAYOUT
MAIN PROBLEMS 
•Automatic generation of layouts 
•Finding optimal layout
LAYOUT GENERATION 
•Local search 
•Start from a handful of expert generated layouts 
•Use widget statistics to make informed swap/exchange steps
OPTIMIZATION FRAMEWORK – EXPLORE/EXPLOIT 
•Several actions an agent can take 
•Agent takes action, and observes payoff
•Pick a policy for choosing action with good trade-off 
•If you always take the best one so far, may be missing on better options 
•If you always explore new things you losses accumulate 
OPTIMIZATION FRAMEWORK – EXPLORE/EXPLOIT
MULTI-ARMED BANDIT MODEL 
•Maximize total expected reward 
•Bayesian framework (Agarwal et al [1]) 
•Context Sensitive Variant (Li et al [2]) 
[1] Explore/Exploit schemes for web content optimization, Yahoo Research, ICDM ‘09 
[2] A contextual-bandit approach to personalized news article recommendation, Yahoo Research, WWW ‘10
FEATURE VECTOR CONSTRUCTION 
•User information: 
•Purchase history, by genre, by price sensitivity, freshness 
•Browsing history 
•Geo 
•Extensions to 
•Optimize for profit (not CTR) 
•Account for different widget categories
UPCOMING PROJECTS
THANK YOU! 
QUESTIONS?

Más contenido relacionado

La actualidad más candente

Using AWS, Terraform, and Ansible to Automate Splunk at Scale
Using AWS, Terraform, and Ansible to Automate Splunk at ScaleUsing AWS, Terraform, and Ansible to Automate Splunk at Scale
Using AWS, Terraform, and Ansible to Automate Splunk at Scale
Data Works MD
 
Web Development using Ruby on Rails
Web Development using Ruby on RailsWeb Development using Ruby on Rails
Web Development using Ruby on Rails
Avi Kedar
 
Rise of the hybrids
Rise of the hybridsRise of the hybrids
Rise of the hybrids
Oron Ben Zvi
 

La actualidad más candente (20)

Becoming a more productive Rails Developer
Becoming a more productive Rails DeveloperBecoming a more productive Rails Developer
Becoming a more productive Rails Developer
 
Scaling High Traffic Web Applications
Scaling High Traffic Web ApplicationsScaling High Traffic Web Applications
Scaling High Traffic Web Applications
 
Using AWS, Terraform, and Ansible to Automate Splunk at Scale
Using AWS, Terraform, and Ansible to Automate Splunk at ScaleUsing AWS, Terraform, and Ansible to Automate Splunk at Scale
Using AWS, Terraform, and Ansible to Automate Splunk at Scale
 
Stackato v3
Stackato v3Stackato v3
Stackato v3
 
Choosing Javascript Libraries to Adopt for Development
Choosing Javascript Libraries to Adopt for DevelopmentChoosing Javascript Libraries to Adopt for Development
Choosing Javascript Libraries to Adopt for Development
 
Irb Tips and Tricks
Irb Tips and TricksIrb Tips and Tricks
Irb Tips and Tricks
 
CrossWorlds: Unleash the Power of Domino for Connections Development
CrossWorlds: Unleash the Power of Domino for Connections Development CrossWorlds: Unleash the Power of Domino for Connections Development
CrossWorlds: Unleash the Power of Domino for Connections Development
 
Markup languages and warp-speed documentation
Markup languages and warp-speed documentationMarkup languages and warp-speed documentation
Markup languages and warp-speed documentation
 
Georgia Tech Drupal Users Group - Local Drupal Development
Georgia Tech Drupal Users Group - Local Drupal DevelopmentGeorgia Tech Drupal Users Group - Local Drupal Development
Georgia Tech Drupal Users Group - Local Drupal Development
 
eMusic: WordPress in the Enterprise
eMusic: WordPress in the EnterpriseeMusic: WordPress in the Enterprise
eMusic: WordPress in the Enterprise
 
Tech Stack Ideas
Tech Stack IdeasTech Stack Ideas
Tech Stack Ideas
 
A tale of 3 databases
A tale of 3 databasesA tale of 3 databases
A tale of 3 databases
 
Should I use a document database?
Should I use a document database?Should I use a document database?
Should I use a document database?
 
Scaling
ScalingScaling
Scaling
 
DevOps Columbus Meetup Kickoff - Infrastructure as Code
DevOps Columbus Meetup Kickoff - Infrastructure as CodeDevOps Columbus Meetup Kickoff - Infrastructure as Code
DevOps Columbus Meetup Kickoff - Infrastructure as Code
 
Web Development using Ruby on Rails
Web Development using Ruby on RailsWeb Development using Ruby on Rails
Web Development using Ruby on Rails
 
Rupher = Ruby + Gopther
Rupher = Ruby + GoptherRupher = Ruby + Gopther
Rupher = Ruby + Gopther
 
Rupher
RupherRupher
Rupher
 
DownTheRabbitHole.js – How to Stay Sane in an Insane Ecosystem
DownTheRabbitHole.js – How to Stay Sane in an Insane EcosystemDownTheRabbitHole.js – How to Stay Sane in an Insane Ecosystem
DownTheRabbitHole.js – How to Stay Sane in an Insane Ecosystem
 
Rise of the hybrids
Rise of the hybridsRise of the hybrids
Rise of the hybrids
 

Similar a [Rakuten TechConf2014] [C-2] Big Data for eBooks and eReaders

The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
Lucidworks
 
AWS Enterprise Day | Closing Keynote - Data Without Limits, Dr Werner Vogels
AWS Enterprise Day | Closing Keynote - Data Without Limits, Dr Werner VogelsAWS Enterprise Day | Closing Keynote - Data Without Limits, Dr Werner Vogels
AWS Enterprise Day | Closing Keynote - Data Without Limits, Dr Werner Vogels
Amazon Web Services
 
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
MongoDB
 
Abernathy supervisors meeting 2013
Abernathy supervisors meeting 2013Abernathy supervisors meeting 2013
Abernathy supervisors meeting 2013
Andrew Abernathy
 

Similar a [Rakuten TechConf2014] [C-2] Big Data for eBooks and eReaders (20)

DevDay 2013 - Building Startups and Minimum Viable Products
DevDay 2013 - Building Startups and Minimum Viable ProductsDevDay 2013 - Building Startups and Minimum Viable Products
DevDay 2013 - Building Startups and Minimum Viable Products
 
Untangling spring week2
Untangling spring week2Untangling spring week2
Untangling spring week2
 
Big Data Day LA 2015 - Building a Big Data Culture in the Entertainment Indus...
Big Data Day LA 2015 - Building a Big Data Culture in the Entertainment Indus...Big Data Day LA 2015 - Building a Big Data Culture in the Entertainment Indus...
Big Data Day LA 2015 - Building a Big Data Culture in the Entertainment Indus...
 
Sept. 28, 2011 webcast become an expert google searcher in an hour stephan ...
Sept. 28, 2011 webcast become an expert google searcher in an hour   stephan ...Sept. 28, 2011 webcast become an expert google searcher in an hour   stephan ...
Sept. 28, 2011 webcast become an expert google searcher in an hour stephan ...
 
Social dev camp_2011
Social dev camp_2011Social dev camp_2011
Social dev camp_2011
 
Machine Learning with Hadoop Boston hug 2012
Machine Learning with Hadoop Boston hug 2012Machine Learning with Hadoop Boston hug 2012
Machine Learning with Hadoop Boston hug 2012
 
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4jRobotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
 
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
 
Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)Make Life Suck Less (Building Scalable Systems)
Make Life Suck Less (Building Scalable Systems)
 
JustEnoughDevOpsForDataScientists
JustEnoughDevOpsForDataScientistsJustEnoughDevOpsForDataScientists
JustEnoughDevOpsForDataScientists
 
Big Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeBig Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = Awesome
 
Practical Cyber: Lessons from 500,000 Miles of Security Evangelism
Practical Cyber: Lessons from 500,000 Miles of Security EvangelismPractical Cyber: Lessons from 500,000 Miles of Security Evangelism
Practical Cyber: Lessons from 500,000 Miles of Security Evangelism
 
Top to Bottom: SEO for Small Businesses
Top to Bottom: SEO for Small BusinessesTop to Bottom: SEO for Small Businesses
Top to Bottom: SEO for Small Businesses
 
AWS Enterprise Day | Closing Keynote - Data Without Limits, Dr Werner Vogels
AWS Enterprise Day | Closing Keynote - Data Without Limits, Dr Werner VogelsAWS Enterprise Day | Closing Keynote - Data Without Limits, Dr Werner Vogels
AWS Enterprise Day | Closing Keynote - Data Without Limits, Dr Werner Vogels
 
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
 
Abernathy supervisors meeting 2013
Abernathy supervisors meeting 2013Abernathy supervisors meeting 2013
Abernathy supervisors meeting 2013
 
Do This, Don't Do That: A Primer on Sitecore Development
Do This, Don't Do That: A Primer on Sitecore DevelopmentDo This, Don't Do That: A Primer on Sitecore Development
Do This, Don't Do That: A Primer on Sitecore Development
 
Optimize Your Funnel By Getting Inside Your Buyer's Head
Optimize Your Funnel By Getting Inside Your Buyer's HeadOptimize Your Funnel By Getting Inside Your Buyer's Head
Optimize Your Funnel By Getting Inside Your Buyer's Head
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
 
Haifa
HaifaHaifa
Haifa
 

Más de Rakuten Group, Inc.

Más de Rakuten Group, Inc. (20)

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり
 
What Makes Software Green?
What Makes Software Green?What Makes Software Green?
What Makes Software Green?
 
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組み
 
大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開
 
楽天における大規模データベースの運用
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用
 
楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー
 
楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割
 
Rakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdf
 
The Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfThe Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdf
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdf
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdf
 
How We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfHow We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdf
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
OWASPTop10_Introduction
OWASPTop10_IntroductionOWASPTop10_Introduction
OWASPTop10_Introduction
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technology
 
100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

[Rakuten TechConf2014] [C-2] Big Data for eBooks and eReaders