SlideShare una empresa de Scribd logo
1 de 13
GOTO Berlin Conference 2013

A use case of online machine learning
using Jubatus
2013/10/18
NTT DATA Corporation System Platforms Sector
OSS Professional Services
Toru Shimogaki

Copyright © 2013 NTT DATA Corporation
Who is Toru Shimogaki
 The Elephant Wizard as a team lead of “NTT DATA OSS
Professional Services”
 Deep and wide experience deploying Open Source
Software technologies for enterprise customers

10+ years
Contributor
Ex. pg_bulkload
Copyright © 2013 NTT DATA Corporation

6+ years
Leads Japanese
Hadoop Community

Co-Author
2nd Edition
in Japan
2
About Jubatus (1/2)
 OSS Machine Learning Platform developed by NTT
Research Laboratories and Preferred Infrastructure, Inc
 Motivation: Take the Right Action at the Right Time, at
the Right Place

Copyright © 2013 NTT DATA Corporation

(Source : Hadoop Summit 2013)

3
About Jubatus (2/2)
 Distributed Processing Framework and Streaming
Machine Learning Libraries
 Classification, Regression, Recommendation, Graph Mining,
Anomaly Detection
 Especially,
Jubatus Classifier has
small footprint and
responds with very low
latency. So it is easy
to scale for multiple
and simultaneous
requests.
(Source : Hadoop Summit 2013)
Copyright © 2013 NTT DATA Corporation

4
Background
 “SUUMO” : Online Service for Real-Estate Business
Improve usability for smartphone access
Navigate those who don’t know how to search
residences
Efficiently approach the first time customer and
non regular short-time access users

Copyright © 2013 NTT DATA Corporation

5
“SUUMO” Real-Estate Service : Customer Reach and Brand Awareness
Compared to our competitors , ”SUUMO” has the largest customer reach and has scored highest in
brand awareness.

Customer reach (Unique Users)
(Million)

Brand awareness
(%)

9.9
64.3
5.2

4.7
2.6
4.7

SUUMO

A
Real estate

B

C

SUUMO

4.4

9.2

A

B

C

Aided SUUMO awareness
SUUMO UU

77.2%

9.9million /month

Awareness of Character 【SUUMO】

2013 RECRUIT SUMAI CO., LTD All Rights Reserved

87.6%

6
Background
 “SUUMO” : Online Service for Real-Estate Business
Improve usability for smartphone access
Navigate those who don’t know how to search
residences
Efficiently approach the first time customer and
non regular short-time access users

Copyright © 2013 NTT DATA Corporation

7
Web service for SmartPhone (beta ver.)
 Repeatedly present two choices for candidates, then
system learns user’s flavor
Existing search service
simply list candidates with
filtering..

Resulted
recommendation

Present two typical
and salient choices.
Then a user simply
choose more
preferable one.

Present recommendation
based on acquired
user’s preference

Ask 10 times
with countdown
Copyright © 2013 NTT DATA Corporation

8
Inside this SUUMO Web service
 This SUUMO web services is implemented with the
combination of some algorithms
 Multidimentional Scaling (MDS)
 Jubatus classifier (Passive Aggressive algorithm)
 etc.

Copyright © 2013 NTT DATA Corporation

9
Building search space by MDS
 Building the search spaces for each station by Multidimensional
Scaling (MDS)
 daily batch processing using R

 Goal : to achieve O(log n) search (like binary search)
 Using MDS, convert multi dimensional vectors to lower dimensional one
with keeping distance relations among houses
Rent
Fraction on foot
Size
Deposit
Age of a building
...
Copyright © 2013 NTT DATA Corporation

10
Learning user’s flavor using Jubatus
 Classify user’s flavor for real-estate using Jubatus
 Goal : reflect user action to result on real-time (cannot by R)
 Using Passive Aggressive algorithm
 If score of the area becomes lower than threshold, remove it from search area
 Processing this classification with low latency, and easy to scale

Initial state

1 clicked

2 clicked

6 clicked

10 clicked

 Using this approach :
 Keep to include discrete candidate, which is excluded in MDS space search but a
user still have some interest with unexpected attributions.
 Keep sufficient diversion in order to present “salient” candidate. It is necessary to
have distant choices for estimating user’s preference without losing features
Copyright © 2013 NTT DATA Corporation

11
Wrap up
 Introduced a use case of online machine learning using Jubatus
 Now beta service for smartphone is released on SUUMO, which
is one of the largest residence service in Japan.
 Today I explained
 Building a Multi-dimensional Scaling search space
 Using Jubatus classifier to understand users preference adaptively

 Furute works
 Learn from similar users activity using logs
 Semantic analysis of sentences input by client

Copyright © 2013 NTT DATA Corporation

12
NTT DATA Corporation System Platforms Sector
OSS Professional Services
URL:
http://oss.nttdata.co.jp/hadoop/
mail: hadoop@kits.nttdata.co.jp
Copyright © 2013 NTT DATA Corporation

Más contenido relacionado

Destacado

小町のレス数が予測できるか試してみた
小町のレス数が予測できるか試してみた小町のレス数が予測できるか試してみた
小町のレス数が予測できるか試してみたJubatusOfficial
 
jubarecommenderの紹介
jubarecommenderの紹介jubarecommenderの紹介
jubarecommenderの紹介JubatusOfficial
 
新聞から今年の漢字を予測する
新聞から今年の漢字を予測する新聞から今年の漢字を予測する
新聞から今年の漢字を予測するJubatusOfficial
 
単語コレクター(文章自動校正器)
単語コレクター(文章自動校正器)単語コレクター(文章自動校正器)
単語コレクター(文章自動校正器)JubatusOfficial
 
かまってちゃん小町
かまってちゃん小町かまってちゃん小町
かまってちゃん小町JubatusOfficial
 
Jubatus 新機能ハイライト
Jubatus 新機能ハイライトJubatus 新機能ハイライト
Jubatus 新機能ハイライトJubatusOfficial
 
コンテンツマーケティングでレコメンドエンジンが必要になる背景とその活用
コンテンツマーケティングでレコメンドエンジンが必要になる背景とその活用コンテンツマーケティングでレコメンドエンジンが必要になる背景とその活用
コンテンツマーケティングでレコメンドエンジンが必要になる背景とその活用JubatusOfficial
 
データ圧縮アルゴリズムを用いたマルウェア感染通信ログの判定
データ圧縮アルゴリズムを用いたマルウェア感染通信ログの判定データ圧縮アルゴリズムを用いたマルウェア感染通信ログの判定
データ圧縮アルゴリズムを用いたマルウェア感染通信ログの判定JubatusOfficial
 
まだCPUで消耗してるの?Jubatusによる近傍探索のGPUを利用した高速化
まだCPUで消耗してるの?Jubatusによる近傍探索のGPUを利用した高速化まだCPUで消耗してるの?Jubatusによる近傍探索のGPUを利用した高速化
まだCPUで消耗してるの?Jubatusによる近傍探索のGPUを利用した高速化JubatusOfficial
 
Jubakit の紹介
Jubakit の紹介Jubakit の紹介
Jubakit の紹介kmaehashi
 
発言小町からのプロファイリング
発言小町からのプロファイリング発言小町からのプロファイリング
発言小町からのプロファイリングJubatusOfficial
 
地域の魅力を伝えるツアーガイドAI
地域の魅力を伝えるツアーガイドAI地域の魅力を伝えるツアーガイドAI
地域の魅力を伝えるツアーガイドAIJubatusOfficial
 
機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual Talks機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual TalksYuya Unno
 

Destacado (20)

小町のレス数が予測できるか試してみた
小町のレス数が予測できるか試してみた小町のレス数が予測できるか試してみた
小町のレス数が予測できるか試してみた
 
jubarecommenderの紹介
jubarecommenderの紹介jubarecommenderの紹介
jubarecommenderの紹介
 
新聞から今年の漢字を予測する
新聞から今年の漢字を予測する新聞から今年の漢字を予測する
新聞から今年の漢字を予測する
 
単語コレクター(文章自動校正器)
単語コレクター(文章自動校正器)単語コレクター(文章自動校正器)
単語コレクター(文章自動校正器)
 
jubabanditの紹介
jubabanditの紹介jubabanditの紹介
jubabanditの紹介
 
かまってちゃん小町
かまってちゃん小町かまってちゃん小町
かまってちゃん小町
 
JubaQLご紹介
JubaQLご紹介JubaQLご紹介
JubaQLご紹介
 
Jubatus 新機能ハイライト
Jubatus 新機能ハイライトJubatus 新機能ハイライト
Jubatus 新機能ハイライト
 
コンテンツマーケティングでレコメンドエンジンが必要になる背景とその活用
コンテンツマーケティングでレコメンドエンジンが必要になる背景とその活用コンテンツマーケティングでレコメンドエンジンが必要になる背景とその活用
コンテンツマーケティングでレコメンドエンジンが必要になる背景とその活用
 
Jubaanomalyについて
JubaanomalyについてJubaanomalyについて
Jubaanomalyについて
 
データ圧縮アルゴリズムを用いたマルウェア感染通信ログの判定
データ圧縮アルゴリズムを用いたマルウェア感染通信ログの判定データ圧縮アルゴリズムを用いたマルウェア感染通信ログの判定
データ圧縮アルゴリズムを用いたマルウェア感染通信ログの判定
 
まだCPUで消耗してるの?Jubatusによる近傍探索のGPUを利用した高速化
まだCPUで消耗してるの?Jubatusによる近傍探索のGPUを利用した高速化まだCPUで消耗してるの?Jubatusによる近傍探索のGPUを利用した高速化
まだCPUで消耗してるの?Jubatusによる近傍探索のGPUを利用した高速化
 
Jubakit の紹介
Jubakit の紹介Jubakit の紹介
Jubakit の紹介
 
発言小町からのプロファイリング
発言小町からのプロファイリング発言小町からのプロファイリング
発言小町からのプロファイリング
 
銀座のママ
銀座のママ銀座のママ
銀座のママ
 
小町の溜息
小町の溜息小町の溜息
小町の溜息
 
JUBARHYME
JUBARHYMEJUBARHYME
JUBARHYME
 
Jubatus 1.0 の紹介
Jubatus 1.0 の紹介Jubatus 1.0 の紹介
Jubatus 1.0 の紹介
 
地域の魅力を伝えるツアーガイドAI
地域の魅力を伝えるツアーガイドAI地域の魅力を伝えるツアーガイドAI
地域の魅力を伝えるツアーガイドAI
 
機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual Talks機械学習チュートリアル@Jubatus Casual Talks
機械学習チュートリアル@Jubatus Casual Talks
 

Similar a A use case of online machine learning using Jubatus

Guidelines for Android application design.pptx
Guidelines for Android application design.pptxGuidelines for Android application design.pptx
Guidelines for Android application design.pptxdebasish duarah
 
Why an innovative mobile strategy needs a robust API
Why an innovative mobile strategy needs a robust APIWhy an innovative mobile strategy needs a robust API
Why an innovative mobile strategy needs a robust APIManmohan Gupta
 
Why an Innovative Mobile Strategy Requires a Robust API
Why an Innovative Mobile Strategy Requires a Robust API Why an Innovative Mobile Strategy Requires a Robust API
Why an Innovative Mobile Strategy Requires a Robust API Software AG
 
IRJET- Location based Voice Reminder
IRJET-  	  Location based Voice ReminderIRJET-  	  Location based Voice Reminder
IRJET- Location based Voice ReminderIRJET Journal
 
IRJET- Chore Pay – Magnified Task Colligation
IRJET-  	  Chore Pay – Magnified Task ColligationIRJET-  	  Chore Pay – Magnified Task Colligation
IRJET- Chore Pay – Magnified Task ColligationIRJET Journal
 
IRJET- Voice Controlled Personal Assistant Bot with Smart Storage
IRJET- Voice Controlled Personal Assistant Bot with Smart StorageIRJET- Voice Controlled Personal Assistant Bot with Smart Storage
IRJET- Voice Controlled Personal Assistant Bot with Smart StorageIRJET Journal
 
How Changing Mobile Technology Is Changing The Way We Do Business
How Changing Mobile Technology Is Changing The Way We Do Business How Changing Mobile Technology Is Changing The Way We Do Business
How Changing Mobile Technology Is Changing The Way We Do Business Osaka University
 
Azure WP7 fire starter
Azure WP7 fire starterAzure WP7 fire starter
Azure WP7 fire starterSam Basu
 
Fujitsu IT Future 2013 : Touching the Cloud par Joseph Reger CTO Fujitsu Tech...
Fujitsu IT Future 2013 : Touching the Cloud par Joseph Reger CTO Fujitsu Tech...Fujitsu IT Future 2013 : Touching the Cloud par Joseph Reger CTO Fujitsu Tech...
Fujitsu IT Future 2013 : Touching the Cloud par Joseph Reger CTO Fujitsu Tech...Fujitsu France
 
Running Head HUMAN-COMPUTER INTERFACE .docx
Running Head HUMAN-COMPUTER INTERFACE                            .docxRunning Head HUMAN-COMPUTER INTERFACE                            .docx
Running Head HUMAN-COMPUTER INTERFACE .docxwlynn1
 
Running Head HUMAN-COMPUTER INTERFACE .docx
Running Head HUMAN-COMPUTER INTERFACE                            .docxRunning Head HUMAN-COMPUTER INTERFACE                            .docx
Running Head HUMAN-COMPUTER INTERFACE .docxjeanettehully
 
13Running Head HUMAN-COMPUTER INTERFACEHuman-.docx
13Running Head HUMAN-COMPUTER INTERFACEHuman-.docx13Running Head HUMAN-COMPUTER INTERFACEHuman-.docx
13Running Head HUMAN-COMPUTER INTERFACEHuman-.docxjesusamckone
 
13Running Head HUMAN-COMPUTER INTERFACEHuman-.docx
13Running Head HUMAN-COMPUTER INTERFACEHuman-.docx13Running Head HUMAN-COMPUTER INTERFACEHuman-.docx
13Running Head HUMAN-COMPUTER INTERFACEHuman-.docxaulasnilda
 
SAP Screen Personas and SAP Fiori session from TechEd 2013
SAP Screen Personas and SAP Fiori session from TechEd 2013SAP Screen Personas and SAP Fiori session from TechEd 2013
SAP Screen Personas and SAP Fiori session from TechEd 2013Peter Spielvogel
 
1Running Head HUMAN-COMPUTER INTERFACE .docx
1Running Head HUMAN-COMPUTER INTERFACE                     .docx1Running Head HUMAN-COMPUTER INTERFACE                     .docx
1Running Head HUMAN-COMPUTER INTERFACE .docxherminaprocter
 
Role of Operators in the Mobile App Delivery Ecosystem
Role of Operators in the Mobile App Delivery EcosystemRole of Operators in the Mobile App Delivery Ecosystem
Role of Operators in the Mobile App Delivery EcosystemRelayware
 
Human Computer Interaction .docx
Human Computer  Interaction .docxHuman Computer  Interaction .docx
Human Computer Interaction .docxsaeed afridi
 
Mobile UX breakfast briefing - Dubai september 2013
Mobile UX breakfast briefing - Dubai september 2013Mobile UX breakfast briefing - Dubai september 2013
Mobile UX breakfast briefing - Dubai september 2013User Vision
 
Survey, comparison & evaluation of cross platform mobile application developm...
Survey, comparison & evaluation of cross platform mobile application developm...Survey, comparison & evaluation of cross platform mobile application developm...
Survey, comparison & evaluation of cross platform mobile application developm...Soumya Kanti Datta
 

Similar a A use case of online machine learning using Jubatus (20)

Guidelines for Android application design.pptx
Guidelines for Android application design.pptxGuidelines for Android application design.pptx
Guidelines for Android application design.pptx
 
Why an innovative mobile strategy needs a robust API
Why an innovative mobile strategy needs a robust APIWhy an innovative mobile strategy needs a robust API
Why an innovative mobile strategy needs a robust API
 
Why an Innovative Mobile Strategy Requires a Robust API
Why an Innovative Mobile Strategy Requires a Robust API Why an Innovative Mobile Strategy Requires a Robust API
Why an Innovative Mobile Strategy Requires a Robust API
 
Rajput Bandhu
Rajput BandhuRajput Bandhu
Rajput Bandhu
 
IRJET- Location based Voice Reminder
IRJET-  	  Location based Voice ReminderIRJET-  	  Location based Voice Reminder
IRJET- Location based Voice Reminder
 
IRJET- Chore Pay – Magnified Task Colligation
IRJET-  	  Chore Pay – Magnified Task ColligationIRJET-  	  Chore Pay – Magnified Task Colligation
IRJET- Chore Pay – Magnified Task Colligation
 
IRJET- Voice Controlled Personal Assistant Bot with Smart Storage
IRJET- Voice Controlled Personal Assistant Bot with Smart StorageIRJET- Voice Controlled Personal Assistant Bot with Smart Storage
IRJET- Voice Controlled Personal Assistant Bot with Smart Storage
 
How Changing Mobile Technology Is Changing The Way We Do Business
How Changing Mobile Technology Is Changing The Way We Do Business How Changing Mobile Technology Is Changing The Way We Do Business
How Changing Mobile Technology Is Changing The Way We Do Business
 
Azure WP7 fire starter
Azure WP7 fire starterAzure WP7 fire starter
Azure WP7 fire starter
 
Fujitsu IT Future 2013 : Touching the Cloud par Joseph Reger CTO Fujitsu Tech...
Fujitsu IT Future 2013 : Touching the Cloud par Joseph Reger CTO Fujitsu Tech...Fujitsu IT Future 2013 : Touching the Cloud par Joseph Reger CTO Fujitsu Tech...
Fujitsu IT Future 2013 : Touching the Cloud par Joseph Reger CTO Fujitsu Tech...
 
Running Head HUMAN-COMPUTER INTERFACE .docx
Running Head HUMAN-COMPUTER INTERFACE                            .docxRunning Head HUMAN-COMPUTER INTERFACE                            .docx
Running Head HUMAN-COMPUTER INTERFACE .docx
 
Running Head HUMAN-COMPUTER INTERFACE .docx
Running Head HUMAN-COMPUTER INTERFACE                            .docxRunning Head HUMAN-COMPUTER INTERFACE                            .docx
Running Head HUMAN-COMPUTER INTERFACE .docx
 
13Running Head HUMAN-COMPUTER INTERFACEHuman-.docx
13Running Head HUMAN-COMPUTER INTERFACEHuman-.docx13Running Head HUMAN-COMPUTER INTERFACEHuman-.docx
13Running Head HUMAN-COMPUTER INTERFACEHuman-.docx
 
13Running Head HUMAN-COMPUTER INTERFACEHuman-.docx
13Running Head HUMAN-COMPUTER INTERFACEHuman-.docx13Running Head HUMAN-COMPUTER INTERFACEHuman-.docx
13Running Head HUMAN-COMPUTER INTERFACEHuman-.docx
 
SAP Screen Personas and SAP Fiori session from TechEd 2013
SAP Screen Personas and SAP Fiori session from TechEd 2013SAP Screen Personas and SAP Fiori session from TechEd 2013
SAP Screen Personas and SAP Fiori session from TechEd 2013
 
1Running Head HUMAN-COMPUTER INTERFACE .docx
1Running Head HUMAN-COMPUTER INTERFACE                     .docx1Running Head HUMAN-COMPUTER INTERFACE                     .docx
1Running Head HUMAN-COMPUTER INTERFACE .docx
 
Role of Operators in the Mobile App Delivery Ecosystem
Role of Operators in the Mobile App Delivery EcosystemRole of Operators in the Mobile App Delivery Ecosystem
Role of Operators in the Mobile App Delivery Ecosystem
 
Human Computer Interaction .docx
Human Computer  Interaction .docxHuman Computer  Interaction .docx
Human Computer Interaction .docx
 
Mobile UX breakfast briefing - Dubai september 2013
Mobile UX breakfast briefing - Dubai september 2013Mobile UX breakfast briefing - Dubai september 2013
Mobile UX breakfast briefing - Dubai september 2013
 
Survey, comparison & evaluation of cross platform mobile application developm...
Survey, comparison & evaluation of cross platform mobile application developm...Survey, comparison & evaluation of cross platform mobile application developm...
Survey, comparison & evaluation of cross platform mobile application developm...
 

Más de NTT DATA OSS Professional Services

Global Top 5 を目指す NTT DATA の確かで意外な技術力
Global Top 5 を目指す NTT DATA の確かで意外な技術力Global Top 5 を目指す NTT DATA の確かで意外な技術力
Global Top 5 を目指す NTT DATA の確かで意外な技術力NTT DATA OSS Professional Services
 
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~NTT DATA OSS Professional Services
 
PostgreSQL10を導入!大規模データ分析事例からみるDWHとしてのPostgreSQL活用のポイント
PostgreSQL10を導入!大規模データ分析事例からみるDWHとしてのPostgreSQL活用のポイントPostgreSQL10を導入!大規模データ分析事例からみるDWHとしてのPostgreSQL活用のポイント
PostgreSQL10を導入!大規模データ分析事例からみるDWHとしてのPostgreSQL活用のポイントNTT DATA OSS Professional Services
 
SIerとオープンソースの美味しい関係 ~コミュニティの力を活かして世界を目指そう~
SIerとオープンソースの美味しい関係 ~コミュニティの力を活かして世界を目指そう~SIerとオープンソースの美味しい関係 ~コミュニティの力を活かして世界を目指そう~
SIerとオープンソースの美味しい関係 ~コミュニティの力を活かして世界を目指そう~NTT DATA OSS Professional Services
 
データ活用をもっともっと円滑に! ~データ処理・分析基盤編を少しだけ~
データ活用をもっともっと円滑に!~データ処理・分析基盤編を少しだけ~データ活用をもっともっと円滑に!~データ処理・分析基盤編を少しだけ~
データ活用をもっともっと円滑に! ~データ処理・分析基盤編を少しだけ~NTT DATA OSS Professional Services
 
商用ミドルウェアのPuppet化で気を付けたい5つのこと
商用ミドルウェアのPuppet化で気を付けたい5つのこと商用ミドルウェアのPuppet化で気を付けたい5つのこと
商用ミドルウェアのPuppet化で気を付けたい5つのことNTT DATA OSS Professional Services
 

Más de NTT DATA OSS Professional Services (20)

Global Top 5 を目指す NTT DATA の確かで意外な技術力
Global Top 5 を目指す NTT DATA の確かで意外な技術力Global Top 5 を目指す NTT DATA の確かで意外な技術力
Global Top 5 を目指す NTT DATA の確かで意外な技術力
 
Spark SQL - The internal -
Spark SQL - The internal -Spark SQL - The internal -
Spark SQL - The internal -
 
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
 
Hadoopエコシステムのデータストア振り返り
Hadoopエコシステムのデータストア振り返りHadoopエコシステムのデータストア振り返り
Hadoopエコシステムのデータストア振り返り
 
HDFS Router-based federation
HDFS Router-based federationHDFS Router-based federation
HDFS Router-based federation
 
PostgreSQL10を導入!大規模データ分析事例からみるDWHとしてのPostgreSQL活用のポイント
PostgreSQL10を導入!大規模データ分析事例からみるDWHとしてのPostgreSQL活用のポイントPostgreSQL10を導入!大規模データ分析事例からみるDWHとしてのPostgreSQL活用のポイント
PostgreSQL10を導入!大規模データ分析事例からみるDWHとしてのPostgreSQL活用のポイント
 
Apache Hadoopの新機能Ozoneの現状
Apache Hadoopの新機能Ozoneの現状Apache Hadoopの新機能Ozoneの現状
Apache Hadoopの新機能Ozoneの現状
 
Distributed data stores in Hadoop ecosystem
Distributed data stores in Hadoop ecosystemDistributed data stores in Hadoop ecosystem
Distributed data stores in Hadoop ecosystem
 
Structured Streaming - The Internal -
Structured Streaming - The Internal -Structured Streaming - The Internal -
Structured Streaming - The Internal -
 
Apache Hadoopの未来 3系になって何が変わるのか?
Apache Hadoopの未来 3系になって何が変わるのか?Apache Hadoopの未来 3系になって何が変わるのか?
Apache Hadoopの未来 3系になって何が変わるのか?
 
Apache Hadoop and YARN, current development status
Apache Hadoop and YARN, current development statusApache Hadoop and YARN, current development status
Apache Hadoop and YARN, current development status
 
HDFS basics from API perspective
HDFS basics from API perspectiveHDFS basics from API perspective
HDFS basics from API perspective
 
SIerとオープンソースの美味しい関係 ~コミュニティの力を活かして世界を目指そう~
SIerとオープンソースの美味しい関係 ~コミュニティの力を活かして世界を目指そう~SIerとオープンソースの美味しい関係 ~コミュニティの力を活かして世界を目指そう~
SIerとオープンソースの美味しい関係 ~コミュニティの力を活かして世界を目指そう~
 
20170303 java9 hadoop
20170303 java9 hadoop20170303 java9 hadoop
20170303 java9 hadoop
 
ブロックチェーンの仕組みと動向(入門編)
ブロックチェーンの仕組みと動向(入門編)ブロックチェーンの仕組みと動向(入門編)
ブロックチェーンの仕組みと動向(入門編)
 
Application of postgre sql to large social infrastructure jp
Application of postgre sql to large social infrastructure jpApplication of postgre sql to large social infrastructure jp
Application of postgre sql to large social infrastructure jp
 
Application of postgre sql to large social infrastructure
Application of postgre sql to large social infrastructureApplication of postgre sql to large social infrastructure
Application of postgre sql to large social infrastructure
 
Apache Hadoop 2.8.0 の新機能 (抜粋)
Apache Hadoop 2.8.0 の新機能 (抜粋)Apache Hadoop 2.8.0 の新機能 (抜粋)
Apache Hadoop 2.8.0 の新機能 (抜粋)
 
データ活用をもっともっと円滑に! ~データ処理・分析基盤編を少しだけ~
データ活用をもっともっと円滑に!~データ処理・分析基盤編を少しだけ~データ活用をもっともっと円滑に!~データ処理・分析基盤編を少しだけ~
データ活用をもっともっと円滑に! ~データ処理・分析基盤編を少しだけ~
 
商用ミドルウェアのPuppet化で気を付けたい5つのこと
商用ミドルウェアのPuppet化で気を付けたい5つのこと商用ミドルウェアのPuppet化で気を付けたい5つのこと
商用ミドルウェアのPuppet化で気を付けたい5つのこと
 

Último

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 

Último (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

A use case of online machine learning using Jubatus

  • 1. GOTO Berlin Conference 2013 A use case of online machine learning using Jubatus 2013/10/18 NTT DATA Corporation System Platforms Sector OSS Professional Services Toru Shimogaki Copyright © 2013 NTT DATA Corporation
  • 2. Who is Toru Shimogaki  The Elephant Wizard as a team lead of “NTT DATA OSS Professional Services”  Deep and wide experience deploying Open Source Software technologies for enterprise customers 10+ years Contributor Ex. pg_bulkload Copyright © 2013 NTT DATA Corporation 6+ years Leads Japanese Hadoop Community Co-Author 2nd Edition in Japan 2
  • 3. About Jubatus (1/2)  OSS Machine Learning Platform developed by NTT Research Laboratories and Preferred Infrastructure, Inc  Motivation: Take the Right Action at the Right Time, at the Right Place Copyright © 2013 NTT DATA Corporation (Source : Hadoop Summit 2013) 3
  • 4. About Jubatus (2/2)  Distributed Processing Framework and Streaming Machine Learning Libraries  Classification, Regression, Recommendation, Graph Mining, Anomaly Detection  Especially, Jubatus Classifier has small footprint and responds with very low latency. So it is easy to scale for multiple and simultaneous requests. (Source : Hadoop Summit 2013) Copyright © 2013 NTT DATA Corporation 4
  • 5. Background  “SUUMO” : Online Service for Real-Estate Business Improve usability for smartphone access Navigate those who don’t know how to search residences Efficiently approach the first time customer and non regular short-time access users Copyright © 2013 NTT DATA Corporation 5
  • 6. “SUUMO” Real-Estate Service : Customer Reach and Brand Awareness Compared to our competitors , ”SUUMO” has the largest customer reach and has scored highest in brand awareness. Customer reach (Unique Users) (Million) Brand awareness (%) 9.9 64.3 5.2 4.7 2.6 4.7 SUUMO A Real estate B C SUUMO 4.4 9.2 A B C Aided SUUMO awareness SUUMO UU 77.2% 9.9million /month Awareness of Character 【SUUMO】 2013 RECRUIT SUMAI CO., LTD All Rights Reserved 87.6% 6
  • 7. Background  “SUUMO” : Online Service for Real-Estate Business Improve usability for smartphone access Navigate those who don’t know how to search residences Efficiently approach the first time customer and non regular short-time access users Copyright © 2013 NTT DATA Corporation 7
  • 8. Web service for SmartPhone (beta ver.)  Repeatedly present two choices for candidates, then system learns user’s flavor Existing search service simply list candidates with filtering.. Resulted recommendation Present two typical and salient choices. Then a user simply choose more preferable one. Present recommendation based on acquired user’s preference Ask 10 times with countdown Copyright © 2013 NTT DATA Corporation 8
  • 9. Inside this SUUMO Web service  This SUUMO web services is implemented with the combination of some algorithms  Multidimentional Scaling (MDS)  Jubatus classifier (Passive Aggressive algorithm)  etc. Copyright © 2013 NTT DATA Corporation 9
  • 10. Building search space by MDS  Building the search spaces for each station by Multidimensional Scaling (MDS)  daily batch processing using R  Goal : to achieve O(log n) search (like binary search)  Using MDS, convert multi dimensional vectors to lower dimensional one with keeping distance relations among houses Rent Fraction on foot Size Deposit Age of a building ... Copyright © 2013 NTT DATA Corporation 10
  • 11. Learning user’s flavor using Jubatus  Classify user’s flavor for real-estate using Jubatus  Goal : reflect user action to result on real-time (cannot by R)  Using Passive Aggressive algorithm  If score of the area becomes lower than threshold, remove it from search area  Processing this classification with low latency, and easy to scale Initial state 1 clicked 2 clicked 6 clicked 10 clicked  Using this approach :  Keep to include discrete candidate, which is excluded in MDS space search but a user still have some interest with unexpected attributions.  Keep sufficient diversion in order to present “salient” candidate. It is necessary to have distant choices for estimating user’s preference without losing features Copyright © 2013 NTT DATA Corporation 11
  • 12. Wrap up  Introduced a use case of online machine learning using Jubatus  Now beta service for smartphone is released on SUUMO, which is one of the largest residence service in Japan.  Today I explained  Building a Multi-dimensional Scaling search space  Using Jubatus classifier to understand users preference adaptively  Furute works  Learn from similar users activity using logs  Semantic analysis of sentences input by client Copyright © 2013 NTT DATA Corporation 12
  • 13. NTT DATA Corporation System Platforms Sector OSS Professional Services URL: http://oss.nttdata.co.jp/hadoop/ mail: hadoop@kits.nttdata.co.jp Copyright © 2013 NTT DATA Corporation