SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
楽天における機械学習アルゴリズムの活用
Yu Hirate, Dr. Eng.
Rakuten Institute of Technology Tokyo,
Rakuten, Inc.
2
平手勇宇
 Principal Scientist, Rakuten Institute of Technology
Manager, Intelligence domain research group
 Bio.
• 2005-2008 CS div. graduate school of Science and Engineering,
Waseda University.
• 2006-2009 Research Associate, Media Network Center, Waseda
University.
• 2009- current Rakuten Institute of Technology.
Working on projects for extracting knowledge from large scale of data
by utilizing data mining, machine learning technologies.
3
Masaya Mori
Global head
• Established in 2006.
• Launched R.I.T. NY in 2010.
• Launched R.I.T. Paris in 2014.
• Launched R.I.T. Singapore / Boston in 2015.
Strategic R&D organization for Rakuten Group
4
+100 researchers in 5 locations
5
3 research groups for adapting with Internet growth
RealityIntelligencePower
• HCI
• AR / VR
• Image Processing
• Distributed Computing
• HPC
• IoT
• Machine Learning
• Deep Learning
• NLP
• Data Mining
6
Optimizing A/B testing
Item Classification
User Segmentation
AI
Coupon Distribution
Recommender System Economy Prediction / Demand Prediction
Review Analysis
Anomaly Detection / Fraud Detection
Image Recognition
8
Huge!
Unstructured!
241 million items
Num.ofItems(million)
date
https://item.rakuten.co.jp/kawahara/345812/
9
Why are we working on this problem? (Key Benefits)
‣ To organize our catalog in accordance with customer
expectations
‣ To precisely search our catalog for products and its variants
‣ To measure and enforce merchant KPI's.
What are we doing? (Key Tasks)
‣ Product Genre Classification
‣ Attribute Extraction from Product Information
‣ Merchant and Item Review Analysis
How are we doing? (Key Technologies)
‣ Large-Scale Gradient Boosted Decision Trees
‣ Deep Learning (RNN's, CNN's, others)
‣ Computing Massive Number of NLP Features
Product Catalog
Businesses
10
Each product can be assigned a category and attributes. For instance:
+Category Grocery & food
Subcategory Wine
Each (sub)category has a number of relevant attributes with a list of valid values
Challenge: this structured information is not always present or correct
Goal: automatically predict category and attributes from text and/or images
https://item.rakuten.co.jp/kawahara/345812/
11
Classifier based on
Deep Learning Algorithm (CNN)
Prec@1 92%
Prec@10 99%
Classifier based on
Deep Learning Algorithm(CNN)
Prec@1 57%
Prec@3 75%
Extracting Words
* Tested to Ichiba L3 category (1.5K categories)
* Tested for PriceMinister Image Data
Text Data
• Item Title
• Item Description
Image Data
12
Hobby and Entertainment
> Books and Magazine
> Business Electronics
> Audio
> Earphone / Headphone
Electronics
> Smartphone
> AC Adaptor / Battery
13
14
15
Detect prospective applicants from Ichiba purchasers
by using their purchase trends and demographics
Ichiba Active
Users Prospective
Applicants
Extract a finance service
16
Ichiba
Active UsersOverlap
7,413
Positive Samples
7,417
(Negative) Samples
About 50% of contractors of the Fintech service were Ichiba Active Users.
17
抽出されたユーザ行動モデル:重要なファクタ
0 0.1 0.2 0.3
genre_41_100890_ / 花・ガーデン・DIY / DIY・工具
genre_72_111078_ / キッズ・ベビー・マタニティ / キッズ
genre_50_110983_ / 靴 / メンズ靴
Age-05-[35-40]
genre_93_101077_ / スポーツ・アウトドア / ゴルフ
Area-01-Kanto
Area-00-Others
genre_113_101126_ / 車用品・バイク用品 / カー用品
Age-08-[50-*]
Age-03-[25-30]
Age-00-none
gms
Gender-00-none
basket_max_price
frequency
basket_average_price
average_unit_price
Age-02-[20-25]
Gender-02-female
Gender-01-male
Top 20 factors selected from 141 factors
市場ジャンル/車・バイク/車用品・バイク用品
市場ジャンル /スポーツ・アウトドア/ゴルフ用品
市場ジャンル/靴/メンズ靴
市場ジャンル/キッズ・ベビー・マタニティ/キッズ
市場ジャンル/ガーデン・DIY・工具/ DIY・工具
購買商品の平均単価
一回あたりの購買金額の平均値
購買頻度
一回あたりの購買金額の最大値
購買金額総計
18
Prospective Users Control Group
• Randomly Selected
• About 300,000 users
• Score >= 0.8
• About 300,000 users
Send ichiba mail magazine to two groups
Ichiba Mail Magazine
19
Mail Deliver
Open Mail
Click Contents
(Visit Service
Page)
Click Rate went up by +49.23%
compared with control group
+3.52% +49.23%
20
21
我们真的很有诚意了。
你说我一个老总都亲
自跑了好几趟了。
Machine
translation
is a Rakuten group company which provides video streaming service.
Volunteers are editing subtitles and translated subtitles.
https://www.viki.com/?locale=ja
22
 Translate from Chinese to English sentences
 Extracted 10,000 Chinese-English sentence-pairs to
evaluate commercial APIs and IBot, e.g.,
 我一个老总都亲自跑了好几趟了
 I’m a director and yet I’ve made so many trips
 Extracted another 2.1 million sentence-pairs to train
IBot’s model
23
 Applying Attentional Recurrent Neural Networks
(RNN)
 Neural Machine Translation by Jointly Learning to
Align and Translate [Bahdanau, Cho & Bengio, ICLR 2015]
 658 citations (Google scholar)
 Train RNN with 2.1 million c
Chinese-English sentence
pairs
24
 Evaluated on 10,000 Chinese-English sentence pairs
System BLEU (%) METEOR (%)
Google API 12 20
Microsoft API 12 20
IBM Watson API 3 12
RIT (Aug 24) 10 15
RIT (Sep 7) 14 19
RIT (Sep 21) 22 24
RIT (Nov 28) 36 30
楽天における機械学習アルゴリズムの活用

Más contenido relacionado

La actualidad más candente

Jubatusのリアルタイム分散レコメンデーション@TokyoNLP#9
Jubatusのリアルタイム分散レコメンデーション@TokyoNLP#9Jubatusのリアルタイム分散レコメンデーション@TokyoNLP#9
Jubatusのリアルタイム分散レコメンデーション@TokyoNLP#9
Yuya Unno
 

La actualidad más candente (20)

Data-centricなML開発
Data-centricなML開発Data-centricなML開発
Data-centricなML開発
 
データサイエンス概論第一=2-1 データ間の距離と類似度
データサイエンス概論第一=2-1 データ間の距離と類似度データサイエンス概論第一=2-1 データ間の距離と類似度
データサイエンス概論第一=2-1 データ間の距離と類似度
 
機械学習モデルの判断根拠の説明(Ver.2)
機械学習モデルの判断根拠の説明(Ver.2)機械学習モデルの判断根拠の説明(Ver.2)
機械学習モデルの判断根拠の説明(Ver.2)
 
【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ【DL輪読会】ViT + Self Supervised Learningまとめ
【DL輪読会】ViT + Self Supervised Learningまとめ
 
ブラックボックスからXAI (説明可能なAI) へ - LIME (Local Interpretable Model-agnostic Explanat...
ブラックボックスからXAI (説明可能なAI) へ - LIME (Local Interpretable Model-agnostic Explanat...ブラックボックスからXAI (説明可能なAI) へ - LIME (Local Interpretable Model-agnostic Explanat...
ブラックボックスからXAI (説明可能なAI) へ - LIME (Local Interpretable Model-agnostic Explanat...
 
[DL輪読会]Understanding Black-box Predictions via Influence Functions
[DL輪読会]Understanding Black-box Predictions via Influence Functions [DL輪読会]Understanding Black-box Predictions via Influence Functions
[DL輪読会]Understanding Black-box Predictions via Influence Functions
 
SSII2022 [OS3-02] Federated Learningの基礎と応用
SSII2022 [OS3-02] Federated Learningの基礎と応用SSII2022 [OS3-02] Federated Learningの基礎と応用
SSII2022 [OS3-02] Federated Learningの基礎と応用
 
CV分野におけるサーベイ方法
CV分野におけるサーベイ方法CV分野におけるサーベイ方法
CV分野におけるサーベイ方法
 
機械学習モデルのハイパパラメータ最適化
機械学習モデルのハイパパラメータ最適化機械学習モデルのハイパパラメータ最適化
機械学習モデルのハイパパラメータ最適化
 
数学で解き明かす深層学習の原理
数学で解き明かす深層学習の原理数学で解き明かす深層学習の原理
数学で解き明かす深層学習の原理
 
テーブル・テキスト・画像の反実仮想説明
テーブル・テキスト・画像の反実仮想説明テーブル・テキスト・画像の反実仮想説明
テーブル・テキスト・画像の反実仮想説明
 
バンディットアルゴリズム入門と実践
バンディットアルゴリズム入門と実践バンディットアルゴリズム入門と実践
バンディットアルゴリズム入門と実践
 
最適輸送の計算アルゴリズムの研究動向
最適輸送の計算アルゴリズムの研究動向最適輸送の計算アルゴリズムの研究動向
最適輸送の計算アルゴリズムの研究動向
 
因果推論の奥へ: "What works" meets "why it works"
因果推論の奥へ: "What works" meets "why it works"因果推論の奥へ: "What works" meets "why it works"
因果推論の奥へ: "What works" meets "why it works"
 
ゼロから始める転移学習
ゼロから始める転移学習ゼロから始める転移学習
ゼロから始める転移学習
 
ベータ分布の謎に迫る
ベータ分布の謎に迫るベータ分布の謎に迫る
ベータ分布の謎に迫る
 
Data-Centric AIの紹介
Data-Centric AIの紹介Data-Centric AIの紹介
Data-Centric AIの紹介
 
“機械学習の説明”の信頼性
“機械学習の説明”の信頼性“機械学習の説明”の信頼性
“機械学習の説明”の信頼性
 
Jubatusのリアルタイム分散レコメンデーション@TokyoNLP#9
Jubatusのリアルタイム分散レコメンデーション@TokyoNLP#9Jubatusのリアルタイム分散レコメンデーション@TokyoNLP#9
Jubatusのリアルタイム分散レコメンデーション@TokyoNLP#9
 
『バックドア基準の入門』@統数研研究集会
『バックドア基準の入門』@統数研研究集会『バックドア基準の入門』@統数研研究集会
『バックドア基準の入門』@統数研研究集会
 

Similar a 楽天における機械学習アルゴリズムの活用

Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
javed75
 
Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...
Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...
Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...
Denodo
 

Similar a 楽天における機械学習アルゴリズムの活用 (20)

Actual cases of applying AI related technologiesin Rakuten
Actual cases of applying AI related technologiesin RakutenActual cases of applying AI related technologiesin Rakuten
Actual cases of applying AI related technologiesin Rakuten
 
IW14 Session: Mike Gualtieri, Forrester Research
IW14 Session: Mike Gualtieri, Forrester ResearchIW14 Session: Mike Gualtieri, Forrester Research
IW14 Session: Mike Gualtieri, Forrester Research
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
Machine Learning For Stock Broking
Machine Learning For Stock BrokingMachine Learning For Stock Broking
Machine Learning For Stock Broking
 
Adaptive Apps: Reimagining the Future - Forrester
Adaptive Apps: Reimagining the Future  - ForresterAdaptive Apps: Reimagining the Future  - Forrester
Adaptive Apps: Reimagining the Future - Forrester
 
The path to be a data scientist
The path to be a data scientistThe path to be a data scientist
The path to be a data scientist
 
Ijmet 10 01_035
Ijmet 10 01_035Ijmet 10 01_035
Ijmet 10 01_035
 
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix WebinarFuture-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
Future-Proof Your Streaming Analytics Architecture- StreamAnalytix Webinar
 
Why analytics projects fail
Why analytics projects failWhy analytics projects fail
Why analytics projects fail
 
WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?
WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?
WHY DO SO MANY ANALYTICS PROJECTS STILL FAIL?
 
WeSpline Investment Deck
WeSpline Investment DeckWeSpline Investment Deck
WeSpline Investment Deck
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
The Strategic Vision: Visualizing Data From Multiple Sources
The Strategic Vision: Visualizing Data From Multiple SourcesThe Strategic Vision: Visualizing Data From Multiple Sources
The Strategic Vision: Visualizing Data From Multiple Sources
 
Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...
Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...
Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...
 
L2 DS Tools and Application.pptx
L2 DS Tools and Application.pptxL2 DS Tools and Application.pptx
L2 DS Tools and Application.pptx
 
Welcome to the world of Analytics
Welcome to the world of AnalyticsWelcome to the world of Analytics
Welcome to the world of Analytics
 
LinkedIn Tech Marketer's Event - Tech Research
LinkedIn Tech Marketer's Event - Tech ResearchLinkedIn Tech Marketer's Event - Tech Research
LinkedIn Tech Marketer's Event - Tech Research
 
Lincoln talent analysis
Lincoln talent analysisLincoln talent analysis
Lincoln talent analysis
 
IRJET- Movie Success Prediction using Popularity Factor from Social Media
IRJET- Movie Success Prediction using Popularity Factor from Social MediaIRJET- Movie Success Prediction using Popularity Factor from Social Media
IRJET- Movie Success Prediction using Popularity Factor from Social Media
 
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
 

Más de Rakuten Group, Inc.

Más de Rakuten Group, Inc. (20)

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり
 
What Makes Software Green?
What Makes Software Green?What Makes Software Green?
What Makes Software Green?
 
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組み
 
大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開
 
楽天における大規模データベースの運用
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用
 
楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー
 
楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割
 
Rakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdf
 
The Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfThe Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdf
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdf
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdf
 
How We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfHow We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdf
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
OWASPTop10_Introduction
OWASPTop10_IntroductionOWASPTop10_Introduction
OWASPTop10_Introduction
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technology
 
100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

楽天における機械学習アルゴリズムの活用

  • 1. 楽天における機械学習アルゴリズムの活用 Yu Hirate, Dr. Eng. Rakuten Institute of Technology Tokyo, Rakuten, Inc.
  • 2. 2 平手勇宇  Principal Scientist, Rakuten Institute of Technology Manager, Intelligence domain research group  Bio. • 2005-2008 CS div. graduate school of Science and Engineering, Waseda University. • 2006-2009 Research Associate, Media Network Center, Waseda University. • 2009- current Rakuten Institute of Technology. Working on projects for extracting knowledge from large scale of data by utilizing data mining, machine learning technologies.
  • 3. 3 Masaya Mori Global head • Established in 2006. • Launched R.I.T. NY in 2010. • Launched R.I.T. Paris in 2014. • Launched R.I.T. Singapore / Boston in 2015. Strategic R&D organization for Rakuten Group
  • 4. 4 +100 researchers in 5 locations
  • 5. 5 3 research groups for adapting with Internet growth RealityIntelligencePower • HCI • AR / VR • Image Processing • Distributed Computing • HPC • IoT • Machine Learning • Deep Learning • NLP • Data Mining
  • 6. 6 Optimizing A/B testing Item Classification User Segmentation AI Coupon Distribution Recommender System Economy Prediction / Demand Prediction Review Analysis Anomaly Detection / Fraud Detection Image Recognition
  • 7.
  • 9. 9 Why are we working on this problem? (Key Benefits) ‣ To organize our catalog in accordance with customer expectations ‣ To precisely search our catalog for products and its variants ‣ To measure and enforce merchant KPI's. What are we doing? (Key Tasks) ‣ Product Genre Classification ‣ Attribute Extraction from Product Information ‣ Merchant and Item Review Analysis How are we doing? (Key Technologies) ‣ Large-Scale Gradient Boosted Decision Trees ‣ Deep Learning (RNN's, CNN's, others) ‣ Computing Massive Number of NLP Features Product Catalog Businesses
  • 10. 10 Each product can be assigned a category and attributes. For instance: +Category Grocery & food Subcategory Wine Each (sub)category has a number of relevant attributes with a list of valid values Challenge: this structured information is not always present or correct Goal: automatically predict category and attributes from text and/or images https://item.rakuten.co.jp/kawahara/345812/
  • 11. 11 Classifier based on Deep Learning Algorithm (CNN) Prec@1 92% Prec@10 99% Classifier based on Deep Learning Algorithm(CNN) Prec@1 57% Prec@3 75% Extracting Words * Tested to Ichiba L3 category (1.5K categories) * Tested for PriceMinister Image Data Text Data • Item Title • Item Description Image Data
  • 12. 12 Hobby and Entertainment > Books and Magazine > Business Electronics > Audio > Earphone / Headphone Electronics > Smartphone > AC Adaptor / Battery
  • 13. 13
  • 14. 14
  • 15. 15 Detect prospective applicants from Ichiba purchasers by using their purchase trends and demographics Ichiba Active Users Prospective Applicants Extract a finance service
  • 16. 16 Ichiba Active UsersOverlap 7,413 Positive Samples 7,417 (Negative) Samples About 50% of contractors of the Fintech service were Ichiba Active Users.
  • 17. 17 抽出されたユーザ行動モデル:重要なファクタ 0 0.1 0.2 0.3 genre_41_100890_ / 花・ガーデン・DIY / DIY・工具 genre_72_111078_ / キッズ・ベビー・マタニティ / キッズ genre_50_110983_ / 靴 / メンズ靴 Age-05-[35-40] genre_93_101077_ / スポーツ・アウトドア / ゴルフ Area-01-Kanto Area-00-Others genre_113_101126_ / 車用品・バイク用品 / カー用品 Age-08-[50-*] Age-03-[25-30] Age-00-none gms Gender-00-none basket_max_price frequency basket_average_price average_unit_price Age-02-[20-25] Gender-02-female Gender-01-male Top 20 factors selected from 141 factors 市場ジャンル/車・バイク/車用品・バイク用品 市場ジャンル /スポーツ・アウトドア/ゴルフ用品 市場ジャンル/靴/メンズ靴 市場ジャンル/キッズ・ベビー・マタニティ/キッズ 市場ジャンル/ガーデン・DIY・工具/ DIY・工具 購買商品の平均単価 一回あたりの購買金額の平均値 購買頻度 一回あたりの購買金額の最大値 購買金額総計
  • 18. 18 Prospective Users Control Group • Randomly Selected • About 300,000 users • Score >= 0.8 • About 300,000 users Send ichiba mail magazine to two groups Ichiba Mail Magazine
  • 19. 19 Mail Deliver Open Mail Click Contents (Visit Service Page) Click Rate went up by +49.23% compared with control group +3.52% +49.23%
  • 20. 20
  • 21. 21 我们真的很有诚意了。 你说我一个老总都亲 自跑了好几趟了。 Machine translation is a Rakuten group company which provides video streaming service. Volunteers are editing subtitles and translated subtitles. https://www.viki.com/?locale=ja
  • 22. 22  Translate from Chinese to English sentences  Extracted 10,000 Chinese-English sentence-pairs to evaluate commercial APIs and IBot, e.g.,  我一个老总都亲自跑了好几趟了  I’m a director and yet I’ve made so many trips  Extracted another 2.1 million sentence-pairs to train IBot’s model
  • 23. 23  Applying Attentional Recurrent Neural Networks (RNN)  Neural Machine Translation by Jointly Learning to Align and Translate [Bahdanau, Cho & Bengio, ICLR 2015]  658 citations (Google scholar)  Train RNN with 2.1 million c Chinese-English sentence pairs
  • 24. 24  Evaluated on 10,000 Chinese-English sentence pairs System BLEU (%) METEOR (%) Google API 12 20 Microsoft API 12 20 IBM Watson API 3 12 RIT (Aug 24) 10 15 RIT (Sep 7) 14 19 RIT (Sep 21) 22 24 RIT (Nov 28) 36 30