SlideShare una empresa de Scribd logo
1 de 21
Duplicated Cooking Recipe Determination
Using Multimodal Information
March 19, 2020
●Nguyen The Tung1, Yuki Nakayama2
1Nara Institute of Science and Technology
2Rakuten Institute of Technology, Rakuten, Inc.
2
Duplicate Recipe Detection
nTask description:
ØGiven a new recipe, decide whether it is duplicated with a recipe in
the database or not.
nMethod:
ØStep 1: From the database 𝑅"# and the new recipe 𝑟, generate
candidates of duplication 𝑅% ⊂ 𝑅"# ,
• we use the work from previous intern [Oguni+ 2018]
ØStep 2: Decide whether a pair of 𝑟, 𝑟′ , 𝑟′ ∈ 𝑅% is duplicate or not.
• Our work in this paper – duplicated recipe determination
[Oguni+ 2018] Masaki Oguni, Lasguido Nio, Yu Hirate, and Yohei Seki. Method for Detecting Near-duplicate
Recipes Based on Nearest Neighbor Search for Features of Cooking Instructions and Food Images (in Japanese)
3
Related work and Proposal
n Deriving a Recipe Similarity Measure for Recommending Healthful
Meals [van Pinxteren +2011]
Ø Features: cooking instruction + ingredients
Ø Human makes decision based on features
n Clustering for Closely Similar Recipes to Extract Spam Recipes in
User-generated Recipe Sites [Hanai+ 2015]
Ø Features: ingredients
Ø Cluster recipes into groups
n Our proposal: Treat as classification
Ø detecting duplicate recipes based on Multi-Layer Perceptron
Ø The classifier uses similarity scores of cooking instruction (text), ingredients
(text), user ID, and the result photo (image)
4
Previous work’s pipeline (Step 1)
Ingredients Food ImageCooking Instruction
Extract Text Vector Extract Image Vector
Database
Nearest Neighbor Search (NGT)
Database
Nearest Neighbor Search (NGT)
Extract candidates of
original recipes
Extract ingredients of
original recipes
Duplicate candidates
Calculate ingredient
similarity
New Recipe
[Oguni+ 2018]NGT: Neighborhood Graph
and Tree [Iwasaki 2010]
5
How to extract text vector: SCDV [Mekala+ 2017]
n Sparse Composite Document Vectors (SCDV)
n Output vector (dimension=10,000) represents the cooking
instruction of each recipe.
n Vector dimension is too high à use Principal Component Analysis
(PCA) to reduce dimension to 2000.
6
How to extract image vector:
Inception-V3 [Szegedy+ 2016]
n Convolutional Neural Network model to recognize generic object
n We extract 2,048 dimensional vectors from images
n We use pre-trained inception-v3 model for ImageNet competition
input
output
2,048 dimensional vector
7
How to calculate ingredients similarity
𝐼𝑛𝑔_𝐴
∗
= {にんにく, ⽶, ねぎ, いくら, 砂糖}
𝐼𝑛𝑔_𝐵
∗
= {ニンニク, ライス, ネギ, いくら, 砂糖}
*A is a near-duplicate recipe candidate
B is a original recipe candidate
1. Find common ingredients between Ing_A and Ing_B (Intersection set)
𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛 = 𝐼𝑛𝑔_𝐴 ∩ 𝐼𝑛𝑔_𝐵 = {いくら, 砂糖}
2. Create new sets by removing common ingredients
𝐼𝑛𝑔_𝐴’ = {にんにく, ⽶, ねぎ}, 𝐼𝑛𝑔_𝐵’ = {ニンニク, ライス, ネギ}
3. Convert to Katakana each ingredient
𝐼𝑛𝑔_𝐴’_𝑘 = {ニンニク, コメ, ネギ}, 𝐼𝑛𝑔_𝐵’_𝑘 = {ニンニク, ライス, ネギ}
4. Find common ingredients between Ing_A’_k and Ing_B’_k, add to Intersection.
𝐼𝑛𝑔_𝐴’_𝑘 ∩ 𝐼𝑛𝑔_𝐵’_𝑘 = {ニンニク, ネギ}
5. Create new sets by removing common ingredients
𝐼𝑛𝑔_𝐴’’ = {⽶}, 𝐼𝑛𝑔_𝐵’’ = {ライス}
𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛 = {いくら, 砂糖, ニンニク, ネギ}
8
How to calculate ingredients similarity
6. Search similar ingredients of each ingredient of Ing_A’’ using word2vec model
trained by 1.16 million recipe data (training data).
If the system finds ingredients of Ing_B’’ in top 3 search result, we consider as
the same ingredient; add it to Intersection.
・Similar word of “⽶” 𝑆𝑖𝑚𝑖𝑙𝑎𝑟 𝑖𝑛𝑔𝑟𝑒𝑑𝑖𝑒𝑛𝑡𝑠 = {こめ, コメ, ライス, レンジ, 五穀⽶}
𝐼𝑛𝑔_𝐴’’ ∩ 𝐼𝑛𝑔_𝐵’’ = {⽶ (ライス)}
𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛 = {いくら, 砂糖, ニンニク, ネギ, ⽶}
7. Create new sets by removing common ingredients.
𝐼𝑛𝑔_𝐴’’’ = 𝜑 , 𝐼𝑛𝑔_𝐵’’’ = 𝜑 , 𝑖𝑛𝑔𝑟𝑒𝑑𝑖𝑒𝑛𝑡 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 = 𝐼𝑛𝑔_𝐴’’’ + 𝐼𝑛𝑔_𝐵’’’
8. Make Union set and calculate the Jaccard similarity.
𝑈𝑛𝑖𝑜𝑛 = 𝐼𝑛𝑔_𝐴’’’ ∪ 𝐼𝑛𝑔_𝐵’’’ ∪ 𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛
𝑖𝑛𝑔𝑟𝑒𝑑𝑖𝑒𝑛𝑡 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 =
𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛
𝑈𝑛𝑖𝑜𝑛
9
Detection Model using Multi-Layer Perceptron (MLP)
n MLP is a simple yet powerful classification model used in various
tasks.
n The task of duplication detection can be viewed as classification,
with two class: duplicate, non-duplicate.
Input features
duplicate
non-duplicate
10
Experiment: Dataset
n 1847 pairs of recipe extracted from database
Ø Annotate the pairs with labels of duplication in terms of instruction,
ingredients, and image.
Ø If either one of those labels is duplicate à the pair is regarded as duplicate.
n Training dataset
Ø 1547 (1255 duplicate/292 non-duplicate) recipe pairs
n Development dataset
Ø 100 (50 duplicate/50 non-duplicate) recipe pairs
n Test dataset
Ø 200 (100 duplicate/100 non-duplicate) recipe pairs
11
Features extraction
n Instruction vector: SCDV à using PCA to reduce dimension to 2000.
n Image vector: dimension 2048 (Inception v3) .
n The features above are used as input of MLP, we trained a model
that classifies input into duplicate or non-duplicate.
Features type Description
Instruction Euclidean distance between two vectors
Image Euclidean distance between two vectors
Ingredient 1.0 − 𝑖𝑛𝑔𝑟𝑒𝑑𝑖𝑒𝑛𝑡 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 (p.8)
User
Identity function: 𝑠𝑐𝑜𝑟𝑒 = N
0, 𝑢𝑖𝑑1 = 𝑢𝑖𝑑2
1, 𝑢𝑖𝑑1 ≠ 𝑢𝑖𝑑2
12
Results: Our method vs. Previous method
n Our method
n Previous method
[Oguni+ 2018 ]
Ø pick top k pairs with highest
similarity score and assign
them as “duplicate”, the
remaining are “non-duplicate”
Ø With all values of k, the
proposal outperforms the
previous method
Positive label Negative label
Recall Precision F1 Recall Precision F1
0.99 0.92 0.95 0.91 0.99 0.95
Duplicate Non-
duplicate
Duplicate 99 9
Non-
duplicate
1 91
Positive label Negative label
k Recall@k Precision@k F1@k Recall@k Precision@k F1@k
20 0.15 0.75 0.25 0.53 0.95 0.68
40 0.34 0.85 0.49 0.59 0.94 0.72
60 0.47 0.78 0.59 0.62 0.87 0.73
80 0.54 0.68 0.60 0.60 0.74 0.67
100 0.64 0.64 0.64 0.64 0.64 0.64
120 0.67 0.56 0.61 0.59 0.47 0.52
140 0.73 0.52 0.61 0.55 0.33 0.41
160 0.82 0.51 0.63 0.55 0.22 0.31
Ground
truth
Prediction
13
Results: Effectiveness of features
n Both User and Image information help improve performance
compared to Instruction + Ingredient set.
n Jaccard coefficient is better at representing ingredient similarity
than ingredient difference.
n We achieved highest result when using all the features.
Features
Positive label Negative label
Recall Precision F1 Recall Precision F1
Instruction + Ingredient
(ingredient difference at p.8)
0.89 0.60 0.71 0.40 0.78 0.53
Instruction + Ingredient 0.85 0.66 0.74 0.56 0.79 0.66
Instruction + Ingredient + User 0.94 0.90 0.92 0.90 0.94 0.92
Instruction + Ingredient + Image 0.97 0.91 0.94 0.90 0.97 0.93
Instruction + Ingredient + User + Image 0.99 0.92 0.95 0.91 0.99 0.95
14
Correct prediction examples
Ingredients:酒粕、砂糖、珈琲焼酎, ⽜乳.
Sake lees, sugar, shochu, milk.
Instruction: 鍋にすべての材料と⽔100g(分量外)を⼊れ、
あたためながら10分ほどとかしまぜてできあがり。
Put all ingredients and 100g of water (outside the amount)
into the pan and mix for about 10 minutes with warming.
Ingredients:酒粕, タイム, 砂糖.
Sake lees, thyme, sugar.
Instruction: 鍋にすべての材料と⽔200g(分量外)を⼊れ、
あたためながら10分ほどとかしまぜてできあがり.
Put all ingredients and 200g of water (outside the
amount) into the pan and stir for about 10 minutes.
Prediction: Duplicate
Ground truth: Duplicate
15
Correct prediction examples
Ingredients: ブロッコリー, ☆マヨネーズ, ☆プレーンヨーグルト, ☆塩コショウ
Broccoli, ☆ mayonnaise, ☆ Plain yogurt, ☆ Salt pepper)
Instruction:ブロッコリーは洗い、(⽔分は拭き取らずに)軸の太い部分は⼗字
に切り込みを⼊れラップでふんわり包みます. レンジに約3分かけて取り出し少
し冷めてから切り分けます(冷凍保存も可)☆を混ぜ低カロリーマヨネーズを作
り、付けていただきます。
Ingredients:ブロッコリー, 塩, *だし汁, *醤油, *⾟⼦.
Broccoli, salt, * dashi soup, *soy sauce, * pepper.
Instruction:ブロッコリーは⼩房に切って、塩を加えた熱湯で茹でてザルにあげ
る。*は合わせておく。1のブロッコリーと*を混ぜ合わせ、器に盛る。完成︕.
Put all ingredients and 200g of water (outside the amount) into the pan and stir
for about 10 minutes.
Prediction: Non-duplicate
Ground truth: Non-duplicate
Wash broccoli (without wiping away moisture) and cut the thick part of the shaft
into a cross and wrap it softly with a wrap. Take it out to the range for about 3
minutes, cool it a little, and cut it out (Frozen storage is also possible) ☆ Mix and
make low calorie mayonnaise.
16
Wrong prediction example
Ingredients:コーラス, バナナ⼩, プレーンヨーグルト, ⽔⽺羹,
きな粉. (Chorus, banana, plain yogurt, water sheep, kinako.)
Instruction:バナナは⼩さくちぎり、上記材料と⼀緒に全てミ
キサーにかけ、ジュース状になったら出来上がりです.
Bananas are chopped into small pieces and put into a mixer
with the above ingredients.
Ingredients:バナナ, プレーンヨーグルト, みかん, オリゴ糖,
りんごジュース. Banana, plain yogurt, mandarin orange,
oligosaccharidea, apple juice.
Instruction:バナナは⼩さくちぎり、上記材料と⼀緒に全てミ
キサーにかけ、ジュース状になったら出来上がりです.
Bananas are chopped into small pieces and put into a mixer
with the above ingredients
Prediction: Duplicate
Ground truth: Non-duplicate
17
Summary
n Conclusion
Ø Implement duplicate recipe detection system based on MLP.
Ø Our proposal outperforms the previous work significantly, reaching 95%
accuracy.
Ø Image and user information contributes to the task of predicting duplicate
recipe pairs.
n Future Work
Ø Investigating other kind of features.
Ø Expand our dataset.
19
Architecture
with Threshold
Ingredients Food Image
Cooking
Instruction
Extract
Text Vector
Extract
Image Vector
Database
Nearest Neighbor Search
(NGT)
Database
Nearest Neighbor Search
(NGT)
Text similarity >
threshold
Yes
Extract candidates
of original recipes
Extract ingredients
of original recipes
Image similarity
> threshold
We set the threshold to β
β = 0.94
We set the threshold to α
α = 0.9
System judges posted recipe as near-duplicate recipe
Calculate ingredient
similarity
Ingredient
similarity >
threshold
Posted Recipe
(Near-duplicate recipe candidate)
Appendix
20
Experiment on Tsukuba dataset
n The paper is not yet published à the results are not confirmed.
n Tsukuba dataset contains lots of conflicting samples (example below).
Appendix
Method Features Recall Precision F1
Tsukuba team
(Random Forest)
Instruction (n-gram mover distance)
+ ingredients (ingredient difference)
0.9 0.77 0.83
Our method Instruction + Ingredient 0.64 0.31 0.42
Ingredients: Clam, water, miso.
Instruction: Remove the salt in the pan, add
well-washed clams and water, and set it on
medium heat. When the clam opens, it will come
out and throw it away. When you put out the fire
and melt the miso, it ’s done
https://recipe.rakuten.co.jp/recipe/1190013605
Ingredients: Clam, water, miso.
Instruction: The clams are sanded out, rub the
shells and wash well. Put the clam and water in
the pan and bring it to a boil. When the clam
shell opens, the scoop will come out. Stop the
fire, melt the miso and let it stand for a while.
https://recipe.rakuten.co.jp/recipe/1280001778
Duplicate
Ingredients: Mackerel, salt.
Instruction: Shake the mackerel and let it sit
for 10 minutes
Wipe off moisture
Bake for 10 minutes on the grill
https://recipe.rakuten.co.jp/recipe/1090029
263
Ingredients: Mackerel, salt.
Instruction: Lower 2 persimmons and finish 2
Sprinkle salt in the bowl and leave in the
refrigerator for 30 minutes
Wipe the water from the mackerel and bake for
about 9 minutes on the grilled fish.
https://recipe.rakuten.co.jp/recipe/1240026533
Non-duplicate
21
Using raw subtraction vector as input features
n Instruction (raw): Absolute of subtraction between text vectors of
recipes in the recipe pair.
n Image (raw): Same as above, but use image vector.
n The resultant vectors were used as input features.
n We observe no significant difference compared to the features set
using Euclidean distance. On the other hand, computation time and
storage requirement increased by a large amount.
Appendix
Features Positive label Negative label
Accuracy Recall Precisio
n
F1 Recall Precisio
n
F1
Instruction + Ingredient
+ User + Image
95% 0.99 0.92 0.95 0.91 0.99 0.95
Instruction (raw) +
Ingredient + User
+ Image (raw)
95% 1 0.91 0.95 0.9 1 0.95

Más contenido relacionado

Similar a Duplicated Cooking Recipe Determination Using Multimodal Information

Analysis And Detection of Infected Fruit Part Using Improved k-means Clusteri...
Analysis And Detection of Infected Fruit Part Using Improved k-means Clusteri...Analysis And Detection of Infected Fruit Part Using Improved k-means Clusteri...
Analysis And Detection of Infected Fruit Part Using Improved k-means Clusteri...IJSRD
 
Machine Learning Approach.pptx
Machine Learning Approach.pptxMachine Learning Approach.pptx
Machine Learning Approach.pptxCYPatrickKwee
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Greg Landrum
 
Advance mathematics mid term presentation rev01
Advance mathematics mid term presentation rev01Advance mathematics mid term presentation rev01
Advance mathematics mid term presentation rev01Nirmal Joshi
 
Bootstrapping Entity Alignment with Knowledge Graph Embedding
Bootstrapping Entity Alignment with Knowledge Graph EmbeddingBootstrapping Entity Alignment with Knowledge Graph Embedding
Bootstrapping Entity Alignment with Knowledge Graph EmbeddingNanjing University
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_TrushitaTrushita Redij
 
Bioactivity Predictive ModelingMay2016
Bioactivity Predictive ModelingMay2016Bioactivity Predictive ModelingMay2016
Bioactivity Predictive ModelingMay2016Matthew Clark
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것NAVER Engineering
 
Facial Emotion Recognition: A Deep Learning approach
Facial Emotion Recognition: A Deep Learning approachFacial Emotion Recognition: A Deep Learning approach
Facial Emotion Recognition: A Deep Learning approachAshwinRachha
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationHariniMS1
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptxPrabhuSelvaraj15
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural NetworkDessy Amirudin
 
NITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxNITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxDrKBManwade
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxssuserd23711
 
Efficient Similarity Computation for Collaborative Filtering in Dynamic Envir...
Efficient Similarity Computation for Collaborative Filtering in Dynamic Envir...Efficient Similarity Computation for Collaborative Filtering in Dynamic Envir...
Efficient Similarity Computation for Collaborative Filtering in Dynamic Envir...Olivier Jeunen
 

Similar a Duplicated Cooking Recipe Determination Using Multimodal Information (20)

Analysis And Detection of Infected Fruit Part Using Improved k-means Clusteri...
Analysis And Detection of Infected Fruit Part Using Improved k-means Clusteri...Analysis And Detection of Infected Fruit Part Using Improved k-means Clusteri...
Analysis And Detection of Infected Fruit Part Using Improved k-means Clusteri...
 
Machine Learning Approach.pptx
Machine Learning Approach.pptxMachine Learning Approach.pptx
Machine Learning Approach.pptx
 
Py data19 final
Py data19   finalPy data19   final
Py data19 final
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
그림 그리는 AI
그림 그리는 AI그림 그리는 AI
그림 그리는 AI
 
Advance mathematics mid term presentation rev01
Advance mathematics mid term presentation rev01Advance mathematics mid term presentation rev01
Advance mathematics mid term presentation rev01
 
Bootstrapping Entity Alignment with Knowledge Graph Embedding
Bootstrapping Entity Alignment with Knowledge Graph EmbeddingBootstrapping Entity Alignment with Knowledge Graph Embedding
Bootstrapping Entity Alignment with Knowledge Graph Embedding
 
Machine_Learning_Trushita
Machine_Learning_TrushitaMachine_Learning_Trushita
Machine_Learning_Trushita
 
Bioactivity Predictive ModelingMay2016
Bioactivity Predictive ModelingMay2016Bioactivity Predictive ModelingMay2016
Bioactivity Predictive ModelingMay2016
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
Facial Emotion Recognition: A Deep Learning approach
Facial Emotion Recognition: A Deep Learning approachFacial Emotion Recognition: A Deep Learning approach
Facial Emotion Recognition: A Deep Learning approach
 
evaluation and credibility-Part 2
evaluation and credibility-Part 2evaluation and credibility-Part 2
evaluation and credibility-Part 2
 
Practical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and PresentationPractical Data Science: Data Modelling and Presentation
Practical Data Science: Data Modelling and Presentation
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
NITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxNITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptx
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptx
 
Efficient Similarity Computation for Collaborative Filtering in Dynamic Envir...
Efficient Similarity Computation for Collaborative Filtering in Dynamic Envir...Efficient Similarity Computation for Collaborative Filtering in Dynamic Envir...
Efficient Similarity Computation for Collaborative Filtering in Dynamic Envir...
 

Más de Rakuten Group, Inc.

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話Rakuten Group, Inc.
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のりRakuten Group, Inc.
 
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Rakuten Group, Inc.
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みRakuten Group, Inc.
 
大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開Rakuten Group, Inc.
 
楽天における大規模データベースの運用
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用Rakuten Group, Inc.
 
楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャーRakuten Group, Inc.
 
楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割Rakuten Group, Inc.
 
Rakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Group, Inc.
 
The Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfThe Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfRakuten Group, Inc.
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfRakuten Group, Inc.
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfRakuten Group, Inc.
 
How We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfHow We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfRakuten Group, Inc.
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoRakuten Group, Inc.
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoRakuten Group, Inc.
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technologyRakuten Group, Inc.
 
100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情Rakuten Group, Inc.
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャーRakuten Group, Inc.
 

Más de Rakuten Group, Inc. (20)

コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
 
楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり楽天における安全な秘匿情報管理への道のり
楽天における安全な秘匿情報管理への道のり
 
What Makes Software Green?
What Makes Software Green?What Makes Software Green?
What Makes Software Green?
 
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
 
DataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組みDataSkillCultureを浸透させる楽天の取り組み
DataSkillCultureを浸透させる楽天の取り組み
 
大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開大規模なリアルタイム監視の導入と展開
大規模なリアルタイム監視の導入と展開
 
楽天における大規模データベースの運用
楽天における大規模データベースの運用楽天における大規模データベースの運用
楽天における大規模データベースの運用
 
楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー楽天サービスを支えるネットワークインフラストラクチャー
楽天サービスを支えるネットワークインフラストラクチャー
 
楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割楽天の規模とクラウドプラットフォーム統括部の役割
楽天の規模とクラウドプラットフォーム統括部の役割
 
Rakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdfRakuten Services and Infrastructure Team.pdf
Rakuten Services and Infrastructure Team.pdf
 
The Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdfThe Data Platform Administration Handling the 100 PB.pdf
The Data Platform Administration Handling the 100 PB.pdf
 
Supporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdfSupporting Internal Customers as Technical Account Managers.pdf
Supporting Internal Customers as Technical Account Managers.pdf
 
Making Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdfMaking Cloud Native CI_CD Services.pdf
Making Cloud Native CI_CD Services.pdf
 
How We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdfHow We Defined Our Own Cloud.pdf
How We Defined Our Own Cloud.pdf
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
Travel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech infoTravel & Leisure Platform Department's tech info
Travel & Leisure Platform Department's tech info
 
OWASPTop10_Introduction
OWASPTop10_IntroductionOWASPTop10_Introduction
OWASPTop10_Introduction
 
Introduction of GORA API Group technology
Introduction of GORA API Group technologyIntroduction of GORA API Group technology
Introduction of GORA API Group technology
 
100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情100PBを越えるデータプラットフォームの実情
100PBを越えるデータプラットフォームの実情
 
社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー社内エンジニアを支えるテクニカルアカウントマネージャー
社内エンジニアを支えるテクニカルアカウントマネージャー
 

Último

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

Duplicated Cooking Recipe Determination Using Multimodal Information

  • 1. Duplicated Cooking Recipe Determination Using Multimodal Information March 19, 2020 ●Nguyen The Tung1, Yuki Nakayama2 1Nara Institute of Science and Technology 2Rakuten Institute of Technology, Rakuten, Inc.
  • 2. 2 Duplicate Recipe Detection nTask description: ØGiven a new recipe, decide whether it is duplicated with a recipe in the database or not. nMethod: ØStep 1: From the database 𝑅"# and the new recipe 𝑟, generate candidates of duplication 𝑅% ⊂ 𝑅"# , • we use the work from previous intern [Oguni+ 2018] ØStep 2: Decide whether a pair of 𝑟, 𝑟′ , 𝑟′ ∈ 𝑅% is duplicate or not. • Our work in this paper – duplicated recipe determination [Oguni+ 2018] Masaki Oguni, Lasguido Nio, Yu Hirate, and Yohei Seki. Method for Detecting Near-duplicate Recipes Based on Nearest Neighbor Search for Features of Cooking Instructions and Food Images (in Japanese)
  • 3. 3 Related work and Proposal n Deriving a Recipe Similarity Measure for Recommending Healthful Meals [van Pinxteren +2011] Ø Features: cooking instruction + ingredients Ø Human makes decision based on features n Clustering for Closely Similar Recipes to Extract Spam Recipes in User-generated Recipe Sites [Hanai+ 2015] Ø Features: ingredients Ø Cluster recipes into groups n Our proposal: Treat as classification Ø detecting duplicate recipes based on Multi-Layer Perceptron Ø The classifier uses similarity scores of cooking instruction (text), ingredients (text), user ID, and the result photo (image)
  • 4. 4 Previous work’s pipeline (Step 1) Ingredients Food ImageCooking Instruction Extract Text Vector Extract Image Vector Database Nearest Neighbor Search (NGT) Database Nearest Neighbor Search (NGT) Extract candidates of original recipes Extract ingredients of original recipes Duplicate candidates Calculate ingredient similarity New Recipe [Oguni+ 2018]NGT: Neighborhood Graph and Tree [Iwasaki 2010]
  • 5. 5 How to extract text vector: SCDV [Mekala+ 2017] n Sparse Composite Document Vectors (SCDV) n Output vector (dimension=10,000) represents the cooking instruction of each recipe. n Vector dimension is too high à use Principal Component Analysis (PCA) to reduce dimension to 2000.
  • 6. 6 How to extract image vector: Inception-V3 [Szegedy+ 2016] n Convolutional Neural Network model to recognize generic object n We extract 2,048 dimensional vectors from images n We use pre-trained inception-v3 model for ImageNet competition input output 2,048 dimensional vector
  • 7. 7 How to calculate ingredients similarity 𝐼𝑛𝑔_𝐴 ∗ = {にんにく, ⽶, ねぎ, いくら, 砂糖} 𝐼𝑛𝑔_𝐵 ∗ = {ニンニク, ライス, ネギ, いくら, 砂糖} *A is a near-duplicate recipe candidate B is a original recipe candidate 1. Find common ingredients between Ing_A and Ing_B (Intersection set) 𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛 = 𝐼𝑛𝑔_𝐴 ∩ 𝐼𝑛𝑔_𝐵 = {いくら, 砂糖} 2. Create new sets by removing common ingredients 𝐼𝑛𝑔_𝐴’ = {にんにく, ⽶, ねぎ}, 𝐼𝑛𝑔_𝐵’ = {ニンニク, ライス, ネギ} 3. Convert to Katakana each ingredient 𝐼𝑛𝑔_𝐴’_𝑘 = {ニンニク, コメ, ネギ}, 𝐼𝑛𝑔_𝐵’_𝑘 = {ニンニク, ライス, ネギ} 4. Find common ingredients between Ing_A’_k and Ing_B’_k, add to Intersection. 𝐼𝑛𝑔_𝐴’_𝑘 ∩ 𝐼𝑛𝑔_𝐵’_𝑘 = {ニンニク, ネギ} 5. Create new sets by removing common ingredients 𝐼𝑛𝑔_𝐴’’ = {⽶}, 𝐼𝑛𝑔_𝐵’’ = {ライス} 𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛 = {いくら, 砂糖, ニンニク, ネギ}
  • 8. 8 How to calculate ingredients similarity 6. Search similar ingredients of each ingredient of Ing_A’’ using word2vec model trained by 1.16 million recipe data (training data). If the system finds ingredients of Ing_B’’ in top 3 search result, we consider as the same ingredient; add it to Intersection. ・Similar word of “⽶” 𝑆𝑖𝑚𝑖𝑙𝑎𝑟 𝑖𝑛𝑔𝑟𝑒𝑑𝑖𝑒𝑛𝑡𝑠 = {こめ, コメ, ライス, レンジ, 五穀⽶} 𝐼𝑛𝑔_𝐴’’ ∩ 𝐼𝑛𝑔_𝐵’’ = {⽶ (ライス)} 𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛 = {いくら, 砂糖, ニンニク, ネギ, ⽶} 7. Create new sets by removing common ingredients. 𝐼𝑛𝑔_𝐴’’’ = 𝜑 , 𝐼𝑛𝑔_𝐵’’’ = 𝜑 , 𝑖𝑛𝑔𝑟𝑒𝑑𝑖𝑒𝑛𝑡 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 = 𝐼𝑛𝑔_𝐴’’’ + 𝐼𝑛𝑔_𝐵’’’ 8. Make Union set and calculate the Jaccard similarity. 𝑈𝑛𝑖𝑜𝑛 = 𝐼𝑛𝑔_𝐴’’’ ∪ 𝐼𝑛𝑔_𝐵’’’ ∪ 𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛 𝑖𝑛𝑔𝑟𝑒𝑑𝑖𝑒𝑛𝑡 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 = 𝐼𝑛𝑡𝑒𝑟𝑠𝑒𝑐𝑡𝑖𝑜𝑛 𝑈𝑛𝑖𝑜𝑛
  • 9. 9 Detection Model using Multi-Layer Perceptron (MLP) n MLP is a simple yet powerful classification model used in various tasks. n The task of duplication detection can be viewed as classification, with two class: duplicate, non-duplicate. Input features duplicate non-duplicate
  • 10. 10 Experiment: Dataset n 1847 pairs of recipe extracted from database Ø Annotate the pairs with labels of duplication in terms of instruction, ingredients, and image. Ø If either one of those labels is duplicate à the pair is regarded as duplicate. n Training dataset Ø 1547 (1255 duplicate/292 non-duplicate) recipe pairs n Development dataset Ø 100 (50 duplicate/50 non-duplicate) recipe pairs n Test dataset Ø 200 (100 duplicate/100 non-duplicate) recipe pairs
  • 11. 11 Features extraction n Instruction vector: SCDV à using PCA to reduce dimension to 2000. n Image vector: dimension 2048 (Inception v3) . n The features above are used as input of MLP, we trained a model that classifies input into duplicate or non-duplicate. Features type Description Instruction Euclidean distance between two vectors Image Euclidean distance between two vectors Ingredient 1.0 − 𝑖𝑛𝑔𝑟𝑒𝑑𝑖𝑒𝑛𝑡 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 (p.8) User Identity function: 𝑠𝑐𝑜𝑟𝑒 = N 0, 𝑢𝑖𝑑1 = 𝑢𝑖𝑑2 1, 𝑢𝑖𝑑1 ≠ 𝑢𝑖𝑑2
  • 12. 12 Results: Our method vs. Previous method n Our method n Previous method [Oguni+ 2018 ] Ø pick top k pairs with highest similarity score and assign them as “duplicate”, the remaining are “non-duplicate” Ø With all values of k, the proposal outperforms the previous method Positive label Negative label Recall Precision F1 Recall Precision F1 0.99 0.92 0.95 0.91 0.99 0.95 Duplicate Non- duplicate Duplicate 99 9 Non- duplicate 1 91 Positive label Negative label k Recall@k Precision@k F1@k Recall@k Precision@k F1@k 20 0.15 0.75 0.25 0.53 0.95 0.68 40 0.34 0.85 0.49 0.59 0.94 0.72 60 0.47 0.78 0.59 0.62 0.87 0.73 80 0.54 0.68 0.60 0.60 0.74 0.67 100 0.64 0.64 0.64 0.64 0.64 0.64 120 0.67 0.56 0.61 0.59 0.47 0.52 140 0.73 0.52 0.61 0.55 0.33 0.41 160 0.82 0.51 0.63 0.55 0.22 0.31 Ground truth Prediction
  • 13. 13 Results: Effectiveness of features n Both User and Image information help improve performance compared to Instruction + Ingredient set. n Jaccard coefficient is better at representing ingredient similarity than ingredient difference. n We achieved highest result when using all the features. Features Positive label Negative label Recall Precision F1 Recall Precision F1 Instruction + Ingredient (ingredient difference at p.8) 0.89 0.60 0.71 0.40 0.78 0.53 Instruction + Ingredient 0.85 0.66 0.74 0.56 0.79 0.66 Instruction + Ingredient + User 0.94 0.90 0.92 0.90 0.94 0.92 Instruction + Ingredient + Image 0.97 0.91 0.94 0.90 0.97 0.93 Instruction + Ingredient + User + Image 0.99 0.92 0.95 0.91 0.99 0.95
  • 14. 14 Correct prediction examples Ingredients:酒粕、砂糖、珈琲焼酎, ⽜乳. Sake lees, sugar, shochu, milk. Instruction: 鍋にすべての材料と⽔100g(分量外)を⼊れ、 あたためながら10分ほどとかしまぜてできあがり。 Put all ingredients and 100g of water (outside the amount) into the pan and mix for about 10 minutes with warming. Ingredients:酒粕, タイム, 砂糖. Sake lees, thyme, sugar. Instruction: 鍋にすべての材料と⽔200g(分量外)を⼊れ、 あたためながら10分ほどとかしまぜてできあがり. Put all ingredients and 200g of water (outside the amount) into the pan and stir for about 10 minutes. Prediction: Duplicate Ground truth: Duplicate
  • 15. 15 Correct prediction examples Ingredients: ブロッコリー, ☆マヨネーズ, ☆プレーンヨーグルト, ☆塩コショウ Broccoli, ☆ mayonnaise, ☆ Plain yogurt, ☆ Salt pepper) Instruction:ブロッコリーは洗い、(⽔分は拭き取らずに)軸の太い部分は⼗字 に切り込みを⼊れラップでふんわり包みます. レンジに約3分かけて取り出し少 し冷めてから切り分けます(冷凍保存も可)☆を混ぜ低カロリーマヨネーズを作 り、付けていただきます。 Ingredients:ブロッコリー, 塩, *だし汁, *醤油, *⾟⼦. Broccoli, salt, * dashi soup, *soy sauce, * pepper. Instruction:ブロッコリーは⼩房に切って、塩を加えた熱湯で茹でてザルにあげ る。*は合わせておく。1のブロッコリーと*を混ぜ合わせ、器に盛る。完成︕. Put all ingredients and 200g of water (outside the amount) into the pan and stir for about 10 minutes. Prediction: Non-duplicate Ground truth: Non-duplicate Wash broccoli (without wiping away moisture) and cut the thick part of the shaft into a cross and wrap it softly with a wrap. Take it out to the range for about 3 minutes, cool it a little, and cut it out (Frozen storage is also possible) ☆ Mix and make low calorie mayonnaise.
  • 16. 16 Wrong prediction example Ingredients:コーラス, バナナ⼩, プレーンヨーグルト, ⽔⽺羹, きな粉. (Chorus, banana, plain yogurt, water sheep, kinako.) Instruction:バナナは⼩さくちぎり、上記材料と⼀緒に全てミ キサーにかけ、ジュース状になったら出来上がりです. Bananas are chopped into small pieces and put into a mixer with the above ingredients. Ingredients:バナナ, プレーンヨーグルト, みかん, オリゴ糖, りんごジュース. Banana, plain yogurt, mandarin orange, oligosaccharidea, apple juice. Instruction:バナナは⼩さくちぎり、上記材料と⼀緒に全てミ キサーにかけ、ジュース状になったら出来上がりです. Bananas are chopped into small pieces and put into a mixer with the above ingredients Prediction: Duplicate Ground truth: Non-duplicate
  • 17. 17 Summary n Conclusion Ø Implement duplicate recipe detection system based on MLP. Ø Our proposal outperforms the previous work significantly, reaching 95% accuracy. Ø Image and user information contributes to the task of predicting duplicate recipe pairs. n Future Work Ø Investigating other kind of features. Ø Expand our dataset.
  • 18.
  • 19. 19 Architecture with Threshold Ingredients Food Image Cooking Instruction Extract Text Vector Extract Image Vector Database Nearest Neighbor Search (NGT) Database Nearest Neighbor Search (NGT) Text similarity > threshold Yes Extract candidates of original recipes Extract ingredients of original recipes Image similarity > threshold We set the threshold to β β = 0.94 We set the threshold to α α = 0.9 System judges posted recipe as near-duplicate recipe Calculate ingredient similarity Ingredient similarity > threshold Posted Recipe (Near-duplicate recipe candidate) Appendix
  • 20. 20 Experiment on Tsukuba dataset n The paper is not yet published à the results are not confirmed. n Tsukuba dataset contains lots of conflicting samples (example below). Appendix Method Features Recall Precision F1 Tsukuba team (Random Forest) Instruction (n-gram mover distance) + ingredients (ingredient difference) 0.9 0.77 0.83 Our method Instruction + Ingredient 0.64 0.31 0.42 Ingredients: Clam, water, miso. Instruction: Remove the salt in the pan, add well-washed clams and water, and set it on medium heat. When the clam opens, it will come out and throw it away. When you put out the fire and melt the miso, it ’s done https://recipe.rakuten.co.jp/recipe/1190013605 Ingredients: Clam, water, miso. Instruction: The clams are sanded out, rub the shells and wash well. Put the clam and water in the pan and bring it to a boil. When the clam shell opens, the scoop will come out. Stop the fire, melt the miso and let it stand for a while. https://recipe.rakuten.co.jp/recipe/1280001778 Duplicate Ingredients: Mackerel, salt. Instruction: Shake the mackerel and let it sit for 10 minutes Wipe off moisture Bake for 10 minutes on the grill https://recipe.rakuten.co.jp/recipe/1090029 263 Ingredients: Mackerel, salt. Instruction: Lower 2 persimmons and finish 2 Sprinkle salt in the bowl and leave in the refrigerator for 30 minutes Wipe the water from the mackerel and bake for about 9 minutes on the grilled fish. https://recipe.rakuten.co.jp/recipe/1240026533 Non-duplicate
  • 21. 21 Using raw subtraction vector as input features n Instruction (raw): Absolute of subtraction between text vectors of recipes in the recipe pair. n Image (raw): Same as above, but use image vector. n The resultant vectors were used as input features. n We observe no significant difference compared to the features set using Euclidean distance. On the other hand, computation time and storage requirement increased by a large amount. Appendix Features Positive label Negative label Accuracy Recall Precisio n F1 Recall Precisio n F1 Instruction + Ingredient + User + Image 95% 0.99 0.92 0.95 0.91 0.99 0.95 Instruction (raw) + Ingredient + User + Image (raw) 95% 1 0.91 0.95 0.9 1 0.95