Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 33 Anuncio

Más Contenido Relacionado

Similares a Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction (20)

Más de Rakuten Group, Inc. (20)

Anuncio

Más reciente (20)

Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction

  1. 1. Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product Attribute Extraction Keiji Shinzato1 1) Rakuten Institute of Technology, Rakuten Group, Inc. 2) Institute of Industrial Science, the University of Tokyo Naoki Yoshinaga2 Yandi Xia1 Wei-Te Chen1 ACL 2022 short paper
  2. 2. 1 ⾃⼰紹介 • 新⾥ 圭司 • Lead Scientist, Rakuten Institute of Technology Americas • 経歴 • 2004 – 2006: 北陸先端科学技術⼤学院⼤学 博⼠後期課程(⿃澤研) • 2006 – 2011: 京都⼤学⼤学院情報学研究科 特定助教・研究員(⿊橋研) • 2011 – 2018: 楽天グループ株式会社 楽天技術研究所 • 2018 – 現在: Rakuten USA, Rakuten Institute of Technology Americas • 趣味・興味 • 料理 • クラフトビール
  3. 3. 2 Crafted from sleek spazzolato leather (black). This is an elegant carryall that's perfect for your essentials. 10"H x 13”W x 6"D. Large Elegant Leather Bag - BLK Goal: Organizing Enormous Products in E-commerce • Business contribution • Sophisticated product search and recommendation. • Better understanding of customers on the marketplace. Attribute Value Color Black Material Leather Height 10 inch Width 13 inch Depth 6 inch Attribute value extraction The bag image is designed by pch.vector / Freepik
  4. 4. 3 From NER-Based to QA-Based Attribute Value Extraction • Existing Named Entity Recognition (NER)-based approach to attribute value extraction suffers from data sparseness problem. • Number of classes (attributes) in attribute value extraction can exceed one thousand. • Question Answering (QA)-based approach to attribute value extraction alleviates the data sparseness problem [Xu+, 2019; Wang+, 2020]. QA-based approach Adidas Running Shoes - 8.5 / White[SEP]Brand Context Query Adidas Running Shoes - 8.5 / White Answer
  5. 5. 4 Adidas Running Shoes - 8.5 / White[SEP]Brand Context Query From NER-Based to QA-Based Attribute Value Extraction • Existing Named Entity Recognition (NER)-based approach to attribute value extraction suffers from data sparseness problem. • Number of classes (attributes) in attribute value extraction can exceed 1K. • Question Answering (QA)-based approach to attribute value extraction alleviates the data sparseness problem [Xu+, 2019; Wang+, 2020]. BERT QA model BERT-QA [Wang+, 2020] Adidas Running Shoes - 8.5 / White Answer
  6. 6. 5 Attribute Value Extraction is Still Difficult 1 10 100 1000 10000 100000 1 48 95 142 189 236 283 330 377 424 471 518 565 612 659 706 753 800 847 894 941 988 1035 1082 1129 1176 1223 1270 1317 1364 1411 1458 1505 1552 1599 1646 1693 1740 1787 1834 1881 1928 1975 2022 2069 2116 Number of instances Attributes (2,162) Number of Instances per Attribute on AliExpress Dataset
  7. 7. 6 Attribute Value Extraction is Still Difficult 1 10 100 1000 10000 100000 1 48 95 142 189 236 283 330 377 424 471 518 565 612 659 706 753 800 847 894 941 988 1035 1082 1129 1176 1223 1270 1317 1364 1411 1458 1505 1552 1599 1646 1693 1740 1787 1834 1881 1928 1975 2022 2069 2116 Number of instances Attributes (2,162) Problems • Rare attributes • Number of instances is less than 10 in 85% of attributes Number of Instances per Attribute on AliExpress Dataset
  8. 8. 7 Attribute Value Extraction is Still Difficult 1 10 100 1000 10000 100000 1 48 95 142 189 236 283 330 377 424 471 518 565 612 659 706 753 800 847 894 941 988 1035 1082 1129 1176 1223 1270 1317 1364 1411 1458 1505 1552 1599 1646 1693 1740 1787 1834 1881 1928 1975 2022 2069 2116 Number of instances Attributes (2,162) Problems • Rare attributes • Number of instances is less than 10 in 85% of attributes • Ambiguous attributes • function 1, suitable, sort, etc. Number of Instances per Attribute on AliExpress Dataset
  9. 9. 8 Attribute Value Extraction is Still Difficult 1 10 100 1000 10000 100000 1 48 95 142 189 236 283 330 377 424 471 518 565 612 659 706 753 800 847 894 941 988 1035 1082 1129 1176 1223 1270 1317 1364 1411 1458 1505 1552 1599 1646 1693 1740 1787 1834 1881 1928 1975 2022 2069 2116 Number of instances Attributes (2,162) Problems • Rare attributes • Number of instances is less than 10 in 85% of attributes • Ambiguous attributes • function 1, suitable, sort, etc. Number of Labels per Attribute on AliExpress Dataset How can we obtain effective query representation for rare and ambiguous attributes?
  10. 10. 9 Knowledge-Driven Query Expansion for QA-based AE (1/3) Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) BERT-QA Title[SEP]Attribute[SEP]Values Context Exploit attribute values in training data as run-time knowledge to induce better query representation CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah Query Zipp Battery 12V 14AH SLA… Nominal capacity Brand
  11. 11. 10 Knowledge-Driven Query Expansion for QA-based AE (1/3) Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) BERT-QA Title[SEP]Attribute[SEP]Values Context Exploit attribute values in training data as run-time knowledge to induce better query representation CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah Query Imperfect Zipp Battery 12V 14AH SLA… Nominal capacity Brand
  12. 12. 11 Knowledge-Driven Query Expansion for QA-based AE (2/3) • Train knowledge-based QA models while mimicking the imperfection of knowledge in testing. • Knowledge dropout: Prevent models from naively matching values in query with one in context. • Knowledge token mixing: Prevent models from more relying on values than attributes. • We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based model with and without our value-based query expansion. Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah BERT-QA Title[SEP]Attribute[SEP]Values Context Query
  13. 13. 12 Knowledge-Driven Query Expansion for QA-based AE (2/3) • Train knowledge-based QA models while mimicking the imperfection of knowledge in testing. • Knowledge dropout: Prevent models from naively matching values in query with one in context. • Knowledge token mixing: Prevent models from more relying on values than attributes. • We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based model with and without our value-based query expansion. Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah BERT-QA Title[SEP]Attribute[SEP]Values Context Query
  14. 14. 13 Knowledge-Driven Query Expansion for QA-based AE (2/3) • Train knowledge-based QA models while mimicking the imperfection of knowledge in testing. • Knowledge dropout: Prevent models from naively matching values in query with one in context. • Knowledge token mixing: Prevent models from more relying on values than attributes. • We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based model with and without our value-based query expansion. Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah BERT-QA Title[SEP]Attribute[SEP]Values Context Query Knowledge dropout
  15. 15. 14 Knowledge-Driven Query Expansion for QA-based AE (3/3) • Train knowledge-based QA models while mimicking the imperfection of knowledge in testing. • Knowledge dropout: Prevent models from naively matching values in query with one in context. • Knowledge token mixing: Prevent models from more relying on values than attributes. • We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based model with and without our value-based query expansion. Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP]nominal capacity[SEP]14ah[SEP]40ah BERT-QA Title[SEP]Attribute[SEP]Values Context Query Knowledge dropout
  16. 16. 15 Knowledge-Driven Query Expansion for QA-based AE (3/3) • Train knowledge-based QA models while mimicking the imperfection of knowledge in testing. • Knowledge dropout: Prevent models from naively matching values in query with one in context. • Knowledge token mixing: Prevent models from more relying on values than attributes. • We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based model with and without our value-based query expansion. Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP][Seen]nominal capacity[SEP]14ah[SEP]40ah BERT-QA Title[SEP][Un/seen]Attribute[SEP]Values Context Query Knowledge dropout
  17. 17. 16 Knowledge-Driven Query Expansion for QA-based AE (3/3) • Train knowledge-based QA models while mimicking the imperfection of knowledge in testing. • Knowledge dropout: Prevent models from naively matching values in query with one in context. • Knowledge token mixing: Prevent models from more relying on values than attributes. • We assume the availability of value knowledge to be domain, and perform multi-domain learning for QA-based model with and without our value-based query expansion. Training data B E CATL ... 100 Ah … battery Knowledge (Attribute-value pairs) CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP][Seen]nominal capacity[SEP]14ah[SEP]40ah BERT-QA CATL 3.2V 100Ah battery LiFePo4 prismatic battery[SEP][Unseen] nominal capacity Deleted Title[SEP][Un/seen]Attribute[SEP]Values Context Query Knowledge dropout
  18. 18. 17 Experimental Settings • Perform experiments using cleaned AE-pub dataset. • We construct the cleaned AE-pub dataset from the public AliExpress dataset [Xu+, 2019] by removing 736 near-duplicated tuples. • Each entry consists of a tuple of <product title, attribute, value>. • Split the cleaned AE-pub dataset into train/dev/test sets with the ratio of 7:1:2. Train Dev. Test # of tuples 76,823 10,975 21,950 # of tuples with NULL 15,097 2,201 4,259 # of unique attribute-value pairs 11,819 2,680 4,431 # of unique attributes 1,801 635 872 # of unique values 9,317 2,258 3,671 Statistics of the cleaned AE-pub dataset
  19. 19. 18 Baselines • Dictionary matching • SUOpenTag [Xu+, 2019] • AVEQA [Wang+, 2020] • BERT-QA [Wang+, 2020] SUOpenTag AVEQA
  20. 20. 19 Baselines • Dictionary matching • SUOpenTag [Xu+, 2019] • AVEQA [Wang+, 2020] • BERT-QA [Wang+, 2020] SUOpenTag AVEQA BERT-QA
  21. 21. 20 Performance on Cleaned AE-pub Dataset Models Macro Micro P (%) R (%) F1 P (%) R (%) F1 Dictionary 33.20 (±0.00) 30.37 (±0.00) 31.72 (±0.00) 73.39 (±0.00) 73.77 (±0.00) 73.58 (±0.00) SUOpenTag [Xu+, 2019] 30.92 (±1.44) 28.04 (±1.48) 29.41 (±1.44) 86.53 (±0.78) 79.11 (±0.35) 82.65 (±0.20) AVEQA [Wang+, 2020] 41.93 (±1.05) 39.65 (±0.96) 40.76 (±0.98) 86.95 (±0.27) 81.99 (±0.13) 84.40 (±0.09) BERT-QA [Wang+, 2020] 42.77 (±0.36) 40.85 (±0.22) 41.79 (±0.28) 87.14 (±0.54) 82.16 (±0.21) 84.58 (±0.24) BERT-QA +vals 39.48 (±0.37) 35.60 (±0.44) 37.44 (±0.38) 88.82 (±0.22) 81.77 (±0.14) 85.15 (±0.14) BERT-QA +vals +drop 41.61 (±0.83) 38.22 (±0.80) 39.84 (±0.81) 88.46 (±0.26) 82.02 (±0.37) 85.12 (±0.14) BERT-QA +vals +mixing 46.67 (±0.33) 43.32 (±0.50) 44.93 (±0.39) 88.30 (±0.69) 82.46 (±0.30) 85.28 (±0.26) BERT-QA +vals +drop +mixing 47.74 (±0.54) 44.82 (±0.75) 46.23 (±0.64) 87.84 (±0.39) 82.61 (±0.07) 85.14 (±0.19) BERT-QA +vals +drop +mixing outperformed the baseline methods.
  22. 22. 21 Performance on Cleaned AE-pub Dataset Models Macro Micro P (%) R (%) F1 P (%) R (%) F1 Dictionary 33.20 (±0.00) 30.37 (±0.00) 31.72 (±0.00) 73.39 (±0.00) 73.77 (±0.00) 73.58 (±0.00) SUOpenTag [Xu+, 2019] 30.92 (±1.44) 28.04 (±1.48) 29.41 (±1.44) 86.53 (±0.78) 79.11 (±0.35) 82.65 (±0.20) AVEQA [Wang+, 2020] 41.93 (±1.05) 39.65 (±0.96) 40.76 (±0.98) 86.95 (±0.27) 81.99 (±0.13) 84.40 (±0.09) BERT-QA [Wang+, 2020] 42.77 (±0.36) 40.85 (±0.22) 41.79 (±0.28) 87.14 (±0.54) 82.16 (±0.21) 84.58 (±0.24) BERT-QA +vals 39.48 (±0.37) 35.60 (±0.44) 37.44 (±0.38) 88.82 (±0.22) 81.77 (±0.14) 85.15 (±0.14) BERT-QA +vals +drop 41.61 (±0.83) 38.22 (±0.80) 39.84 (±0.81) 88.46 (±0.26) 82.02 (±0.37) 85.12 (±0.14) BERT-QA +vals +mixing 46.67 (±0.33) 43.32 (±0.50) 44.93 (±0.39) 88.30 (±0.69) 82.46 (±0.30) 85.28 (±0.26) BERT-QA +vals +drop +mixing 47.74 (±0.54) 44.82 (±0.75) 46.23 (±0.64) 87.84 (±0.39) 82.61 (±0.07) 85.14 (±0.19) BERT-QA +vals learns to find strings that are similar to ones retrieved from the training data.
  23. 23. 22 Performance on Cleaned AE-pub Dataset Models Macro Micro P (%) R (%) F1 P (%) R (%) F1 Dictionary 33.20 (±0.00) 30.37 (±0.00) 31.72 (±0.00) 73.39 (±0.00) 73.77 (±0.00) 73.58 (±0.00) SUOpenTag [Xu+, 2019] 30.92 (±1.44) 28.04 (±1.48) 29.41 (±1.44) 86.53 (±0.78) 79.11 (±0.35) 82.65 (±0.20) AVEQA [Wang+, 2020] 41.93 (±1.05) 39.65 (±0.96) 40.76 (±0.98) 86.95 (±0.27) 81.99 (±0.13) 84.40 (±0.09) BERT-QA [Wang+, 2020] 42.77 (±0.36) 40.85 (±0.22) 41.79 (±0.28) 87.14 (±0.54) 82.16 (±0.21) 84.58 (±0.24) BERT-QA +vals 39.48 (±0.37) 35.60 (±0.44) 37.44 (±0.38) 88.82 (±0.22) 81.77 (±0.14) 85.15 (±0.14) BERT-QA +vals +drop 41.61 (±0.83) 38.22 (±0.80) 39.84 (±0.81) 88.46 (±0.26) 82.02 (±0.37) 85.12 (±0.14) BERT-QA +vals +mixing 46.67 (±0.33) 43.32 (±0.50) 44.93 (±0.39) 88.30 (±0.69) 82.46 (±0.30) 85.28 (±0.26) BERT-QA +vals +drop +mixing 47.74 (±0.54) 44.82 (±0.75) 46.23 (±0.64) 87.84 (±0.39) 82.61 (±0.07) 85.14 (±0.19) Knowledge dropout and knowledge token mixing improve both macro and micro F1 performance.
  24. 24. 23 Impact on Rare and Ambiguous Attributes • Categorize attributes that took the query expansion according to the number of training examples and the appropriateness of the attribute names. • Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness. • Compute the cosine similarity between the attribute embeddings and averaged value embeddings. • Regard the attribute name as ambiguous if the similarity is low. • Divide the attributes into four according to median frequency and similarity to values.
  25. 25. 24 Impact on Rare and Ambiguous Attributes • Categorize attributes that took the query expansion according to the number of training examples and the appropriateness of the attribute names. • Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness. • Compute the cosine similarity between the attribute embeddings and averaged value embeddings. • Regard the attribute name as ambiguous if the similarity is low. • Divide the attributes into four according to median frequency and similarity to values. Model Cosine similarity (med: 0.929) Number of training examples (median: 8) [1, 8) [8, ∞) All BERT-QA +vals +drop +mixing [0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86) [0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29) All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08) Macro F1 Gains over BERT-QA Model
  26. 26. 25 Impact on Rare and Ambiguous Attributes • Categorize attributes that took the query expansion according to the number of training examples and the appropriateness of the attribute names. • Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness. • Compute the cosine similarity between the attribute embeddings and averaged value embeddings. • Regard the attribute name as ambiguous if the similarity is low. • Divide the attributes into four according to median frequency and similarity to values. Model Cosine similarity (med: 0.929) Number of training examples (median: 8) [1, 8) [8, ∞) All BERT-QA +vals +drop +mixing [0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86) [0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29) All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08) Macro F1 Gains over BERT-QA Model Query expansion can generate more informative queries than ambiguous attributes alone.
  27. 27. 26 Impact on Rare and Ambiguous Attributes • Categorize attributes that took the query expansion according to the number of training examples and the appropriateness of the attribute names. • Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness. • Compute the cosine similarity between the attribute embeddings and averaged value embeddings. • Regard the attribute name as ambiguous if the similarity is low. • Divide the attributes into four according to median frequency and similarity to values. Model Cosine similarity (med: 0.929) Number of training examples (median: 8) [1, 8) [8, ∞) All BERT-QA +vals +drop +mixing [0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86) [0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29) All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08) Macro F1 Gains over BERT-QA Model Query expansion is effective for rare attributes more than frequent attributes.
  28. 28. 27 Impact on Rare and Ambiguous Attributes • Categorize attributes that took the query expansion according to the number of training examples and the appropriateness of the attribute names. • Exploit embeddings of the CLS token obtained from BERT to measure the appropriateness. • Compute the cosine similarity between the attribute embeddings and averaged value embeddings. • Regard the attribute name as ambiguous if the similarity is low. • Divide the attributes into four according to median frequency and similarity to values. Model Cosine similarity (med: 0.929) Number of training examples (median: 8) [1, 8) [8, ∞) All BERT-QA +vals +drop +mixing [0.411, 0.929) 49.15 (+7.54) 57.89 (+6.15) 53.51 (+6.86) [0.929, 1.0] 50.94 (+8.14) 71.04 (+3.02) 62.10 (+5.29) All 49.99 (+7.82) 64.84 (+4.50) 57.81 (+6.08) Macro F1 Gains over BERT-QA Model Model could use more parameters to solve the task itself by taking the internal knowledge induced from the training data as runtime input.
  29. 29. 28 Example Outputs Context Query Gold Prediction Attribute Values BERT-QA BERT-QA w/ query expansion aeronova bicycle carbon mtb handlebar mountain bikes flat handlebar mtb integrated handlebars with stem bike accessories function 1 skiing goggles, carbon road bicycle handlebar, cycling glasses, bicycle mask, gas mask, … carbon mtb handlebar bicycle carbon mtb handlebar L carbon mtb handlebar J lfp 3.2v 100ah lifepo4 prismatic cell deep cycle diy lithium ion battery 72v 60v 48v 24v 100ah 200ah ev solar storage battery nominal capacity 14ah, 40ah, 17.4ah 100ah 3.2v 100ah L 100ah J camel outdoor softshell men’s hiking jacket wind- proof thermal jacket for camping ski thick warm coats suitable men, camping, kids, saltwater/fresh water, women, 4-15y, mtb cycling shoes, … men men J camping L
  30. 30. 29 Conclusions • Knowledge-driven query expansion for QA-based product attribute extraction. • We construct the knowledge from training data, and use it to induce better query representation. • Two tricks to mimic the imperfection of the knowledge. • Knowledge dropout and knowledge token mixing. • Our query expansion is effective, especially for rare and ambiguous attributes.
  31. 31. 30 論⽂で触れていない話 • 評価実験と実際の利⽤シーンの乖離 • 評価実験︓先⾏研究も含め、正解属性が与えられている • 実際の利⽤シーン︓正解属性はわからない • QA-based modelの実⽤性 • 属性を変えて複数回モデルを⾛らせる必要がある • どの属性について値を抽出したいのか事前に知っておく必要がある • Eコマースサイトによってはマスターデータを参照すれば絞り込み可能
  32. 32. 31 属性値抽出の今後 • 属性値抽出 à NERとなりがち • NERベースの⼿法の問題 • 抽出された値の正規化が必要(D&G à Dolce & Gabbana) • 属性値をアノテーションする場合、正解を定義するのが難しい • 既存の商品データから学習データを⾃動⽣成すると誤ったアノテーションが含まれる • 商品タイトル: ジャーナルスタンダード ジーンズ スタンダードフィット • 属性︓<ブランド, ジャーナルスタンダード>, <パンツ脚幅, スタンダード> • 値の種類がめったに増えない属性もある(e.g., ⾊、⽣産国) • NER以外のアプローチ • 分類として解く • Extreme Multi-Label Classification with Label Masking for Product Attribute Value Extraction • ⽣成として解く • 研究中

×