SlideShare una empresa de Scribd logo
1 de 22
Descargar para leer sin conexión
PR-298
주성훈, Samsung SDS
2021. 1. 17.
https://www.flaticon.com/kr/authors/freepik
https://arxiv.org/pdf/2008.09093.pdf
PARADE: Passage Representation Aggregation for Document Reranking
Canjia Li1,3∗, Andrew Yates2, Sean MacAvaney4, Ben He1,3, Yingfei Sun1
1 University of Chinese Academy of Sciences, Beijing, China
2 Max Planck Institute for Informatics, Saarbr¨ucken, Germany
3 Institute of Software, Chinese Academy of Sciences, Beijing, China
4 IR Lab, Georgetown University, Washington, DC, USA
licanjia17@mails.ucas.ac.cn, ayates@mpi-inf.mpg.de sean@ir.cs.georgetown.edu, {benhe,
yfsun}@ucas.ac.cn
1. Research Background
1. Research Background
Information Retrieval (IR)
3/22
Inverted Index Retrieval task
https://giyatto.tistory.com/2 https://devopedia.org/information-retrieval 1) Retrieval stage 2) re-ranking stage
• 1차 검색 결과를 재정렬함으로서 더 연관성 높은 문서가 상위에 노출되도록 한다.
1. Research Background
BERT 기반의 re-ranking
4/22
Ying, Chengxuan, and Chen Huo. "An Adaptive Early Stopping Strategy for Query-based Passage Re-ranking.“ (2020)
• BERT (http://arxiv.org/abs/1810.04805) 를 많은 NLP task 에 적용해 성공을 거둠 -> IR re-ranking
task에 동기 부여
• Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT
• Transformer의 self-attention의 높은 계산비용으로 인한 입력 문장 길이 제한이 있음
BERT
P1
Document
P2
Pn
…
rel1
rel2
reln
…
Q
[SEP]
Document relevance score 결정하는 방법
• 상위 3개 sentence/passage의 relevance score를 평균 내 문서의 relevance score를 결정
• Zhuyun Dai and Jamie Callan. 2019. Deeper text understanding for IR with contextual neural language modeling
• Zeynep Akkalyoncu Yilmaz, Wei Yang, Haotian Zhang, and Jimmy Lin. 2019.
• Sentence/passage의 relevance score를 aggregation 하는 방법
1. Sentence/passage의 순서를 고려하는 방법 : non-relevant passag가 들어오면 rel(q,D)를 깎음
• Hui Fang, Tao Tao, and Chengxiang Zhai. 2011. Diagnostic evaluation of information retrieval models. ACMTrans. Inf. Syst., 29(2).
2. passage-level cumulative gain
• Zhijing Wu, Jiaxin Mao, Yiqun Liu, Jingtao Zhan, Yukun Zheng, Min Zhang, and Shaoping Ma. 2020. Leveraging passage-level cumulative gain for
document ranking. In WWW. ACM.
𝑟𝑒𝑙(𝑞, 𝐷)
• Longformer (4096 token)
• Beltagy et al., 2020. Longformer: The long-document transformer. CoRR, abs/2004.05150.
1. Research Background 5/22
Objective & Approach : a hybrid retrieval approach
• We propose PARADE, an end-to-end document reranking model.
• PARADE predicts a document’s relevance by learning passage-level relevance
representations that are aggregated in a way that preserves document-level
context.
1. Research Background 6/22
BERT
P1
Document
P2
Pn
…
Aggregation
Q
[SEP]
𝑟𝑒𝑙(𝑞, 𝐷)
2. Methods
Representing a Document as Passages
𝑝𝑖
𝑐𝑙𝑠
= BERT(𝑞, 𝑃𝑖)
2. Methods 8/22
BERT
BERT
𝑝2
𝑐𝑙𝑠
𝑝 𝑛
𝑐𝑙𝑠
P1: 150 words P2: 150 words
150 words
100 words stride
P3: 150 words
P1-start P1-endP2-start P2-endP3-start P3-end …
𝐷 = {𝑃1, … , 𝑃𝑛}
…
BERT
𝑝1
𝑐𝑙𝑠
𝑝3
𝑐𝑙𝑠
100 words stride
Aggregating Passage Relevance Representations
2. Methods 9/22
BERT
BERT
𝑝2
𝑐𝑙𝑠
𝑝 𝑛
𝑐𝑙𝑠𝐷 𝑐𝑙𝑠 = {𝑝1
𝑐𝑙𝑠
, … , 𝑝 𝑛
𝑐𝑙𝑠}
…
BERT
𝑝1
𝑐𝑙𝑠
𝑝3
𝑐𝑙𝑠
𝑝𝑖
𝑐𝑙𝑠
= BERT(𝑞, 𝑃𝑖)
𝐷 = {𝑃1, … , 𝑃𝑛}
𝐷 𝑐𝑙𝑠 → 𝑑 𝑐𝑙𝑠 ∈ 𝑅 𝑑
Aggregating layer
Single-layer feed-forward network
𝑟𝑒𝑙(𝑞, 𝐷)
𝑑 𝑐𝑙𝑠
𝑑 𝑐𝑙𝑠
[𝑗] = max(𝑝1
𝑐𝑙𝑠
𝑗 , … , 𝑝 𝑛
𝑐𝑙𝑠
𝑗 )
𝑤1, … , 𝑤 𝑛 = softmax(𝑊𝑝1
𝑐𝑙𝑠
, … , 𝑊𝑝 𝑛
𝑐𝑙𝑠)
𝑑 𝑐𝑙𝑠
= ෍
𝑖=1
𝑛
𝑤𝑖 𝑝1
𝑐𝑙𝑠
1) PARADEMax
2) PARADEAttn
Aggregating Passage Relevance Representations
3) PARADETransformer
2. Methods 10/22
𝑝2
𝑐𝑙𝑠
𝑝1
𝑐𝑙𝑠 𝑝 𝑛
𝑐𝑙𝑠
Single-layer feed-forward network
𝑟𝑒𝑙(𝑞, 𝐷)
http://jalammar.github.io/illustrated-transformer/ 변형해 새로 그림
𝑒𝑚𝑏 𝑐𝑙𝑠
…
+positional embedding
MultiHead Attention
Add & LayerNorm
2-layer Feed Forward
Add & LayerNorm
𝑑 𝑐𝑙𝑠
𝐷 𝑐𝑙𝑠 = {𝑝1
𝑐𝑙𝑠
, … , 𝑝 𝑛
𝑐𝑙𝑠}
𝑝𝑖
𝑐𝑙𝑠
= BERT(𝑞, 𝑃𝑖)
𝐷 = {𝑃1, … , 𝑃𝑛}
𝐷 𝑐𝑙𝑠 → 𝑑 𝑐𝑙𝑠 ∈ 𝑅 𝑑
Robust04 is a newswire collection used by the TREC 2004 Robust track.
GOV2 is a Web collection crawled from government Websites.
Dataset
• Robust04, GOV2
2. Methods 11/22
301 0 FBIS3-10082 1
301 0 FBIS3-10169 0
301 0 FBIS3-10243 1
301 0 FBIS3-10319 0
301 0 FBIS3-10397 1
301 0 FBIS3-10491 1
301 0 FBIS3-10555 0
301 0 FBIS3-10622 1
301 0 FBIS3-10634 0
301 0 FBIS3-10635 0
301 0 FBIS3-10721 1
301 0 FBIS3-10805 1
…
qrels.robust2004.txt
P3
Q [SEP] Pn
Training
• BERT 초기화
(MSMARCO passage ranking)
2. Methods 12/22
301 0 FBIS3-10082 1
301 0 FBIS3-10169 0
301 0 FBIS3-10243 1
301 0 FBIS3-10319 0
301 0 FBIS3-10397 1
301 0 FBIS3-10491 1
301 0 FBIS3-10555 0
301 0 FBIS3-10622 1
301 0 FBIS3-10634 0
301 0 FBIS3-10635 0
301 0 FBIS3-10721 1
301 0 FBIS3-10805 1
…
BM25 result
BERT
BERT
𝑝2
𝑐𝑙𝑠
𝑝 𝑛
𝑐𝑙𝑠
…
𝑝1
𝑐𝑙𝑠
𝑝3
𝑐𝑙𝑠
Aggregating layer
Single-layer feed-forward network
𝑟𝑒𝑙(𝑞, 𝐷)
𝑑 𝑐𝑙𝑠
BERT
𝑟𝑒𝑙(𝑞, 𝐷)
Q [SEP] D
GT
𝐿 𝐶𝐸
• PARADE 학습 (robust2004, GOV2)
Q [SEP] P1
Q [SEP] P2
BERT
3 epochs, single Google TPU v3-8 에서 2.5 시간.
Document
set
1000
Initial ranking
(BM25)
3. Experimental Results
Research Questions
• 속도와 정확도 tradeoff 관련하여,
RQ1: How can BERT’s efficiency be improved while maintaining its effectiveness?
• 입력 문서의 길이와 관련하여,
RQ2: How does the number of document passages preserved influence effectiveness?
• 다양한 initial ranking method의 대해 PARADE가 성능 향상이 있는지
RQ3: Is it beneficial to rerank documents from a more effective initial ranking method? In particular, is
reranking BM25+RM3 better than reranking BM25?
3. Experimental Results 14/22
*Birch : top 3 passage score와 initial rankin에서 document score를 weighted sum 함.
BERT-Large 사용, initial ranking score고려 (https://www.aclweb.org/anthology/D19-3004/)
3. Experimental Results 15/22
PARADE의 reranking 성능
Document
set
100
100
Initial ranking
(BM25)
Re-ranking
*BERT-MaxP : 4 passage의 max score로 doc score결정 (https://arxiv.org/pdf/1905.09217.pdf)
PARADE의 계산 효율성에 대한 실험
3. Experimental Results 16/22
zt, zs : logits from teacher and student modelLCE : student model에 대한 cross-entropy loss
3. Experimental Results 17/22
3. Experimental Results 18/22
입력 passage의 수에 대한 성능 비교
입력 문장이 길수록 성능이 좋음 : 문서의 내용을 잘 보존한다.
Passage representation 을 위해 Attention을 사용해야 한다.
효율성과 성능을 모두 고려하면, 많은 passage를 사용하도록 학습하고 inference시에는 적은 passage를 사용하는 것이 유리하다
• Trade-off efficiency for effectiveness
입력 passage의 수에 대한 성능 비교
3. Experimental Results 19/22
계산량 증가
PARADE가 re-ranking 방법으로서 효과적인가?
3. Experimental Results 20/22
D
1000
1000
Initial ranking
(BM25, BM25+RM3)
Re-ranking (PARADE)
DQE 와 DBM25 가 PARADE에 의해 높은 순위로 올라갔다.
DQE: BM25 + RM3 에서 검색했지만 BM25 에서 검색하지 않은 관련 문서
DBM25: BM25에서 검색했지만 BM25 + RM3에서 검색하지 않은 관련 문서
4. Conclusion
Thank you.
• Language model기반의 PARADE라는 end-to-end document reranking model
을 새로 선보였다.
TREC Robust04 dataset에서 SOTA
TREC-COVID challenge에서 second round 1위
• PARADE에 대한 knowledge distillation을 통해 parameter수를 줄이면서 성능은
높이는 것이 가능했다.
4. Conclusions 22/22
• BM25, RM3 등 기존의 initial ranking method의 검색 결과 품질을 re-ranking으로
높이는 것이 가능하다는 것을 보였다.
• Transformer를 이용한 passage representation aggregation이 문서의 relevance
score를 구하는 좋은 방법임을 보였다.

Más contenido relacionado

Similar a PARADE passage representation aggregation for document reranking

2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible researchYannick Wurm
 
Making effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsMaking effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsOregon State University
 
Indexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data searchIndexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data searchTill Blume
 
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Paragon_Science_Inc
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...Thomas Gottron
 
Katharine_Hurst_200789199_MSc_Thesis_2014
Katharine_Hurst_200789199_MSc_Thesis_2014Katharine_Hurst_200789199_MSc_Thesis_2014
Katharine_Hurst_200789199_MSc_Thesis_2014Katharine Hurst
 
Ontology-Based Data Access with Ontop
Ontology-Based Data Access with OntopOntology-Based Data Access with Ontop
Ontology-Based Data Access with OntopBenjamin Cogrel
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web PagesMichael Nelson
 
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)Jing-Doo Wang
 
Challenge@RuleML2015 Modeling Object-Relational Geolocation Knowledge in PSOA...
Challenge@RuleML2015 Modeling Object-Relational Geolocation Knowledge in PSOA...Challenge@RuleML2015 Modeling Object-Relational Geolocation Knowledge in PSOA...
Challenge@RuleML2015 Modeling Object-Relational Geolocation Knowledge in PSOA...RuleML
 
Improving the search mechanism for unstructured peer to-peer networks using t...
Improving the search mechanism for unstructured peer to-peer networks using t...Improving the search mechanism for unstructured peer to-peer networks using t...
Improving the search mechanism for unstructured peer to-peer networks using t...Aditya Kumar
 
Population-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisPopulation-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisDenis C. Bauer
 
How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?Sunghwan Kim
 
DLSU Research Congress 2014 - Civil Engineering Abstracts
DLSU Research Congress 2014 - Civil Engineering AbstractsDLSU Research Congress 2014 - Civil Engineering Abstracts
DLSU Research Congress 2014 - Civil Engineering Abstractsandyoreta
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics EnvironmentIan Foster
 
Binarization of Degraded Text documents and Palm Leaf Manuscripts
Binarization of Degraded Text documents and Palm Leaf ManuscriptsBinarization of Degraded Text documents and Palm Leaf Manuscripts
Binarization of Degraded Text documents and Palm Leaf ManuscriptsIRJET Journal
 
Genetic programming for prediction of local scour at vertical bridge abutment
Genetic programming for prediction of local scour at vertical bridge abutmentGenetic programming for prediction of local scour at vertical bridge abutment
Genetic programming for prediction of local scour at vertical bridge abutmenteSAT Publishing House
 

Similar a PARADE passage representation aggregation for document reranking (20)

2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research
 
Making effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computationsMaking effective use of graphics processing units (GPUs) in computations
Making effective use of graphics processing units (GPUs) in computations
 
Indexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data searchIndexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data search
 
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
 
ChemEngine ACS
ChemEngine ACSChemEngine ACS
ChemEngine ACS
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
 
Katharine_Hurst_200789199_MSc_Thesis_2014
Katharine_Hurst_200789199_MSc_Thesis_2014Katharine_Hurst_200789199_MSc_Thesis_2014
Katharine_Hurst_200789199_MSc_Thesis_2014
 
Ontology-Based Data Access with Ontop
Ontology-Based Data Access with OntopOntology-Based Data Access with Ontop
Ontology-Based Data Access with Ontop
 
OKE2018 Challenge @ ESWC2018
OKE2018 Challenge @ ESWC2018OKE2018 Challenge @ ESWC2018
OKE2018 Challenge @ ESWC2018
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages
 
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
Hadoop con 2016_9_10_王經篤(Jing-Doo Wang)
 
Challenge@RuleML2015 Modeling Object-Relational Geolocation Knowledge in PSOA...
Challenge@RuleML2015 Modeling Object-Relational Geolocation Knowledge in PSOA...Challenge@RuleML2015 Modeling Object-Relational Geolocation Knowledge in PSOA...
Challenge@RuleML2015 Modeling Object-Relational Geolocation Knowledge in PSOA...
 
Improving the search mechanism for unstructured peer to-peer networks using t...
Improving the search mechanism for unstructured peer to-peer networks using t...Improving the search mechanism for unstructured peer to-peer networks using t...
Improving the search mechanism for unstructured peer to-peer networks using t...
 
Population-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysisPopulation-scale high-throughput sequencing data analysis
Population-scale high-throughput sequencing data analysis
 
How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?
 
DLSU Research Congress 2014 - Civil Engineering Abstracts
DLSU Research Congress 2014 - Civil Engineering AbstractsDLSU Research Congress 2014 - Civil Engineering Abstracts
DLSU Research Congress 2014 - Civil Engineering Abstracts
 
Open Analytics Environment
Open Analytics EnvironmentOpen Analytics Environment
Open Analytics Environment
 
Binarization of Degraded Text documents and Palm Leaf Manuscripts
Binarization of Degraded Text documents and Palm Leaf ManuscriptsBinarization of Degraded Text documents and Palm Leaf Manuscripts
Binarization of Degraded Text documents and Palm Leaf Manuscripts
 
Genetic programming for prediction of local scour at vertical bridge abutment
Genetic programming for prediction of local scour at vertical bridge abutmentGenetic programming for prediction of local scour at vertical bridge abutment
Genetic programming for prediction of local scour at vertical bridge abutment
 

Más de Sunghoon Joo

PR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterPR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterSunghoon Joo
 
PR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked AutoencodersPR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked AutoencodersSunghoon Joo
 
PR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfPR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfSunghoon Joo
 
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...Sunghoon Joo
 
PR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionPR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionSunghoon Joo
 
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...Sunghoon Joo
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.Sunghoon Joo
 
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningPR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningSunghoon Joo
 
PR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningPR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningSunghoon Joo
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...Sunghoon Joo
 
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...Sunghoon Joo
 
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...Sunghoon Joo
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationSunghoon Joo
 
PR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesPR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesSunghoon Joo
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From ScratchSunghoon Joo
 
PR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchPR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchSunghoon Joo
 
PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesSunghoon Joo
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...Sunghoon Joo
 
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...Sunghoon Joo
 
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...Sunghoon Joo
 

Más de Sunghoon Joo (20)

PR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But FasterPR-445: Token Merging: Your ViT But Faster
PR-445: Token Merging: Your ViT But Faster
 
PR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked AutoencodersPR-433: Test-time Training with Masked Autoencoders
PR-433: Test-time Training with Masked Autoencoders
 
PR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdfPR422_hyper-deep ensembles.pdf
PR422_hyper-deep ensembles.pdf
 
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
PR-411: Model soups: averaging weights of multiple fine-tuned models improves...
 
PR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed RecognitionPR-393: ResLT: Residual Learning for Long-tailed Recognition
PR-393: ResLT: Residual Learning for Long-tailed Recognition
 
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
 
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental LearningPR-351: Adaptive Aggregation Networks for Class-Incremental Learning
PR-351: Adaptive Aggregation Networks for Class-Incremental Learning
 
PR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learningPR-339: Maintaining discrimination and fairness in class incremental learning
PR-339: Maintaining discrimination and fairness in class incremental learning
 
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
[PR-325] Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Tran...
 
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
PR-313 Training BatchNorm and Only BatchNorm: On the Expressive Power of Rand...
 
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
PR-285 Leveraging Semantic and Lexical Matching to Improve the Recall of Docu...
 
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector QuantizationPR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
PR-272: Accelerating Large-Scale Inference with Anisotropic Vector Quantization
 
PR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseasesPR-246: A deep learning system for differential diagnosis of skin diseases
PR-246: A deep learning system for differential diagnosis of skin diseases
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
 
PR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture SearchPR-218: MFAS: Multimodal Fusion Architecture Search
PR-218: MFAS: Multimodal Fusion Architecture Search
 
PR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of SamplesPR-203: Class-Balanced Loss Based on Effective Number of Samples
PR-203: Class-Balanced Loss Based on Effective Number of Samples
 
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
PR-187 : MorphNet: Fast & Simple Resource-Constrained Structure Learning of D...
 
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
PR173 : Automatic Chemical Design Using a Data-Driven Continuous Representati...
 
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
PR-159 : Synergistic Image and Feature Adaptation: Towards Cross-Modality Dom...
 

Último

Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESNarmatha D
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxRomil Mishra
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptxNikhil Raut
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptJasonTagapanGulla
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptSAURABHKUMAR892774
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the weldingMuhammadUzairLiaqat
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadaditya806802
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxVelmuruganTECE
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 

Último (20)

Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Industrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIESIndustrial Safety Unit-I SAFETY TERMINOLOGIES
Industrial Safety Unit-I SAFETY TERMINOLOGIES
 
Mine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptxMine Environment II Lab_MI10448MI__________.pptx
Mine Environment II Lab_MI10448MI__________.pptx
 
Steel Structures - Building technology.pptx
Steel Structures - Building technology.pptxSteel Structures - Building technology.pptx
Steel Structures - Building technology.pptx
 
Solving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.pptSolving The Right Triangles PowerPoint 2.ppt
Solving The Right Triangles PowerPoint 2.ppt
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Arduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.pptArduino_CSE ece ppt for working and principal of arduino.ppt
Arduino_CSE ece ppt for working and principal of arduino.ppt
 
POWER SYSTEMS-1 Complete notes examples
POWER SYSTEMS-1 Complete notes  examplesPOWER SYSTEMS-1 Complete notes  examples
POWER SYSTEMS-1 Complete notes examples
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
welding defects observed during the welding
welding defects observed during the weldingwelding defects observed during the welding
welding defects observed during the welding
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasad
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
Internet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptxInternet of things -Arshdeep Bahga .pptx
Internet of things -Arshdeep Bahga .pptx
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 

PARADE passage representation aggregation for document reranking

  • 1. PR-298 주성훈, Samsung SDS 2021. 1. 17. https://www.flaticon.com/kr/authors/freepik https://arxiv.org/pdf/2008.09093.pdf PARADE: Passage Representation Aggregation for Document Reranking Canjia Li1,3∗, Andrew Yates2, Sean MacAvaney4, Ben He1,3, Yingfei Sun1 1 University of Chinese Academy of Sciences, Beijing, China 2 Max Planck Institute for Informatics, Saarbr¨ucken, Germany 3 Institute of Software, Chinese Academy of Sciences, Beijing, China 4 IR Lab, Georgetown University, Washington, DC, USA licanjia17@mails.ucas.ac.cn, ayates@mpi-inf.mpg.de sean@ir.cs.georgetown.edu, {benhe, yfsun}@ucas.ac.cn
  • 3. 1. Research Background Information Retrieval (IR) 3/22 Inverted Index Retrieval task https://giyatto.tistory.com/2 https://devopedia.org/information-retrieval 1) Retrieval stage 2) re-ranking stage • 1차 검색 결과를 재정렬함으로서 더 연관성 높은 문서가 상위에 노출되도록 한다.
  • 4. 1. Research Background BERT 기반의 re-ranking 4/22 Ying, Chengxuan, and Chen Huo. "An Adaptive Early Stopping Strategy for Query-based Passage Re-ranking.“ (2020) • BERT (http://arxiv.org/abs/1810.04805) 를 많은 NLP task 에 적용해 성공을 거둠 -> IR re-ranking task에 동기 부여 • Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT • Transformer의 self-attention의 높은 계산비용으로 인한 입력 문장 길이 제한이 있음
  • 5. BERT P1 Document P2 Pn … rel1 rel2 reln … Q [SEP] Document relevance score 결정하는 방법 • 상위 3개 sentence/passage의 relevance score를 평균 내 문서의 relevance score를 결정 • Zhuyun Dai and Jamie Callan. 2019. Deeper text understanding for IR with contextual neural language modeling • Zeynep Akkalyoncu Yilmaz, Wei Yang, Haotian Zhang, and Jimmy Lin. 2019. • Sentence/passage의 relevance score를 aggregation 하는 방법 1. Sentence/passage의 순서를 고려하는 방법 : non-relevant passag가 들어오면 rel(q,D)를 깎음 • Hui Fang, Tao Tao, and Chengxiang Zhai. 2011. Diagnostic evaluation of information retrieval models. ACMTrans. Inf. Syst., 29(2). 2. passage-level cumulative gain • Zhijing Wu, Jiaxin Mao, Yiqun Liu, Jingtao Zhan, Yukun Zheng, Min Zhang, and Shaoping Ma. 2020. Leveraging passage-level cumulative gain for document ranking. In WWW. ACM. 𝑟𝑒𝑙(𝑞, 𝐷) • Longformer (4096 token) • Beltagy et al., 2020. Longformer: The long-document transformer. CoRR, abs/2004.05150. 1. Research Background 5/22
  • 6. Objective & Approach : a hybrid retrieval approach • We propose PARADE, an end-to-end document reranking model. • PARADE predicts a document’s relevance by learning passage-level relevance representations that are aggregated in a way that preserves document-level context. 1. Research Background 6/22 BERT P1 Document P2 Pn … Aggregation Q [SEP] 𝑟𝑒𝑙(𝑞, 𝐷)
  • 8. Representing a Document as Passages 𝑝𝑖 𝑐𝑙𝑠 = BERT(𝑞, 𝑃𝑖) 2. Methods 8/22 BERT BERT 𝑝2 𝑐𝑙𝑠 𝑝 𝑛 𝑐𝑙𝑠 P1: 150 words P2: 150 words 150 words 100 words stride P3: 150 words P1-start P1-endP2-start P2-endP3-start P3-end … 𝐷 = {𝑃1, … , 𝑃𝑛} … BERT 𝑝1 𝑐𝑙𝑠 𝑝3 𝑐𝑙𝑠 100 words stride
  • 9. Aggregating Passage Relevance Representations 2. Methods 9/22 BERT BERT 𝑝2 𝑐𝑙𝑠 𝑝 𝑛 𝑐𝑙𝑠𝐷 𝑐𝑙𝑠 = {𝑝1 𝑐𝑙𝑠 , … , 𝑝 𝑛 𝑐𝑙𝑠} … BERT 𝑝1 𝑐𝑙𝑠 𝑝3 𝑐𝑙𝑠 𝑝𝑖 𝑐𝑙𝑠 = BERT(𝑞, 𝑃𝑖) 𝐷 = {𝑃1, … , 𝑃𝑛} 𝐷 𝑐𝑙𝑠 → 𝑑 𝑐𝑙𝑠 ∈ 𝑅 𝑑 Aggregating layer Single-layer feed-forward network 𝑟𝑒𝑙(𝑞, 𝐷) 𝑑 𝑐𝑙𝑠 𝑑 𝑐𝑙𝑠 [𝑗] = max(𝑝1 𝑐𝑙𝑠 𝑗 , … , 𝑝 𝑛 𝑐𝑙𝑠 𝑗 ) 𝑤1, … , 𝑤 𝑛 = softmax(𝑊𝑝1 𝑐𝑙𝑠 , … , 𝑊𝑝 𝑛 𝑐𝑙𝑠) 𝑑 𝑐𝑙𝑠 = ෍ 𝑖=1 𝑛 𝑤𝑖 𝑝1 𝑐𝑙𝑠 1) PARADEMax 2) PARADEAttn
  • 10. Aggregating Passage Relevance Representations 3) PARADETransformer 2. Methods 10/22 𝑝2 𝑐𝑙𝑠 𝑝1 𝑐𝑙𝑠 𝑝 𝑛 𝑐𝑙𝑠 Single-layer feed-forward network 𝑟𝑒𝑙(𝑞, 𝐷) http://jalammar.github.io/illustrated-transformer/ 변형해 새로 그림 𝑒𝑚𝑏 𝑐𝑙𝑠 … +positional embedding MultiHead Attention Add & LayerNorm 2-layer Feed Forward Add & LayerNorm 𝑑 𝑐𝑙𝑠 𝐷 𝑐𝑙𝑠 = {𝑝1 𝑐𝑙𝑠 , … , 𝑝 𝑛 𝑐𝑙𝑠} 𝑝𝑖 𝑐𝑙𝑠 = BERT(𝑞, 𝑃𝑖) 𝐷 = {𝑃1, … , 𝑃𝑛} 𝐷 𝑐𝑙𝑠 → 𝑑 𝑐𝑙𝑠 ∈ 𝑅 𝑑
  • 11. Robust04 is a newswire collection used by the TREC 2004 Robust track. GOV2 is a Web collection crawled from government Websites. Dataset • Robust04, GOV2 2. Methods 11/22 301 0 FBIS3-10082 1 301 0 FBIS3-10169 0 301 0 FBIS3-10243 1 301 0 FBIS3-10319 0 301 0 FBIS3-10397 1 301 0 FBIS3-10491 1 301 0 FBIS3-10555 0 301 0 FBIS3-10622 1 301 0 FBIS3-10634 0 301 0 FBIS3-10635 0 301 0 FBIS3-10721 1 301 0 FBIS3-10805 1 … qrels.robust2004.txt
  • 12. P3 Q [SEP] Pn Training • BERT 초기화 (MSMARCO passage ranking) 2. Methods 12/22 301 0 FBIS3-10082 1 301 0 FBIS3-10169 0 301 0 FBIS3-10243 1 301 0 FBIS3-10319 0 301 0 FBIS3-10397 1 301 0 FBIS3-10491 1 301 0 FBIS3-10555 0 301 0 FBIS3-10622 1 301 0 FBIS3-10634 0 301 0 FBIS3-10635 0 301 0 FBIS3-10721 1 301 0 FBIS3-10805 1 … BM25 result BERT BERT 𝑝2 𝑐𝑙𝑠 𝑝 𝑛 𝑐𝑙𝑠 … 𝑝1 𝑐𝑙𝑠 𝑝3 𝑐𝑙𝑠 Aggregating layer Single-layer feed-forward network 𝑟𝑒𝑙(𝑞, 𝐷) 𝑑 𝑐𝑙𝑠 BERT 𝑟𝑒𝑙(𝑞, 𝐷) Q [SEP] D GT 𝐿 𝐶𝐸 • PARADE 학습 (robust2004, GOV2) Q [SEP] P1 Q [SEP] P2 BERT 3 epochs, single Google TPU v3-8 에서 2.5 시간. Document set 1000 Initial ranking (BM25)
  • 14. Research Questions • 속도와 정확도 tradeoff 관련하여, RQ1: How can BERT’s efficiency be improved while maintaining its effectiveness? • 입력 문서의 길이와 관련하여, RQ2: How does the number of document passages preserved influence effectiveness? • 다양한 initial ranking method의 대해 PARADE가 성능 향상이 있는지 RQ3: Is it beneficial to rerank documents from a more effective initial ranking method? In particular, is reranking BM25+RM3 better than reranking BM25? 3. Experimental Results 14/22
  • 15. *Birch : top 3 passage score와 initial rankin에서 document score를 weighted sum 함. BERT-Large 사용, initial ranking score고려 (https://www.aclweb.org/anthology/D19-3004/) 3. Experimental Results 15/22 PARADE의 reranking 성능 Document set 100 100 Initial ranking (BM25) Re-ranking *BERT-MaxP : 4 passage의 max score로 doc score결정 (https://arxiv.org/pdf/1905.09217.pdf)
  • 16. PARADE의 계산 효율성에 대한 실험 3. Experimental Results 16/22
  • 17. zt, zs : logits from teacher and student modelLCE : student model에 대한 cross-entropy loss 3. Experimental Results 17/22
  • 18. 3. Experimental Results 18/22 입력 passage의 수에 대한 성능 비교 입력 문장이 길수록 성능이 좋음 : 문서의 내용을 잘 보존한다. Passage representation 을 위해 Attention을 사용해야 한다.
  • 19. 효율성과 성능을 모두 고려하면, 많은 passage를 사용하도록 학습하고 inference시에는 적은 passage를 사용하는 것이 유리하다 • Trade-off efficiency for effectiveness 입력 passage의 수에 대한 성능 비교 3. Experimental Results 19/22 계산량 증가
  • 20. PARADE가 re-ranking 방법으로서 효과적인가? 3. Experimental Results 20/22 D 1000 1000 Initial ranking (BM25, BM25+RM3) Re-ranking (PARADE) DQE 와 DBM25 가 PARADE에 의해 높은 순위로 올라갔다. DQE: BM25 + RM3 에서 검색했지만 BM25 에서 검색하지 않은 관련 문서 DBM25: BM25에서 검색했지만 BM25 + RM3에서 검색하지 않은 관련 문서
  • 22. Thank you. • Language model기반의 PARADE라는 end-to-end document reranking model 을 새로 선보였다. TREC Robust04 dataset에서 SOTA TREC-COVID challenge에서 second round 1위 • PARADE에 대한 knowledge distillation을 통해 parameter수를 줄이면서 성능은 높이는 것이 가능했다. 4. Conclusions 22/22 • BM25, RM3 등 기존의 initial ranking method의 검색 결과 품질을 re-ranking으로 높이는 것이 가능하다는 것을 보였다. • Transformer를 이용한 passage representation aggregation이 문서의 relevance score를 구하는 좋은 방법임을 보였다.