Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Adversarial and reinforcement learning-based approaches to information retrieval

495 visualizaciones

Publicado el

Traditionally, machine learning based approaches to information retrieval have taken the form of supervised learning-to-rank models. Recently, other machine learning approaches—such as adversarial learning and reinforcement learning—have started to find interesting applications in retrieval systems. At Bing, we have been exploring some of these methods in the context of web search. In this talk, I will share couple of our recent work in this area that we presented at SIGIR 2018.

Publicado en: Tecnología
  • Sé el primero en comentar

Adversarial and reinforcement learning-based approaches to information retrieval

  1. 1. ADVERSARIAL AND REINFORCEMENT LEARNING BASED APPROACHES TO INFORMATION RETRIEVAL Bhaskar Mitra Principal Applied Scientist, Microsoft AI & Research Joint work with Daniel Cohen, Katja Hofmann, W. Bruce Croft, Corby Rosset, Damien Jose, Gargi Ghosh, and Saurabh Tiwary SIGIR 2018 | Ann Arbor, Michigan
  2. 2. Today’s topics: two SIGIR 2018 short papers Awarded SIGIR 2018 Best Short Paper https://arxiv.org/abs/1805.03403 https://arxiv.org/abs/1804.04410
  3. 3. Cross Domain Regularization for Neural Ranking Models Using Adversarial Learning Daniel Cohen, Bhaskar Mitra, Katja Hofmann, W. Bruce Croft https://arxiv.org/abs/1805.03403
  4. 4. Clever Hans was a horse claimed to have been capable of performing arithmetic and other intellectual tasks. "If the eighth day of the month comes on a Tuesday, what is the date of the following Friday?“ Hans would answer by tapping his hoof. In fact, the horse was purported to have been responding directly to involuntary cues in the body language of the human trainer, who had the faculties to solve each problem. The trainer was entirely unaware that he was providing such cues. (source: Wikipedia)
  5. 5. Duet model for document ranking (2017) Latent representation learning models (e.g., duet and DSSM) “memorize” relationships between term and entities
  6. 6. Today Recent In older (1990s) TREC data Query: uk prime minister
  7. 7. Cross domain performance is an important requirement in many IR scenarios–e.g., 1. Bing (across markets) 2. Enterprise search (across tenants)
  8. 8. BM25 vs. Inverse document frequency of terms( ) Duet Embeddings containing noisy co-occurrence information ( ) What corpus statistics do they depend on?
  9. 9. Problem setup domain A domain B domain C domain X training domains test domain
  10. 10. The distributed sub-model of duet Projects query and document to latent space for matching Additional fully-connected layers to estimate relevance Hidden layers may encode domain specific statistics convolution and pooling layers convolution and pooling layers hadamard product dense layers 𝑦 query doc How do we encourage the model to only learn features that generalize across multiple domains?
  11. 11. The distributed sub-model of duet Train model on multiple domains During training, an adversarial discriminator inspects the hidden states of the model and tries to predict the source corpus of the training sample convolution and pooling layers convolution and pooling layers hadamard product dense layers adversarial discriminator (dense) 𝑧 𝑦 query doc The duet model, in addition to optimizing for the ranking loss, also tries to “fool” the adversarial discriminator – and in the process learns more domain independent representations
  12. 12. Additional regularization for the ranking loss
  13. 13. Additional regularization for the ranking loss query relevant document non-relevant document parameters of the adversarial discriminator parameters of the ranking model
  14. 14. Additional regularization for the ranking loss
  15. 15. Gradient reversal Reverse the gradient from the discriminator when back-propagating through the ranking model convolution and pooling layers convolution and pooling layers hadamard product dense layers adversarial discriminator (dense) 𝑧 𝑦 query doc ≈ ≈
  16. 16. Results: Yahoo Webscope L4 topics In-domain (large) ≫ Out-of-domain + adversarial ≫ Out-of-domain ≫ In-domain (small)
  17. 17. Results: cross collection Out-of-domain + Adversarial ≫ Out-of-domain
  18. 18. There are other challenges with depending too heavily on cooccurrence patterns
  19. 19. Adversarial regularization may also be useful for mitigating such issues
  20. 20. Optimizing Query Evaluations using Reinforcement Learning for Web Search Corby Rosset, Damien Jose, Gargi Ghosh, Bhaskar Mitra, and Saurabh Tiwary https://arxiv.org/abs/1804.04410
  21. 21. Large scale IR systems trade-off search result quality and query response time In Bing, we have a candidate generation stage followed by multiple rank and prune stages Typically, we apply machine learning in the re-ranking stages In this work, we explore reinforcement learning for effective and efficient candidate generation
  22. 22. In Bing, the index is distributed over multiple machines For candidate generation, on each machine the documents are linearly scanned using a match plan
  23. 23. When a query comes in, it is automatically categorized and a pre-defined match plan is selected A match plan consists of a sequence of match rules, and corresponding stopping criteria A match rule defines the condition that a document should satisfy to be selected as a candidate The stopping criteria decides when the index scan using a particular match rule should terminate—and if the matching process should continue with the next match rule, or conclude, or reset to the beginning of the index
  24. 24. Match plans influence the trade-off between effectiveness and efficiency E.g., long queries with rare intents may require expensive match plans that consider body text and search deeper into the index In contrast, for popular navigational queries a shallow scan against URL and title metastreams may be sufficient
  25. 25. E.g., Query: halloween costumes Match rule: mrA → (halloween ∈ A|U|B|T ) ∧ (costumes ∈ A|U|B|T ) Query: facebook login Match rule: mrB → (facebook ∈ U|T )
  26. 26. During execution, two accumulators are tracked u: the number of blocks accessed from disk v: the cum. number of term matches in all inspected documents A stopping criteria sets thresholds for each – when either thresholds are met, the scan using that particular match rule terminates Matching may then continue with a new match rule, or terminate, or re-start from beginning
  27. 27. Typically these match plans are hand-crafted and statically assigned to different query categories In this work, we cast match planning as a reinforcement learning task
  28. 28. Reinforcement learning environment action reward agent state
  29. 29. Reinforcement learning (for Bing candidate generation) index match rule relevance discounted by index blocks accessed agent accumulators (u, v)
  30. 30. Reinforcement learning (for Bing candidate generation) Learn a policy πθ : S → A which maximizes the cumulative discounted reward R Where, γ is the discount rate index match rule relevance discounted by index blocks accessed agent accumulators (u, v)
  31. 31. Reinforcement learning (for Bing candidate generation) We use table based Q learning State space: discrete <ut, vt> Action space: index match rule relevance discounted by index blocks accessed agent accumulators (u, v)
  32. 32. Reinforcement learning (for Bing candidate generation) Reward function: g(di) is the relevance of the ith document estimated based on the subsequent L1 ranker score— considering only top n documents index match rule relevance discounted by index blocks accessed agent accumulators (u, v)
  33. 33. Reinforcement learning (for Bing candidate generation) Final reward: If no new documents are selected, we assign a small negative reward index match rule relevance discounted by index blocks accessed agent accumulators (u, v)
  34. 34. Results
  35. 35. Conclusions Traditionally, ML models consumer more time and resources to improve quality of retrieved results In this work, we argue that ML based approaches can help improve our response time Milliseconds saved can translate to material cost savings in query serving infrastructure or can be re-purposed by upstream systems to provide better end-user experience
  36. 36. THANK YOU! Blog post: https://www.microsoft.com/en- us/research/blog/adversarial-and-reinforcement- learning-based-approaches-to-information-retrieval/

×