Solr implementation

•Descargar como DOC, PDF•

0 recomendaciones•1,916 vistas

OpenSource Connections

Más contenido relacionado

Más de OpenSource Connections

Encores

OpenSource Connections

Test driven relevancy

OpenSource Connections

Smarter search drives value to your business. Delivering search that matches users to the right content is what you care about. But organizations often get stuck getting there. It turns out that you need quite a number of very different ingredients to deliver tremendous search. It can make your head spin! To help you think through where your team is on its road to smarter search, Pugh introduces the maturity model used by OpenSource Connections and walks you through a very concrete method to inventory needed skills and translate that into search roles for your team. He shows how to measure your capabilities in key areas of search to drive better ROI from search.

How To Structure Your Search Team for Success

OpenSource Connections

The right path to making search relevant - Taxonomy Bootcamp London 2019

OpenSource Connections

Payloads and OCR with Solr

OpenSource Connections

Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull

OpenSource Connections

Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison

OpenSource Connections

Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...

OpenSource Connections

Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj

OpenSource Connections

Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...

OpenSource Connections

Over the past year, the POLITICO team has developed a recommendation system for our users, which recommends not only news content to read but also news topics to subscribe to. This talk will discuss our development path, including dead-ends and performance trade-offs. In the end, the team produced a system based on search technology (in our case, Elasticsearch) and refined by machine learning techniques to achieve a balance between personalization and serendipity.

Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl

OpenSource Connections

With the advent of deep learning and algorithms like word2vec and doc2vec, vectors-based representations are increasingly being used in search to represent anything from documents to images and products. However, search engines work with documents made of tokens, and not vectors, and are typically not designed for fast vector matching out of the box. In this talk, I will give an overview of how vectors can be derived from documents to produce a semantic representation of a document that can be used to implement semantic / conceptual search without hurting performance. I will then describe a few different techniques for efficiently searching vector-based representations in an inverted index, including LSH, vector quantization and k-means tree, and compare their performance in terms of speed and relevancy. Finally, I will describe how each technique can be implemented efficiently in a lucene-based search engine such as Solr or Elastic Search.

Haystack 2019 - Search with Vectors - Simon Hughes

OpenSource Connections

To optimally interpret most natural language queries, it is necessary to understand the phrases, entities, commands, and relationships represented or implied within the search. Knowledge graphs serve as useful instantiations of ontologies which can help represent this kind of knowledge within a domain. In this talk, we'll walk through techniques to build knowledge graphs automatically from your own domain-specific content, how you can update and edit the nodes and relationships, and how you can seamlessly integrate them into your search solution for enhanced query interpretation and semantic search. We'll have some fun with some of the more search-centric use cased of knowledge graphs, such as entity extraction, query expansion, disambiguation, and pattern identification within our queries: for example, transforming the query "bbq near haystack" into { filter:["doc_type":"restaurant"], "query": { "boost": { "b": "recip(geodist(38.034780,-78.486790),1,1000,1000)", "query": "bbq OR barbeque OR barbecue" } } } We'll also specifically cover use of the Semantic Knowledge Graph, a particularly interesting knowledge graph implementation available within Apache Solr that can be auto-generated from your own domain-specific content and which provides highly-nuanced, contextual interpretation of all of the terms, phrases and entities within your domain. We'll see a live demo with real world data demonstrating how you can build and apply your own knowledge graphs to power much more relevant query understanding within your search engine.

Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger

OpenSource Connections

For e-commerce applications, matching users with the items they want is the name of the game. If they can't find what they want then how can they buy anything?! Typically this functionality is provided through search and browse experience. Search allows users to type in text and match against the text of the items in the inventory. Browse allows users to select filters and slice-and-dice the inventory down to the subset they are interested in. But with the shift toward mobile devices, no one wants to type anymore - thus browse is becoming dominant in the e-commerce experience. But there's a problem! What if your inventory is not categorized? Perhaps your inventory is user generated or generated by external providers who don't tag and categorize the inventory. No categories and no tags means no browse experience and missed sales. You could hire an army of taxonomists and curators to tag items - but training and curation will be expensive. You can demand that your providers tag their items and adhere to your taxonomy - but providers will buck this new requirement unless they see obvious and immediate benefit. Worse, providers might use tags to game the system - artificially placing themselves in the wrong category to drive more sales. Worst of all, creating the right taxonomy is hard. You have to structure a taxonomy to realistically represent how your customers think about the inventory. Eventbrite is investigating a tantalizing alternative: using a combination of customer interactions and machine learning to automatically tag and categorize our inventory. As customers interact with our platform - as they search for events and click on and purchase events that interest them - we implicitly gather information about how our users think about our inventory. Search text effectively acts like a tag and a click on an event card is a vote for that clicked event is representative of that tag. We are able to use this stream of information as training data for a machine learning classification model; and as we receive new inventory, we can automatically tag it with the text that customers will likely use when searching for it. This makes it possible to better understand our inventory, our supply and demand, and most importantly this allows us to build the browse experience that customers demand. In this talk I will explain in depth the problem space and Eventbrite's approach in solving the problem. I will describe how we gathered training data from our search and click logs, and how we built and refined the model. I will present the output of the model and discuss both the positive results of our work as well as the work left to be done. Those attending this talk will leave with some new ideas to take back to their own business.

Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - Joh...

OpenSource Connections

Recently Elasticsearch has introduced a number of ways to improve search relevance of your documents based on numeric features. In this talk I will present the newly introduced field types of "rank_feature", "rank_features" ,"dense_field", and "sparse_vector" and discuss in what situations and how they can be used to boost scores of your documents. I will also talk about the inner workings of queries based on these fields, and related performance considerations.

Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...

OpenSource Connections

With an increasing amount of relevancy factors, relevancy fine-tuning becomes more complex as changing the impact of factors produces increasingly more unintended side effects. In recent years, there has been a lot of discussion about how learning algorithms can replace manual relevancy fine-tuning in order to manage this complexity. However, discussions about the challenge of relevancy should additionally consider architectural aspects. Especially microservice-based architectures provide many ways to encapsulate and to separate complexities of search solutions, which facilitates optimizing the search as well as locating and fixing problems. Generally, relevancy factors can be assigned to three different groups, each handled at a different stage of the search request processing. The first group contains contextual factors that depend on certain characteristics of a query, such as query-related boosts lifting up top-sellers for queries or category-related boosts to distinguish products from their accessories. Such contextual factors can be handled as a step of the preprocessing of queries. The respective boosting information can simply be appended to the query before it is actually sent to the search engine. Ideally, the normalization of the query is done beforehand. The second group contains factors that are considered for all queries in more or less the same way, e. g. a ranking function basing on keyword occurrences, product topicality or sales in total. Factors related to this group can be handled directly by configuring the search engine. The third group contains situational factors. For instance, a certain product might be a good match for a certain query in general, but for situational circumstances it should not appear among the top five products (e. g. because it is out of stock). Such situational factors can be handled by resorting result sets, after they were returned by the search engine. The handling of the different factors within successive stages of search request processing will be discussed from an architectural perspective. Implications for applying learning algorithms and the implementation of a personalized search will be considered.

Haystack 2019 - Architectural considerations on search relevancy in the conte...

OpenSource Connections

Does your search application include a custom query syntax with various search operators such as Booleans, proximity, term or phrase frequency, capitalization, quoted text or as-is operator, and other advanced operators? Although most search applications offer a natural language-oriented search box, some advanced applications may also offer a custom query syntax for advanced users or automated tasks. The Lucene "classic" query operators that are supported by the Solr edismax query parser (Boolean, phrase with slop, wildcard, etc.) cover a good amount of use cases, but they only get you so far. In this talk, we will explore various strategies to support a custom and advanced query syntax in Solr, covering a spectrum of options from leveraging the out-of-the-box Solr query DSL, to a custom Solr query parser, and hybrid solutions in between. We will identify the options' pros and cons, discuss relevancy considerations, and illustrate the options in Java.

Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...

OpenSource Connections

For a relevance engineer one of the most difficult tasks in the tuning process is to convince others in the organization that this is a joint effort. Even the brightest search guru doesn't get very far when working in isolation, so establishing cross-collaboration through the organization is essential. But how to get there? On top of that, in a large organization a relevance engineer often works on multiple seemingly unrelated search projects. The challenge is not to get drowned in building custom solutions for each project, but to design generic and re-usable strategies which solve many problems at once. In this session we'll discuss how to build a widely supported basis for search quality improvements in an organization. It is full of practical tips and examples which could help you in establishing a cross-functional culture that is optimal for relevance tuning. It also zooms in on an holistic approach of solving multiple equivalent search issues at once.

Haystack 2019 - Establishing a relevance focused culture in a large organizat...

OpenSource Connections

Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...

OpenSource Connections

The New York Times has had search for a long time but 2018 was the year in which the company engaged with relevance in a deep way. The aim of this talk is to share what we've learned as we've increased our search sophistication and some of the challenges we still face. Some of the techniques we've adopted in this past year include offline metrics testing, reflective testing, and user engagement metrics. We now have a process in place to quickly get mappings changes out to production. As a team we now also have a vocabulary for talking about relevance and can use it to discuss trade-offs and goals in conjunction with our metrics. We hope this talk is of use to those who've put off working on search relevance due to fear, uncertainty, or ambivalence. We will talk about how we went from working on everything but search relevance to finally pulling back the curtain on the search system. We hope what we've learned can help others get started.

2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via

OpenSource Connections

Más de OpenSource Connections (20)