Presented by David Giffin, Software Engineer, Etsy - See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012
Search at Etsy poses significant challenges. Our marketplace is filled with millions of unique, short-lived items and people trying to find them over 13 million times a day. In this session we'll discuss many of the solutions we've engineered to meet these challenges including, the evolution of indexing at Etsy, how HBase and Hadoop have taken indexing from hours to minutes, how and why we use bittorrent for Solr replication, how we track search performance, our approach to shave crucial milliseconds off every search, and an overview of our continuous deployment strategy, web / search config integration and A/B testing and analytics.
47. HBase + Hadoop Indexing
Solr
Disk
•Solr Document Converter
Output Format
•Solr Requires Posix Disk
HDFS •Index Copied Back to HDFS
Thursday, May 10, 12
48. HBase + Hadoop Indexing
•Not Great with Multi-Core Configs
•Added Solr Multi-Core Support
• Solr Config Issues
•Added ENV support for Configs
•Uses “new” style Hadoop API
•Added Support for both Old and New
Thursday, May 10, 12
49. HBase + Hadoop Indexing
SolrInputDocumentWritable
public class SolrInputDocumentWritable extends SolrInputDocument
implements org.apache.hadoop.io.Writable {
Thursday, May 10, 12
54. HBase + Hadoop Indexing
IndexerActionMain
Thursday, May 10, 12
55. HBase + Hadoop Indexing
Deployinator
Thursday, May 10, 12
56. HBase + Hadoop Indexing
IndexCompare
Thursday, May 10, 12
57. HBase + Hadoop Indexing
$ ./compare
ERROR: please provide two index directories
example: ./compare -p 0.1 -i user_id ./index ./index-1332867952588
options:
-p --percent= percent of the index to check
-i --id= primary key id field in the index
-h --hash= comparison or hash field in the index
<index> <index>
Thursday, May 10, 12
58. HBase + Hadoop Indexing
$ ./compare
/search/data/person/index-1332867952588/
/search/data/person/index-1335378487672
id field: user_id
hash field: hash
percentage: 0.0010
files: /search/data/person/index-1332867952588/ /search/
data/person/index-1335378487672
/search/data/person/index-1332867952588 contains 1515512 docs
/search/data/person/index-1335378487672 contains 14837972 docs
1516 of 1516 documents are the same
Thursday, May 10, 12
59. HBase + Hadoop Indexing
Copy and Merge
Thursday, May 10, 12
60. HBase + Hadoop Indexing
Open Source
Thursday, May 10, 12
71. Replication
Fork of TTorent: https://github.com/etsy/ttorrent
Multi-File Support
Large File Support
Fork BitTorrent: Comming Soon
Thursday, May 10, 12