SlideShare una empresa de Scribd logo
1 de 18
Descargar para leer sin conexión
Search @ Flipkart
Umesh Prasad
Thejus VM
Empowering Consumers discover and find products
Solr/ Lucene Meetup 2 @ Bangalore
Date : July 27, 2013
Outline
● Search Architecture @ Flipkart
● Challenges for E-commerce
○ Diverse Catalogue
○ Availability, Uptime and performance
○ High frequency updates
● Solutions
○ Caching and warm up
○ External Source Fields (Sort, Facet, Filter)
○ Relevance optimizations
Flipkart Search Architecture
Technologies Used
The E-commerce Search Challenge
● Diverse catalogue
○ ~13 million products, ~900 categories
○ What fields to Search
○ How to rank (within category/across categories). Ranking Facets ?
○ tf-idf and vector space model doesn't help
● Performance
○ 99.99 % availability
○ ~1000 qps
○ ~75 ms for Search, ~5 ms for Autosuggest
○ Prefetching data (Conflicts with liveliness)
● High rate of updates
○ Multiple data sources (aggregate, index, commit, replicate)
○ Temporal fields (Price/Availability/SLAs/Offers)
○ Lucene doesn't support partial updates
Addressing - Performance / Latency
● Make Search Faster
○ Use Filters, score only if needed, lazy field loads,
smaller indexes aka sharding
● Caching
○ Solr caches (Type/Sizing/Tuning/Warming)
○ Custom caches
○ Cache warmup on replication and startup
Solr Search Flow
And High Latency Cache
Cache hit is 10X -
50X faster.
Solr Caches
● QueryCache
○ Key = <Lucene Query, Filters, SortFields>
○ Value = Docset(Bitset) / DocList (bitset with score)
○ Caching only a results Window
○ Use : Pagination/repeat queries
● FilterCache
○ Key = Query
○ Value = Docset (maxDoc)
○ Matching / Faceting
● FieldValueCache
○ Key = FieldName
○ Value = <Term,DocSet>
○ Faceting
● DocumentCache
○ Key = docId
○ Value = Fields
Expensive Features
● Facet on Queries
○ Facet.queries
● Grouping
○ ngroups (counting number of groups )
○ facet counting of groups (makes 2nd query)
○ No Cache for Group
● Solution : High Latency Cache
○ Key = All Request Params
○ Value = Full response object
○ Re-generate
How replication Impacts Caching ?
Challenge 3 : High Rate of Updates
● Two Solutions
○ Near real time Indexing / Searching
○ External Fields
● NRT Indexing and searching
○ Softcommits => solr caches invalidated
○ Lot of churn : Document deleted and re-added.
○ No autowarm for document cache
● External Fields
○ Resonates with Horizontal partition (Document level
partitioning)
○ Great for Ephemeral fields (Price/availability/slas)
○ Supports faceting / filter / sorting
External Fields and Relevance Tuning
Sorting on 500 plus Dynamic Fields
● 10 million products * 4 bytes = 38.1 MB
● 38.1 MB * 500 fields = 17.0 GB of Heap Memory
● On replication : 17 * 2 = 34 GB Heap for just FieldCache
BOOM
External Fields
Relevance and Scoring
● Search Page(Query based scoring)
○ Handcrafted boosts to capture retail specific signals
○ User feedback based ranking
○ Turn off - query norm, tf, idf on specific fields
● Browse Page(Non Query based Scoring)
○ Challenge - How do we rank in order to maximize
diversity and still show relevant products
Query Classification
● Rank category for a given query
● Signals
○ Text Scoring
○ Retail signals
○ Click stream data
● Rules Specified over classifications for better
customer experience
Q & A

Más contenido relacionado

La actualidad más candente

Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Lucidworks
 
Rapid Data Analytics @ Netflix
Rapid Data Analytics @ NetflixRapid Data Analytics @ Netflix
Rapid Data Analytics @ Netflix
Data Con LA
 
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Sease
 

La actualidad más candente (20)

Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Precision and Recall
Precision and RecallPrecision and Recall
Precision and Recall
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
 
Elasticsearch V/s Relational Database
Elasticsearch V/s Relational DatabaseElasticsearch V/s Relational Database
Elasticsearch V/s Relational Database
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis LabsRedis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
 
The Case for Graphs in Supply Chains
The Case for Graphs in Supply ChainsThe Case for Graphs in Supply Chains
The Case for Graphs in Supply Chains
 
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer SimonDocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
 
Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020Data platform architecture principles - ieee infrastructure 2020
Data platform architecture principles - ieee infrastructure 2020
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Learn to Rank search results
Learn to Rank search resultsLearn to Rank search results
Learn to Rank search results
 
Rapid Data Analytics @ Netflix
Rapid Data Analytics @ NetflixRapid Data Analytics @ Netflix
Rapid Data Analytics @ Netflix
 
Accelerate Your ML Pipeline with AutoML and MLflow
Accelerate Your ML Pipeline with AutoML and MLflowAccelerate Your ML Pipeline with AutoML and MLflow
Accelerate Your ML Pipeline with AutoML and MLflow
 
How to Build your Training Set for a Learning To Rank Project
How to Build your Training Set for a Learning To Rank ProjectHow to Build your Training Set for a Learning To Rank Project
How to Build your Training Set for a Learning To Rank Project
 
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
Evaluating Your Learning to Rank Model: Dos and Don’ts in Offline/Online Eval...
 
아이템 추천의 다양성을 높이기 위한 후처리 방법(논문 리뷰)
아이템 추천의 다양성을 높이기 위한 후처리 방법(논문 리뷰)아이템 추천의 다양성을 높이기 위한 후처리 방법(논문 리뷰)
아이템 추천의 다양성을 높이기 위한 후처리 방법(논문 리뷰)
 
4.4 text mining
4.4 text mining4.4 text mining
4.4 text mining
 
A Practical Enterprise Feature Store on Delta Lake
A Practical Enterprise Feature Store on Delta LakeA Practical Enterprise Feature Store on Delta Lake
A Practical Enterprise Feature Store on Delta Lake
 
Inverted index
Inverted indexInverted index
Inverted index
 

Destacado

The parsers & test upload
The parsers & test uploadThe parsers & test upload
The parsers & test upload
Anupam Jain
 
Recommendations play @flipkart
Recommendations play @flipkartRecommendations play @flipkart
Recommendations play @flipkart
hava101
 
Nice Docs Finish First - Designing Search Ranking for Fairness at Etsy: Prese...
Nice Docs Finish First - Designing Search Ranking for Fairness at Etsy: Prese...Nice Docs Finish First - Designing Search Ranking for Fairness at Etsy: Prese...
Nice Docs Finish First - Designing Search Ranking for Fairness at Etsy: Prese...
Lucidworks
 
Events, Signals, and Recommendations
Events, Signals, and RecommendationsEvents, Signals, and Recommendations
Events, Signals, and Recommendations
Lucidworks
 

Destacado (20)

Building tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systemsBuilding tiered data stores using aesop to bridge sql and no sql systems
Building tiered data stores using aesop to bridge sql and no sql systems
 
E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres E commerce data migration in moving systems across data centres
E commerce data migration in moving systems across data centres
 
The parsers & test upload
The parsers & test uploadThe parsers & test upload
The parsers & test upload
 
Recommendations play @flipkart
Recommendations play @flipkartRecommendations play @flipkart
Recommendations play @flipkart
 
Strategic recommendations for flipkart
Strategic recommendations for flipkartStrategic recommendations for flipkart
Strategic recommendations for flipkart
 
Nice Docs Finish First - Designing Search Ranking for Fairness at Etsy: Prese...
Nice Docs Finish First - Designing Search Ranking for Fairness at Etsy: Prese...Nice Docs Finish First - Designing Search Ranking for Fairness at Etsy: Prese...
Nice Docs Finish First - Designing Search Ranking for Fairness at Etsy: Prese...
 
Aesop change data propagation
Aesop change data propagationAesop change data propagation
Aesop change data propagation
 
Events, Signals, and Recommendations
Events, Signals, and RecommendationsEvents, Signals, and Recommendations
Events, Signals, and Recommendations
 
Etsy Search: How We Index and Query 26 Million One-of-a-kind Items
Etsy Search: How We Index and Query 26 Million One-of-a-kind ItemsEtsy Search: How We Index and Query 26 Million One-of-a-kind Items
Etsy Search: How We Index and Query 26 Million One-of-a-kind Items
 
Evolving Search Relevancy: Presented by James Strassburg, Direct Supply
Evolving Search Relevancy: Presented by James Strassburg, Direct SupplyEvolving Search Relevancy: Presented by James Strassburg, Direct Supply
Evolving Search Relevancy: Presented by James Strassburg, Direct Supply
 
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, LucidworksYour Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
 
Netflix Global Search - Lucene Revolution
Netflix Global Search - Lucene RevolutionNetflix Global Search - Lucene Revolution
Netflix Global Search - Lucene Revolution
 
It's Just Search: Presented by Erik Hatcher, Lucidworks
It's Just Search: Presented by Erik Hatcher, LucidworksIt's Just Search: Presented by Erik Hatcher, Lucidworks
It's Just Search: Presented by Erik Hatcher, Lucidworks
 
Search At AstraZeneca. An Agile AppStore (search-based apps) Created On A Ric...
Search At AstraZeneca. An Agile AppStore (search-based apps) Created On A Ric...Search At AstraZeneca. An Agile AppStore (search-based apps) Created On A Ric...
Search At AstraZeneca. An Agile AppStore (search-based apps) Created On A Ric...
 
Fusion 3 Overview Webinar
Fusion 3 Overview Webinar Fusion 3 Overview Webinar
Fusion 3 Overview Webinar
 
Solr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg DonovanSolr & Lucene @ Etsy by Gregg Donovan
Solr & Lucene @ Etsy by Gregg Donovan
 
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, FlaxCoffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
 
Webinar: Ecommerce, Rules, and Relevance
Webinar: Ecommerce, Rules, and RelevanceWebinar: Ecommerce, Rules, and Relevance
Webinar: Ecommerce, Rules, and Relevance
 
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
Autocomplete Multi-Language Search Using Ngram and EDismax Phrase Queries: Pr...
 
Webinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks FusionWebinar: Replace Google Search Appliance with Lucidworks Fusion
Webinar: Replace Google Search Appliance with Lucidworks Fusion
 

Similar a Search@flipkart

Procella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at YoutubeProcella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at Youtube
DataWorks Summit
 

Similar a Search@flipkart (20)

Query optimization in Apache Tajo
Query optimization in Apache TajoQuery optimization in Apache Tajo
Query optimization in Apache Tajo
 
Procella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at YoutubeProcella: A fast versatile SQL query engine powering data at Youtube
Procella: A fast versatile SQL query engine powering data at Youtube
 
Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking system
 
Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...Lessons learned from designing a QA Automation for analytics databases (big d...
Lessons learned from designing a QA Automation for analytics databases (big d...
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Centernet
CenternetCenternet
Centernet
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Data Enginering from Google Data Warehouse
Data Enginering from Google Data WarehouseData Enginering from Google Data Warehouse
Data Enginering from Google Data Warehouse
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data Warehouse
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data Warehouse
 
Presto Bangalore Meetup1 Repertoire@Myntra
Presto Bangalore Meetup1 Repertoire@MyntraPresto Bangalore Meetup1 Repertoire@Myntra
Presto Bangalore Meetup1 Repertoire@Myntra
 
Druid
DruidDruid
Druid
 
Improve Presto Architectural Decisions with Shadow Cache
 Improve Presto Architectural Decisions with Shadow Cache Improve Presto Architectural Decisions with Shadow Cache
Improve Presto Architectural Decisions with Shadow Cache
 
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...
 
Enabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with AlluxioEnabling Presto Caching at Uber with Alluxio
Enabling Presto Caching at Uber with Alluxio
 
Journey through high performance django application
Journey through high performance django applicationJourney through high performance django application
Journey through high performance django application
 
Efficient Query Processing Infrastructures
Efficient Query Processing InfrastructuresEfficient Query Processing Infrastructures
Efficient Query Processing Infrastructures
 
Data Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet ReaderData Infra Meetup | ByteDance's Native Parquet Reader
Data Infra Meetup | ByteDance's Native Parquet Reader
 
Faceted Search And Result Reordering
Faceted Search And Result ReorderingFaceted Search And Result Reordering
Faceted Search And Result Reordering
 

Último

Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 

Último (20)

Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 

Search@flipkart

  • 1. Search @ Flipkart Umesh Prasad Thejus VM Empowering Consumers discover and find products Solr/ Lucene Meetup 2 @ Bangalore Date : July 27, 2013
  • 2. Outline ● Search Architecture @ Flipkart ● Challenges for E-commerce ○ Diverse Catalogue ○ Availability, Uptime and performance ○ High frequency updates ● Solutions ○ Caching and warm up ○ External Source Fields (Sort, Facet, Filter) ○ Relevance optimizations
  • 3.
  • 6. The E-commerce Search Challenge ● Diverse catalogue ○ ~13 million products, ~900 categories ○ What fields to Search ○ How to rank (within category/across categories). Ranking Facets ? ○ tf-idf and vector space model doesn't help ● Performance ○ 99.99 % availability ○ ~1000 qps ○ ~75 ms for Search, ~5 ms for Autosuggest ○ Prefetching data (Conflicts with liveliness) ● High rate of updates ○ Multiple data sources (aggregate, index, commit, replicate) ○ Temporal fields (Price/Availability/SLAs/Offers) ○ Lucene doesn't support partial updates
  • 7. Addressing - Performance / Latency ● Make Search Faster ○ Use Filters, score only if needed, lazy field loads, smaller indexes aka sharding ● Caching ○ Solr caches (Type/Sizing/Tuning/Warming) ○ Custom caches ○ Cache warmup on replication and startup
  • 8. Solr Search Flow And High Latency Cache Cache hit is 10X - 50X faster.
  • 9. Solr Caches ● QueryCache ○ Key = <Lucene Query, Filters, SortFields> ○ Value = Docset(Bitset) / DocList (bitset with score) ○ Caching only a results Window ○ Use : Pagination/repeat queries ● FilterCache ○ Key = Query ○ Value = Docset (maxDoc) ○ Matching / Faceting ● FieldValueCache ○ Key = FieldName ○ Value = <Term,DocSet> ○ Faceting ● DocumentCache ○ Key = docId ○ Value = Fields
  • 10. Expensive Features ● Facet on Queries ○ Facet.queries ● Grouping ○ ngroups (counting number of groups ) ○ facet counting of groups (makes 2nd query) ○ No Cache for Group ● Solution : High Latency Cache ○ Key = All Request Params ○ Value = Full response object ○ Re-generate
  • 12. Challenge 3 : High Rate of Updates ● Two Solutions ○ Near real time Indexing / Searching ○ External Fields ● NRT Indexing and searching ○ Softcommits => solr caches invalidated ○ Lot of churn : Document deleted and re-added. ○ No autowarm for document cache ● External Fields ○ Resonates with Horizontal partition (Document level partitioning) ○ Great for Ephemeral fields (Price/availability/slas) ○ Supports faceting / filter / sorting
  • 13. External Fields and Relevance Tuning
  • 14. Sorting on 500 plus Dynamic Fields ● 10 million products * 4 bytes = 38.1 MB ● 38.1 MB * 500 fields = 17.0 GB of Heap Memory ● On replication : 17 * 2 = 34 GB Heap for just FieldCache BOOM
  • 16. Relevance and Scoring ● Search Page(Query based scoring) ○ Handcrafted boosts to capture retail specific signals ○ User feedback based ranking ○ Turn off - query norm, tf, idf on specific fields ● Browse Page(Non Query based Scoring) ○ Challenge - How do we rank in order to maximize diversity and still show relevant products
  • 17. Query Classification ● Rank category for a given query ● Signals ○ Text Scoring ○ Retail signals ○ Click stream data ● Rules Specified over classifications for better customer experience
  • 18. Q & A