SlideShare una empresa de Scribd logo
1 de 33
OCTOBER 11-14, 2016 • BOSTON, MA
Near Real time Indexing
Building Real Time Search Index For E-Commerce
Umesh Prasad
Tech Lead @ Flipkart
Thejus V M
Data Architect @ Flipkart
Agenda
• Search @ Flipkart
• Need for Real Time Search
• SolrCloud Solution
• Our approach
• Q & A
Traffic @ Flipkart
• Peak Traffic
– ~ 800K active users
– ~ 160K requests per second
• Search Traffic
– ~ 40K searches per second (Service)
– ~ 10K searches per second (Solr )
• Latency
– Median : 11 ms
– 99th percentile : 1.1 second
Search @ Flipkart
• Catalogue
– ~ 50 main categories
– ~ 5000 sub-categories
– ~ 231 million documents
– ~ 90 million SKUs
– ~ 160 million listings
• E-commerce Marketplace
– ~ 100K Sellers
– Local Sellers
– Regional Availability
– Logistics Constraints
E-commerce Search
• Heavy usage of drill down filters
• Heavy usage of faceting
• Only top results matter
• Results grouped/collapsed by products
• Serviceability and delivery experience MATTERS
Agenda
• Search @ Flipkart
• Need for Real Time Search
• SolrCloud Solution
• Our approach
• Q & A
Sorry, Stock Over !!?
Damn !! Is Offer Over ??
What !! All Steal Deals Gone ??
Product /Listing: Important Attributes
Seller
Rating
Service
catalogue
service
Promise
Service
Availability
Service
Offer
Service
Pricing
Service
Product aka SKU
Listings
Summary : Lucene Document
• Product/SKU (Parent Document)
– Listing (Child Document)
• Query : Mostly SKU Attributes (Free Text)
• Filters : SKU + Listing Attributes (Drill Down)
• Ranking : SKU + Listing Attributes
(Explicit/Relevance)
• Index Time Join aka Block Join (Best
Performance)
Out Of Stock, but Why Show?
Index has Stale
Availability Data
234K
Products
Challenge 1 : High Update Rates
updates / sec updates /hr
normal Peak
text / catalogue ~10 ~100 ~100K
pricing ~100 ~1K ~10 million
availability ~100 ~10K ~10 million
offer ~100 ~10K ~10 million
seller rating ~10 ~1K ~1 million
signal 6 ~10 ~100 ~1 million
signal 7 ~100 ~10K ~10 million
signal 8 ~100 ~10K ~10 million
Challenge 2 : Micro Services
Ingestion pipeline
Catalogue Pricing Availability Offers ...
Document Builder
Solr/Lucene
Change
Propagation
Documents
{L1,L2 … P1}
Updates Stream 1
Updates Stream 2
Updates Stream 3
● Lucene doesn’t support Partial Updates
● Update = Delete + Add
Agenda
• Search @ Flipkart
• Need for Real Time Search
• SolrCloud Solution
• Our approach
• Q & A
SolrCloud for NRT
Shard
Replica
Shard
Replica
Shard
Replica
Shard
Replica
Shard
Replica
Shard
Replica
Re-open
searcher
Re-open
searcher
Re-open
searcher
Re-open
searcher
Re-open
searcher
Re-open
searcher
Ingestion pipeline Shard
Leader
Auto commit
Soft Commit
Batch of
documents
For Document
Versioning
Update Log
Forward to Replica
SolrCloud Evaluation
• Update = Delete + Add
– Block Join Index ⇒ Update Whole Block (Product + Listings)
• Updated Document gets streamed to all replicas in sync
– Reduces indexing throughput
• Soft commit is Not Free
– Soft commit ⇒ In Memory Segment
– Lots of Merges
– Huge document churn / deletes
– All caches still need to be re-generated
– Filter Cache miss specially hurts performance
Agenda
• Search @ Flipkart
• Need for Real Time Index
• SolrCloud Solution
•Our approach
• Q & A
ProductA
brand : Apple
availability : T
price : 45000
ProductB
brand : Samsung
availability : T
price : 23000
ProductC
brand : Apple
availability : F
price : 5000
Document ID
Mappings
Posting List
(Inverted Index)
DocValues
(columunar data)
Lucene Segment
Lucene Index
0 ProductA
1 ProductB
2 ProductC
45000 23000 5000Price
availability : T
brand : Samsung
brand : Apple 0 , 2
1
0 , 1
Terms
Sparse
Bitsets
A Typical Search Flow
Query Rewrite
Results
Query
Matching
Ranking Faceting
Stats
Posting List
Doc Values
Other
Components
Lucene Segment
Inverted Index
Forward Index
NRT Store
samsung mobiles
Offer : exchange offer
price desc
category : mobiles
brand : samsung
Offer : exchange offer
NRT Forward Index - Considerations
● Lookup efficiency
– 50th percentile : ~10K matches
– 99th percentile : ~1 million matches
● Data on Java heap
– Memory efficiency
NRT Forward Index - Naive Implementation
NRT Forward IndexLucene Segment
Lookup Engine
0 ProductB
1 ProductA
2 ProductC
3 ProductD
ProductD
ProductA
ProductB
ProductC
ProductD
True
False
False
True
100
150
200
250
ProductId(3) <ProductD,price>
DocId : 3
field: price
250
ProductId Availability Price
Latency : ~10 secs for ~1 Million
lookups
NRT Store - Forward Index Optimized
Lookup Engine
Lucene Segment
0 ProductB
1 ProductA
2 ProductC
3 ProductD
DocId : 3
Field : price
250
DocId - NrtId
0
1
2
3
3
0
1
2
NrtId(3)
2
Price(2
)
NRT Forward Index (Segment Independent)
100 200 250 150Price
0 ProductA
1 ProductC
2 ProductD
3 ProductB
Availability T F F T
Status 01 10 01 00
Latency : ~100 ms for ~1 Million lookups
NRT Store Filter - PostFilter
PostFilter(Price:[100 TO 150])
Lucene Segment
0 ProductB
1 ProductA
2 ProductC
3 ProductD
DocId : 3
Don’t
Delegate
DocId - NrtId
0
1
2
3
3
0
1
2
NrtId(3)
2
Price(2
)
NRT Forward Index (Segment Independent)
100 200 250 150Price
0 ProductA
1 ProductC
2 ProductD
3 ProductB
Availability T F F T
Status 01 10 01 00
NRT Filter
NRT Store - Invert index
NRT Forward StoreNRT Inverter
Lucene Segment
0 ProductB
1 ProductA
2 ProductC
3 ProductD
NRT DocIdSet Cache
Availability : T 0 3
Offer : O1 2 3
Offer:O1 DocIdSet
Solr Integration Points
• ValueSources
• Filtering
– Custom Filter Implementation for cached DocIdSet
– Custom PostFilter
• Query
– Wrapper over Filter
• Custom FacetComponent
Near Real Time Solr Architecture
Solr
Kafka
Ingestion pipeline
NRT Forward
Index
Ranking
Matching
Faceting
Redis
Bootstrap
NRT Inverted
store
Solr Master
NRT Updates
Lucene Updates
Catalogue
Pricing
Availability
Offers
Seller
Quality
Commit
+
Replicate
+
Reopen
Lucene
Others
Accomplishments
• Real time sorting
• Real time filtering : PostFilter
– Higher latency
• Near real time filtering : cached DocIdSet
– No consistency between lookup and filtering
• Independent of lucene commits
• Query latency comparable to DocValues
– Consistent 99% performance
Accomplishments @ Flipkart
● Real time consumption for ~150 Signals
● Reduction in shown out of stock products by 2X
● Production instances of ~50K updates/second real time
Thank you
&
Questions

Más contenido relacionado

La actualidad más candente

Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScaleSeunghyun Lee
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearchhypto
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino ProjectMartin Traverso
 
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Spark Summit
 
Google Vertex AI
Google Vertex AIGoogle Vertex AI
Google Vertex AIVikasBisoi
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...HostedbyConfluent
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Upfoundsearch
 
Neural Search Comes to Apache Solr
Neural Search Comes to Apache SolrNeural Search Comes to Apache Solr
Neural Search Comes to Apache SolrSease
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewenconfluent
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into ElasticsearchKnoldus Inc.
 
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...Institute of Contemporary Sciences
 
Elasticsearch 엘라스틱서치 (검색서비스) 에 대해 알아보자.txt
Elasticsearch 엘라스틱서치 (검색서비스) 에 대해 알아보자.txtElasticsearch 엘라스틱서치 (검색서비스) 에 대해 알아보자.txt
Elasticsearch 엘라스틱서치 (검색서비스) 에 대해 알아보자.txt용진 조
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?lucenerevolution
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic IntroductionMayur Rathod
 
Securing the Elastic Stack for free
Securing the Elastic Stack for freeSecuring the Elastic Stack for free
Securing the Elastic Stack for freeElasticsearch
 
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarScalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarDatabricks
 
『個客』 視点の行動分析がウェブビジネスを変える!
『個客』 視点の行動分析がウェブビジネスを変える!『個客』 視点の行動分析がウェブビジネスを変える!
『個客』 視点の行動分析がウェブビジネスを変える!Akihiko Uchino
 
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...DataStax
 
Haystack 2019 - Making the case for human judgement relevance testing - Tara ...
Haystack 2019 - Making the case for human judgement relevance testing - Tara ...Haystack 2019 - Making the case for human judgement relevance testing - Tara ...
Haystack 2019 - Making the case for human judgement relevance testing - Tara ...OpenSource Connections
 
Elasticsearch와 Python을 이용하여 맨땅에서 데이터 분석하기
Elasticsearch와 Python을 이용하여 맨땅에서 데이터 분석하기Elasticsearch와 Python을 이용하여 맨땅에서 데이터 분석하기
Elasticsearch와 Python을 이용하여 맨땅에서 데이터 분석하기흥래 김
 

La actualidad más candente (20)

Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
 
Introduction to elasticsearch
Introduction to elasticsearchIntroduction to elasticsearch
Introduction to elasticsearch
 
State of the Trino Project
State of the Trino ProjectState of the Trino Project
State of the Trino Project
 
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
 
Google Vertex AI
Google Vertex AIGoogle Vertex AI
Google Vertex AI
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Up
 
Neural Search Comes to Apache Solr
Neural Search Comes to Apache SolrNeural Search Comes to Apache Solr
Neural Search Comes to Apache Solr
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
Deep Dive Into Elasticsearch
Deep Dive Into ElasticsearchDeep Dive Into Elasticsearch
Deep Dive Into Elasticsearch
 
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
Improving Search Relevance in Elasticsearch Using Machine Learning - Milorad ...
 
Elasticsearch 엘라스틱서치 (검색서비스) 에 대해 알아보자.txt
Elasticsearch 엘라스틱서치 (검색서비스) 에 대해 알아보자.txtElasticsearch 엘라스틱서치 (검색서비스) 에 대해 알아보자.txt
Elasticsearch 엘라스틱서치 (검색서비스) 에 대해 알아보자.txt
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
Securing the Elastic Stack for free
Securing the Elastic Stack for freeSecuring the Elastic Stack for free
Securing the Elastic Stack for free
 
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh BhatnagarScalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
 
『個客』 視点の行動分析がウェブビジネスを変える!
『個客』 視点の行動分析がウェブビジネスを変える!『個客』 視点の行動分析がウェブビジネスを変える!
『個客』 視点の行動分析がウェブビジネスを変える!
 
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
Netflix Recommendations Using Spark + Cassandra (Prasanna Padmanabhan & Roopa...
 
Haystack 2019 - Making the case for human judgement relevance testing - Tara ...
Haystack 2019 - Making the case for human judgement relevance testing - Tara ...Haystack 2019 - Making the case for human judgement relevance testing - Tara ...
Haystack 2019 - Making the case for human judgement relevance testing - Tara ...
 
Elasticsearch와 Python을 이용하여 맨땅에서 데이터 분석하기
Elasticsearch와 Python을 이용하여 맨땅에서 데이터 분석하기Elasticsearch와 Python을 이용하여 맨땅에서 데이터 분석하기
Elasticsearch와 Python을 이용하여 맨땅에서 데이터 분석하기
 

Similar a Near RealTime search @Flipkart

near real time search in e-commerce
near real time search in e-commerce  near real time search in e-commerce
near real time search in e-commerce Umesh Prasad
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Peter Bakas
 
Data monstersrealtimeetl new
Data monstersrealtimeetl newData monstersrealtimeetl new
Data monstersrealtimeetl newGreenM
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
 
Drinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsDrinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsSamantha Quiñones
 
TAUS Machine Translation Showcase, Machine Translation at eBay, eBay, 2014
TAUS Machine Translation Showcase, Machine Translation at eBay, eBay, 2014TAUS Machine Translation Showcase, Machine Translation at eBay, eBay, 2014
TAUS Machine Translation Showcase, Machine Translation at eBay, eBay, 2014TAUS - The Language Data Network
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Lucidworks
 
Stream Processing in Uber
Stream Processing in UberStream Processing in Uber
Stream Processing in UberC4Media
 
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...Amazon Web Services
 
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...Lucidworks
 
#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitterTwitter Developers
 
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02Valerie Akinson Brown
 
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02Valerie Akinson Brown
 
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02BigDataCamp
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitinbloomreacheng
 
Real Time Insights for Advertising Tech
Real Time Insights for Advertising TechReal Time Insights for Advertising Tech
Real Time Insights for Advertising TechApache Apex
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Nitin S
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationNitin Sharma
 
Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure bloomreacheng
 

Similar a Near RealTime search @Flipkart (20)

near real time search in e-commerce
near real time search in e-commerce  near real time search in e-commerce
near real time search in e-commerce
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
 
Data monstersrealtimeetl new
Data monstersrealtimeetl newData monstersrealtimeetl new
Data monstersrealtimeetl new
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Drinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time MetricsDrinking from the Firehose - Real-time Metrics
Drinking from the Firehose - Real-time Metrics
 
TAUS Machine Translation Showcase, Machine Translation at eBay, eBay, 2014
TAUS Machine Translation Showcase, Machine Translation at eBay, eBay, 2014TAUS Machine Translation Showcase, Machine Translation at eBay, eBay, 2014
TAUS Machine Translation Showcase, Machine Translation at eBay, eBay, 2014
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
 
Stream Processing in Uber
Stream Processing in UberStream Processing in Uber
Stream Processing in Uber
 
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
AWS re:Invent 2016: How DataXu scaled its Attribution System to handle billio...
 
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
Thoth - Real-time Solr Monitor and Search Analysis Engine: Presented by Damia...
 
#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter#TwitterRealTime - Real time processing @twitter
#TwitterRealTime - Real time processing @twitter
 
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02
 
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02
 
2 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.022 one spot redshift bigdatacamp 1.02
2 one spot redshift bigdatacamp 1.02
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
 
Real Time Insights for Advertising Tech
Real Time Insights for Advertising TechReal Time Insights for Advertising Tech
Real Time Insights for Advertising Tech
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin Presentation
 
Scalable IoT platform
Scalable IoT platformScalable IoT platform
Scalable IoT platform
 
Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure Bloomreach - BloomStore Compute Cloud Infrastructure
Bloomreach - BloomStore Compute Cloud Infrastructure
 

Último

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 

Último (20)

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 

Near RealTime search @Flipkart

  • 1. OCTOBER 11-14, 2016 • BOSTON, MA
  • 2. Near Real time Indexing Building Real Time Search Index For E-Commerce Umesh Prasad Tech Lead @ Flipkart Thejus V M Data Architect @ Flipkart
  • 3. Agenda • Search @ Flipkart • Need for Real Time Search • SolrCloud Solution • Our approach • Q & A
  • 4.
  • 5.
  • 6. Traffic @ Flipkart • Peak Traffic – ~ 800K active users – ~ 160K requests per second • Search Traffic – ~ 40K searches per second (Service) – ~ 10K searches per second (Solr ) • Latency – Median : 11 ms – 99th percentile : 1.1 second
  • 7. Search @ Flipkart • Catalogue – ~ 50 main categories – ~ 5000 sub-categories – ~ 231 million documents – ~ 90 million SKUs – ~ 160 million listings • E-commerce Marketplace – ~ 100K Sellers – Local Sellers – Regional Availability – Logistics Constraints
  • 8. E-commerce Search • Heavy usage of drill down filters • Heavy usage of faceting • Only top results matter • Results grouped/collapsed by products • Serviceability and delivery experience MATTERS
  • 9. Agenda • Search @ Flipkart • Need for Real Time Search • SolrCloud Solution • Our approach • Q & A
  • 11. Damn !! Is Offer Over ??
  • 12. What !! All Steal Deals Gone ??
  • 13. Product /Listing: Important Attributes Seller Rating Service catalogue service Promise Service Availability Service Offer Service Pricing Service Product aka SKU Listings
  • 14. Summary : Lucene Document • Product/SKU (Parent Document) – Listing (Child Document) • Query : Mostly SKU Attributes (Free Text) • Filters : SKU + Listing Attributes (Drill Down) • Ranking : SKU + Listing Attributes (Explicit/Relevance) • Index Time Join aka Block Join (Best Performance)
  • 15. Out Of Stock, but Why Show? Index has Stale Availability Data 234K Products
  • 16. Challenge 1 : High Update Rates updates / sec updates /hr normal Peak text / catalogue ~10 ~100 ~100K pricing ~100 ~1K ~10 million availability ~100 ~10K ~10 million offer ~100 ~10K ~10 million seller rating ~10 ~1K ~1 million signal 6 ~10 ~100 ~1 million signal 7 ~100 ~10K ~10 million signal 8 ~100 ~10K ~10 million
  • 17. Challenge 2 : Micro Services Ingestion pipeline Catalogue Pricing Availability Offers ... Document Builder Solr/Lucene Change Propagation Documents {L1,L2 … P1} Updates Stream 1 Updates Stream 2 Updates Stream 3 ● Lucene doesn’t support Partial Updates ● Update = Delete + Add
  • 18. Agenda • Search @ Flipkart • Need for Real Time Search • SolrCloud Solution • Our approach • Q & A
  • 20. SolrCloud Evaluation • Update = Delete + Add – Block Join Index ⇒ Update Whole Block (Product + Listings) • Updated Document gets streamed to all replicas in sync – Reduces indexing throughput • Soft commit is Not Free – Soft commit ⇒ In Memory Segment – Lots of Merges – Huge document churn / deletes – All caches still need to be re-generated – Filter Cache miss specially hurts performance
  • 21. Agenda • Search @ Flipkart • Need for Real Time Index • SolrCloud Solution •Our approach • Q & A
  • 22. ProductA brand : Apple availability : T price : 45000 ProductB brand : Samsung availability : T price : 23000 ProductC brand : Apple availability : F price : 5000 Document ID Mappings Posting List (Inverted Index) DocValues (columunar data) Lucene Segment Lucene Index 0 ProductA 1 ProductB 2 ProductC 45000 23000 5000Price availability : T brand : Samsung brand : Apple 0 , 2 1 0 , 1 Terms Sparse Bitsets
  • 23. A Typical Search Flow Query Rewrite Results Query Matching Ranking Faceting Stats Posting List Doc Values Other Components Lucene Segment Inverted Index Forward Index NRT Store samsung mobiles Offer : exchange offer price desc category : mobiles brand : samsung Offer : exchange offer
  • 24. NRT Forward Index - Considerations ● Lookup efficiency – 50th percentile : ~10K matches – 99th percentile : ~1 million matches ● Data on Java heap – Memory efficiency
  • 25. NRT Forward Index - Naive Implementation NRT Forward IndexLucene Segment Lookup Engine 0 ProductB 1 ProductA 2 ProductC 3 ProductD ProductD ProductA ProductB ProductC ProductD True False False True 100 150 200 250 ProductId(3) <ProductD,price> DocId : 3 field: price 250 ProductId Availability Price Latency : ~10 secs for ~1 Million lookups
  • 26. NRT Store - Forward Index Optimized Lookup Engine Lucene Segment 0 ProductB 1 ProductA 2 ProductC 3 ProductD DocId : 3 Field : price 250 DocId - NrtId 0 1 2 3 3 0 1 2 NrtId(3) 2 Price(2 ) NRT Forward Index (Segment Independent) 100 200 250 150Price 0 ProductA 1 ProductC 2 ProductD 3 ProductB Availability T F F T Status 01 10 01 00 Latency : ~100 ms for ~1 Million lookups
  • 27. NRT Store Filter - PostFilter PostFilter(Price:[100 TO 150]) Lucene Segment 0 ProductB 1 ProductA 2 ProductC 3 ProductD DocId : 3 Don’t Delegate DocId - NrtId 0 1 2 3 3 0 1 2 NrtId(3) 2 Price(2 ) NRT Forward Index (Segment Independent) 100 200 250 150Price 0 ProductA 1 ProductC 2 ProductD 3 ProductB Availability T F F T Status 01 10 01 00
  • 28. NRT Filter NRT Store - Invert index NRT Forward StoreNRT Inverter Lucene Segment 0 ProductB 1 ProductA 2 ProductC 3 ProductD NRT DocIdSet Cache Availability : T 0 3 Offer : O1 2 3 Offer:O1 DocIdSet
  • 29. Solr Integration Points • ValueSources • Filtering – Custom Filter Implementation for cached DocIdSet – Custom PostFilter • Query – Wrapper over Filter • Custom FacetComponent
  • 30. Near Real Time Solr Architecture Solr Kafka Ingestion pipeline NRT Forward Index Ranking Matching Faceting Redis Bootstrap NRT Inverted store Solr Master NRT Updates Lucene Updates Catalogue Pricing Availability Offers Seller Quality Commit + Replicate + Reopen Lucene Others
  • 31. Accomplishments • Real time sorting • Real time filtering : PostFilter – Higher latency • Near real time filtering : cached DocIdSet – No consistency between lookup and filtering • Independent of lucene commits • Query latency comparable to DocValues – Consistent 99% performance
  • 32. Accomplishments @ Flipkart ● Real time consumption for ~150 Signals ● Reduction in shown out of stock products by 2X ● Production instances of ~50K updates/second real time