SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
Building Search@Airbnb
Mousom Dhar Gupta
Total Guests
20,000,000+
Countries
190
Cities
34,000+
Castles
600+
Listings Worldwide
1,200,000+
Search
That Awesome Slide Title of Yours
Technical Stack
____________________________
DropWizard as a service framework (incl. Jetty, Jersey, Jackson)
ZooKeeper (via Smartstack) for service discovery. 
Lucene for index storage and simple retrieval. 
In-house built forward index, real-time indexing, ranking,
advanced filtering.
Web App
Search1
150 Search Threads
Lucene Index
~30 replicas of same index
dataJVM
…Search2 SearchN
Search
Overview
search
Lucene
Lucene
Lucene
Lucene
Lucene
Lucene
Lucene
Lucene
Combiner
Filtering 
and
Ranking
Shards
____________________________
Each box has 8 shards of Lucene Index
Latency is 50% less than a single shard index
Challenges
____________________________
Bootstrap (creating the index from scratch)
Ensuring consistency of the index with ground truth data in real time
Indexing
What’s in the Lucene index?
____________________________
Positions of listings indexed using Lucene’s spatial module
(RecursivePrefixTreeStrategy)
Categorical and numerical properties like room type and maximum occupancy
Full text (descriptions, reviews, etc.)
~40 fields per listing from a variety of data sources, all updated in real time
fraud
SpinalTap
…
calendar
master
DataStore
Medusa
Search 1
Search N
Search 2
Realtime Update
Tails binary update logs from Mysql Servers (5.6+)
Converts changes in any of the tables into actionable objects called
“Mutations” (Inserts, deletes, Updates)
Broadcasts them to Medusa using Kafka
Spinaltap
fraud
SpinalTap
…
calenda
r
master
DataStore
Medusa
Search 1
Search N
Search 2
Realtime Update
Source of truth for search index data.
Listens to updates from Spinaltap and builds new IndexData by
querying ~15 mysql tables from three different databases.
Persists everything in a DataStore and broadcasts latest version to all
search nodes.
Uses ZooKeeper for leader election.
Medusa
fraud
SpinalTap
…
calenda
r
master
DataStore
Medusa
Search 1
Search N
Search 2
Realtime Update
What’s in the forward index?
____________________________
Holds all the metadata about a listing required by
scoring and filtering.
We also have complicated business rules to calculate
Price, Availability, InstantBook etc which needs a ton of
metadata.
~50 fields built from multiple data source and updated
in realtime.
public final class ForwardIndexData {	
private final CalendarData calendarData;	
private final PricingData pricingData;	
private final HostInfo hostInfo;	
. . . .	
. . . .	
}	
!
public final class CalendarData {	
private final DateRanges reservationDates;	
private final SeasonalValues startDayOfWeeks;	
. . . .	
}	
!
private final class SeasonalValues<T> {	
private final DateRange startDate;	
private final T value;	
. . . .	
}	
Forward Index
Availability
____________________________
!
Depends on the profile of guest.
The checkin date must be one of the valid start days of the week.
Must satisfy seasonal minimum nights.
There must be enough preparation time for the host.
Import busy dates from external calendars to avoid booking conflict.
Pricing
____________________________
!
Depends on number of guests , number of nights.
How close or further away the checkin date is.
How long is the trip, does the host have Weekly and Monthly pricing.
Is there special price override for these nights.
Instant Book
____________________________
!
Depends on number of guests , number of nights.
Profile of the guest like positive reviews, does have profile photo?
How much preparation time the host has etc.
Needs to store objects with 50-100 fields as values keyed by listing id.
Should avoid the cost of serialization/deserialization during every fetch.
Data must be available in-memory for fast lookup, but also
persisted on disk.
Highly Concurrent, writer shouldn’t block the readers (One writer
but >100 reader threads)
Requirements
Why did we need our custom Forward Index?
// Forward Index	
public interface ForwardIndex<V> {	
!
Map<Long, V> asMap();	
	
void put(long id, V value);	
!
void putAll(Map<Long, V> values);	
!
void remove(long id);	
!
void commit();	
!
}
Forward Index Interface
// Writer	
forwardIndex.put(listingId, listingData);	
. . .	
// write to disk and also make it visible to readers.	
forwardIndex.commit();
// Reader	
// Fetch forward index data from in-memory map	
Map<Long, ListingData> fwdIndex = forwardIndex.asMap();	
ListingData data = fwdIndex.get(listingId);	
!
// Use it to evaluate business rules 	
checkAvailability(data, searchRequest);	
calculatePrice(data, searchRequest)
NonBlocking In-Memory
HashMap
DiskStore
// Forward Index	
public class ForwardIndexStore<V> implements ForwardIndex<V> {	
private final DB<V> diskStore;	
private final Cache<V> cache;	
!
. . . .	
!
@Override	
Map<Long, V> asMap() {	
return Collections.unmodifiableMap(cache);	
}	
	
void put(long id, V value) {	
diskStore.put(id, value);	
cache.put(id, value);	
}	
!
. . . .	
!
void commit() {	
diskStore.commit();	
cache.commit();	
}	
}
Forward Index Implementation
Ranking Problem
____________________________
Not a text search problem
Users are almost never searching for a specific item, rather they’re looking to
“Discover”
The most common component of a query is location
Highly personalized – the user is a part of the query
Optimizing for conversion (Search -> Inquiry -> Booking)
Evolution through continuous experimentation
Ranking
Ranking Components
____________________________
Relevance
Quality
Bookability
Personalization
Desirability of location
etc.
Ranking
Several hundred signals used to build
machine learning models:
!
Properties of the listing (reviews, location, etc.)
Behavioral signals (mined from request logs)
Image quality and click ability (computer vision)
Host behavior (response time/rate, cancellations, etc.)
Host preferences model
DB snapshots Logs
Life of a Query
Query Understanding
Retrieval Populator
First Pass Scorer
Geocoding
Configuring retrieval options
Choosing ranking models
Quality
Bookability
Relevance
Second Pass Ranking
Result Generation AirEvents
Filtering by Price and
Availability
25 results
2000 results
25 results
Second Pass Ranking
____________________________
Traditional ranking works like this:
!
then sort by 
In contrast, second pass operates on the entire list at once:
!
Makes it possible to implement features like result diversity, etc.
Life of a Query
Query Understanding
Retrieval Populator
First Pass Scorer
Geocoding
Configuring retrieval options
Choosing ranking models
Quality
Bookability
Relevance
Second Pass Ranking
Result Generation AirEvents
Filtering by Price and
Availability
25 results
2000 results
25 results
Search@airbnb

Más contenido relacionado

La actualidad más candente

Training Series - Build A Routing Web Application With OpenStreetMap, Neo4j, ...
Training Series - Build A Routing Web Application With OpenStreetMap, Neo4j, ...Training Series - Build A Routing Web Application With OpenStreetMap, Neo4j, ...
Training Series - Build A Routing Web Application With OpenStreetMap, Neo4j, ...
Neo4j
 
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query PerformanceORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
DataWorks Summit
 
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Koray Tugberk GUBUR
 
Mapping french open data actors on the web with common crawl
Mapping french open data actors on the web with common crawlMapping french open data actors on the web with common crawl
Mapping french open data actors on the web with common crawl
data publica
 
How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloud
Vinay Kumar Chella
 
Storage Capacity Management on Multi-tenant Kafka Cluster with Nurettin Omeroglu
Storage Capacity Management on Multi-tenant Kafka Cluster with Nurettin OmerogluStorage Capacity Management on Multi-tenant Kafka Cluster with Nurettin Omeroglu
Storage Capacity Management on Multi-tenant Kafka Cluster with Nurettin Omeroglu
HostedbyConfluent
 

La actualidad más candente (20)

Google analytics version 4 in details
Google analytics version 4 in detailsGoogle analytics version 4 in details
Google analytics version 4 in details
 
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4jNeo4j Graph Platform Overview, Kurt Freytag, Neo4j
Neo4j Graph Platform Overview, Kurt Freytag, Neo4j
 
Scaling up uber's real time data analytics
Scaling up uber's real time data analyticsScaling up uber's real time data analytics
Scaling up uber's real time data analytics
 
Training Series - Build A Routing Web Application With OpenStreetMap, Neo4j, ...
Training Series - Build A Routing Web Application With OpenStreetMap, Neo4j, ...Training Series - Build A Routing Web Application With OpenStreetMap, Neo4j, ...
Training Series - Build A Routing Web Application With OpenStreetMap, Neo4j, ...
 
Apache HBase at Airbnb
Apache HBase at Airbnb Apache HBase at Airbnb
Apache HBase at Airbnb
 
Tripadvisor Improvements & Monetization - Veeranna
Tripadvisor Improvements & Monetization - VeerannaTripadvisor Improvements & Monetization - Veeranna
Tripadvisor Improvements & Monetization - Veeranna
 
Competitive Analysis of Zostel Brand
Competitive Analysis of Zostel BrandCompetitive Analysis of Zostel Brand
Competitive Analysis of Zostel Brand
 
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query PerformanceORC File & Vectorization - Improving Hive Data Storage and Query Performance
ORC File & Vectorization - Improving Hive Data Storage and Query Performance
 
대용량 분산 아키텍쳐 설계 #4. soa 아키텍쳐
대용량 분산 아키텍쳐 설계 #4. soa 아키텍쳐대용량 분산 아키텍쳐 설계 #4. soa 아키텍쳐
대용량 분산 아키텍쳐 설계 #4. soa 아키텍쳐
 
SEO-Campixx 2022 | Suchoperatoren auf Steroiden
SEO-Campixx 2022 | Suchoperatoren auf SteroidenSEO-Campixx 2022 | Suchoperatoren auf Steroiden
SEO-Campixx 2022 | Suchoperatoren auf Steroiden
 
Oyo rooms
Oyo roomsOyo rooms
Oyo rooms
 
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
 
Mapping french open data actors on the web with common crawl
Mapping french open data actors on the web with common crawlMapping french open data actors on the web with common crawl
Mapping french open data actors on the web with common crawl
 
PPC Restart 2023: Petra Nulíčková - Jsou PPCčkaři diamanty ve světě reklamy?
PPC Restart 2023: Petra Nulíčková - Jsou PPCčkaři diamanty ve světě reklamy?PPC Restart 2023: Petra Nulíčková - Jsou PPCčkaři diamanty ve světě reklamy?
PPC Restart 2023: Petra Nulíčková - Jsou PPCčkaři diamanty ve světě reklamy?
 
Zostel ‘Entrepreneurship Development Programme - 2 #FollowYourHeart
Zostel ‘Entrepreneurship Development Programme - 2  #FollowYourHeartZostel ‘Entrepreneurship Development Programme - 2  #FollowYourHeart
Zostel ‘Entrepreneurship Development Programme - 2 #FollowYourHeart
 
Hashicorp Corporate Pitch Deck Stenio_v2
Hashicorp Corporate Pitch Deck Stenio_v2 Hashicorp Corporate Pitch Deck Stenio_v2
Hashicorp Corporate Pitch Deck Stenio_v2
 
Getting Started with Google Analytics 4
Getting Started with Google Analytics 4Getting Started with Google Analytics 4
Getting Started with Google Analytics 4
 
How netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloudHow netflix manages petabyte scale apache cassandra in the cloud
How netflix manages petabyte scale apache cassandra in the cloud
 
Storage Capacity Management on Multi-tenant Kafka Cluster with Nurettin Omeroglu
Storage Capacity Management on Multi-tenant Kafka Cluster with Nurettin OmerogluStorage Capacity Management on Multi-tenant Kafka Cluster with Nurettin Omeroglu
Storage Capacity Management on Multi-tenant Kafka Cluster with Nurettin Omeroglu
 
The Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j OverviewThe Graph Database Universe: Neo4j Overview
The Graph Database Universe: Neo4j Overview
 

Similar a Search@airbnb

Introducing DataWave
Introducing DataWaveIntroducing DataWave
Introducing DataWave
Data Works MD
 
Writing Node.js Bindings - General Principles - Gabriel Schulhof
Writing Node.js Bindings - General Principles - Gabriel SchulhofWriting Node.js Bindings - General Principles - Gabriel Schulhof
Writing Node.js Bindings - General Principles - Gabriel Schulhof
WithTheBest
 
Paris Cassandra Meetup - Cassandra for Developers
Paris Cassandra Meetup - Cassandra for DevelopersParis Cassandra Meetup - Cassandra for Developers
Paris Cassandra Meetup - Cassandra for Developers
Michaël Figuière
 

Similar a Search@airbnb (20)

WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
WSO2Con ASIA 2016: WSO2 Analytics Platform: The One Stop Shop for All Your Da...
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbAirbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
 
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
Meet the Experts: Visualize Your Time-Stamped Data Using the React-Based Gira...
 
Relevance trilogy may dream be with you! (dec17)
Relevance trilogy  may dream be with you! (dec17)Relevance trilogy  may dream be with you! (dec17)
Relevance trilogy may dream be with you! (dec17)
 
WSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needsWSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needs
 
How To Analyze Geolocation Data with Hive and Hadoop
How To Analyze Geolocation Data with Hive and HadoopHow To Analyze Geolocation Data with Hive and Hadoop
How To Analyze Geolocation Data with Hive and Hadoop
 
Personalization with Orleans and Actor modelling
Personalization with Orleans and Actor modellingPersonalization with Orleans and Actor modelling
Personalization with Orleans and Actor modelling
 
Icinga 2010 at Nagios Workshop
Icinga 2010 at Nagios WorkshopIcinga 2010 at Nagios Workshop
Icinga 2010 at Nagios Workshop
 
WSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsWSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needs
 
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
 
Siddhi - cloud-native stream processor
Siddhi - cloud-native stream processorSiddhi - cloud-native stream processor
Siddhi - cloud-native stream processor
 
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
 
Introducing DataWave
Introducing DataWaveIntroducing DataWave
Introducing DataWave
 
Kerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit eastKerberizing spark. Spark Summit east
Kerberizing spark. Spark Summit east
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 
Writing Node.js Bindings - General Principles - Gabriel Schulhof
Writing Node.js Bindings - General Principles - Gabriel SchulhofWriting Node.js Bindings - General Principles - Gabriel Schulhof
Writing Node.js Bindings - General Principles - Gabriel Schulhof
 
Paris Cassandra Meetup - Cassandra for Developers
Paris Cassandra Meetup - Cassandra for DevelopersParis Cassandra Meetup - Cassandra for Developers
Paris Cassandra Meetup - Cassandra for Developers
 
Discover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 AnalyticsDiscover Data That Matters- Deep dive into WSO2 Analytics
Discover Data That Matters- Deep dive into WSO2 Analytics
 
Do you know what your drupal is doing? Observe it!
Do you know what your drupal is doing? Observe it!Do you know what your drupal is doing? Observe it!
Do you know what your drupal is doing? Observe it!
 

Último

Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 

Último (20)

WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 

Search@airbnb

  • 4. That Awesome Slide Title of Yours
  • 5. Technical Stack ____________________________ DropWizard as a service framework (incl. Jetty, Jersey, Jackson) ZooKeeper (via Smartstack) for service discovery. Lucene for index storage and simple retrieval. In-house built forward index, real-time indexing, ranking, advanced filtering.
  • 6. Web App Search1 150 Search Threads Lucene Index ~30 replicas of same index dataJVM …Search2 SearchN Search Overview
  • 8. Challenges ____________________________ Bootstrap (creating the index from scratch) Ensuring consistency of the index with ground truth data in real time Indexing
  • 9. What’s in the Lucene index? ____________________________ Positions of listings indexed using Lucene’s spatial module (RecursivePrefixTreeStrategy) Categorical and numerical properties like room type and maximum occupancy Full text (descriptions, reviews, etc.) ~40 fields per listing from a variety of data sources, all updated in real time
  • 11. Tails binary update logs from Mysql Servers (5.6+) Converts changes in any of the tables into actionable objects called “Mutations” (Inserts, deletes, Updates) Broadcasts them to Medusa using Kafka Spinaltap
  • 13. Source of truth for search index data. Listens to updates from Spinaltap and builds new IndexData by querying ~15 mysql tables from three different databases. Persists everything in a DataStore and broadcasts latest version to all search nodes. Uses ZooKeeper for leader election. Medusa
  • 15. What’s in the forward index? ____________________________ Holds all the metadata about a listing required by scoring and filtering. We also have complicated business rules to calculate Price, Availability, InstantBook etc which needs a ton of metadata. ~50 fields built from multiple data source and updated in realtime. public final class ForwardIndexData { private final CalendarData calendarData; private final PricingData pricingData; private final HostInfo hostInfo; . . . . . . . . } ! public final class CalendarData { private final DateRanges reservationDates; private final SeasonalValues startDayOfWeeks; . . . . } ! private final class SeasonalValues<T> { private final DateRange startDate; private final T value; . . . . } Forward Index
  • 16. Availability ____________________________ ! Depends on the profile of guest. The checkin date must be one of the valid start days of the week. Must satisfy seasonal minimum nights. There must be enough preparation time for the host. Import busy dates from external calendars to avoid booking conflict.
  • 17. Pricing ____________________________ ! Depends on number of guests , number of nights. How close or further away the checkin date is. How long is the trip, does the host have Weekly and Monthly pricing. Is there special price override for these nights.
  • 18. Instant Book ____________________________ ! Depends on number of guests , number of nights. Profile of the guest like positive reviews, does have profile photo? How much preparation time the host has etc.
  • 19. Needs to store objects with 50-100 fields as values keyed by listing id. Should avoid the cost of serialization/deserialization during every fetch. Data must be available in-memory for fast lookup, but also persisted on disk. Highly Concurrent, writer shouldn’t block the readers (One writer but >100 reader threads) Requirements Why did we need our custom Forward Index?
  • 20. // Forward Index public interface ForwardIndex<V> { ! Map<Long, V> asMap(); void put(long id, V value); ! void putAll(Map<Long, V> values); ! void remove(long id); ! void commit(); ! } Forward Index Interface // Writer forwardIndex.put(listingId, listingData); . . . // write to disk and also make it visible to readers. forwardIndex.commit(); // Reader // Fetch forward index data from in-memory map Map<Long, ListingData> fwdIndex = forwardIndex.asMap(); ListingData data = fwdIndex.get(listingId); ! // Use it to evaluate business rules checkAvailability(data, searchRequest); calculatePrice(data, searchRequest)
  • 21. NonBlocking In-Memory HashMap DiskStore // Forward Index public class ForwardIndexStore<V> implements ForwardIndex<V> { private final DB<V> diskStore; private final Cache<V> cache; ! . . . . ! @Override Map<Long, V> asMap() { return Collections.unmodifiableMap(cache); } void put(long id, V value) { diskStore.put(id, value); cache.put(id, value); } ! . . . . ! void commit() { diskStore.commit(); cache.commit(); } } Forward Index Implementation
  • 22. Ranking Problem ____________________________ Not a text search problem Users are almost never searching for a specific item, rather they’re looking to “Discover” The most common component of a query is location Highly personalized – the user is a part of the query Optimizing for conversion (Search -> Inquiry -> Booking) Evolution through continuous experimentation Ranking
  • 24. Several hundred signals used to build machine learning models: ! Properties of the listing (reviews, location, etc.) Behavioral signals (mined from request logs) Image quality and click ability (computer vision) Host behavior (response time/rate, cancellations, etc.) Host preferences model DB snapshots Logs
  • 25. Life of a Query Query Understanding Retrieval Populator First Pass Scorer Geocoding Configuring retrieval options Choosing ranking models Quality Bookability Relevance Second Pass Ranking Result Generation AirEvents Filtering by Price and Availability 25 results 2000 results 25 results
  • 26. Second Pass Ranking ____________________________ Traditional ranking works like this: ! then sort by In contrast, second pass operates on the entire list at once: ! Makes it possible to implement features like result diversity, etc.
  • 27. Life of a Query Query Understanding Retrieval Populator First Pass Scorer Geocoding Configuring retrieval options Choosing ranking models Quality Bookability Relevance Second Pass Ranking Result Generation AirEvents Filtering by Price and Availability 25 results 2000 results 25 results