SlideShare una empresa de Scribd logo
1 de 20
Search Logs
+ Machine Learning
= Automatic Tagging
John Berryman
Hi! I'm John Berryman
@JnBrymn
This is my name
with all the
unnecessary
letters removed.
● Degree in Aerospace Engineering
● Moved into Search Technology
● Wrote a book (... well 40% of one)
I got a haircut. Life
has been different
ever since.
That's me on
the cover.
● Discovery Engineer @ Eventbrite
(Search/Recommendations)
What is "tagging"?
… and why would you want it?
First, let's talk about e-commerce search
● Search is ubiquitous.
○ Search makes the internet accessible
○ Search is the backbone of many products
○ Search is embedded in most products
● E-commerce is powered by search
● Browse is an important aspect of
the experience. You filter inventory
based upon tags.
● Mobile users prefer browse over
text search.
● Everyone is moving to mobile.
These are tags!
These are
also tags!
What is "tagging"?
… and why would you want it?
It's the ability to CATEGORIZE and
UNDERSTAND your inventory.
Because it powers the emerging
dominant e-commerce interaction.
How can you tag your inventory?
● Use curators to tag content:
○ Benefits: control over tagging, uniform tagging approach
○ Drawbacks: curation approach must be define, curators must be trained, curators are
expensive
● Require tagging from content creators:
○ Benefits: content creators know their content the best, scales well
○ Drawback: content creators may not cooperate if they see no advantage for themselves
● Encourage customers to tag content:
○ Benefits: customers are the ones buying content and their idea of tags matters most
○ Drawbacks: there's even less likelihood for customers to cooperate
… but what of nobody wants to tag your content?
An Interesting Observation
● Every day millions of people search for events on Eventbrite.
● They issue approximately > 500K distinct queries in a month.
● But the most common 1,000 queries accounts for 41% of all search traffic.
● The common queries look like tags!
Can we use logged searches as a
training set to built a tagging model?
○ 5k run
○ back to school
○ job fair
○ 4th of july party
○ baby
○ real estate
○ car show
○ pool party
○ golf
○ gospel
○ speed dating
○ boat party
○ photography
○ dog
○ data science
○ business
○ kids
○ networking
○ christian
○ free
Search Logs
+ Machine Learning
= Automatic Tagging
John Berryman
Initial Approach
● Given – we have 3 tables:
○ search log
○ click log
○ event table
● Step 1: Find the most common 500 queries
● Step 2: Find all the events clicked after a user search using a common query
● Step 3: Collect the name and description of those events
● Create a training set:
○ X = input = title and body text of events
○ y = output = query string used to find them a.k.a. tags
● Train a model to predict y based on new X
tagging_with_searches_1.ipynb
emergency backup plan
Problems with this Approach
● Near synonym tags:
○ memorial day
○ memorial day weekend events
○ memorial day weekend
● Small tag vocabulary
● Each event only gets 1 or 2 tags. Sometimes 0.
Improved Approach
● A session may contain several queries. These queries are often related:
○ Spelling corrections
○ Word synonyms
○ Query Refinements or generalizations
● Idea:
○ Let's group statistically significant query strings together.
○ Then we can train the neural network based on the query string groups
query_string_clusters
and
tagging_with_searches_2
emergency backup plan
emergency backup plan
Things to Notice
Benefits
● Much fewer near-synonyms (bitcoin, block chain, blockchai → blockchain)
● More sample data
○ v1 model - 500 most popular queries - 33% of query traffic
○ v2 model - 2649 most popular queries collapsed down to 681 - 52% of query traffic
● Broader tags
Drawback
● Some of the clusters pull in very loosely related words
○ ai → blockchain
○ ozio → rosebar
Tagging-Related Applications
● Power Faceted Search
● Infer relationship between tags
● Provide organizers tag
recommendations
● Better understand supply and
demand
● Apply tags to users for better
recommendation
● Search Synonyms (e.g.
misspellings)
Future Work
● Better coverage
○ Currently reach 50% of our traffic with 2,500 queries.
Long tail is long! > 500K distinct queries in a month
○ Model biases towards short tail labels - everything's a "day party"
○ Can't cover searches for an event that isn't in our inventory.
● Create real pipeline
● Build out all the cool ideas on the last slide
Questions?
… better yet, Ideas?
Final notes:
● My jupyter notebooks are here:
○ First implementation
○ Query collapsing
○ Second implementation
○ Third implementation
Data Nerds
● Want to learn data science with
others? You should try Data Nerds.
● Do you like spending time around
people that love learning? Penny
University is the peer-to-peer
learning community for you!
● I just shared my talk
https://twitter.com/JnBrymn
This slide intentionally left blank.
DON'T FORGET
● Tweet the slides out just before the talk
● Open the notebooks
○ do
■ cd ~/Personal/data_science/tagging_events/
■ jupyter notebook
■ open the 3 notebooks in event_tagging_strategies
○ or just use gists: one, two, three
● Bump up the font size on the notebooks
● Remove the menus
● Clear cells

Más contenido relacionado

La actualidad más candente

Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
OpenSource Connections
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 
Reflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital TransformationReflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital Transformation
Trey Grainger
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Lucidworks
 

La actualidad más candente (19)

Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey GraingerHaystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
Haystack 2019 - Natural Language Search with Knowledge Graphs - Trey Grainger
 
Understanding search engine algorithms
Understanding search engine algorithmsUnderstanding search engine algorithms
Understanding search engine algorithms
 
seo basic
seo basicseo basic
seo basic
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge Management
 
Humanizing The Machine
Humanizing The MachineHumanizing The Machine
Humanizing The Machine
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
Ranking in Google Since The Advent of The Knowledge Graph
Ranking in Google Since The Advent of The Knowledge GraphRanking in Google Since The Advent of The Knowledge Graph
Ranking in Google Since The Advent of The Knowledge Graph
 
Pixel tags and tag management
Pixel tags and tag managementPixel tags and tag management
Pixel tags and tag management
 
DataKind SG sharing of our first DataDive
DataKind SG sharing of our first DataDiveDataKind SG sharing of our first DataDive
DataKind SG sharing of our first DataDive
 
Semantics and Search by Upasna Gautam at PubCon Austin 2018
Semantics and Search by Upasna Gautam at PubCon Austin 2018Semantics and Search by Upasna Gautam at PubCon Austin 2018
Semantics and Search by Upasna Gautam at PubCon Austin 2018
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
 
Evolution of Search
Evolution of SearchEvolution of Search
Evolution of Search
 
Reflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital TransformationReflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital Transformation
 
Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)
 
Searching for Meaning
Searching for MeaningSearching for Meaning
Searching for Meaning
 
Presentation 4 MCExtenders
Presentation 4 MCExtendersPresentation 4 MCExtenders
Presentation 4 MCExtenders
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
Conductor C3 2019 - A Sound Advantage: How Voice Search Works & Works For You
Conductor C3 2019 - A Sound Advantage: How Voice Search Works & Works For YouConductor C3 2019 - A Sound Advantage: How Voice Search Works & Works For You
Conductor C3 2019 - A Sound Advantage: How Voice Search Works & Works For You
 

Similar a Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - John Berryman

Similar a Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - John Berryman (20)

Machine Learning - Startup weekend UCSB 2018
Machine Learning - Startup weekend UCSB 2018Machine Learning - Startup weekend UCSB 2018
Machine Learning - Startup weekend UCSB 2018
 
What Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PMWhat Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PM
 
Google advance features for power searching
Google advance features for power searchingGoogle advance features for power searching
Google advance features for power searching
 
The right path to making search relevant - Taxonomy Bootcamp London 2019
The right path to making search relevant  - Taxonomy Bootcamp London 2019The right path to making search relevant  - Taxonomy Bootcamp London 2019
The right path to making search relevant - Taxonomy Bootcamp London 2019
 
How To Do Technical Keyword Research For A New Website
How To Do Technical Keyword Research For A New WebsiteHow To Do Technical Keyword Research For A New Website
How To Do Technical Keyword Research For A New Website
 
Performing Technical Keyword Research for a NEW Website
Performing Technical Keyword Research for a NEW WebsitePerforming Technical Keyword Research for a NEW Website
Performing Technical Keyword Research for a NEW Website
 
The evolution of Search spscinci
The evolution of Search spscinciThe evolution of Search spscinci
The evolution of Search spscinci
 
Curtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahooCurtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahoo
 
Guerrilla UX: Practical and Affordable Research
Guerrilla UX: Practical and Affordable ResearchGuerrilla UX: Practical and Affordable Research
Guerrilla UX: Practical and Affordable Research
 
Personalized search
Personalized searchPersonalized search
Personalized search
 
Webinar - SEO for Beginners: Simple Steps for Nonprofits and Libraries - 2016...
Webinar - SEO for Beginners: Simple Steps for Nonprofits and Libraries - 2016...Webinar - SEO for Beginners: Simple Steps for Nonprofits and Libraries - 2016...
Webinar - SEO for Beginners: Simple Steps for Nonprofits and Libraries - 2016...
 
Role of Data Science in eCommerce
Role of Data Science in eCommerceRole of Data Science in eCommerce
Role of Data Science in eCommerce
 
Search Engine PPT For Students and Professionals
Search Engine PPT For Students and ProfessionalsSearch Engine PPT For Students and Professionals
Search Engine PPT For Students and Professionals
 
Defining the Search Experience
Defining the Search ExperienceDefining the Search Experience
Defining the Search Experience
 
Tool criticism
Tool criticismTool criticism
Tool criticism
 
How To Keyword Research For SEO Content Planning
How To Keyword Research For SEO Content PlanningHow To Keyword Research For SEO Content Planning
How To Keyword Research For SEO Content Planning
 
Test driven relevancy
Test driven relevancyTest driven relevancy
Test driven relevancy
 
Link Building in 2020 :: Use this Walk-through to Acquire & Earn Links that w...
Link Building in 2020 :: Use this Walk-through to Acquire & Earn Links that w...Link Building in 2020 :: Use this Walk-through to Acquire & Earn Links that w...
Link Building in 2020 :: Use this Walk-through to Acquire & Earn Links that w...
 
National Wildlife Federation- OMS- Dreamcore 2011
National Wildlife Federation- OMS- Dreamcore 2011National Wildlife Federation- OMS- Dreamcore 2011
National Wildlife Federation- OMS- Dreamcore 2011
 
Tf wiads
Tf wiadsTf wiads
Tf wiads
 

Más de OpenSource Connections

Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...
OpenSource Connections
 

Más de OpenSource Connections (20)

Encores
EncoresEncores
Encores
 
How To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for Success
 
Payloads and OCR with Solr
Payloads and OCR with SolrPayloads and OCR with Solr
Payloads and OCR with Solr
 
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie HullHaystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
Haystack 2019 Lightning Talk - The Future of Quepid - Charlie Hull
 
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim AllisonHaystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
Haystack 2019 Lightning Talk - State of Apache Tika - Tim Allison
 
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
Haystack 2019 Lightning Talk - Relevance on 17 million full text documents - ...
 
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj BharadwajHaystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
Haystack 2019 Lightning Talk - Solr Cloud on Kubernetes - Manoj Bharadwaj
 
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
Haystack 2019 Lightning Talk - Quaerite a Search relevance evaluation toolkit...
 
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan KohlHaystack 2019 - Search-based recommendations at Politico - Ryan Kohl
Haystack 2019 - Search-based recommendations at Politico - Ryan Kohl
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon Hughes
 
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
Haystack 2019 - Improving Search Relevance with Numeric Features in Elasticse...
 
Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...Haystack 2019 - Architectural considerations on search relevancy in the conte...
Haystack 2019 - Architectural considerations on search relevancy in the conte...
 
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
Haystack 2019 - Custom Solr Query Parser Design Option, and Pros & Cons - Ber...
 
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...Haystack 2019 - Establishing a relevance focused culture in a large organizat...
Haystack 2019 - Establishing a relevance focused culture in a large organizat...
 
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
Haystack 2019 - Solving for Satisfaction: Introduction to Click Models - Eliz...
 
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
2019 Haystack - How The New York Times Tackles Relevance - Jeremiah Via
 
Haystack 2019 - Addressing variance in AB tests: Interleaved evaluation of ra...
Haystack 2019 - Addressing variance in AB tests: Interleaved evaluation of ra...Haystack 2019 - Addressing variance in AB tests: Interleaved evaluation of ra...
Haystack 2019 - Addressing variance in AB tests: Interleaved evaluation of ra...
 
Haystack 2019 - Beyond The Search Engine: Improving Relevancy through Query E...
Haystack 2019 - Beyond The Search Engine: Improving Relevancy through Query E...Haystack 2019 - Beyond The Search Engine: Improving Relevancy through Query E...
Haystack 2019 - Beyond The Search Engine: Improving Relevancy through Query E...
 
Haystack 2019 - Evolution of Yelp search to a generalized ranking platform - ...
Haystack 2019 - Evolution of Yelp search to a generalized ranking platform - ...Haystack 2019 - Evolution of Yelp search to a generalized ranking platform - ...
Haystack 2019 - Evolution of Yelp search to a generalized ranking platform - ...
 
Haystack 2019 - Query relaxation - a rewriting technique between search and r...
Haystack 2019 - Query relaxation - a rewriting technique between search and r...Haystack 2019 - Query relaxation - a rewriting technique between search and r...
Haystack 2019 - Query relaxation - a rewriting technique between search and r...
 

Último

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Último (20)

Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 

Haystack 2019 - Search Logs + Machine Learning = Auto-Tagging Inventory - John Berryman

  • 1. Search Logs + Machine Learning = Automatic Tagging John Berryman
  • 2. Hi! I'm John Berryman @JnBrymn This is my name with all the unnecessary letters removed. ● Degree in Aerospace Engineering ● Moved into Search Technology ● Wrote a book (... well 40% of one) I got a haircut. Life has been different ever since. That's me on the cover. ● Discovery Engineer @ Eventbrite (Search/Recommendations)
  • 3. What is "tagging"? … and why would you want it?
  • 4. First, let's talk about e-commerce search ● Search is ubiquitous. ○ Search makes the internet accessible ○ Search is the backbone of many products ○ Search is embedded in most products ● E-commerce is powered by search ● Browse is an important aspect of the experience. You filter inventory based upon tags. ● Mobile users prefer browse over text search. ● Everyone is moving to mobile. These are tags! These are also tags!
  • 5. What is "tagging"? … and why would you want it? It's the ability to CATEGORIZE and UNDERSTAND your inventory. Because it powers the emerging dominant e-commerce interaction.
  • 6. How can you tag your inventory? ● Use curators to tag content: ○ Benefits: control over tagging, uniform tagging approach ○ Drawbacks: curation approach must be define, curators must be trained, curators are expensive ● Require tagging from content creators: ○ Benefits: content creators know their content the best, scales well ○ Drawback: content creators may not cooperate if they see no advantage for themselves ● Encourage customers to tag content: ○ Benefits: customers are the ones buying content and their idea of tags matters most ○ Drawbacks: there's even less likelihood for customers to cooperate … but what of nobody wants to tag your content?
  • 7. An Interesting Observation ● Every day millions of people search for events on Eventbrite. ● They issue approximately > 500K distinct queries in a month. ● But the most common 1,000 queries accounts for 41% of all search traffic. ● The common queries look like tags! Can we use logged searches as a training set to built a tagging model? ○ 5k run ○ back to school ○ job fair ○ 4th of july party ○ baby ○ real estate ○ car show ○ pool party ○ golf ○ gospel ○ speed dating ○ boat party ○ photography ○ dog ○ data science ○ business ○ kids ○ networking ○ christian ○ free
  • 8. Search Logs + Machine Learning = Automatic Tagging John Berryman
  • 9. Initial Approach ● Given – we have 3 tables: ○ search log ○ click log ○ event table ● Step 1: Find the most common 500 queries ● Step 2: Find all the events clicked after a user search using a common query ● Step 3: Collect the name and description of those events ● Create a training set: ○ X = input = title and body text of events ○ y = output = query string used to find them a.k.a. tags ● Train a model to predict y based on new X
  • 11. Problems with this Approach ● Near synonym tags: ○ memorial day ○ memorial day weekend events ○ memorial day weekend ● Small tag vocabulary ● Each event only gets 1 or 2 tags. Sometimes 0.
  • 12. Improved Approach ● A session may contain several queries. These queries are often related: ○ Spelling corrections ○ Word synonyms ○ Query Refinements or generalizations ● Idea: ○ Let's group statistically significant query strings together. ○ Then we can train the neural network based on the query string groups
  • 14. Things to Notice Benefits ● Much fewer near-synonyms (bitcoin, block chain, blockchai → blockchain) ● More sample data ○ v1 model - 500 most popular queries - 33% of query traffic ○ v2 model - 2649 most popular queries collapsed down to 681 - 52% of query traffic ● Broader tags Drawback ● Some of the clusters pull in very loosely related words ○ ai → blockchain ○ ozio → rosebar
  • 15. Tagging-Related Applications ● Power Faceted Search ● Infer relationship between tags ● Provide organizers tag recommendations ● Better understand supply and demand ● Apply tags to users for better recommendation ● Search Synonyms (e.g. misspellings)
  • 16. Future Work ● Better coverage ○ Currently reach 50% of our traffic with 2,500 queries. Long tail is long! > 500K distinct queries in a month ○ Model biases towards short tail labels - everything's a "day party" ○ Can't cover searches for an event that isn't in our inventory. ● Create real pipeline ● Build out all the cool ideas on the last slide
  • 18. Final notes: ● My jupyter notebooks are here: ○ First implementation ○ Query collapsing ○ Second implementation ○ Third implementation Data Nerds ● Want to learn data science with others? You should try Data Nerds. ● Do you like spending time around people that love learning? Penny University is the peer-to-peer learning community for you! ● I just shared my talk https://twitter.com/JnBrymn
  • 20. DON'T FORGET ● Tweet the slides out just before the talk ● Open the notebooks ○ do ■ cd ~/Personal/data_science/tagging_events/ ■ jupyter notebook ■ open the 3 notebooks in event_tagging_strategies ○ or just use gists: one, two, three ● Bump up the font size on the notebooks ● Remove the menus ● Clear cells

Notas del editor

  1. I hope so (that we use logged searches as a training set to built a tagging model) because the title of the talk is ^
  2. "...but" - this is the situation that Eventbrite was in
  3. I hope so (that we use logged searches as a training set to built a tagging model) because the title of the talk is ^