When it comes to improving your site search results, it takes both human ingenuity and some machine learning. Learn how to improve your site search results with hands-on tactics & then learn how to structure a machine learning element to augment it. This presentation was delivered live and in person at the 2020 Paris Search Y Conference
3. STATE OF SITE SEARCH
Search is much larger than
search engines.
@JPSHERMAN
4. BIGGER THAN GOOGLE?
Now, what about those non-search engine searches?
Amazon, Facebook, Sohu, Weibo, Reddit, Instagram, Twitter, Ebay….
Web Search
App Search
GOOGLE’S 2 TRILLION PER YEAR SEARCH VOLUME
4
Maintaining Site search
will
● Increase Conversions
● Reduce Abandonment
● Reinforce Expertise
● Deliver a Good User &
Brand Experience
@JPSHERMAN
5. SEARCH AS A BEHAVIOR IS FRACTURED
THERE ARE MORE WAYS TO SEARCH THAN EVER.
5
Search isn’t just a search
engine. It’s in an
application, in IoT, in smart
devices
Findability Is:
● Understanding “How”
● Understanding
Selection
● Understanding Behavior
● Understanding Intent
@JPSHERMAN
6. IF THEY’RE SEARCHING ON YOUR SITE...
IF THEY DON’T FIND IT,
THEY WILL LEAVE YOU.
THEY THINK YOU HAVE WHAT THEY’RE LOOKING FOR.
6
If a user cannot find what
they’re looking for, they
know that Google is less
than a second away.
● They think you have
what they want
● They’re probably right
● If it’s not findable
● They’re gone.
@JPSHERMAN
7. IF THEY FIND IT, DO BALLOONS DROP?
THAT’S THE EXPECTATION.
NO.
7
@JPSHERMAN
8. USERS REMEMBER THEIR SITE SEARCH
EXPERIENCE
USERS ARE NOT KIND.
8
Clever girl...
A poor search experience
is remembered.
● Some trust is lost
● They’ll go to Google
● They may find what
they’re looking for.
● Let's hope your
competitor doesn’t rank.
@JPSHERMAN
9. SEARCH BEHAVIOR: HOW … NOT WHAT...
USERS SCAN WITH PURPOSE AND INTENT
9
Passive Search Active Search
Users apply criteria as they
scan through your results
● They have acceptance and
rejection criteria
● They spend less than a second
scanning a snippet
● Perception of Value is Critical
@JPSHERMAN
10. SITE SEARCH BEHAVIORAL SCIENCE
INFORMATION SCENT TRAILS
USERS LOOK FOR “INFORMATION SCENT TRAILS”
10
USERS SCAN FOR
PATTERNS
● They include elements
of or related to their
intent
● They look at textual,
image proximities
● Active vs. Passive
Scanning
● Value Signals.
@JPSHERMAN
11. INFORMATION SCENT TRAILS
A QUICK EXAMPLE
11
An intent based word-
cloud.
● Users scan
● When words match
intent
● Acceptance & Rejection
Criteria.
● One will lead to an
information trail.
TYPES PROPERTIES
@JPSHERMAN
12. USER PERCEPTION OF VALUE
WITH INTENT, USERS LOOK FOR VALUE
12
Results for “Road Bikes”
● sigh.
● They all look alike
● Which one is good?
@JPSHERMAN
13. USER PERCEPTION OF VALUE
WITH INTENT, USERS LOOK FOR VALUE
13
Results for “Road Bikes”
● Value applied as metadata.
● Triggers for behavior
● Which one is better?
@JPSHERMAN
14. THINGS HUMANS CAN DO TO IMPROVE
RESULTS
SPOILER ALERT: IT’S A LOT OF THE STUFF WE ALREADY DO
14
Actionable Tasks to Improve Site Search
Results:
● Keyword Metadata
● Synonym Lists
● Boosted Results
● SERP Features
● Clickstream Data
● Personalization
@JPSHERMAN
15. THINGS HUMANS CAN DO TO IMPROVE
RESULTS
IMPROVE THE SERP DESIGN
15
Actionable Tasks to Improve
Site Search Results:
● SERP Features
AUTOCOMPLETE/
AUTOSUGGEST
FACETS
KEYMATCH
KNOWLEDGE
GRAPH
NATURAL RESULTS
@JPSHERMAN
16. THINGS HUMANS CAN DO TO IMPROVE
RESULTS
LOCATION CAN BE A STRONG SIGNAL OF INTENT
16
Actionable Tasks to Improve
Site Search Results:
● Personalization
Keyword: Bike Tires
Saint-Brieuc Bay Portes du Soleil
Location Bias Can
Deliver Intent
Road Bike Tires Mountain Bike Tires
@JPSHERMAN
17. THINGS HUMANS CAN DO TO IMPROVE
RESULTS
MEASURE HUMAN BEHAVIOR
17
Users apply criteria as they scan
through your results
● Measure consumption & conversion
● Measure dwell time
● Measure time from query to conversion
@JPSHERMAN
18. THINGS HUMANS CAN DO TO IMPROVE
RESULTS
SPOILER ALERT: IT’S A LOT OF THE STUFF WE ALREADY DO
18
Design your SERP for the user.
● SERP Design
● Accessibility for people with
visual impairments
● Snippet Design
● Features
● Disambiguation
@JPSHERMAN
19. SO.. UH… WHAT’S THE POINT?
Site Search is a massive behavior across the web.
1. Simple changes to the search platform & content will pay off
2. Users who search your site think you have what they want
3. Metadata and what is displayed in the SERP influences CTR
4. Use boosting of content to quickly rank on your site-search
5. Consider Design
6. Consider Accessibility
Don’t Be Google. Google has to figure out “everything”. You don't.
Be Better Than Google.
YOU CAN DO A LOT TO MAKE SITE SEARCH BETTER, BUT THERE’S
MORE
19
@JPSHERMAN
29. THE PLATFORM OF THE ENGINE
SEARCH PLATFORM
FREE, OPEN-SOURCE & POWERFUL - LUCENE IS POWERFUL.
29
Lucene is a Java-based, free and open-source search engine software platform. Lucene has
several different flavors.
● Apache Nutch
● Apache Solr
● Compass
● CrateDB
● DocFetcher
● ElasticSearch
● Kinosearch
● Swiftype
@JPSHERMAN
30. THE MACHINE LEARNING APPLICATION
DATA SET
THESE ARE YOUR QUERIES OR CONTENT YOU WANT TO LEARN FROM…
FOR EXAMPLE.
“ARE THESE CATS OR DOGS?”
DATA SETS, TRAINING SETS AND HOW TO MEASURE.
30
@JPSHERMAN
31. THE MACHINE LEARNING APPLICATION
TRAINING SET
PARTITION YOUR DATA SET TO 10% EVALUATION, 90% TRAINING.
DATA SETS, TRAINING SETS AND HOW TO MEASURE.
31
DATA SET: TRAINING SET: EVALUATION
@JPSHERMAN
32. THE MACHINE LEARNING APPLICATION
TO QUANTIFY “DOG-NESS” OR
“CAT-NESS” A POWERFUL
FORMULA IS THE OKAPI BM25
FORMULA.
DATA SETS, TRAINING SETS AND HOW TO MEASURE.
32
Good resources on data science formulation.
https://www.datasciencecentral.com/profiles/blogs/140-machine-learning-formulas
Reference:
https://www.amazon.com/Solr-Action-Trey-Grainger/dp/1617291021
Reference:
https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables
@JPSHERMAN
33. THE MACHINE LEARNING APPLICATION
DATA OUTPUT AND FRACTIONAL SCORES.
HOW CLOSE TO “CAT-NESS” IS THIS?
DATA SETS, TRAINING SETS AND HOW TO MEASURE.
33
0.99732134 0.87569821 0.62587471 0.0000111
@JPSHERMAN
34. THE MACHINE LEARNING APPLICATION
WHAT IF THE QUERY ISN’T “CATS” BUT “FUZZY ANIMALS”?
THIS IS A CORE VALUE OF STRUCTURED MARKUP AS AN ATTRIBUTE SIGNAL
DATA SETS, TRAINING SETS AND HOW TO MEASURE.
34
0.9632115 0.9178585 0.9244844 0.9371025
@dawnieandois an inspirationto all of
us.
@JPSHERMAN
35. THE RECIPE FOR MACHINE LEARNING
LEARN TO RANK PLUGIN (LTR)
IT’S NOT AS HARD AS YOU MAY THINK.
35
LTR is a Lucene-compatible plugin that allows the application of machine
learning. It uses a wide variety of ranking signals.
● QUERY INDEPENDENT: Looks only at the body of indexed content
● QUERY DEPENDENT: Looks at both the query and the document, most often a TF-
IDF score.
● QUERY LEVEL FEATURES: Looks only at the query
@JPSHERMAN
36. HOW MACHINES PREDICT INTENT
A SUPPORT ORGANIZATION’S
PRIMARY TASK IS TO HELP
THIS IS THE CORE OF FINDABILITY
- TO DELIVER THE RIGHT
INFORMATION
- AT THE RIGHT TIME
- TO RESOLVE AN ISSUE
- QUICKLY
- THROUGH ANY CHANNEL
THROUGH THE CONTEXT OF SUPPORT
36
@JPSHERMAN
37. CONFIGURING SYSTEM FOR EXPERIMENTS
STARTING SMALL:
- WE SELECTED A SINGLE PRODUCT
- RED HAT OPENSHIFT
- WE SELECTED A SINGLE USE-CASE
- TROUBLESHOOTING
- WE SELECTED THE CONTENT
- SOLUTION CONTENT TYPE
MORE DATA SETS, TRAINING SETS AND EVALUATION SETS
37
@JPSHERMAN
38. CONFIGURING SYSTEM FOR EXPERIMENTS
STARTING SMALL:
- SOLUTION CONTENT IS 30% OF
CONTENT.
MORE DATA SETS, TRAINING SETS AND EVALUATION SETS
38
DOCUMENTATION
VIDEOS
ARTICLES
SECURITY
PRODUCT
SOLUTION
@JPSHERMAN
39. CONFIGURING SYSTEM FOR EXPERIMENTS
APPROXIMATELY 100K INDIVIDUAL
PIECES OF CONTENT.
- 90K WENT TO TRAINING
PARTITION
- 10K WENT TO EVALUATION
MORE DATA SETS, TRAINING SETS AND EVALUATION SETS
39
@JPSHERMAN
40. RUN THE RELEVANCY ALGORITHM
RUN THE EVALUATION
- FIRST CHECK:
- DOES THIS LOOK RIGHT?
- CONFIRM/ CORRECT SAMPLE
- DEFINE “SUCCESS”
WHAT PERCENT EQUALS
“GOOD”
LETS ASSUME FAILURE. WHAT
CAN BE DONE?
THIS REQUIRES SOME HUMAN INTERVENTION
40
@JPSHERMAN
41. RELEVENCY TUNING: THE MACHINE PARTS
FIXING THE MACHINE FIRST
41
- CHECK “WEIGHTS”:
- SIGNAL WEIGHT
- CTR WEIGHT
- IMPRESSION WEIGHT
- SYNONYM WEIGHT
- INTERNAL LINK WEIGHT
- SERP IMPRESSIONS
@JPSHERMAN
42. RELEVENCY TUNING: THE METADATA PARTS
TUNING THE UNDERLYING STRUCTURE OF THE DATA
42
- CHECK “WEIGHTS”:
- TITLE, DESCRIPTION &
KEYWORD METADATA
- STRUCTURED MARKUP
- SCHEMA
- PAGE COMPONENTS
ABSTRACT
CONCLUSION
COMMENTS
- ENTITIES/ TAXONOMIES/
ONTOLOGIES
@JPSHERMAN
43. RELEVENCY TUNING: THE CONTENT PARTS
PRACTICAL EXAMPLE OF WHY CONTENT IS KING
43
- CHECK CONTENT:
- IS YOUR CONTENT DESCRIPTIVE?
- DOES IT HAVE DIVERSE
LANGUAGE/ WORDS?
- IS IT ORGANIZED?
- ARE THEIR DIFFERENT CONTENT
TYPES?
THIS IS “GOOD SEO FOR CONTENT”
@JPSHERMAN
44. TUNING THE MACHINE FOR IMPROVEMENT
TRAINING DATA
- CONTROL GROUP
EVALUATION DATA
- EXPERIMENTAL GROUP
USE CTR AS A SUCCESS
SIGNAL
DETERMINE SIGNIFICANCE
CONFIRM WITH METRICS.
44
@JPSHERMAN
45. PUTTING IT ALL TOGETHER
CONTENTQUALITY
- WRITE GREAT, DESCRIPTIVE CONTENT
METADATA & IA
- DONTIGNORE YOURMETADATA, SCHEMA &
STRUCTURE
MACHINE RELEVANCY
- THE MACHINE WILL ATTEMPTTO
UNDERSTAND, BUT TUNING REQUIRES
HUMANS.
MEASUREMENT
- DEFINE WHAT SUCCESS IS, SEGMENT&
MEASURE CTR.
MACHINE LEARNING & INTENTION DETECTION IS A BALANCING
ACT
45
@JPSHERMAN
46. THANK YOU SO MUCH
46@JPSHERMAN
PLEASE FEEL FREE TO TALK TO ME.
BECAUSE NO ONE DOES THIS ALONE… THANK YOU:
- JASON BARNARD
- MANIKANDAN SIVANESAN
- JIM SCARBOROUGH
- DAWN ANDERSON
- MARIANNE SWEENY
- TREY GRAINGER
- CHARLIE HULL
- JAIMIE ALBERICO
- HAMLET BATISTA
- BRITNEY MULLER
- MARTHA VAN BERKEL
- GRANT INGERSOLL
- JR OAKES
- MICHAEL KING
- MARK TRAPHAGEN
- BARRY ADAMS
- JENN HOFFMAN
- JENNY HALASZ