How Humans & Machines Can Improve Site Search Results - Search Y: Paris

7 February 2020 | Paris - Bercy
L’Événement Search Marketing
PARIS 2020

2
TEACHING MACHINES &
HUMANS TO IMPROVE
SITE SEARCH RESULTS
JP SHERMAN
MANAGER OF SEARCH & FINDABILITY

STATE OF SITE SEARCH
Search is much larger than
search engines.
@JPSHERMAN

BIGGER THAN GOOGLE?
Now, what about those non-search engine searches?
Amazon, Facebook, Sohu, Weibo, Reddit, Instagram, Twitter, Ebay….
Web Search
App Search
GOOGLE’S 2 TRILLION PER YEAR SEARCH VOLUME
4
Maintaining Site search
will
● Increase Conversions
● Reduce Abandonment
● Reinforce Expertise
● Deliver a Good User &
Brand Experience
@JPSHERMAN

SEARCH AS A BEHAVIOR IS FRACTURED
THERE ARE MORE WAYS TO SEARCH THAN EVER.
5
Search isn’t just a search
engine. It’s in an
application, in IoT, in smart
devices
Findability Is:
● Understanding “How”
● Understanding
Selection
● Understanding Behavior
● Understanding Intent
@JPSHERMAN

IF THEY’RE SEARCHING ON YOUR SITE...
IF THEY DON’T FIND IT,
THEY WILL LEAVE YOU.
THEY THINK YOU HAVE WHAT THEY’RE LOOKING FOR.
6
If a user cannot find what
they’re looking for, they
know that Google is less
than a second away.
● They think you have
what they want
● They’re probably right
● If it’s not findable
● They’re gone.
@JPSHERMAN

IF THEY FIND IT, DO BALLOONS DROP?
THAT’S THE EXPECTATION.
NO.
7
@JPSHERMAN

USERS REMEMBER THEIR SITE SEARCH
EXPERIENCE
USERS ARE NOT KIND.
8
Clever girl...
A poor search experience
is remembered.
● Some trust is lost
● They’ll go to Google
● They may find what
they’re looking for.
● Let's hope your
competitor doesn’t rank.
@JPSHERMAN

SEARCH BEHAVIOR: HOW … NOT WHAT...
USERS SCAN WITH PURPOSE AND INTENT
9
Passive Search Active Search
Users apply criteria as they
scan through your results
● They have acceptance and
rejection criteria
● They spend less than a second
scanning a snippet
● Perception of Value is Critical
@JPSHERMAN

SITE SEARCH BEHAVIORAL SCIENCE
INFORMATION SCENT TRAILS
USERS LOOK FOR “INFORMATION SCENT TRAILS”
10
USERS SCAN FOR
PATTERNS
● They include elements
of or related to their
intent
● They look at textual,
image proximities
● Active vs. Passive
Scanning
● Value Signals.
@JPSHERMAN

INFORMATION SCENT TRAILS
A QUICK EXAMPLE
11
An intent based word-
cloud.
● Users scan
● When words match
intent
● Acceptance & Rejection
Criteria.
● One will lead to an
information trail.
TYPES PROPERTIES
@JPSHERMAN

USER PERCEPTION OF VALUE
WITH INTENT, USERS LOOK FOR VALUE
12
Results for “Road Bikes”
● sigh.
● They all look alike
● Which one is good?
@JPSHERMAN

USER PERCEPTION OF VALUE
WITH INTENT, USERS LOOK FOR VALUE
13
Results for “Road Bikes”
● Value applied as metadata.
● Triggers for behavior
● Which one is better?
@JPSHERMAN

THINGS HUMANS CAN DO TO IMPROVE
RESULTS
SPOILER ALERT: IT’S A LOT OF THE STUFF WE ALREADY DO
14
Actionable Tasks to Improve Site Search
Results:
● Keyword Metadata
● Synonym Lists
● Boosted Results
● SERP Features
● Clickstream Data
● Personalization
@JPSHERMAN

RESULTS
IMPROVE THE SERP DESIGN
15
Actionable Tasks to Improve
Site Search Results:
● SERP Features
AUTOCOMPLETE/
AUTOSUGGEST
FACETS
KEYMATCH
KNOWLEDGE
GRAPH
NATURAL RESULTS
@JPSHERMAN

RESULTS
LOCATION CAN BE A STRONG SIGNAL OF INTENT
16
Actionable Tasks to Improve
Site Search Results:
● Personalization
Keyword: Bike Tires
Saint-Brieuc Bay Portes du Soleil
Location Bias Can
Deliver Intent
Road Bike Tires Mountain Bike Tires
@JPSHERMAN

RESULTS
MEASURE HUMAN BEHAVIOR
17
Users apply criteria as they scan
through your results
● Measure consumption & conversion
● Measure dwell time
● Measure time from query to conversion
@JPSHERMAN

RESULTS
SPOILER ALERT: IT’S A LOT OF THE STUFF WE ALREADY DO
18
Design your SERP for the user.
● SERP Design
● Accessibility for people with
visual impairments
● Snippet Design
● Features
● Disambiguation
@JPSHERMAN

SO.. UH… WHAT’S THE POINT?
Site Search is a massive behavior across the web.
1. Simple changes to the search platform & content will pay off
2. Users who search your site think you have what they want
3. Metadata and what is displayed in the SERP influences CTR
4. Use boosting of content to quickly rank on your site-search
5. Consider Design
6. Consider Accessibility
Don’t Be Google. Google has to figure out “everything”. You don't.
Be Better Than Google.
YOU CAN DO A LOT TO MAKE SITE SEARCH BETTER, BUT THERE’S
MORE
19
@JPSHERMAN

UNDERSTANDING CONTEXT
AT RED HAT, WE SELL FREE SOFTWARE.
THIS IS WHO WE ARE
20
@JPSHERMAN

WE SUPPORT PEOPLE FIRST
WHICH MEANS THAT WE ARE A SUBSCRIPTION & SUPPORT COMPANY.
THIS IS MY CONTEXT
21
@JPSHERMAN

HAPPY PEOPLE RECOGNIZE VALUE
THE FASTER PEOPLE FIND ANSWERS TO THEIR SUPPORT NEEDS, THE HAPPIER
THEY ARE.
THIS IS HOW WE SUCCEED.
22
@JPSHERMAN

SEARCH INTENT IS REALLY HARD
PEOPLE LOOK FOR INFORMATION…. UNIQUELY.
PEOPLE CAN BE WEIRD
23
my linux is broken
@JPSHERMAN

SETTING UP THE MACHINE TO LEARN
WOULDN’T IT BE GREAT IF WE COULD PREDICT INTENT?
MAGIC ISN’T REQUIRED.
24
@JPSHERMAN

RISE OF THE MACHINES
- GOALS
- REDUCE ZERO RESULTS
- IMPROVE MATCHING
- IMPROVE CTR
SOME THINGS TO REMEMBER
25
@JPSHERMAN

FIRST, LET’S LOOK AT THE DATA YOU HAVE.
- UNSTRUCTURED DATA
STRUCTURE IS VERY IMPORTANT
26
- STRUCTURED DATA
@JPSHERMAN

NEXT, LET’S LOOK AT YOUR INFORMATION
ARCHITECTURE
- UNSTRUCTURED IA
STRUCTURE IS STILL IMPORTANT
27
- STRUCTURED IA
@JPSHERMAN

NOW, LET’S LOOK AT THE PLAN
START SIMPLE, INCREASE COMPLEXITY
28
keywordsearch
Taxonomies
EntityExtraction
Ontologies
Queryintent
Queryclassification
Semantic parsing
Clustering
RelevancyTuning
Signals
A/BTesting
LTR
Self LearningGoal
Reference:
https://www.slideshare.net/treygrainger/intent-algorithms
Trey Grainger, SVP Engineering Lucidworks
@JPSHERMAN

THE PLATFORM OF THE ENGINE
SEARCH PLATFORM
FREE, OPEN-SOURCE & POWERFUL - LUCENE IS POWERFUL.
29
Lucene is a Java-based, free and open-source search engine software platform. Lucene has
several different flavors.
● Apache Nutch
● Apache Solr
● Compass
● CrateDB
● DocFetcher
● ElasticSearch
● Kinosearch
● Swiftype
@JPSHERMAN

THE MACHINE LEARNING APPLICATION
DATA SET
THESE ARE YOUR QUERIES OR CONTENT YOU WANT TO LEARN FROM…
FOR EXAMPLE.
“ARE THESE CATS OR DOGS?”
DATA SETS, TRAINING SETS AND HOW TO MEASURE.
30
@JPSHERMAN

TRAINING SET
PARTITION YOUR DATA SET TO 10% EVALUATION, 90% TRAINING.
31
DATA SET: TRAINING SET: EVALUATION
@JPSHERMAN

TO QUANTIFY “DOG-NESS” OR
“CAT-NESS” A POWERFUL
FORMULA IS THE OKAPI BM25
FORMULA.
32
Good resources on data science formulation.
https://www.datasciencecentral.com/profiles/blogs/140-machine-learning-formulas
Reference:
https://www.amazon.com/Solr-Action-Trey-Grainger/dp/1617291021
Reference:
https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables
@JPSHERMAN

DATA OUTPUT AND FRACTIONAL SCORES.
HOW CLOSE TO “CAT-NESS” IS THIS?
33
0.99732134 0.87569821 0.62587471 0.0000111
@JPSHERMAN

WHAT IF THE QUERY ISN’T “CATS” BUT “FUZZY ANIMALS”?
THIS IS A CORE VALUE OF STRUCTURED MARKUP AS AN ATTRIBUTE SIGNAL
34
0.9632115 0.9178585 0.9244844 0.9371025
@dawnieandois an inspirationto all of
us.
@JPSHERMAN

THE RECIPE FOR MACHINE LEARNING
LEARN TO RANK PLUGIN (LTR)
IT’S NOT AS HARD AS YOU MAY THINK.
35
LTR is a Lucene-compatible plugin that allows the application of machine
learning. It uses a wide variety of ranking signals.
● QUERY INDEPENDENT: Looks only at the body of indexed content
● QUERY DEPENDENT: Looks at both the query and the document, most often a TF-
IDF score.
● QUERY LEVEL FEATURES: Looks only at the query
@JPSHERMAN

HOW MACHINES PREDICT INTENT
A SUPPORT ORGANIZATION’S
PRIMARY TASK IS TO HELP
THIS IS THE CORE OF FINDABILITY
- TO DELIVER THE RIGHT
INFORMATION
- AT THE RIGHT TIME
- TO RESOLVE AN ISSUE
- QUICKLY
- THROUGH ANY CHANNEL
THROUGH THE CONTEXT OF SUPPORT
36
@JPSHERMAN

CONFIGURING SYSTEM FOR EXPERIMENTS
STARTING SMALL:
- WE SELECTED A SINGLE PRODUCT
- RED HAT OPENSHIFT
- WE SELECTED A SINGLE USE-CASE
- TROUBLESHOOTING
- WE SELECTED THE CONTENT
- SOLUTION CONTENT TYPE
MORE DATA SETS, TRAINING SETS AND EVALUATION SETS
37
@JPSHERMAN

STARTING SMALL:
- SOLUTION CONTENT IS 30% OF
CONTENT.
38
DOCUMENTATION
VIDEOS
ARTICLES
SECURITY
PRODUCT
SOLUTION
@JPSHERMAN

APPROXIMATELY 100K INDIVIDUAL
PIECES OF CONTENT.
- 90K WENT TO TRAINING
PARTITION
- 10K WENT TO EVALUATION
39
@JPSHERMAN

RUN THE RELEVANCY ALGORITHM
RUN THE EVALUATION
- FIRST CHECK:
- DOES THIS LOOK RIGHT?
- CONFIRM/ CORRECT SAMPLE
- DEFINE “SUCCESS”
WHAT PERCENT EQUALS
“GOOD”
LETS ASSUME FAILURE. WHAT
CAN BE DONE?
THIS REQUIRES SOME HUMAN INTERVENTION
40
@JPSHERMAN

RELEVENCY TUNING: THE MACHINE PARTS
FIXING THE MACHINE FIRST
41
- CHECK “WEIGHTS”:
- SIGNAL WEIGHT
- CTR WEIGHT
- IMPRESSION WEIGHT
- SYNONYM WEIGHT
- INTERNAL LINK WEIGHT
- SERP IMPRESSIONS
@JPSHERMAN

RELEVENCY TUNING: THE METADATA PARTS
TUNING THE UNDERLYING STRUCTURE OF THE DATA
42
- CHECK “WEIGHTS”:
- TITLE, DESCRIPTION &
KEYWORD METADATA
- STRUCTURED MARKUP
- SCHEMA
- PAGE COMPONENTS
ABSTRACT
CONCLUSION
COMMENTS
- ENTITIES/ TAXONOMIES/
ONTOLOGIES
@JPSHERMAN

RELEVENCY TUNING: THE CONTENT PARTS
PRACTICAL EXAMPLE OF WHY CONTENT IS KING
43
- CHECK CONTENT:
- IS YOUR CONTENT DESCRIPTIVE?
- DOES IT HAVE DIVERSE
LANGUAGE/ WORDS?
- IS IT ORGANIZED?
- ARE THEIR DIFFERENT CONTENT
TYPES?
THIS IS “GOOD SEO FOR CONTENT”
@JPSHERMAN

TUNING THE MACHINE FOR IMPROVEMENT
TRAINING DATA
- CONTROL GROUP
EVALUATION DATA
- EXPERIMENTAL GROUP
USE CTR AS A SUCCESS
SIGNAL
DETERMINE SIGNIFICANCE
CONFIRM WITH METRICS.
44
@JPSHERMAN

PUTTING IT ALL TOGETHER
CONTENTQUALITY
- WRITE GREAT, DESCRIPTIVE CONTENT
METADATA & IA
- DONTIGNORE YOURMETADATA, SCHEMA &
STRUCTURE
MACHINE RELEVANCY
- THE MACHINE WILL ATTEMPTTO
UNDERSTAND, BUT TUNING REQUIRES
HUMANS.
MEASUREMENT
- DEFINE WHAT SUCCESS IS, SEGMENT&
MEASURE CTR.
MACHINE LEARNING & INTENTION DETECTION IS A BALANCING
ACT
45
@JPSHERMAN

THANK YOU SO MUCH
46@JPSHERMAN
PLEASE FEEL FREE TO TALK TO ME.
BECAUSE NO ONE DOES THIS ALONE… THANK YOU:
- JASON BARNARD
- MANIKANDAN SIVANESAN
- JIM SCARBOROUGH
- DAWN ANDERSON
- MARIANNE SWEENY
- TREY GRAINGER
- CHARLIE HULL
- JAIMIE ALBERICO
- HAMLET BATISTA
- BRITNEY MULLER
- MARTHA VAN BERKEL
- GRANT INGERSOLL
- JR OAKES
- MICHAEL KING
- MARK TRAPHAGEN
- BARRY ADAMS
- JENN HOFFMAN
- JENNY HALASZ

How Humans & Machines Can Improve Site Search Results - Search Y: Paris

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a How Humans & Machines Can Improve Site Search Results - Search Y: Paris

Similar a How Humans & Machines Can Improve Site Search Results - Search Y: Paris (20)

Último

Último (20)

How Humans & Machines Can Improve Site Search Results - Search Y: Paris