7. Mobile shift
7
Desktop Tablet Mobile
Av Words 2.73 2.88 3.05
Av Chars 17.44 18.02 18.93
Song, Ma, Wang, Wang,
Exploring and exploiting
user search behavior on mobile
and tablet devices to improve
search relevance
WWW 2013
Mobile categories are less skewed (Image 42%, Adult 23.5%, Navigational 15%) vs
Desktop (37% Navigational, 19.9% Image, 7.7% commerce)
There’s also a difference between top-level domains:
Mobile Desktop
youtube.com facebook.com
wikipedia.org yahoo.com
answers.yahoo.com wikipedia.org
ehow.com youtube.com
imdb.com walmart.com
8. Q&A in search engines?
8
Fagni, Perego, Silvestri, Orlando. Caching and prefetching query results by exploiting historical usage data. TOIS 2006
9. The web search perspective
Web search today is really fast, without necessarily being
intelligent
› A search engine without any understanding
Trends
› Convergence of search and online media
• End of the 10 blue links
› Personal, social search
• Search over my world
• Search using my profile
› New interfaces
• Contextual, interactive
› Search that anticipates
› Solve tasks not queries
10. Search is really fast, without necessarily being intelligent
Could Watson
explain why the
answer is
Toronto?
11. We came to bury the 10 blue links
8/31/201511
Meaningless
query
12. We came to bury the 10 blue links
Meaningful
query
15. Search that anticipates
15
Google Now
Star Trek computer
• Jason Douglas: Structured Data at Google, SemTechBiz SF 2013
16. Interactive Voice Search
Apple’s Siri
› Question-Answering
• Variety of backend sources
including Wolfram Alpha and
various Yahoo! services
› Task completion
• E.g. schedule an event
Google Now
Facebook’s M
19. Web search by 2009
19
Large classes of queries are solved to perfection
Improvements in web search are harder and harder to come by
› Relevance models, hyperlink structure and interaction data
› Combination of features using machine learning
› Heavy investment in computational power
• real-time indexing, instant search, datacenters and edge services
Search ranking features
› Text matching (including anchor text)
› Page authority (Pagerank)
› User behavior signals
› Other features: context, history (still not very well understood)
20. Language issues
› Multiple interpretations
• jaguar
• paris hilton
› Secondary meaning
• george bush (and I mean the beer brewer
in Arizona)
› Subjectivity
• reliable digital camera
• paris hilton sexy
› Imprecise or overly precise searches
• jim hendler
Complex needs
› Missing information
• brad pitt zombie
• florida man with 115 guns
• 35 year old computer scientist living in
barcelona
› Category queries
• countries in africa
• barcelona nightlife
› Transactional or computational queries
• 120 dollars in euros
• digital camera under 300 dollars
• world temperature in 2020
Poorly solved information needs remain
Many of these queries would
not be asked by users, who
learned over time what search
technology can and can not
do.
21. Semantic Search: a definition
Semantic search is a retrieval paradigm where
› User intent and resources are represented using semantic models
• Not just symbolic representations
› Semantic models are exploited in the matching and ranking of resources
Often a hybrid of document and data retrieval
› Documents with metadata
• Metadata may be embedded inside the document
• I’m looking for documents that mention countries in Africa.
› Data retrieval
• Structured data, but searchable text fields
• I’m looking for directors, who have directed movies where the synopsis mentions dinosaurs.
Wide range of semantic search systems
› Employ different semantic models, possibly at different steps of the search process and in order to support different
tasks
22. Semantic Search – a process view
Query
Constructi
on
•Keywords
•Forms
•NL
•Formal language
Query
Processin
g
•IR-style matching & ranking
•DB-style precise matching
•KB-style matching & inferences
Result
Presentation
•Query visualization
•Document and data presentation
•Summarization
Query
Refinement
•Implicit feedback
•Explicit feedback
•Incentives
Document Representation
Knowledge Representation
Semantic Models
Resources
Documents
23. Yahoo’s Knowledge Graph
Chicago Cubs
Chicago
Barack Obama
Carlos Zambrano
10% off tickets
for
plays for
plays in
lives in
Brad Pitt
Angelina Jolie
Steven Soderbergh
George Clooney
Ocean’s Twelve
partner
directs
casts in
E/R
casts
in
takes place in
Fight Club
casts in
Dust Brothers
casts
in
music by
Nicolas Torzec: Making knowledge reusable at Yahoo!:
a Look at the Yahoo! Knowledge Base (SemTech 2013)
24. The role of Information Extraction in Semantic Search
Making sense of
› Content
• Web, News, Twitter, email, etc.
› User behavior
• Not just queries, also interaction
› NER, NEC, NEL, Time expressions, topic, event and relation extraction
Mapping to an abstract representation
› Linguistic models
• Taxonomies, thesauri, dictionaries of entity names
• Natural language structures extracted from text, e.g. using dependency parsing
• Inference along linguistic relations, e.g. broader/narrower terms, textual entailment
› Conceptual models
• Ontologies capture entities in the world and their relationships
• Words and phrases in text or records in a database are identified as representations of ontological elements
• Inference along ontological relations, e.g. logical entailment
27. Document processing
Goal
› Provide a higher level representation of information in some conceptual space
› Conceptual space is different for Semantic Web and NLP based search engines
Limited document understanding in traditional search
› Page structure such as fields, templates
› Understanding of anchors, other HTML elements
› Limited NLP
In Semantic Search, more advanced text processing and/or reliance on
explicit metadata
› Information sources are not only text but also databases and web services
28. Example: microformats and RDFa
<div class="vcard">
<a class="email fn" href="mailto:jfriday@host.com">Joe Friday</a>
<div class="tel">+1-919-555-7878</div>
<div class="title">Area Administrator, Assistant</div>
</div>
<p typeof="contact:Info" about="http://example.org/staff/jo">
<span property="contact:fn">Jo Smith</span>.
<span property="contact:title">Web hacker</span> at
<a rel="contact:org" href="http://example.org"> Example.org </a>.
You can contact me <a rel="contact:email"
href="mailto:jo@example.org">
via email </a>.
</p> ...
Microformat (hCard)
RDFa
29. schema.org
Agreement on a shared set of schemas for common types of web content
› Bing, Google, and Yahoo! as initial founders (June, 2011)
• Yandex joins schema.org in Nov, 2011
› Similar in intent to sitemaps.org
• Use a single format to communicate the same information to all three search engines
schema.org covers areas of interest to all search engines
› Business listings (local), creative works (video), recipes, reviews and more
› Microdata, RDFa, JSON-LD syntax
Collaborative effort
› Growing number of 3rd party contributions
› schema.org discussions at public-vocabs@w3.org
30. Summary
30
If we want to…
› Answer queries, not just show links
› Personalize search
› Take context into account
› Anticipate user needs
… we need to understand users, content and the world at large!
Search engine have changed considerably
› Queries have changed
• Users seek for more info
• Vertical search (travel, local, images, videos, news, etc.)
• Will move towards a more task-oriented scenario (mobile context shift)
Semantics help tail queries
› Head queries solved mostly by clickthrough data
35. Search over graph data
Unstructured or hybrid search over RDF/graph data
› Supporting end-users
• Users who can not express their need in SPARQL
› Dealing with large-scale data
• Giving up query expressivity for scale
› Dealing with heterogeneity
• Users who are unaware of the schema of the data
• No single schema to the data
– Example: 2.6m classes and 33k properties in Billion Triples 2009
Entity search
› Queries where the user is looking for a single entity named or described in the query
› e.g. kaz vaporizer, hospice of cincinnati, mst3000
Elbassuoni, Blanco. Keyword Search over RDF graphs. CIKM 2011
Blanco, Mika, Vigna. Effective and Efficient entity search in RDF data. ISWC 2011
36. Entity-seeking queries make up 40-
50% of the query volume
› Jeffrey Pound, Peter Mika, Hugo
Zaragoza: Ad-hoc object retrieval in the
web of data. WWW 2010
› Thomas Lin, Patrick Pantel, Michael
Gamon, Anitha Kannan, Ariel Fuxman:
Active objects: actions for entity-
centric search. WWW 2012
› Show a summary of the most likely
information-needs
› Including related entities for navigation
› Roi Blanco, Berkant Barla Cambazoglu,
Peter Mika, Nicolas Torzec: Entity
Recommendations in Web Search.
ISWC 2013
Application:
entity displays in web search
37. Semantic understanding of queries
38
Entities play an important role
› [Pound et al, WWW 2010], [Lin et al WWW 2012]
› ~70% of queries contain a named entity (entity mention queries)
• brad pitt height
› ~50% of queries have an entity focus (entity seeking queries)
• brad pitt attacked by fans
› ~10% of queries are looking for a class of entities
• brad pitt movies
Entity mention query = <entity> {+ <intent>}
› Intent is typically an additional word or phrase to
• Disambiguate, most often by type e.g. brad pitt actor
• Specify action or aspect e.g. brad pitt net worth, toy story trailer
38. oakland as bradd pitt movie moneyball movies.yahoo.com oakland as wikipedia.org
captain america movies.yahoo.com moneyball trailer movies.yahoo.com
money moneyball movies.yahoo.com
moneyball movies.yahoo.com movies.yahoo.com en.wikipedia.org movies.yahoo.com peter brand
peter brand oakland nymag.com moneyball the movie www.imdb.com
moneyball trailer movies.yahoo.com moneyball trailer
brad pitt brad pitt moneyball brad pitt moneyball movie brad pitt moneyball brad pitt moneyball oscar
www.imdb.com
relay for life calvert ocunty www.relayforlife.org trailer for moneyball movies.yahoo.com
moneyball.movie-trailer.com
moneyball en.wikipedia.org movies.yahoo.com map of africa www.africaguide.com
money ball movie www.imdb.com money ball movie trailer moneyball.movie-trailer.com
brad pitt new www.zimbio.com www.usaweekend.com www.ivillage.com www.ivillage.com brad pitt
news news.search.yahoo.com moneyball trailer moneyball trailer www.imdb.com www.imdb.com
Patterns in logs are hard to see
Sample of sessions from June, 2011 containing the term “moneyball”
› What are users trying to do?
39. oakland as bradd pitt movie moneyball trailer movies.yahoo.com oakland as wikipedia.org
Semantic annotations help to generalize…
Sports team
Movie
Actor
40. … and understand user needs
8/31/201541
moneyball trailer
what the user wants to do with it
Movie
Object of the query
41. Semantic analysis of query logs
8/31/201542
Multiple approaches
› Dictionary tagging
• Match entities in a fixed dictionary
• Scalable, high recall, not very precise
› Entity retrieval
• Retrieval an index of the knowledge base
› Post-retrieval methods
• Annotate a document corpus with entities
• Retrieve documents and aggregate annotations
Applications
› Usage mining
• L. Hollink, P. Mika and R. Blanco. Web Usage Mining with Semantic Analysis. WWW 2013
› Related-entity recommendations
• R. Blanco, B. Cambazoglu, P. Mika, N. Torzec: Entity Recommendations in Web Search. ISWC 2013
42. Usage mining
43
Site owners would like to find usage patterns
› Reducing abandonment
› Competitive analysis
Problem: patterns are lost in the data
› 64% of queries are unique within a year
› Even the most frequent patterns have low support
43. Solving the sparseness problem through annotations
44
Frequent patterns of annotations are more general and less noisy
44. Match by keywords
› Closer to text retrieval
• Match individual keywords
• Score and aggregate
• https://github.com/yahoo/Glimmer/
Match by aliases
› Closer to entity linking
• Find potential mentions of entities (spots)
in query
• Score candidates for each spot
Two matching approaches
brad
(actor) (boxer) (city)
(actor) (boxer) (lake)
pitt
brad pitt
(actor) (boxer)
45. … back to query understanding
8/31/201546
moneyball trailer
what the user wants to do with it
Movie
Object of the query
46. Fast Entity Linking in Queries
47
Use aliases to “entity pages” (Wikipedia, IMDB, local, etc.) as source of
information for entity-query aliases
Chunk the query into the most likely segmentation
Be fast by avoiding entity to entity decisions when scoring
Add context externally using semantic relatedness of keywords and
entities
Compression:
› Minimal perfect hashes + Golomb coding
› All Wiki + 1 year of query logs of aliases + 1 year of query sessions w2v model < 3GB
Blanco, Ottaviano, Meij. Fast and space-efficient entity linking for queries. WSDM 2015
47. Problem definition
48
Given
› Query q consisting of an ordered list of tokens ti
› Segment s from a segmentation s from all possible segmentations Sq
› Entity e from a set of candidate entities e from the complete set E
Find
› For all possible segmentations and candidate entities
› Select best entity for segment independently of other segments
48. Keyphraseness
› How likely is a segment to be an entity
mention?
› e.g. how common is “in”(unlinked) vs.
“in” (linked) in the text
Commonness
› How likely that a linked segment refers
to a particular entity?
› e.g. how often does “brad pitt” refers to
Brad Pitt (actor) vs. Brad Pitt (boxer)
49
Intuitions
Assume: also given annotated collections ci with segments of text linked to entities from E.
50. Context-aware extension
51
Estimated by word2vec
representation
Probability of segment and
query are independent
of each other given the entity
Probability of segment and
query are independent
of each other
51. Results: effectiveness
52
Significant improvement over external baselines and internal system
› Measured on public Webscope dataset Yahoo Search Query Log to Entities
Search over Bing, top
Wikipedia result
State-of-the-art in literature
A trivial search engine
over Wikipedia
Our method:
Fast Entity Linker (FEL)
FEL + context
52. Two orders of magnitude faster
than state-of-the-art
› Simplifying assumptions at scoring time
› Adding context independently
› Dynamic pruning
Small memory footprint
› Compression techniques, e.g. 10x
reduction in word2vec storage
53
Results: efficiency
54. Mobile search challenges and opportunities
55
Interaction
› Question-answering
› Support for interactive retrieval
› Spoken-language access
› Task completion
Contextualization
› Personalization
› Geo
› Context (work/home/travel)
• Try getaviate.com
55. Task completion
56
We would like to help our users in task completion
› But we have trained our users to talk in nouns
• Retrieval performance decreases by adding verbs to queries
› We need to understand what the available actions are
Modeling actions
› Understand what actions can be taken on a page
› Help users in mapping their query to potential actions
› Applications in web search, email etc.
THING
THING
Schema.org v1.2
including Actions
published
April 16, 2014
56. The end
57
Many thanks for the Semantic Search team in London
› Peter Mika,
› Edgar Meij
› Hugues Bouchard
Joint work with many collaborators: Sebastiano Vigna, Laura Hollink,
Giuseppe Ottaviano, Nicolas Torzec, among others.
roi@yahoo-inc.com