SlideShare una empresa de Scribd logo
1 de 49
Descargar para leer sin conexión
Which	
  Ver)cal	
  Search	
  Engines	
  	
  
are	
  Relevant?	
  
Understanding	
  Ver)cal	
  Relevance	
  Assessments	
  for	
  Web	
  Queries	
  
	
  
Ke	
  Zhou1,	
  Ronan	
  Cummins2,	
  Mounia	
  Lalmas3,	
  Joemon	
  M.	
  Jose1	
  
1University	
  of	
  Glasgow	
  	
  
2University	
  of	
  Greenwich	
  	
  
3Yahoo!	
  Labs	
  Barcelona	
  
WWW	
  2013,	
  Rio	
  de	
  Janeiro	
  
Aggregated	
  Search	
  
•  Diverse	
  search	
  ver)cals	
  
(image,	
  video,	
  news,	
  etc.)	
  
are	
  available	
  on	
  the	
  web.	
  
•  Aggrega)ng	
  (embedding)	
  
ver)cal	
  results	
  into	
  
“general	
  web”	
  results	
  has	
  
become	
  de-­‐facto	
  in	
  
commercial	
  web	
  search	
  
engine.	
  
Ver)cal	
  
search	
  
engines	
  
General	
  
web	
  
search	
  
Mo)va)on	
  
Aggregated	
  Search	
  
•  Diverse	
  search	
  ver)cals	
  
(image,	
  video,	
  news,	
  etc.)	
  
are	
  available	
  on	
  the	
  web.	
  
•  Aggrega)ng	
  (embedding)	
  
ver)cal	
  results	
  into	
  
“general	
  web”	
  results	
  has	
  
become	
  de-­‐facto	
  in	
  
commercial	
  web	
  search	
  
engine.	
  
Ver)cal	
  
search	
  
engines	
  
General	
  
web	
  
search	
  
Mo)va)on	
  
Ver)cal	
  
selec)on	
  
Evalua)on	
  of	
  Aggregated	
  Search	
  
•  Evalua)on	
  solely	
  based	
  
on	
  ver)cal	
  selec)on.	
  
•  Compare	
  system	
  
predic)on	
  set	
  against	
  
user	
  annota%on	
  set.	
  
•  Annota)on	
  is	
  gathered	
  
– Explicitly	
  (assessing)	
  
– Implicitly	
  (deriving	
  from	
  
search	
  logs)	
  
Mo)va)on	
  
assessor	
  
Topic:	
  yoga	
  poses	
  
System	
  A	
  
System	
  B	
  
System	
  C	
  >	
  System	
  B	
  >	
  System	
  A	
  
System	
  C	
  
Assessor:	
  which	
  ver)cal	
  search	
  engines	
  are	
  relevant?	
  
•  Defini)on	
  of	
  relevance	
  of	
  a	
  ver)cal,	
  given	
  
a	
  query,	
  remains	
  complex.	
  
– Different	
  work	
  makes	
  different	
  assump)ons.	
  
– The	
  underlying	
  assump)ons	
  made	
  may	
  have	
  
a	
  major	
  effect	
  on	
  the	
  evalua)on	
  of	
  a	
  SERP.	
  	
  
•  We	
  want	
  to	
  understand	
  different	
  ver)cal	
  
assessment	
  processes	
  and	
  inves)gate	
  the	
  
impact	
  of	
  these.	
  
Mo)va)on	
  
Assessor:	
  which	
  ver)cal	
  search	
  engines	
  are	
  relevant?	
  
•  Defini)on	
  of	
  relevance	
  of	
  a	
  ver)cal,	
  given	
  
a	
  query,	
  remains	
  complex.	
  
– Different	
  work	
  makes	
  different	
  assump)ons.	
  
– The	
  underlying	
  assump)ons	
  made	
  may	
  have	
  
a	
  major	
  effect	
  on	
  the	
  evalua)on	
  of	
  a	
  SERP.	
  	
  
•  We	
  want	
  to	
  understand	
  different	
  ver)cal	
  
assessment	
  processes	
  and	
  inves)gate	
  the	
  
impact	
  of	
  these.	
  
Mo)va)on	
  
(RQ1)	
  Assump)ons:	
  user	
  perspec)ve	
  
•  Pre-­‐retrieval:	
  
–  Ver%cal	
  Orienta%on:	
  before	
  issuing	
  the	
  
query,	
  the	
  user	
  thinks	
  about	
  which	
  
ver)cals	
  might	
  provide	
  be`er	
  results.	
  
•  Post-­‐retrieval:	
  	
  
–  Aaer	
  viewing	
  search	
  results,	
  the	
  user	
  
considers	
  which	
  ver)cal	
  provides	
  be`er	
  
results.	
  
•  Influencing	
  factors	
  
–  Ver)cal	
  orienta)on	
  (type	
  preference)	
  
–  Within-­‐ver)cal	
  ranking	
  
•  Serendipity	
  	
  
–  Visual	
  a`rac)veness	
  
Problem	
  and	
  
Previous	
  Work	
  
pre-­‐retrieval	
  
user-­‐need	
  
(RQ1)	
  Assump)ons:	
  user	
  perspec)ve	
  
•  Pre-­‐retrieval:	
  
–  Ver%cal	
  Orienta%on:	
  before	
  issuing	
  the	
  
query,	
  the	
  user	
  thinks	
  about	
  which	
  
ver)cals	
  might	
  provide	
  be`er	
  results.	
  
•  Post-­‐retrieval:	
  	
  
–  Aaer	
  viewing	
  search	
  results,	
  the	
  user	
  
considers	
  which	
  ver)cal	
  provides	
  be`er	
  
results.	
  
•  Influencing	
  factors	
  
–  Ver)cal	
  orienta)on	
  (type	
  preference)	
  
–  Within-­‐ver)cal	
  ranking	
  
•  Serendipity	
  	
  
–  Visual	
  a`rac)veness	
  
Problem	
  and	
  
Previous	
  Work	
  
……	
  
Post-­‐retrieval	
  
user	
  perspec)ve	
  	
  
(RQ1)	
  Assump)ons:	
  user	
  perspec)ve	
  
•  Pre-­‐retrieval:	
  
–  Ver%cal	
  Orienta%on:	
  before	
  issuing	
  the	
  
query,	
  the	
  user	
  thinks	
  about	
  which	
  
ver)cals	
  might	
  provide	
  be`er	
  results.	
  
•  Post-­‐retrieval:	
  	
  
–  Aaer	
  viewing	
  search	
  results,	
  the	
  user	
  
considers	
  which	
  ver)cal	
  provides	
  be`er	
  
results.	
  
•  Influencing	
  factors	
  
–  Ver)cal	
  orienta)on	
  (type	
  preference)	
  
–  Within-­‐ver)cal	
  ranking	
  
•  Serendipity	
  	
  
–  Visual	
  a`rac)veness	
  
Problem	
  and	
  
Previous	
  Work	
  
……	
  
pre-­‐retrieval	
  
user-­‐need	
  
Post-­‐retrieval	
  
user	
  perspec)ve	
  	
  
(RQ1)	
  Assump)ons:	
  user	
  perspec)ve	
  
•  Pre-­‐retrieval:	
  
–  Ver%cal	
  Orienta%on:	
  before	
  issuing	
  the	
  
query,	
  the	
  user	
  thinks	
  about	
  which	
  
ver)cals	
  might	
  provide	
  be`er	
  results.	
  
•  Post-­‐retrieval:	
  	
  
–  Aaer	
  viewing	
  search	
  results,	
  the	
  user	
  
considers	
  which	
  ver)cal	
  provides	
  be`er	
  
results.	
  
•  Influencing	
  factors	
  
–  Ver)cal	
  orienta)on	
  (type	
  preference)	
  
–  Within-­‐ver)cal	
  ranking	
  
•  Serendipity	
  	
  
–  Visual	
  a`rac)veness	
  
Problem	
  and	
  
Previous	
  Work	
  
……	
  
pre-­‐retrieval	
  
user-­‐need	
  
Post-­‐retrieval	
  
user	
  perspec)ve	
  	
  
(RQ2)	
  Assump)ons:	
  dependency	
  of	
  relevance	
  	
  
•  Inter-­‐dependent	
  approach:	
  	
  
–  quality	
  of	
  ver)cals	
  is	
  rela)ve	
  and	
  
dependent	
  on	
  each	
  other.	
  
•  Web-­‐anchor	
  approach:	
  	
  
–  quality	
  of	
  “general	
  web”	
  serves	
  as	
  a	
  
reference	
  criteria	
  for	
  deciding	
  relevance.	
  
•  Context	
  	
  
–  Does	
  the	
  context	
  (results	
  returned	
  from	
  
other	
  ver)cals)	
  affect	
  a	
  user’s	
  percep)on	
  
of	
  the	
  relevance	
  of	
  the	
  ver)cal	
  of	
  
interest?	
  
•  U)lity	
  vs.	
  Effort	
  
Problem	
  and	
  
Previous	
  Work	
  
Inter-­‐dependent	
  
approach	
  
(RQ2)	
  Assump)ons:	
  dependency	
  of	
  relevance	
  	
  
•  Inter-­‐dependent	
  approach:	
  	
  
–  quality	
  of	
  ver)cals	
  is	
  rela)ve	
  and	
  
dependent	
  on	
  each	
  other.	
  
•  Web-­‐anchor	
  approach:	
  	
  
–  quality	
  of	
  “general	
  web”	
  serves	
  as	
  a	
  
reference	
  criteria	
  for	
  deciding	
  relevance.	
  
•  Context	
  	
  
–  Does	
  the	
  context	
  (results	
  returned	
  from	
  
other	
  ver)cals)	
  affect	
  a	
  user’s	
  percep)on	
  
of	
  the	
  relevance	
  of	
  the	
  ver)cal	
  of	
  
interest?	
  
•  U)lity	
  vs.	
  Effort	
  
Problem	
  and	
  
Previous	
  Work	
  
Web-­‐anchor	
  
approach	
  
(RQ2)	
  Assump)ons:	
  dependency	
  of	
  relevance	
  	
  
•  Inter-­‐dependent	
  approach:	
  	
  
–  quality	
  of	
  ver)cals	
  is	
  rela)ve	
  and	
  
dependent	
  on	
  each	
  other.	
  
•  Web-­‐anchor	
  approach:	
  	
  
–  quality	
  of	
  “general	
  web”	
  serves	
  as	
  a	
  
reference	
  criteria	
  for	
  deciding	
  relevance.	
  
•  Context	
  	
  
–  Does	
  the	
  context	
  (results	
  returned	
  from	
  
other	
  ver)cals)	
  affect	
  a	
  user’s	
  percep)on	
  
of	
  the	
  relevance	
  of	
  the	
  ver)cal	
  of	
  
interest?	
  
•  U)lity	
  vs.	
  Effort	
  
Problem	
  and	
  
Previous	
  Work	
  
Inter-­‐dependent	
  
approach	
  
Web-­‐anchor	
  
approach	
  
(RQ2)	
  Assump)ons:	
  dependency	
  of	
  relevance	
  	
  
•  Inter-­‐dependent	
  approach:	
  	
  
–  quality	
  of	
  ver)cals	
  is	
  rela)ve	
  and	
  
dependent	
  on	
  each	
  other.	
  
•  Web-­‐anchor	
  approach:	
  	
  
–  quality	
  of	
  “general	
  web”	
  serves	
  as	
  a	
  
reference	
  criteria	
  for	
  deciding	
  relevance.	
  
•  Context	
  	
  
–  Does	
  the	
  context	
  (results	
  returned	
  from	
  
other	
  ver)cals)	
  affect	
  a	
  user’s	
  percep)on	
  
of	
  the	
  relevance	
  of	
  the	
  ver)cal	
  of	
  
interest?	
  
•  U)lity	
  vs.	
  Effort	
  
Problem	
  and	
  
Previous	
  Work	
  
Inter-­‐dependent	
  
approach	
  
Web-­‐anchor	
  
approach	
  
(RQ2)	
  Assump)ons:	
  dependency	
  of	
  relevance	
  	
  
•  Inter-­‐dependent	
  approach:	
  	
  
–  quality	
  of	
  ver)cals	
  is	
  rela)ve	
  and	
  
dependent	
  on	
  each	
  other.	
  
•  Web-­‐anchor	
  approach:	
  	
  
–  quality	
  of	
  “general	
  web”	
  serves	
  as	
  a	
  
reference	
  criteria	
  for	
  deciding	
  relevance.	
  
•  Context	
  	
  
–  Does	
  the	
  context	
  (results	
  returned	
  from	
  
other	
  ver)cals)	
  affect	
  a	
  user’s	
  percep)on	
  
of	
  the	
  relevance	
  of	
  the	
  ver)cal	
  of	
  
interest?	
  
•  U)lity	
  vs.	
  Effort	
  
Problem	
  and	
  
Previous	
  Work	
  
Inter-­‐dependent	
  
approach	
  
Web-­‐anchor	
  
approach	
  
(RQ3)	
  Assump)ons:	
  assessment	
  grade	
  
•  Binary	
  (pairwise)	
  preference	
  
•  Mul)-­‐grade	
  preference	
  
•  SERP	
  (one	
  possible	
  slot)	
  
–  ToP:	
  top	
  of	
  the	
  page	
  
–  NS:	
  not	
  shown	
  
•  Is	
  the	
  binary	
  (pairwise)	
  preference	
  
informa)on	
  provided	
  by	
  a	
  popula)on	
  
of	
  users	
  able	
  to	
  predict	
  the	
  “perfect”	
  
embedding	
  posi)on	
  of	
  a	
  ver)cal?	
  
Problem	
  and	
  
Previous	
  Work	
  
Binary	
  
preference	
  
(ToP	
  or	
  NS)	
  
End	
  of	
  SERP	
  
ToP	
  
(RQ3)	
  Assump)ons:	
  assessment	
  grade	
  
•  Binary	
  (pairwise)	
  preference	
  
•  Mul)-­‐grade	
  preference	
  
•  SERP	
  (three	
  possible	
  slots)	
  
–  ToP:	
  top	
  of	
  the	
  page	
  
–  MoP:	
  middle	
  of	
  the	
  page	
  
–  BoP:	
  bo`om	
  of	
  the	
  page	
  
–  NS:	
  not	
  shown	
  
•  Is	
  the	
  binary	
  (pairwise)	
  preference	
  
informa)on	
  provided	
  by	
  a	
  popula)on	
  
of	
  users	
  able	
  to	
  predict	
  the	
  “perfect”	
  
embedding	
  posi)on	
  of	
  a	
  ver)cal?	
  
Problem	
  and	
  
Previous	
  Work	
  
Mul)-­‐grade	
  
preference	
  
(ToP,	
  MoP,	
  
BoP	
  or	
  NS)	
  
End	
  of	
  SERP	
  
ToP	
  
MoP	
  
BoP	
  
(RQ3)	
  Assump)ons:	
  assessment	
  grade	
  
•  Binary	
  (pairwise)	
  preference	
  
•  Mul)-­‐grade	
  preference	
  
•  SERP	
  (three	
  possible	
  slots)	
  
–  ToP:	
  top	
  of	
  the	
  page	
  
–  MoP:	
  middle	
  of	
  the	
  page	
  
–  BoP:	
  bo`om	
  of	
  the	
  page	
  
–  NS:	
  not	
  shown	
  
•  Is	
  the	
  binary	
  (pairwise)	
  preference	
  
informa)on	
  provided	
  by	
  a	
  popula)on	
  
of	
  users	
  able	
  to	
  predict	
  the	
  “perfect”	
  
embedding	
  posi)on	
  of	
  a	
  ver)cal?	
  
Problem	
  and	
  
Previous	
  Work	
  
Binary	
  
preference	
  
(ToP	
  or	
  NS)	
  
Mul)-­‐grade	
  
preference	
  
(ToP,	
  MoP,	
  
BoP	
  or	
  NS)	
  
End	
  of	
  SERP	
  
ToP	
  
End	
  of	
  SERP	
  
ToP	
  
MoP	
  
BoP	
  
(RQ3)	
  Assump)ons:	
  assessment	
  grade	
  
•  Binary	
  (pairwise)	
  preference	
  
•  Mul)-­‐grade	
  preference	
  
•  SERP	
  (three	
  possible	
  slots)	
  
–  ToP:	
  top	
  of	
  the	
  page	
  
–  MoP:	
  middle	
  of	
  the	
  page	
  
–  BoP:	
  bo`om	
  of	
  the	
  page	
  
–  NS:	
  not	
  shown	
  
•  Is	
  the	
  binary	
  (pairwise)	
  preference	
  
informa)on	
  provided	
  by	
  a	
  popula)on	
  
of	
  users	
  able	
  to	
  predict	
  the	
  “perfect”	
  
embedding	
  posi)on	
  of	
  a	
  ver)cal?	
  
Problem	
  and	
  
Previous	
  Work	
  
Binary	
  
preference	
  
(ToP	
  or	
  NS)	
  
Mul)-­‐grade	
  
preference	
  
(ToP,	
  MoP,	
  
BoP	
  or	
  NS)	
  
End	
  of	
  SERP	
  
ToP	
  
End	
  of	
  SERP	
  
ToP	
  
MoP	
  
BoP	
  
Experimental	
  Design	
  Overview	
  
•  Manipula)on	
  (Independent)	
  Variables	
  
–  Search	
  Tasks	
  
–  Ver)cals	
  of	
  Interest	
  
–  User	
  Perspec)ve	
  (Study	
  1:	
  RQ1)	
  
–  Dependency	
  of	
  Relevance	
  (Study	
  2:	
  RQ2)	
  
–  Assessment	
  Grade	
  (Study	
  3:	
  RQ3)	
  
Experimental	
  
Design	
  
•  Dependent	
  Variables	
  
–  Inter-­‐assessor	
  Agreement	
  
•  Measured	
  by	
  Fleiss’	
  Kappa	
  (KF)	
  
–  Ver)cal	
  Relevance	
  Correla)on	
  
•  Measured	
  by	
  Spearman	
  Correla)on	
  
RQ1	
  
RQ2	
  RQ3	
  
Experimental	
  Design	
  Overview	
  
•  Manipula)on	
  (Independent)	
  Variables	
  
–  Search	
  Tasks	
  
–  Ver)cals	
  of	
  Interest	
  
–  User	
  Perspec)ve	
  (Study	
  1:	
  RQ1)	
  
–  Dependency	
  of	
  Relevance	
  (Study	
  2:	
  RQ2)	
  
–  Assessment	
  Grade	
  (Study	
  3:	
  RQ3)	
  
Experimental	
  
Design	
  
•  Dependent	
  Variables	
  
–  Inter-­‐assessor	
  Agreement	
  
•  Measured	
  by	
  Fleiss’	
  Kappa	
  (KF)	
  
–  Ver)cal	
  Relevance	
  Correla)on	
  
•  Measured	
  by	
  Spearman	
  Correla)on	
  
RQ1	
  
RQ2	
  RQ3	
  
Experiment	
  Design	
  Details	
  
•  Crowd-­‐sourcing	
  Data	
  Collec)on	
  
–  We	
  hire	
  crowd-­‐sourced	
  workers	
  on	
  Amazon	
  Mechanical	
  Turk	
  
to	
  make	
  assessments.	
  
•  Ver)cals	
  
–  Cover	
  a	
  variety	
  of	
  11	
  ver)cals	
  employed	
  by	
  three	
  major	
  
commercial	
  search	
  engines.	
  
–  Use	
  exis)ng	
  commercial	
  ver)cal	
  search	
  engines.	
  
•  Search	
  Tasks	
  
–  44	
  tasks	
  cover	
  a	
  variety	
  of	
  (ver)cal)	
  intents	
  	
  
–  Come	
  from	
  exis)ng	
  aggregated	
  search	
  collec)on	
  (TREC)	
  
•  Quality	
  Control	
  
–  4	
  assessment	
  points	
  for	
  one	
  manipula)on	
  
–  Trap	
  HITs	
  (assessment	
  page	
  with	
  results	
  of	
  other	
  queries)	
  
–  Trap	
  search	
  tasks	
  (assessment	
  page	
  with	
  explicit	
  ver)cal	
  
request)	
  
Experimental	
  
Design	
  
Experiment	
  Design	
  Details	
  
•  Crowd-­‐sourcing	
  Data	
  Collec)on	
  
–  We	
  hire	
  crowd-­‐sourced	
  workers	
  on	
  Amazon	
  Mechanical	
  Turk	
  
to	
  make	
  assessments.	
  
•  Ver)cals	
  
–  Cover	
  a	
  variety	
  of	
  11	
  ver)cals	
  employed	
  by	
  three	
  major	
  
commercial	
  search	
  engines.	
  
–  Use	
  exis)ng	
  commercial	
  ver)cal	
  search	
  engines.	
  
•  Search	
  Tasks	
  
–  44	
  tasks	
  cover	
  a	
  variety	
  of	
  (ver)cal)	
  intents	
  	
  
–  Come	
  from	
  exis)ng	
  aggregated	
  search	
  collec)on	
  (TREC)	
  
•  Quality	
  Control	
  
–  4	
  assessment	
  points	
  for	
  one	
  manipula)on	
  
–  Trap	
  HITs	
  (assessment	
  page	
  with	
  results	
  of	
  other	
  queries)	
  
–  Trap	
  search	
  tasks	
  (assessment	
  page	
  with	
  explicit	
  ver)cal	
  
request)	
  
Experimental	
  
Design	
  
Experiment	
  Design	
  Details	
  
•  Crowd-­‐sourcing	
  Data	
  Collec)on	
  
–  We	
  hire	
  crowd-­‐sourced	
  workers	
  on	
  Amazon	
  Mechanical	
  Turk	
  
to	
  make	
  assessments.	
  
•  Ver)cals	
  
–  Cover	
  a	
  variety	
  of	
  11	
  ver)cals	
  employed	
  by	
  three	
  major	
  
commercial	
  search	
  engines.	
  
–  Use	
  exis)ng	
  commercial	
  ver)cal	
  search	
  engines.	
  
•  Search	
  Tasks	
  
–  44	
  tasks	
  cover	
  a	
  variety	
  of	
  (ver)cal)	
  intents	
  	
  
–  Come	
  from	
  exis)ng	
  aggregated	
  search	
  collec)on	
  (TREC)	
  
•  Quality	
  Control	
  
–  4	
  assessment	
  points	
  for	
  one	
  manipula)on	
  
–  Trap	
  HITs	
  (assessment	
  page	
  with	
  results	
  of	
  other	
  queries)	
  
–  Trap	
  search	
  tasks	
  (assessment	
  page	
  with	
  explicit	
  ver)cal	
  
request)	
  
Experimental	
  
Design	
  
Experiment	
  Design	
  Details	
  
•  Crowd-­‐sourcing	
  Data	
  Collec)on	
  
–  We	
  hire	
  crowd-­‐sourced	
  workers	
  on	
  Amazon	
  Mechanical	
  Turk	
  
to	
  make	
  assessments.	
  
•  Ver)cals	
  
–  Cover	
  a	
  variety	
  of	
  11	
  ver)cals	
  employed	
  by	
  three	
  major	
  
commercial	
  search	
  engines.	
  
–  Use	
  exis)ng	
  commercial	
  ver)cal	
  search	
  engines.	
  
•  Search	
  Tasks	
  
–  44	
  tasks	
  cover	
  a	
  variety	
  of	
  (ver)cal)	
  intents	
  	
  
–  Come	
  from	
  exis)ng	
  aggregated	
  search	
  collec)on	
  (TREC)	
  
•  Quality	
  Control	
  
–  4	
  assessment	
  points	
  for	
  one	
  manipula)on	
  
–  Trap	
  HITs	
  (assessment	
  page	
  with	
  results	
  of	
  other	
  queries)	
  
–  Trap	
  search	
  tasks	
  (assessment	
  page	
  with	
  explicit	
  ver)cal	
  
request)	
  
Experimental	
  
Design	
  
Experimental	
  Design:	
  Study	
  1	
  
•  Manipula)on	
  (Independent)	
  Variables	
  
–  Search	
  Tasks	
  
–  Ver)cals	
  of	
  Interest	
  
–  User	
  Perspec)ve	
  (Study	
  1:	
  RQ1)	
  
–  Dependency	
  of	
  Relevance	
  (Study	
  2:	
  RQ2)	
  
–  Assessment	
  Grade	
  (Study	
  3:	
  RQ3)	
  
Experimental	
  
Design	
  
•  Dependent	
  Variables	
  
–  Inter-­‐assessor	
  Agreement	
  
•  Measured	
  by	
  Fleiss’	
  Kappa	
  (KF)	
  
–  Ver)cal	
  Relevance	
  Correla)on	
  
•  Measured	
  by	
  Spearman	
  Correla)on	
  
RQ1	
  
Study	
  1	
  Results:	
  	
  
pre-­‐retrieval	
  vs.	
  post-­‐retrieval	
  
•  Both	
  pre-­‐retrieval	
  and	
  post-­‐retrieval	
  inter-­‐assessor	
  
agreements	
  are	
  moderate	
  and	
  assessors	
  have	
  the	
  
similar	
  level	
  of	
  difficulty	
  in	
  assessing	
  for	
  both.	
  
•  Ver)cal	
  relevance	
  are	
  moderately	
  (but	
  significantly)	
  
correlated	
  (0.53)	
  between	
  pre-­‐retrieval	
  and	
  post-­‐
retrieval.	
  
•  Highly	
  relevant	
  ver)cals	
  derived	
  from	
  pre-­‐retrieval	
  and	
  
post-­‐retrieval	
  overlap	
  significantly.	
  
–  Almost	
  60%	
  overlap	
  on	
  at	
  least	
  2	
  out	
  of	
  3	
  top	
  ver)cals	
  
•  There	
  is	
  a	
  bias	
  in	
  visually	
  salient	
  ver)cals	
  for	
  post-­‐
retrieval	
  search	
  u%lity.	
  
	
  
Experimental	
  
Results	
  
Study	
  1	
  Results:	
  	
  
pre-­‐retrieval	
  vs.	
  post-­‐retrieval	
  
•  Both	
  pre-­‐retrieval	
  and	
  post-­‐retrieval	
  inter-­‐assessor	
  
agreements	
  are	
  moderate	
  and	
  assessors	
  have	
  the	
  
similar	
  level	
  of	
  difficulty	
  in	
  assessing	
  for	
  both.	
  
•  Ver)cal	
  relevance	
  are	
  moderately	
  (but	
  significantly)	
  
correlated	
  (0.53)	
  between	
  pre-­‐retrieval	
  and	
  post-­‐
retrieval.	
  
•  Highly	
  relevant	
  ver)cals	
  derived	
  from	
  pre-­‐retrieval	
  and	
  
post-­‐retrieval	
  overlap	
  significantly.	
  
–  Almost	
  60%	
  overlap	
  on	
  at	
  least	
  2	
  out	
  of	
  3	
  top	
  ver)cals	
  
•  There	
  is	
  a	
  bias	
  in	
  visually	
  salient	
  ver)cals	
  for	
  post-­‐
retrieval	
  search	
  u%lity.	
  
	
  
Experimental	
  
Results	
  
Study	
  1	
  Results:	
  	
  
pre-­‐retrieval	
  vs.	
  post-­‐retrieval	
  
•  Both	
  pre-­‐retrieval	
  and	
  post-­‐retrieval	
  inter-­‐assessor	
  
agreements	
  are	
  moderate	
  and	
  assessors	
  have	
  the	
  
similar	
  level	
  of	
  difficulty	
  in	
  assessing	
  for	
  both.	
  
•  Ver)cal	
  relevance	
  are	
  moderately	
  (but	
  significantly)	
  
correlated	
  (0.53)	
  between	
  pre-­‐retrieval	
  and	
  post-­‐
retrieval.	
  
•  Highly	
  relevant	
  ver)cals	
  derived	
  from	
  pre-­‐retrieval	
  and	
  
post-­‐retrieval	
  overlap	
  significantly.	
  
–  Almost	
  60%	
  overlap	
  on	
  at	
  least	
  2	
  out	
  of	
  3	
  top	
  ver)cals	
  
•  There	
  is	
  a	
  bias	
  in	
  visually	
  salient	
  ver)cals	
  for	
  post-­‐
retrieval	
  search	
  u%lity.	
  
	
  
Experimental	
  
Results	
  
Study	
  1	
  Results:	
  	
  
pre-­‐retrieval	
  vs.	
  post-­‐retrieval	
  
•  Both	
  pre-­‐retrieval	
  and	
  post-­‐retrieval	
  inter-­‐assessor	
  
agreements	
  are	
  moderate	
  and	
  assessors	
  have	
  the	
  
similar	
  level	
  of	
  difficulty	
  in	
  assessing	
  for	
  both.	
  
•  Ver)cal	
  relevance	
  are	
  moderately	
  (but	
  significantly)	
  
correlated	
  (0.53)	
  between	
  pre-­‐retrieval	
  and	
  post-­‐
retrieval.	
  
•  Highly	
  relevant	
  ver)cals	
  derived	
  from	
  pre-­‐retrieval	
  and	
  
post-­‐retrieval	
  overlap	
  significantly.	
  
–  Almost	
  60%	
  overlap	
  on	
  at	
  least	
  2	
  out	
  of	
  3	
  top	
  ver)cals	
  
•  There	
  is	
  a	
  bias	
  in	
  visually	
  salient	
  ver)cals	
  for	
  post-­‐
retrieval	
  search	
  u%lity.	
  
	
  
Experimental	
  
Results	
  
Study	
  1	
  Results:	
  	
  
topical	
  relevance	
  vs.	
  pre-­‐retrieval	
  orienta)on	
  
Experimental	
  
Results	
  
N R N
R
R
N
nDCG(vi)
nDCG(w)	
  
……	
  
Orienta)onTopical	
  relevance
•  Ver)cal	
  relevance	
  between	
  pre-­‐
retrieval	
  orienta)on	
  and	
  post-­‐
retrieval	
  is	
  moderately	
  correlated	
  
(0.53).	
  
•  Ver)cal	
  relevance	
  between	
  topical-­‐
relevance	
  and	
  post-­‐retrieval	
  is	
  
weakly	
  correlated	
  (0.36).	
  
•  Impact	
  of	
  pre-­‐retrieval	
  orienta%on	
  is	
  
more	
  important	
  for	
  post-­‐retrieval	
  
search	
  u%lity,	
  compared	
  with	
  post-­‐
retrieval	
  topical	
  relevance.	
  
Post-­‐retrieval	
  
search	
  u)lity
Study	
  1	
  Results:	
  	
  
topical	
  relevance	
  vs.	
  pre-­‐retrieval	
  orienta)on	
  
Experimental	
  
Results	
  
N R N
R
R
N
nDCG(vi)
nDCG(w)	
  
……	
  
Orienta)onTopical	
  relevance
•  Ver)cal	
  relevance	
  between	
  pre-­‐
retrieval	
  orienta)on	
  and	
  post-­‐
retrieval	
  is	
  moderately	
  correlated	
  
(0.53).	
  
•  Ver)cal	
  relevance	
  between	
  topical-­‐
relevance	
  and	
  post-­‐retrieval	
  is	
  
weakly	
  correlated	
  (0.36).	
  
•  Impact	
  of	
  pre-­‐retrieval	
  orienta%on	
  is	
  
more	
  important	
  for	
  post-­‐retrieval	
  
search	
  u%lity,	
  compared	
  with	
  post-­‐
retrieval	
  topical	
  relevance.	
  
Post-­‐retrieval	
  
search	
  u)lity
Study	
  1	
  Results:	
  	
  
topical	
  relevance	
  vs.	
  pre-­‐retrieval	
  orienta)on	
  
•  Ver)cal	
  relevance	
  between	
  pre-­‐
retrieval	
  orienta)on	
  and	
  post-­‐
retrieval	
  is	
  moderately	
  correlated	
  
(0.53).	
  
•  Ver)cal	
  relevance	
  between	
  topical-­‐
relevance	
  and	
  post-­‐retrieval	
  is	
  
weakly	
  correlated	
  (0.36).	
  
•  Impact	
  of	
  pre-­‐retrieval	
  orienta%on	
  is	
  
more	
  important	
  for	
  post-­‐retrieval	
  
search	
  u%lity,	
  compared	
  with	
  post-­‐
retrieval	
  topical	
  relevance.	
  
Experimental	
  
Results	
  
N R N
R
R
N
nDCG(vi)
nDCG(w)	
  
……	
  
Orienta)onTopical	
  relevance
Post-­‐retrieval	
  
search	
  u)lity
Experimental	
  Design:	
  Study	
  2	
  
•  Manipula)on	
  (Independent)	
  Variables	
  
–  Search	
  Tasks	
  
–  Ver)cals	
  of	
  Interest	
  
–  User	
  Perspec)ve	
  (Study	
  1:	
  RQ1)	
  
–  Dependency	
  of	
  Relevance	
  (Study	
  2:	
  RQ2)	
  
–  Assessment	
  Grade	
  (Study	
  3:	
  RQ3)	
  
Experimental	
  
Design	
  
•  Dependent	
  Variables	
  
–  Inter-­‐assessor	
  Agreement	
  
•  Measured	
  by	
  Fleiss’	
  Kappa	
  (KF)	
  
–  Ver)cal	
  Relevance	
  Correla)on	
  
•  Measured	
  by	
  Spearman	
  Correla)on	
  
RQ2	
  
Study	
  2	
  (Dependency	
  of	
  Relevance)	
  Results	
  
•  Both	
  inter-­‐assessor	
  agreements	
  are	
  moderate	
  and	
  
there	
  is	
  not	
  much	
  difference	
  between	
  the	
  user	
  
agreement	
  for	
  both	
  approaches.	
  
•  Ver)cal	
  relevance	
  correla)on	
  between	
  inter-­‐
dependent	
  and	
  web-­‐anchor	
  approach	
  is	
  moderate	
  
(0.573).	
  
•  The	
  overlap	
  of	
  top-­‐three	
  relevant	
  ver)cals	
  between	
  
two	
  approaches	
  is	
  quite	
  high.	
  
–  More	
  than	
  70%	
  overlap	
  on	
  2	
  out	
  of	
  3	
  top	
  ver)cals.	
  
•  Web-­‐anchor	
  approach	
  provides	
  be`er	
  trade-­‐off	
  
between	
  u)lity	
  and	
  effort.	
  
	
  
Experimental	
  
Results	
  
Study	
  2	
  (Dependency	
  of	
  Relevance)	
  Results	
  
•  Both	
  inter-­‐assessor	
  agreements	
  are	
  moderate	
  and	
  
there	
  is	
  not	
  much	
  difference	
  between	
  the	
  user	
  
agreement	
  for	
  both	
  approaches.	
  
•  Ver)cal	
  relevance	
  correla)on	
  between	
  inter-­‐
dependent	
  and	
  web-­‐anchor	
  approach	
  is	
  moderate	
  
(0.573).	
  
•  The	
  overlap	
  of	
  top-­‐three	
  relevant	
  ver)cals	
  between	
  
two	
  approaches	
  is	
  quite	
  high.	
  
–  More	
  than	
  70%	
  overlap	
  on	
  2	
  out	
  of	
  3	
  top	
  ver)cals.	
  
•  Web-­‐anchor	
  approach	
  provides	
  be`er	
  trade-­‐off	
  
between	
  u)lity	
  and	
  effort.	
  
Experimental	
  
Results	
  
Study	
  2	
  (Dependency	
  of	
  Relevance)	
  Results	
  
•  Both	
  inter-­‐assessor	
  agreements	
  are	
  moderate	
  and	
  
there	
  is	
  not	
  much	
  difference	
  between	
  the	
  user	
  
agreement	
  for	
  both	
  approaches.	
  
•  Ver)cal	
  relevance	
  correla)on	
  between	
  inter-­‐
dependent	
  and	
  web-­‐anchor	
  approach	
  is	
  moderate	
  
(0.573).	
  
•  The	
  overlap	
  of	
  top-­‐three	
  relevant	
  ver)cals	
  between	
  
two	
  approaches	
  is	
  quite	
  high.	
  
–  More	
  than	
  70%	
  overlap	
  on	
  2	
  out	
  of	
  3	
  top	
  ver)cals.	
  
•  Web-­‐anchor	
  approach	
  provides	
  be`er	
  trade-­‐off	
  
between	
  u)lity	
  and	
  effort.	
  
Experimental	
  
Results	
  
Study	
  2	
  (Dependency	
  of	
  Relevance)	
  Results	
  
•  Both	
  inter-­‐assessor	
  agreements	
  are	
  moderate	
  and	
  
there	
  is	
  not	
  much	
  difference	
  between	
  the	
  user	
  
agreement	
  for	
  both	
  approaches.	
  
•  Ver)cal	
  relevance	
  correla)on	
  between	
  inter-­‐
dependent	
  and	
  web-­‐anchor	
  approach	
  is	
  moderate	
  
(0.573).	
  
•  The	
  overlap	
  of	
  top-­‐three	
  relevant	
  ver)cals	
  between	
  
two	
  approaches	
  is	
  quite	
  high.	
  
–  More	
  than	
  70%	
  overlap	
  on	
  2	
  out	
  of	
  3	
  top	
  ver)cals.	
  
•  Web-­‐anchor	
  approach	
  provides	
  be`er	
  trade-­‐off	
  
between	
  u)lity	
  and	
  effort.	
  
Experimental	
  
Results	
  
Study	
  2	
  (Dependency	
  of	
  Relevance)	
  Results	
  
•  Not	
  much	
  difference	
  is	
  observed	
  by	
  using	
  
different	
  anchors	
  (different	
  observed	
  
topical	
  relevance	
  level).	
  
•  Context	
  ma`ers	
  
–  The	
  context	
  of	
  other	
  ver)cals	
  can	
  diminish	
  
the	
  u)lity	
  of	
  a	
  ver)cal.	
  
–  Examples:	
  (“Answer”,	
  “Wiki”),	
  (“Books”,	
  
“Scholar”),	
  etc.	
  	
  
Experimental	
  
Results	
  
Study	
  2	
  (Dependency	
  of	
  Relevance)	
  Results	
  
•  Not	
  much	
  difference	
  is	
  observed	
  by	
  using	
  
different	
  anchors	
  (different	
  observed	
  
topical	
  relevance	
  level).	
  
•  Context	
  ma`ers	
  
–  The	
  context	
  of	
  other	
  ver)cals	
  can	
  diminish	
  
the	
  u)lity	
  of	
  a	
  ver)cal.	
  
–  Examples:	
  (“Answer”,	
  “Wiki”),	
  (“Books”,	
  
“Scholar”),	
  etc.	
  	
  
Experimental	
  
Results	
  
Experimental	
  Design:	
  Study	
  3	
  
•  Manipula)on	
  (Independent)	
  Variables	
  
–  Search	
  Tasks	
  
–  Ver)cals	
  of	
  Interest	
  
–  User	
  Perspec)ve	
  (Study	
  1:	
  RQ1)	
  
–  Dependency	
  of	
  Relevance	
  (Study	
  2:	
  RQ2)	
  
–  Assessment	
  Grade	
  (Study	
  3:	
  RQ3)	
  
Experimental	
  
Design	
  
•  Dependent	
  Variables	
  
–  Inter-­‐assessor	
  Agreement	
  
•  Measured	
  by	
  Fleiss’	
  Kappa	
  (KF)	
  
–  Ver)cal	
  Relevance	
  Correla)on	
  
•  Measured	
  by	
  Spearman	
  Correla)on	
  
RQ3	
  
Study	
  3	
  (Assessment	
  Grade)	
  Results	
  
•  Deriving	
  “perfect”	
  embedding	
  posi)on	
  from	
  mul)-­‐graded	
  
assessments	
  
•  Thresholding	
  for	
  binary	
  assessment	
  (User	
  type	
  simula)on)	
  
–  Risk-­‐seeking	
  	
  
–  Risk-­‐medium	
  	
  
–  Risk-­‐averse	
  	
  
	
  
Experimental	
  
Results	
  
1.0 0.50.75 0.25 0.25 0.0
……
0.0……
Risk-­‐seeking
Risk-­‐medium
Risk-­‐averse ToP
ToP
ToP MoP
MoP
BoP
BoP
MoP BoP
NS
NS
Ver)cals
Majority	
  preference
Study	
  3	
  (Assessment	
  Grade)	
  Results	
  
•  Inter-­‐assessor	
  agreement	
  are	
  
moderate	
  and	
  different	
  users	
  have	
  
different	
  risk-­‐level.	
  
•  Ver)cal	
  Relevance	
  Correla)on	
  
– Most	
  of	
  the	
  binary	
  approach	
  
significantly	
  correlates	
  with	
  the	
  mul)-­‐
graded	
  ground-­‐truth,	
  however	
  mostly	
  
are	
  modest.	
  
– Risk-­‐medium	
  thresholding	
  approach	
  
performs	
  best.	
  
Experimental	
  
Results	
  
Study	
  3	
  (Assessment	
  Grade)	
  Results	
  
•  Inter-­‐assessor	
  agreement	
  are	
  
moderate	
  and	
  different	
  users	
  have	
  
different	
  risk-­‐level.	
  
•  Ver)cal	
  Relevance	
  Correla)on	
  
– Most	
  of	
  the	
  binary	
  approach	
  
significantly	
  correlates	
  with	
  the	
  mul)-­‐
graded	
  ground-­‐truth,	
  however	
  mostly	
  
are	
  modest.	
  
– Risk-­‐medium	
  thresholding	
  approach	
  
performs	
  best.	
  
Experimental	
  
Results	
  
Final	
  take-­‐out
•  Study	
  1	
  
–  Assessing	
  for	
  aggregated	
  search	
  is	
  difficult.	
  
–  Highly	
  relevant	
  ver)cals	
  overlaps	
  significantly	
  for	
  pre-­‐retrieval	
  and	
  post-­‐retrieval	
  
user	
  perspec)ves.	
  
–  Ver)cal	
  (type)	
  orienta)on	
  is	
  more	
  important	
  than	
  topical	
  relevance.	
  
•  Study	
  2	
  
–  Anchor-­‐based	
  approach	
  might	
  be	
  a	
  be`er	
  approach	
  than	
  inter-­‐dependent	
  
approach,	
  with	
  respect	
  to	
  u)lity-­‐effort	
  trade-­‐off.	
  	
  
–  Context	
  ma`ers.	
  
•  Study	
  3	
  
–  Binary	
  approach	
  can	
  be	
  used	
  to	
  determine	
  “perfect”	
  embedding	
  posi)on	
  of	
  the	
  
ver)cals	
  and	
  it	
  performs	
  rela)vely	
  well	
  with	
  not	
  a	
  lot	
  of	
  assessments.	
  
Final	
  take-­‐out
•  Study	
  1	
  
–  Assessing	
  for	
  aggregated	
  search	
  is	
  difficult.	
  
–  Highly	
  relevant	
  ver)cals	
  overlaps	
  significantly	
  for	
  pre-­‐retrieval	
  and	
  post-­‐retrieval	
  
user	
  perspec)ves.	
  
–  Ver)cal	
  (type)	
  orienta)on	
  is	
  more	
  important	
  than	
  topical	
  relevance.	
  
•  Study	
  2	
  
–  Anchor-­‐based	
  approach	
  might	
  be	
  a	
  be`er	
  approach	
  than	
  inter-­‐dependent	
  
approach,	
  with	
  respect	
  to	
  u)lity-­‐effort	
  trade-­‐off.	
  	
  
–  Context	
  ma`ers.	
  
•  Study	
  3	
  
–  Binary	
  approach	
  can	
  be	
  used	
  to	
  determine	
  “perfect”	
  embedding	
  posi)on	
  of	
  the	
  
ver)cals	
  and	
  it	
  performs	
  rela)vely	
  well	
  with	
  not	
  a	
  lot	
  of	
  assessments.	
  
Final	
  take-­‐out
•  Study	
  1	
  
–  Assessing	
  for	
  aggregated	
  search	
  is	
  difficult.	
  
–  Highly	
  relevant	
  ver)cals	
  overlaps	
  significantly	
  for	
  pre-­‐retrieval	
  and	
  post-­‐retrieval	
  
user	
  perspec)ves.	
  
–  Ver)cal	
  (type)	
  orienta)on	
  is	
  more	
  important	
  than	
  topical	
  relevance.	
  
•  Study	
  2	
  
–  Anchor-­‐based	
  approach	
  might	
  be	
  a	
  be`er	
  approach	
  than	
  inter-­‐dependent	
  
approach,	
  with	
  respect	
  to	
  u)lity-­‐effort	
  trade-­‐off.	
  	
  
–  Context	
  ma`ers.	
  
•  Study	
  3	
  
–  Binary	
  approach	
  can	
  be	
  used	
  to	
  determine	
  “perfect”	
  embedding	
  posi)on	
  of	
  the	
  
ver)cals	
  and	
  it	
  performs	
  rela)vely	
  well	
  with	
  not	
  a	
  lot	
  of	
  assessments.	
  
Conclusions	
  
•  We	
  compare	
  different	
  ver)cal	
  relevance	
  assessment	
  processes	
  
and	
  analyzed	
  their	
  impact.	
  
•  Our	
  work	
  has	
  implica)ons	
  with	
  regard	
  to	
  "how"	
  and	
  "what"	
  
evalua)on	
  design	
  decisions	
  affect	
  the	
  actual	
  evalua)on.	
  	
  
•  This	
  work	
  also	
  creates	
  a	
  need	
  to	
  re-­‐interpret	
  previous	
  
evalua)on	
  efforts	
  in	
  this	
  area.	
  	
  
Ques)ons?	
  
•  Thanks!	
  

Más contenido relacionado

Destacado

Harnessing Vertical Search
Harnessing Vertical SearchHarnessing Vertical Search
Harnessing Vertical SearchAndy Black
 
Engineering challenges in vertical search engines
Engineering challenges in vertical search enginesEngineering challenges in vertical search engines
Engineering challenges in vertical search enginesITDogadjaji.com
 
Vertical Search and The Changing Digital World
Vertical Search and The Changing Digital WorldVertical Search and The Changing Digital World
Vertical Search and The Changing Digital WorldAndy Black
 
How vertical search can generate revenue and drive traffic for publishers
How vertical search can generate revenue and drive traffic for publishersHow vertical search can generate revenue and drive traffic for publishers
How vertical search can generate revenue and drive traffic for publishersAndy Black
 
The Future of Vertical Search Engines
The Future of Vertical Search EnginesThe Future of Vertical Search Engines
The Future of Vertical Search EnginesTed Drake
 
Active Influence - Guided By Voices [Social Marketing]
Active Influence - Guided By Voices [Social Marketing]Active Influence - Guided By Voices [Social Marketing]
Active Influence - Guided By Voices [Social Marketing]Bryan Jones
 
C:\documents and settings\user\desktop\gastrointestinal 0406 liverpdf
C:\documents and settings\user\desktop\gastrointestinal 0406 liverpdfC:\documents and settings\user\desktop\gastrointestinal 0406 liverpdf
C:\documents and settings\user\desktop\gastrointestinal 0406 liverpdfMBBS IMS MSU
 
The Spleen (Anatomy of the Abdomen)
The Spleen (Anatomy of the Abdomen)The Spleen (Anatomy of the Abdomen)
The Spleen (Anatomy of the Abdomen)Dr. Sherif Fahmy
 
Functional anatomy of liver and biliary tree
Functional anatomy of liver and biliary treeFunctional anatomy of liver and biliary tree
Functional anatomy of liver and biliary treeSivaraj Sadhasivam
 
Radiological anatomy of hepatobiliary system
Radiological anatomy of hepatobiliary systemRadiological anatomy of hepatobiliary system
Radiological anatomy of hepatobiliary systemPankaj Kaira
 
Liver & billary apparatus
Liver & billary apparatusLiver & billary apparatus
Liver & billary apparatusIsha Jaiswal
 
ANATOMY OF SPLEEN AND IT'S APPLIED ASPECT
ANATOMY OF SPLEEN AND IT'S APPLIED ASPECTANATOMY OF SPLEEN AND IT'S APPLIED ASPECT
ANATOMY OF SPLEEN AND IT'S APPLIED ASPECTsatendra dwivedi
 
Lymphatic drainage of major organs
Lymphatic drainage of major organsLymphatic drainage of major organs
Lymphatic drainage of major organsAbino David
 

Destacado (20)

Harnessing Vertical Search
Harnessing Vertical SearchHarnessing Vertical Search
Harnessing Vertical Search
 
Engineering challenges in vertical search engines
Engineering challenges in vertical search enginesEngineering challenges in vertical search engines
Engineering challenges in vertical search engines
 
Vertical Search and The Changing Digital World
Vertical Search and The Changing Digital WorldVertical Search and The Changing Digital World
Vertical Search and The Changing Digital World
 
How vertical search can generate revenue and drive traffic for publishers
How vertical search can generate revenue and drive traffic for publishersHow vertical search can generate revenue and drive traffic for publishers
How vertical search can generate revenue and drive traffic for publishers
 
The Future of Vertical Search Engines
The Future of Vertical Search EnginesThe Future of Vertical Search Engines
The Future of Vertical Search Engines
 
Active Influence - Guided By Voices [Social Marketing]
Active Influence - Guided By Voices [Social Marketing]Active Influence - Guided By Voices [Social Marketing]
Active Influence - Guided By Voices [Social Marketing]
 
C:\documents and settings\user\desktop\gastrointestinal 0406 liverpdf
C:\documents and settings\user\desktop\gastrointestinal 0406 liverpdfC:\documents and settings\user\desktop\gastrointestinal 0406 liverpdf
C:\documents and settings\user\desktop\gastrointestinal 0406 liverpdf
 
Spleen NMS
Spleen NMSSpleen NMS
Spleen NMS
 
The Spleen (Anatomy of the Abdomen)
The Spleen (Anatomy of the Abdomen)The Spleen (Anatomy of the Abdomen)
The Spleen (Anatomy of the Abdomen)
 
Liver anatomy
Liver anatomyLiver anatomy
Liver anatomy
 
Functional anatomy of liver and biliary tree
Functional anatomy of liver and biliary treeFunctional anatomy of liver and biliary tree
Functional anatomy of liver and biliary tree
 
Radiological anatomy of hepatobiliary system
Radiological anatomy of hepatobiliary systemRadiological anatomy of hepatobiliary system
Radiological anatomy of hepatobiliary system
 
Liver & billary apparatus
Liver & billary apparatusLiver & billary apparatus
Liver & billary apparatus
 
ANATOMY OF SPLEEN AND IT'S APPLIED ASPECT
ANATOMY OF SPLEEN AND IT'S APPLIED ASPECTANATOMY OF SPLEEN AND IT'S APPLIED ASPECT
ANATOMY OF SPLEEN AND IT'S APPLIED ASPECT
 
Spleen
SpleenSpleen
Spleen
 
Spleen
SpleenSpleen
Spleen
 
Liver and biliary tract pathology
Liver and biliary tract pathologyLiver and biliary tract pathology
Liver and biliary tract pathology
 
anatomy of spleen
anatomy of spleenanatomy of spleen
anatomy of spleen
 
Spleen anatomy
Spleen anatomySpleen anatomy
Spleen anatomy
 
Lymphatic drainage of major organs
Lymphatic drainage of major organsLymphatic drainage of major organs
Lymphatic drainage of major organs
 

Similar a Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

Measuring the quality of web search engines
Measuring the quality of web search enginesMeasuring the quality of web search engines
Measuring the quality of web search enginesDirk Lewandowski
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyKris Jack
 
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...Emanuel Lacić
 
ACM ICTIR 2019 Slides - Santa Clara, USA
ACM ICTIR 2019 Slides -  Santa Clara, USAACM ICTIR 2019 Slides -  Santa Clara, USA
ACM ICTIR 2019 Slides - Santa Clara, USAIadh Ounis
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxJadna Almeida
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxJadna Almeida
 
requirement engineering
requirement engineeringrequirement engineering
requirement engineeringanam singla
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Cataldo Musto
 
Using Contextual Information to Understand Searching and Browsing Behavior
Using Contextual Information to Understand Searching and Browsing BehaviorUsing Contextual Information to Understand Searching and Browsing Behavior
Using Contextual Information to Understand Searching and Browsing BehaviorJulia Kiseleva
 
A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVIntoTheMinds
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVFrancisco Couto
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyMaya Hristakeva
 
My experience in Software QA
My experience in Software QAMy experience in Software QA
My experience in Software QALeonid Mazur
 
Web Performance BootCamp 2013
Web Performance BootCamp 2013Web Performance BootCamp 2013
Web Performance BootCamp 2013Daniel Austin
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filteringD Yogendra Rao
 
Joint search by social and spatial proximity
Joint search by social and spatial proximityJoint search by social and spatial proximity
Joint search by social and spatial proximityieeepondy
 
Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Daniel Austin
 

Similar a Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries (20)

Measuring the quality of web search engines
Measuring the quality of web search enginesMeasuring the quality of web search engines
Measuring the quality of web search engines
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
 
ACM ICTIR 2019 Slides - Santa Clara, USA
ACM ICTIR 2019 Slides -  Santa Clara, USAACM ICTIR 2019 Slides -  Santa Clara, USA
ACM ICTIR 2019 Slides - Santa Clara, USA
 
Rokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptxRokach-GomaxSlides.pptx
Rokach-GomaxSlides.pptx
 
Rokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptxRokach-GomaxSlides (1).pptx
Rokach-GomaxSlides (1).pptx
 
requirement engineering
requirement engineeringrequirement engineering
requirement engineering
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014
 
Using Contextual Information to Understand Searching and Browsing Behavior
Using Contextual Information to Understand Searching and Browsing BehaviorUsing Contextual Information to Understand Searching and Browsing Behavior
Using Contextual Information to Understand Searching and Browsing Behavior
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TV
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TV
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
My experience in Software QA
My experience in Software QAMy experience in Software QA
My experience in Software QA
 
Web testing
Web testingWeb testing
Web testing
 
Web Performance BootCamp 2013
Web Performance BootCamp 2013Web Performance BootCamp 2013
Web Performance BootCamp 2013
 
Recommender systems using collaborative filtering
Recommender systems using collaborative filteringRecommender systems using collaborative filtering
Recommender systems using collaborative filtering
 
Joint search by social and spatial proximity
Joint search by social and spatial proximityJoint search by social and spatial proximity
Joint search by social and spatial proximity
 
Soft requirement
Soft requirementSoft requirement
Soft requirement
 
Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Web Performance Bootcamp 2014
Web Performance Bootcamp 2014
 

Más de Mounia Lalmas-Roelleke

Engagement, Metrics & Personalisation at Scale
Engagement, Metrics &  Personalisation at ScaleEngagement, Metrics &  Personalisation at Scale
Engagement, Metrics & Personalisation at ScaleMounia Lalmas-Roelleke
 
Engagement, metrics and "recommenders"
Engagement, metrics and "recommenders"Engagement, metrics and "recommenders"
Engagement, metrics and "recommenders"Mounia Lalmas-Roelleke
 
Metrics, Engagement & Personalization
Metrics, Engagement & Personalization Metrics, Engagement & Personalization
Metrics, Engagement & Personalization Mounia Lalmas-Roelleke
 
Tutorial on Online User Engagement: Metrics and Optimization
Tutorial on Online User Engagement: Metrics and OptimizationTutorial on Online User Engagement: Metrics and Optimization
Tutorial on Online User Engagement: Metrics and OptimizationMounia Lalmas-Roelleke
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experienceMounia Lalmas-Roelleke
 
Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Mounia Lalmas-Roelleke
 
Tutorial on metrics of user engagement -- Applications to Search & E- commerce
Tutorial on metrics of user engagement -- Applications to Search & E- commerceTutorial on metrics of user engagement -- Applications to Search & E- commerce
Tutorial on metrics of user engagement -- Applications to Search & E- commerceMounia Lalmas-Roelleke
 
An introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information RetrievalAn introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information RetrievalMounia Lalmas-Roelleke
 
Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...
Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...
Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...Mounia Lalmas-Roelleke
 
Social Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the usersSocial Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the usersMounia Lalmas-Roelleke
 
Describing Patterns and Disruptions in Large Scale Mobile App Usage Data
Describing Patterns and Disruptions in Large Scale Mobile App Usage DataDescribing Patterns and Disruptions in Large Scale Mobile App Usage Data
Describing Patterns and Disruptions in Large Scale Mobile App Usage DataMounia Lalmas-Roelleke
 
Story-focused Reading in Online News and its Potential for User Engagement
Story-focused Reading in Online News and its Potential for User EngagementStory-focused Reading in Online News and its Potential for User Engagement
Story-focused Reading in Online News and its Potential for User EngagementMounia Lalmas-Roelleke
 
Mobile advertising: The preclick experience
Mobile advertising: The preclick experienceMobile advertising: The preclick experience
Mobile advertising: The preclick experienceMounia Lalmas-Roelleke
 
Predicting Pre-click Quality for Native Advertisements
Predicting Pre-click Quality for Native AdvertisementsPredicting Pre-click Quality for Native Advertisements
Predicting Pre-click Quality for Native AdvertisementsMounia Lalmas-Roelleke
 
Improving Post-Click User Engagement on Native Ads via Survival Analysis
Improving Post-Click User Engagement on Native Ads via Survival AnalysisImproving Post-Click User Engagement on Native Ads via Survival Analysis
Improving Post-Click User Engagement on Native Ads via Survival AnalysisMounia Lalmas-Roelleke
 
Evaluating the search experience: from Retrieval Effectiveness to User Engage...
Evaluating the search experience: from Retrieval Effectiveness to User Engage...Evaluating the search experience: from Retrieval Effectiveness to User Engage...
Evaluating the search experience: from Retrieval Effectiveness to User Engage...Mounia Lalmas-Roelleke
 
A Journey into Evaluation: from Retrieval Effectiveness to User Engagement
A Journey into Evaluation: from Retrieval Effectiveness to User EngagementA Journey into Evaluation: from Retrieval Effectiveness to User Engagement
A Journey into Evaluation: from Retrieval Effectiveness to User EngagementMounia Lalmas-Roelleke
 

Más de Mounia Lalmas-Roelleke (20)

Engagement, Metrics & Personalisation at Scale
Engagement, Metrics &  Personalisation at ScaleEngagement, Metrics &  Personalisation at Scale
Engagement, Metrics & Personalisation at Scale
 
Engagement, metrics and "recommenders"
Engagement, metrics and "recommenders"Engagement, metrics and "recommenders"
Engagement, metrics and "recommenders"
 
Metrics, Engagement & Personalization
Metrics, Engagement & Personalization Metrics, Engagement & Personalization
Metrics, Engagement & Personalization
 
Tutorial on Online User Engagement: Metrics and Optimization
Tutorial on Online User Engagement: Metrics and OptimizationTutorial on Online User Engagement: Metrics and Optimization
Tutorial on Online User Engagement: Metrics and Optimization
 
Recommending and searching @ Spotify
Recommending and searching @ SpotifyRecommending and searching @ Spotify
Recommending and searching @ Spotify
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experience
 
Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)
 
Search @ Spotify
Search @ Spotify Search @ Spotify
Search @ Spotify
 
Tutorial on metrics of user engagement -- Applications to Search & E- commerce
Tutorial on metrics of user engagement -- Applications to Search & E- commerceTutorial on metrics of user engagement -- Applications to Search & E- commerce
Tutorial on metrics of user engagement -- Applications to Search & E- commerce
 
An introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information RetrievalAn introduction to system-oriented evaluation in Information Retrieval
An introduction to system-oriented evaluation in Information Retrieval
 
Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...
Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...
Friendly, Appealing or Both? Characterising User Experience in Sponsored Sear...
 
Social Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the usersSocial Media and AI: Don’t forget the users
Social Media and AI: Don’t forget the users
 
Advertising Quality Science
Advertising Quality ScienceAdvertising Quality Science
Advertising Quality Science
 
Describing Patterns and Disruptions in Large Scale Mobile App Usage Data
Describing Patterns and Disruptions in Large Scale Mobile App Usage DataDescribing Patterns and Disruptions in Large Scale Mobile App Usage Data
Describing Patterns and Disruptions in Large Scale Mobile App Usage Data
 
Story-focused Reading in Online News and its Potential for User Engagement
Story-focused Reading in Online News and its Potential for User EngagementStory-focused Reading in Online News and its Potential for User Engagement
Story-focused Reading in Online News and its Potential for User Engagement
 
Mobile advertising: The preclick experience
Mobile advertising: The preclick experienceMobile advertising: The preclick experience
Mobile advertising: The preclick experience
 
Predicting Pre-click Quality for Native Advertisements
Predicting Pre-click Quality for Native AdvertisementsPredicting Pre-click Quality for Native Advertisements
Predicting Pre-click Quality for Native Advertisements
 
Improving Post-Click User Engagement on Native Ads via Survival Analysis
Improving Post-Click User Engagement on Native Ads via Survival AnalysisImproving Post-Click User Engagement on Native Ads via Survival Analysis
Improving Post-Click User Engagement on Native Ads via Survival Analysis
 
Evaluating the search experience: from Retrieval Effectiveness to User Engage...
Evaluating the search experience: from Retrieval Effectiveness to User Engage...Evaluating the search experience: from Retrieval Effectiveness to User Engage...
Evaluating the search experience: from Retrieval Effectiveness to User Engage...
 
A Journey into Evaluation: from Retrieval Effectiveness to User Engagement
A Journey into Evaluation: from Retrieval Effectiveness to User EngagementA Journey into Evaluation: from Retrieval Effectiveness to User Engagement
A Journey into Evaluation: from Retrieval Effectiveness to User Engagement
 

Último

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Último (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Which Vertical Search Engines are Relevant? Understanding Vertical Relevance Assessments for Web Queries

  • 1. Which  Ver)cal  Search  Engines     are  Relevant?   Understanding  Ver)cal  Relevance  Assessments  for  Web  Queries     Ke  Zhou1,  Ronan  Cummins2,  Mounia  Lalmas3,  Joemon  M.  Jose1   1University  of  Glasgow     2University  of  Greenwich     3Yahoo!  Labs  Barcelona   WWW  2013,  Rio  de  Janeiro  
  • 2. Aggregated  Search   •  Diverse  search  ver)cals   (image,  video,  news,  etc.)   are  available  on  the  web.   •  Aggrega)ng  (embedding)   ver)cal  results  into   “general  web”  results  has   become  de-­‐facto  in   commercial  web  search   engine.   Ver)cal   search   engines   General   web   search   Mo)va)on  
  • 3. Aggregated  Search   •  Diverse  search  ver)cals   (image,  video,  news,  etc.)   are  available  on  the  web.   •  Aggrega)ng  (embedding)   ver)cal  results  into   “general  web”  results  has   become  de-­‐facto  in   commercial  web  search   engine.   Ver)cal   search   engines   General   web   search   Mo)va)on   Ver)cal   selec)on  
  • 4. Evalua)on  of  Aggregated  Search   •  Evalua)on  solely  based   on  ver)cal  selec)on.   •  Compare  system   predic)on  set  against   user  annota%on  set.   •  Annota)on  is  gathered   – Explicitly  (assessing)   – Implicitly  (deriving  from   search  logs)   Mo)va)on   assessor   Topic:  yoga  poses   System  A   System  B   System  C  >  System  B  >  System  A   System  C  
  • 5. Assessor:  which  ver)cal  search  engines  are  relevant?   •  Defini)on  of  relevance  of  a  ver)cal,  given   a  query,  remains  complex.   – Different  work  makes  different  assump)ons.   – The  underlying  assump)ons  made  may  have   a  major  effect  on  the  evalua)on  of  a  SERP.     •  We  want  to  understand  different  ver)cal   assessment  processes  and  inves)gate  the   impact  of  these.   Mo)va)on  
  • 6. Assessor:  which  ver)cal  search  engines  are  relevant?   •  Defini)on  of  relevance  of  a  ver)cal,  given   a  query,  remains  complex.   – Different  work  makes  different  assump)ons.   – The  underlying  assump)ons  made  may  have   a  major  effect  on  the  evalua)on  of  a  SERP.     •  We  want  to  understand  different  ver)cal   assessment  processes  and  inves)gate  the   impact  of  these.   Mo)va)on  
  • 7. (RQ1)  Assump)ons:  user  perspec)ve   •  Pre-­‐retrieval:   –  Ver%cal  Orienta%on:  before  issuing  the   query,  the  user  thinks  about  which   ver)cals  might  provide  be`er  results.   •  Post-­‐retrieval:     –  Aaer  viewing  search  results,  the  user   considers  which  ver)cal  provides  be`er   results.   •  Influencing  factors   –  Ver)cal  orienta)on  (type  preference)   –  Within-­‐ver)cal  ranking   •  Serendipity     –  Visual  a`rac)veness   Problem  and   Previous  Work   pre-­‐retrieval   user-­‐need  
  • 8. (RQ1)  Assump)ons:  user  perspec)ve   •  Pre-­‐retrieval:   –  Ver%cal  Orienta%on:  before  issuing  the   query,  the  user  thinks  about  which   ver)cals  might  provide  be`er  results.   •  Post-­‐retrieval:     –  Aaer  viewing  search  results,  the  user   considers  which  ver)cal  provides  be`er   results.   •  Influencing  factors   –  Ver)cal  orienta)on  (type  preference)   –  Within-­‐ver)cal  ranking   •  Serendipity     –  Visual  a`rac)veness   Problem  and   Previous  Work   ……   Post-­‐retrieval   user  perspec)ve    
  • 9. (RQ1)  Assump)ons:  user  perspec)ve   •  Pre-­‐retrieval:   –  Ver%cal  Orienta%on:  before  issuing  the   query,  the  user  thinks  about  which   ver)cals  might  provide  be`er  results.   •  Post-­‐retrieval:     –  Aaer  viewing  search  results,  the  user   considers  which  ver)cal  provides  be`er   results.   •  Influencing  factors   –  Ver)cal  orienta)on  (type  preference)   –  Within-­‐ver)cal  ranking   •  Serendipity     –  Visual  a`rac)veness   Problem  and   Previous  Work   ……   pre-­‐retrieval   user-­‐need   Post-­‐retrieval   user  perspec)ve    
  • 10. (RQ1)  Assump)ons:  user  perspec)ve   •  Pre-­‐retrieval:   –  Ver%cal  Orienta%on:  before  issuing  the   query,  the  user  thinks  about  which   ver)cals  might  provide  be`er  results.   •  Post-­‐retrieval:     –  Aaer  viewing  search  results,  the  user   considers  which  ver)cal  provides  be`er   results.   •  Influencing  factors   –  Ver)cal  orienta)on  (type  preference)   –  Within-­‐ver)cal  ranking   •  Serendipity     –  Visual  a`rac)veness   Problem  and   Previous  Work   ……   pre-­‐retrieval   user-­‐need   Post-­‐retrieval   user  perspec)ve    
  • 11. (RQ2)  Assump)ons:  dependency  of  relevance     •  Inter-­‐dependent  approach:     –  quality  of  ver)cals  is  rela)ve  and   dependent  on  each  other.   •  Web-­‐anchor  approach:     –  quality  of  “general  web”  serves  as  a   reference  criteria  for  deciding  relevance.   •  Context     –  Does  the  context  (results  returned  from   other  ver)cals)  affect  a  user’s  percep)on   of  the  relevance  of  the  ver)cal  of   interest?   •  U)lity  vs.  Effort   Problem  and   Previous  Work   Inter-­‐dependent   approach  
  • 12. (RQ2)  Assump)ons:  dependency  of  relevance     •  Inter-­‐dependent  approach:     –  quality  of  ver)cals  is  rela)ve  and   dependent  on  each  other.   •  Web-­‐anchor  approach:     –  quality  of  “general  web”  serves  as  a   reference  criteria  for  deciding  relevance.   •  Context     –  Does  the  context  (results  returned  from   other  ver)cals)  affect  a  user’s  percep)on   of  the  relevance  of  the  ver)cal  of   interest?   •  U)lity  vs.  Effort   Problem  and   Previous  Work   Web-­‐anchor   approach  
  • 13. (RQ2)  Assump)ons:  dependency  of  relevance     •  Inter-­‐dependent  approach:     –  quality  of  ver)cals  is  rela)ve  and   dependent  on  each  other.   •  Web-­‐anchor  approach:     –  quality  of  “general  web”  serves  as  a   reference  criteria  for  deciding  relevance.   •  Context     –  Does  the  context  (results  returned  from   other  ver)cals)  affect  a  user’s  percep)on   of  the  relevance  of  the  ver)cal  of   interest?   •  U)lity  vs.  Effort   Problem  and   Previous  Work   Inter-­‐dependent   approach   Web-­‐anchor   approach  
  • 14. (RQ2)  Assump)ons:  dependency  of  relevance     •  Inter-­‐dependent  approach:     –  quality  of  ver)cals  is  rela)ve  and   dependent  on  each  other.   •  Web-­‐anchor  approach:     –  quality  of  “general  web”  serves  as  a   reference  criteria  for  deciding  relevance.   •  Context     –  Does  the  context  (results  returned  from   other  ver)cals)  affect  a  user’s  percep)on   of  the  relevance  of  the  ver)cal  of   interest?   •  U)lity  vs.  Effort   Problem  and   Previous  Work   Inter-­‐dependent   approach   Web-­‐anchor   approach  
  • 15. (RQ2)  Assump)ons:  dependency  of  relevance     •  Inter-­‐dependent  approach:     –  quality  of  ver)cals  is  rela)ve  and   dependent  on  each  other.   •  Web-­‐anchor  approach:     –  quality  of  “general  web”  serves  as  a   reference  criteria  for  deciding  relevance.   •  Context     –  Does  the  context  (results  returned  from   other  ver)cals)  affect  a  user’s  percep)on   of  the  relevance  of  the  ver)cal  of   interest?   •  U)lity  vs.  Effort   Problem  and   Previous  Work   Inter-­‐dependent   approach   Web-­‐anchor   approach  
  • 16. (RQ3)  Assump)ons:  assessment  grade   •  Binary  (pairwise)  preference   •  Mul)-­‐grade  preference   •  SERP  (one  possible  slot)   –  ToP:  top  of  the  page   –  NS:  not  shown   •  Is  the  binary  (pairwise)  preference   informa)on  provided  by  a  popula)on   of  users  able  to  predict  the  “perfect”   embedding  posi)on  of  a  ver)cal?   Problem  and   Previous  Work   Binary   preference   (ToP  or  NS)   End  of  SERP   ToP  
  • 17. (RQ3)  Assump)ons:  assessment  grade   •  Binary  (pairwise)  preference   •  Mul)-­‐grade  preference   •  SERP  (three  possible  slots)   –  ToP:  top  of  the  page   –  MoP:  middle  of  the  page   –  BoP:  bo`om  of  the  page   –  NS:  not  shown   •  Is  the  binary  (pairwise)  preference   informa)on  provided  by  a  popula)on   of  users  able  to  predict  the  “perfect”   embedding  posi)on  of  a  ver)cal?   Problem  and   Previous  Work   Mul)-­‐grade   preference   (ToP,  MoP,   BoP  or  NS)   End  of  SERP   ToP   MoP   BoP  
  • 18. (RQ3)  Assump)ons:  assessment  grade   •  Binary  (pairwise)  preference   •  Mul)-­‐grade  preference   •  SERP  (three  possible  slots)   –  ToP:  top  of  the  page   –  MoP:  middle  of  the  page   –  BoP:  bo`om  of  the  page   –  NS:  not  shown   •  Is  the  binary  (pairwise)  preference   informa)on  provided  by  a  popula)on   of  users  able  to  predict  the  “perfect”   embedding  posi)on  of  a  ver)cal?   Problem  and   Previous  Work   Binary   preference   (ToP  or  NS)   Mul)-­‐grade   preference   (ToP,  MoP,   BoP  or  NS)   End  of  SERP   ToP   End  of  SERP   ToP   MoP   BoP  
  • 19. (RQ3)  Assump)ons:  assessment  grade   •  Binary  (pairwise)  preference   •  Mul)-­‐grade  preference   •  SERP  (three  possible  slots)   –  ToP:  top  of  the  page   –  MoP:  middle  of  the  page   –  BoP:  bo`om  of  the  page   –  NS:  not  shown   •  Is  the  binary  (pairwise)  preference   informa)on  provided  by  a  popula)on   of  users  able  to  predict  the  “perfect”   embedding  posi)on  of  a  ver)cal?   Problem  and   Previous  Work   Binary   preference   (ToP  or  NS)   Mul)-­‐grade   preference   (ToP,  MoP,   BoP  or  NS)   End  of  SERP   ToP   End  of  SERP   ToP   MoP   BoP  
  • 20. Experimental  Design  Overview   •  Manipula)on  (Independent)  Variables   –  Search  Tasks   –  Ver)cals  of  Interest   –  User  Perspec)ve  (Study  1:  RQ1)   –  Dependency  of  Relevance  (Study  2:  RQ2)   –  Assessment  Grade  (Study  3:  RQ3)   Experimental   Design   •  Dependent  Variables   –  Inter-­‐assessor  Agreement   •  Measured  by  Fleiss’  Kappa  (KF)   –  Ver)cal  Relevance  Correla)on   •  Measured  by  Spearman  Correla)on   RQ1   RQ2  RQ3  
  • 21. Experimental  Design  Overview   •  Manipula)on  (Independent)  Variables   –  Search  Tasks   –  Ver)cals  of  Interest   –  User  Perspec)ve  (Study  1:  RQ1)   –  Dependency  of  Relevance  (Study  2:  RQ2)   –  Assessment  Grade  (Study  3:  RQ3)   Experimental   Design   •  Dependent  Variables   –  Inter-­‐assessor  Agreement   •  Measured  by  Fleiss’  Kappa  (KF)   –  Ver)cal  Relevance  Correla)on   •  Measured  by  Spearman  Correla)on   RQ1   RQ2  RQ3  
  • 22. Experiment  Design  Details   •  Crowd-­‐sourcing  Data  Collec)on   –  We  hire  crowd-­‐sourced  workers  on  Amazon  Mechanical  Turk   to  make  assessments.   •  Ver)cals   –  Cover  a  variety  of  11  ver)cals  employed  by  three  major   commercial  search  engines.   –  Use  exis)ng  commercial  ver)cal  search  engines.   •  Search  Tasks   –  44  tasks  cover  a  variety  of  (ver)cal)  intents     –  Come  from  exis)ng  aggregated  search  collec)on  (TREC)   •  Quality  Control   –  4  assessment  points  for  one  manipula)on   –  Trap  HITs  (assessment  page  with  results  of  other  queries)   –  Trap  search  tasks  (assessment  page  with  explicit  ver)cal   request)   Experimental   Design  
  • 23. Experiment  Design  Details   •  Crowd-­‐sourcing  Data  Collec)on   –  We  hire  crowd-­‐sourced  workers  on  Amazon  Mechanical  Turk   to  make  assessments.   •  Ver)cals   –  Cover  a  variety  of  11  ver)cals  employed  by  three  major   commercial  search  engines.   –  Use  exis)ng  commercial  ver)cal  search  engines.   •  Search  Tasks   –  44  tasks  cover  a  variety  of  (ver)cal)  intents     –  Come  from  exis)ng  aggregated  search  collec)on  (TREC)   •  Quality  Control   –  4  assessment  points  for  one  manipula)on   –  Trap  HITs  (assessment  page  with  results  of  other  queries)   –  Trap  search  tasks  (assessment  page  with  explicit  ver)cal   request)   Experimental   Design  
  • 24. Experiment  Design  Details   •  Crowd-­‐sourcing  Data  Collec)on   –  We  hire  crowd-­‐sourced  workers  on  Amazon  Mechanical  Turk   to  make  assessments.   •  Ver)cals   –  Cover  a  variety  of  11  ver)cals  employed  by  three  major   commercial  search  engines.   –  Use  exis)ng  commercial  ver)cal  search  engines.   •  Search  Tasks   –  44  tasks  cover  a  variety  of  (ver)cal)  intents     –  Come  from  exis)ng  aggregated  search  collec)on  (TREC)   •  Quality  Control   –  4  assessment  points  for  one  manipula)on   –  Trap  HITs  (assessment  page  with  results  of  other  queries)   –  Trap  search  tasks  (assessment  page  with  explicit  ver)cal   request)   Experimental   Design  
  • 25. Experiment  Design  Details   •  Crowd-­‐sourcing  Data  Collec)on   –  We  hire  crowd-­‐sourced  workers  on  Amazon  Mechanical  Turk   to  make  assessments.   •  Ver)cals   –  Cover  a  variety  of  11  ver)cals  employed  by  three  major   commercial  search  engines.   –  Use  exis)ng  commercial  ver)cal  search  engines.   •  Search  Tasks   –  44  tasks  cover  a  variety  of  (ver)cal)  intents     –  Come  from  exis)ng  aggregated  search  collec)on  (TREC)   •  Quality  Control   –  4  assessment  points  for  one  manipula)on   –  Trap  HITs  (assessment  page  with  results  of  other  queries)   –  Trap  search  tasks  (assessment  page  with  explicit  ver)cal   request)   Experimental   Design  
  • 26. Experimental  Design:  Study  1   •  Manipula)on  (Independent)  Variables   –  Search  Tasks   –  Ver)cals  of  Interest   –  User  Perspec)ve  (Study  1:  RQ1)   –  Dependency  of  Relevance  (Study  2:  RQ2)   –  Assessment  Grade  (Study  3:  RQ3)   Experimental   Design   •  Dependent  Variables   –  Inter-­‐assessor  Agreement   •  Measured  by  Fleiss’  Kappa  (KF)   –  Ver)cal  Relevance  Correla)on   •  Measured  by  Spearman  Correla)on   RQ1  
  • 27. Study  1  Results:     pre-­‐retrieval  vs.  post-­‐retrieval   •  Both  pre-­‐retrieval  and  post-­‐retrieval  inter-­‐assessor   agreements  are  moderate  and  assessors  have  the   similar  level  of  difficulty  in  assessing  for  both.   •  Ver)cal  relevance  are  moderately  (but  significantly)   correlated  (0.53)  between  pre-­‐retrieval  and  post-­‐ retrieval.   •  Highly  relevant  ver)cals  derived  from  pre-­‐retrieval  and   post-­‐retrieval  overlap  significantly.   –  Almost  60%  overlap  on  at  least  2  out  of  3  top  ver)cals   •  There  is  a  bias  in  visually  salient  ver)cals  for  post-­‐ retrieval  search  u%lity.     Experimental   Results  
  • 28. Study  1  Results:     pre-­‐retrieval  vs.  post-­‐retrieval   •  Both  pre-­‐retrieval  and  post-­‐retrieval  inter-­‐assessor   agreements  are  moderate  and  assessors  have  the   similar  level  of  difficulty  in  assessing  for  both.   •  Ver)cal  relevance  are  moderately  (but  significantly)   correlated  (0.53)  between  pre-­‐retrieval  and  post-­‐ retrieval.   •  Highly  relevant  ver)cals  derived  from  pre-­‐retrieval  and   post-­‐retrieval  overlap  significantly.   –  Almost  60%  overlap  on  at  least  2  out  of  3  top  ver)cals   •  There  is  a  bias  in  visually  salient  ver)cals  for  post-­‐ retrieval  search  u%lity.     Experimental   Results  
  • 29. Study  1  Results:     pre-­‐retrieval  vs.  post-­‐retrieval   •  Both  pre-­‐retrieval  and  post-­‐retrieval  inter-­‐assessor   agreements  are  moderate  and  assessors  have  the   similar  level  of  difficulty  in  assessing  for  both.   •  Ver)cal  relevance  are  moderately  (but  significantly)   correlated  (0.53)  between  pre-­‐retrieval  and  post-­‐ retrieval.   •  Highly  relevant  ver)cals  derived  from  pre-­‐retrieval  and   post-­‐retrieval  overlap  significantly.   –  Almost  60%  overlap  on  at  least  2  out  of  3  top  ver)cals   •  There  is  a  bias  in  visually  salient  ver)cals  for  post-­‐ retrieval  search  u%lity.     Experimental   Results  
  • 30. Study  1  Results:     pre-­‐retrieval  vs.  post-­‐retrieval   •  Both  pre-­‐retrieval  and  post-­‐retrieval  inter-­‐assessor   agreements  are  moderate  and  assessors  have  the   similar  level  of  difficulty  in  assessing  for  both.   •  Ver)cal  relevance  are  moderately  (but  significantly)   correlated  (0.53)  between  pre-­‐retrieval  and  post-­‐ retrieval.   •  Highly  relevant  ver)cals  derived  from  pre-­‐retrieval  and   post-­‐retrieval  overlap  significantly.   –  Almost  60%  overlap  on  at  least  2  out  of  3  top  ver)cals   •  There  is  a  bias  in  visually  salient  ver)cals  for  post-­‐ retrieval  search  u%lity.     Experimental   Results  
  • 31. Study  1  Results:     topical  relevance  vs.  pre-­‐retrieval  orienta)on   Experimental   Results   N R N R R N nDCG(vi) nDCG(w)   ……   Orienta)onTopical  relevance •  Ver)cal  relevance  between  pre-­‐ retrieval  orienta)on  and  post-­‐ retrieval  is  moderately  correlated   (0.53).   •  Ver)cal  relevance  between  topical-­‐ relevance  and  post-­‐retrieval  is   weakly  correlated  (0.36).   •  Impact  of  pre-­‐retrieval  orienta%on  is   more  important  for  post-­‐retrieval   search  u%lity,  compared  with  post-­‐ retrieval  topical  relevance.   Post-­‐retrieval   search  u)lity
  • 32. Study  1  Results:     topical  relevance  vs.  pre-­‐retrieval  orienta)on   Experimental   Results   N R N R R N nDCG(vi) nDCG(w)   ……   Orienta)onTopical  relevance •  Ver)cal  relevance  between  pre-­‐ retrieval  orienta)on  and  post-­‐ retrieval  is  moderately  correlated   (0.53).   •  Ver)cal  relevance  between  topical-­‐ relevance  and  post-­‐retrieval  is   weakly  correlated  (0.36).   •  Impact  of  pre-­‐retrieval  orienta%on  is   more  important  for  post-­‐retrieval   search  u%lity,  compared  with  post-­‐ retrieval  topical  relevance.   Post-­‐retrieval   search  u)lity
  • 33. Study  1  Results:     topical  relevance  vs.  pre-­‐retrieval  orienta)on   •  Ver)cal  relevance  between  pre-­‐ retrieval  orienta)on  and  post-­‐ retrieval  is  moderately  correlated   (0.53).   •  Ver)cal  relevance  between  topical-­‐ relevance  and  post-­‐retrieval  is   weakly  correlated  (0.36).   •  Impact  of  pre-­‐retrieval  orienta%on  is   more  important  for  post-­‐retrieval   search  u%lity,  compared  with  post-­‐ retrieval  topical  relevance.   Experimental   Results   N R N R R N nDCG(vi) nDCG(w)   ……   Orienta)onTopical  relevance Post-­‐retrieval   search  u)lity
  • 34. Experimental  Design:  Study  2   •  Manipula)on  (Independent)  Variables   –  Search  Tasks   –  Ver)cals  of  Interest   –  User  Perspec)ve  (Study  1:  RQ1)   –  Dependency  of  Relevance  (Study  2:  RQ2)   –  Assessment  Grade  (Study  3:  RQ3)   Experimental   Design   •  Dependent  Variables   –  Inter-­‐assessor  Agreement   •  Measured  by  Fleiss’  Kappa  (KF)   –  Ver)cal  Relevance  Correla)on   •  Measured  by  Spearman  Correla)on   RQ2  
  • 35. Study  2  (Dependency  of  Relevance)  Results   •  Both  inter-­‐assessor  agreements  are  moderate  and   there  is  not  much  difference  between  the  user   agreement  for  both  approaches.   •  Ver)cal  relevance  correla)on  between  inter-­‐ dependent  and  web-­‐anchor  approach  is  moderate   (0.573).   •  The  overlap  of  top-­‐three  relevant  ver)cals  between   two  approaches  is  quite  high.   –  More  than  70%  overlap  on  2  out  of  3  top  ver)cals.   •  Web-­‐anchor  approach  provides  be`er  trade-­‐off   between  u)lity  and  effort.     Experimental   Results  
  • 36. Study  2  (Dependency  of  Relevance)  Results   •  Both  inter-­‐assessor  agreements  are  moderate  and   there  is  not  much  difference  between  the  user   agreement  for  both  approaches.   •  Ver)cal  relevance  correla)on  between  inter-­‐ dependent  and  web-­‐anchor  approach  is  moderate   (0.573).   •  The  overlap  of  top-­‐three  relevant  ver)cals  between   two  approaches  is  quite  high.   –  More  than  70%  overlap  on  2  out  of  3  top  ver)cals.   •  Web-­‐anchor  approach  provides  be`er  trade-­‐off   between  u)lity  and  effort.   Experimental   Results  
  • 37. Study  2  (Dependency  of  Relevance)  Results   •  Both  inter-­‐assessor  agreements  are  moderate  and   there  is  not  much  difference  between  the  user   agreement  for  both  approaches.   •  Ver)cal  relevance  correla)on  between  inter-­‐ dependent  and  web-­‐anchor  approach  is  moderate   (0.573).   •  The  overlap  of  top-­‐three  relevant  ver)cals  between   two  approaches  is  quite  high.   –  More  than  70%  overlap  on  2  out  of  3  top  ver)cals.   •  Web-­‐anchor  approach  provides  be`er  trade-­‐off   between  u)lity  and  effort.   Experimental   Results  
  • 38. Study  2  (Dependency  of  Relevance)  Results   •  Both  inter-­‐assessor  agreements  are  moderate  and   there  is  not  much  difference  between  the  user   agreement  for  both  approaches.   •  Ver)cal  relevance  correla)on  between  inter-­‐ dependent  and  web-­‐anchor  approach  is  moderate   (0.573).   •  The  overlap  of  top-­‐three  relevant  ver)cals  between   two  approaches  is  quite  high.   –  More  than  70%  overlap  on  2  out  of  3  top  ver)cals.   •  Web-­‐anchor  approach  provides  be`er  trade-­‐off   between  u)lity  and  effort.   Experimental   Results  
  • 39. Study  2  (Dependency  of  Relevance)  Results   •  Not  much  difference  is  observed  by  using   different  anchors  (different  observed   topical  relevance  level).   •  Context  ma`ers   –  The  context  of  other  ver)cals  can  diminish   the  u)lity  of  a  ver)cal.   –  Examples:  (“Answer”,  “Wiki”),  (“Books”,   “Scholar”),  etc.     Experimental   Results  
  • 40. Study  2  (Dependency  of  Relevance)  Results   •  Not  much  difference  is  observed  by  using   different  anchors  (different  observed   topical  relevance  level).   •  Context  ma`ers   –  The  context  of  other  ver)cals  can  diminish   the  u)lity  of  a  ver)cal.   –  Examples:  (“Answer”,  “Wiki”),  (“Books”,   “Scholar”),  etc.     Experimental   Results  
  • 41. Experimental  Design:  Study  3   •  Manipula)on  (Independent)  Variables   –  Search  Tasks   –  Ver)cals  of  Interest   –  User  Perspec)ve  (Study  1:  RQ1)   –  Dependency  of  Relevance  (Study  2:  RQ2)   –  Assessment  Grade  (Study  3:  RQ3)   Experimental   Design   •  Dependent  Variables   –  Inter-­‐assessor  Agreement   •  Measured  by  Fleiss’  Kappa  (KF)   –  Ver)cal  Relevance  Correla)on   •  Measured  by  Spearman  Correla)on   RQ3  
  • 42. Study  3  (Assessment  Grade)  Results   •  Deriving  “perfect”  embedding  posi)on  from  mul)-­‐graded   assessments   •  Thresholding  for  binary  assessment  (User  type  simula)on)   –  Risk-­‐seeking     –  Risk-­‐medium     –  Risk-­‐averse       Experimental   Results   1.0 0.50.75 0.25 0.25 0.0 …… 0.0…… Risk-­‐seeking Risk-­‐medium Risk-­‐averse ToP ToP ToP MoP MoP BoP BoP MoP BoP NS NS Ver)cals Majority  preference
  • 43. Study  3  (Assessment  Grade)  Results   •  Inter-­‐assessor  agreement  are   moderate  and  different  users  have   different  risk-­‐level.   •  Ver)cal  Relevance  Correla)on   – Most  of  the  binary  approach   significantly  correlates  with  the  mul)-­‐ graded  ground-­‐truth,  however  mostly   are  modest.   – Risk-­‐medium  thresholding  approach   performs  best.   Experimental   Results  
  • 44. Study  3  (Assessment  Grade)  Results   •  Inter-­‐assessor  agreement  are   moderate  and  different  users  have   different  risk-­‐level.   •  Ver)cal  Relevance  Correla)on   – Most  of  the  binary  approach   significantly  correlates  with  the  mul)-­‐ graded  ground-­‐truth,  however  mostly   are  modest.   – Risk-­‐medium  thresholding  approach   performs  best.   Experimental   Results  
  • 45. Final  take-­‐out •  Study  1   –  Assessing  for  aggregated  search  is  difficult.   –  Highly  relevant  ver)cals  overlaps  significantly  for  pre-­‐retrieval  and  post-­‐retrieval   user  perspec)ves.   –  Ver)cal  (type)  orienta)on  is  more  important  than  topical  relevance.   •  Study  2   –  Anchor-­‐based  approach  might  be  a  be`er  approach  than  inter-­‐dependent   approach,  with  respect  to  u)lity-­‐effort  trade-­‐off.     –  Context  ma`ers.   •  Study  3   –  Binary  approach  can  be  used  to  determine  “perfect”  embedding  posi)on  of  the   ver)cals  and  it  performs  rela)vely  well  with  not  a  lot  of  assessments.  
  • 46. Final  take-­‐out •  Study  1   –  Assessing  for  aggregated  search  is  difficult.   –  Highly  relevant  ver)cals  overlaps  significantly  for  pre-­‐retrieval  and  post-­‐retrieval   user  perspec)ves.   –  Ver)cal  (type)  orienta)on  is  more  important  than  topical  relevance.   •  Study  2   –  Anchor-­‐based  approach  might  be  a  be`er  approach  than  inter-­‐dependent   approach,  with  respect  to  u)lity-­‐effort  trade-­‐off.     –  Context  ma`ers.   •  Study  3   –  Binary  approach  can  be  used  to  determine  “perfect”  embedding  posi)on  of  the   ver)cals  and  it  performs  rela)vely  well  with  not  a  lot  of  assessments.  
  • 47. Final  take-­‐out •  Study  1   –  Assessing  for  aggregated  search  is  difficult.   –  Highly  relevant  ver)cals  overlaps  significantly  for  pre-­‐retrieval  and  post-­‐retrieval   user  perspec)ves.   –  Ver)cal  (type)  orienta)on  is  more  important  than  topical  relevance.   •  Study  2   –  Anchor-­‐based  approach  might  be  a  be`er  approach  than  inter-­‐dependent   approach,  with  respect  to  u)lity-­‐effort  trade-­‐off.     –  Context  ma`ers.   •  Study  3   –  Binary  approach  can  be  used  to  determine  “perfect”  embedding  posi)on  of  the   ver)cals  and  it  performs  rela)vely  well  with  not  a  lot  of  assessments.  
  • 48. Conclusions   •  We  compare  different  ver)cal  relevance  assessment  processes   and  analyzed  their  impact.   •  Our  work  has  implica)ons  with  regard  to  "how"  and  "what"   evalua)on  design  decisions  affect  the  actual  evalua)on.     •  This  work  also  creates  a  need  to  re-­‐interpret  previous   evalua)on  efforts  in  this  area.