SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
Clustering Search Query Log Data to Improve Search

Sophy Bishop & Ravi Mynampaty

                                     Copyright © President & Fellows of Harvard College.
Agenda

     Background
     Five W’s of Clustering
      •   What, why, who, how, when
     Is it really repeatable?
     Questions
About Information Management Services (IMS)

                                   Analytics




           Lifecycle                                     Metadata
            Mgmt.                                         Mgmt.
                                - Standards
                                - Best Practices
                                - User Needs
                                - Service Models




                                                   Taxonomy
                       Search
                                                     Dev.
Inspired by…

Chapters 8 & 9
About this talk…

   Case study on how we are improving search and
    browse by performing clustering exercises on your
    search query data
   Not rocket science
   High-level overview
   You can follow this method, with your own insights and
    tweaks

   You can kick this off next week at your work
What is clustering?

A process for organizing and analyzing search log
data that:
   Is repeatable, low-cost, scalable, simple

   Yields actionable results
   Supports constant incremental improvement
    to search
What’s clustering good for?

   Ensure results for high frequency queries

   Improve Metadata and Taxonomy

   Inform and validate decision making in site IA

   Informs editorial/curatorial activities

   Provides Feedback for Search Suggestions
      o   Autosuggest, synonym lists, no-hits page
          suggestions
   But more on this later...
So how do I cluster search queries?

A simple set of steps
                                 Create
                               query report



               Draw
                                                  Cluster
            conclusions
                                                  queries
             and ACT




                                          Determine #
                    Analyze
                                           queries to
                    clusters
                                            analyze
Step 1: Create a query report

We started with the site with the most traffic
  • Upper-bound limit
  • One year’s data by quarter
  • Cut off tail at frequency < 10
Step 1: Create a query report

 We started with the site with the most traffic
     • Upper-bound limit
HBS Working Knowledge FY12 Use Snapshot
     • One year’s data by quarter
Overall Traffic
     • Cut off tail at frequency < 10
   Page Views:             6,439,485
   Visits:                 3,635,746
   Unique visitors:        2,734,620
   On-site searches:          174,425
   Views per Visit:              1.77
   Local Search visit rate:        5%
   Organic Search visit rate:     46%
Step 2: Cluster the queries
Step 2 (cont’d): Three levels of clustering
Level           Method             Example


Narrow          Simple             Eliminate
                normalization      grammatical,
                                   spelling, typos, and
                                   punctuation
                                   differences
Mid-level       Group by subject   management,
                                   finance, decision
                                   making
Broad           Group by facet     topic, name, date,
                                   content type
Step 2 (cont’d): Levels  Tasks Enabled

Level       Improve your   Ensure           Improve        Improve
            base for       representation   Metadata/Index Search
            query          of major         /Taxonomy      Suggestions
            analysis       clusters on your
                           site
Narrow           X                               X             X
(simple)

Mid-level                        X               X             X
(group by
subject)
Broad                            X               X
(group by
facet)
Step 2 (cont’d): Narrow Clustering Example
Step 2 (cont’d): Mid-level Example
Cluster                         brand
branding                                245
brand                                   160
brand management                         73
consumer branding                        57
global brand                             32
service brands                           24
brand image retail bank                  17
employer branding                        16
brand management professional
services                                16
global branding                         13
b2b branding                            13
importance of branding                  12
brand 2002                              12
brand equity                            11
brand image                             11
Cluster                         brand
Step 2 (cont’d): Mid-level Example
branding                                245
brand                                   160
brand management                         73
consumer branding                        57
global brand                             32
service brands                           24
brand image retail bank                  17
employer branding                        16
brand management professional
services                                 16
global branding                          13
b2b branding                             13
importance of branding                   12
brand 2002                               12
brand equity                             11
brand image                              11
Cluster                                  customer
                                             brand
 Step 2 (cont’d): Mid-level Example
350
      333
branding                                                                                    245
brand
300                                                                                         160
brand management                                                                                 73
250
consumer branding                                                                                57
global brand
200
                                                                                                 32
            179
service brands                                                                                   24
          145
brand image retail bank
150                                                                                              17
employer branding101
             111                                                                                 16
100
brand management professional
                     88

services                                                                                         16
 50                       40
global branding                26   26    25   20
                                                                                                 13
                                                    19   15   14   12   12   11   11   10   10    10
b2b branding
 0
                                                                                                 13
importance of branding                                                                           12
brand 2002                                                                                       12
brand equity                                                                                     11
brand image                                                                                      11
Step 2 (cont’d): Broad Clustering Example
Step 2 (cont’d): List of facets we used
Facet                           Example
                              case studies, cases, working papers, articles,
content type
                              newspaper
date                          2011, world in 2030
demographic characteristics   women, Gen Y, gender, baby boomers
event                         economic crisis
format                        podcast, video
geographic area               india, japan, mount everest
industry                      global wine industry
                              independent director, entrepreneur, ceo, phd
job type/role
                              economist
organization name             ikea, zara, toyota
person name                   michael porter, kanter, sebenius
product name / brand name      ipad
product/commodity             coffee, wine, cement
topic                         this covers the majority of keywords
                              faculty work, ex: publication name, title of a
work
                              case
Step 3: Choose #clusters to analyze
Number of   Analyze Top Hits   Improve Metadata/   Supply Search
Clusters                       Taxonomy            Suggestions
Analyzed                       /Index




50                  X


150                X                   X


300+               X                   X                  X
Small # Clusters can cover a lot of your data

  Number of top clusters     % Total Queries

Top 20 clusters                    14

Top 30 clusters                    18

Top 50 clusters                    26

Top 100 clusters                   37
Now you have your clusters…

What do you do with them?



           TAKE ACTION!
Analyze Top (“Short Head”) Clusters

Clustering has created a condensed and reliable
list of your top search queries
   Are they what you thought they would be?
   Does the information on your site accurately
    represent the top searches?
   Are you fulfilling user needs?
Use your clusters: Improve Site Navigation


Examine the short-head of clusters, basically:
   For each cluster, add up the frequencies
    of queries
   Reorder clusters by cumulative frequency
    descending
   Ensure top clusters are accounted for in your
    navigation
   Use cluster topics as browse/navigation
    headers/footers for your website
WK Top Clusters
Cluster                           Frequency
innovation                        867

balanced scorecard                794

leadership                        570
cases                             545

social media                      508

negotiation                       470

knowledge management              457
ethics                            448

apple                             430
corporate social responsibility   398
Use your clusters: Improve Taxonomy

•   Missing categories in browse taxonomy
    •   "Balanced Scorecard"
    •   “Ethics”
    •   “Social media”

•   Second-level topics in the WK context
Use your clusters: Improve Taxonomy

•   Missing categories in browse taxonomy
    •   "Balanced Scorecard"
    •   “Ethics”
    •   “Social media”

•   Second-level topics in the WK context
Use your clusters: Improve Taxonomy

•   Missing categories in browse taxonomy
    •   "Balanced Scorecard"
    •   “Ethics”
    •   “Social media”

•   Second-level topics in the WK context
Mid-level clustering:
Informs editorial /curatorial activities
   “Featured Topics”
     o  What topics to highlight this week/month/year
     o  News items to focus on
     o  What research guides to create
     o  How to formulate queries for the topics
Use your clusters: Improve Synonym Handling

   Clustered list provides synonyms for taxonomy
   Requires human judgment and
    standards/guidelines for synonyms – in our
    case, synonyms are exact
   Map to one "like term" in the search engine

    Example:
      Balanced Scorecard, BSC, Balanced score card
      kaplan and norton -> Balanced Scorecard
Use your clusters: Improve no-hits page
Time Commitment
•   2 hours to 2 weeks

•    Variables include:
    •   What kind of information you want to gather
    •   How broad or narrow you want your clusters
    •   How many queries you analyze

•   In our case ~2 person-weeks
    •   We had Sophy Bishop
    •   Intern, MSLIS student
Results vs. Time Invested

           Analyze top   Update     Create New   Determine
           clusters      Taxonomy   Metadata     New Search
                                                 Suggestions

2 Hours         X            X



6 Hours         X            X           X



One Week        X            X           X            X
Next Steps: Autosuggest
   Your top clusters probably make up a large
    percentage of what people are looking for
      o Use them to establish/supplement
         auto-suggest!

    Example: suggestions for “innovation”
      o   innovation and leadership
      o   disruptive innovation
      o   innovation management
      o   open innovation
Next Steps: New Access Structures

   Needed an obvious way to search podcasts
    o   Put in best bets for now
   A lot of people searching for article titles
    o   Considering simple interface/approach for select
        field-specific search, e.g. “title”
   Consider adding other facets to browse
    taxonomy where we have entities tagged
    o   “company name”, “job type/class”, etc.
Next Steps

   SEO Optimization Input
    o   Advise authors to use top cluster terms in Titles,
        Abstracts, Keywords
    o   Report on clusters in our monthly analytics reports
        to faculty (“Top search topics/subjects in May 2012
        were…” ; “Searchers found your works with
        following queries”)

   Repeat process on other sites/content
Summary
   Established plan/process, but be willing to tweak
    as you go

   Keep it very simple.
   Play with your data – the more we played, the better
    we understood what benefits could be realized by
    levels of clustering and effort
   Tuning process/results
     o Build staging/working prototypes
     o Repeat process on other sites

   TAKE ACTION!
Thank you!



               Questions?


       sophybishop@gmail.com @sophreads

      searchguy@hbs.edu @ravimynampaty

Más contenido relacionado

Similar a Clustering Search Log Data

5 Strategies to Market in the Digital Age - 2012 Event Marketer Summit
5 Strategies to Market in the Digital Age - 2012 Event Marketer Summit5 Strategies to Market in the Digital Age - 2012 Event Marketer Summit
5 Strategies to Market in the Digital Age - 2012 Event Marketer SummitDavid Rogers
 
Applying Design Thinking Principles in Product Management
Applying Design Thinking Principles in Product ManagementApplying Design Thinking Principles in Product Management
Applying Design Thinking Principles in Product ManagementSVPMA
 
Peru Marketing Symposium 2012 David Rogers
Peru Marketing Symposium 2012 David RogersPeru Marketing Symposium 2012 David Rogers
Peru Marketing Symposium 2012 David RogersDavid Rogers
 
ReformIS Capability Statement
ReformIS Capability StatementReformIS Capability Statement
ReformIS Capability Statementjpmoynihan
 
Why care about brand management?
Why care about brand management?Why care about brand management?
Why care about brand management?Brandworkz
 
Fall 2012 Info Session Slides
Fall 2012 Info Session SlidesFall 2012 Info Session Slides
Fall 2012 Info Session SlidesJamie Thai
 
Software Product Management in Web 2.0
Software Product Management in Web 2.0Software Product Management in Web 2.0
Software Product Management in Web 2.0Suhas Kelkar
 
Citrix systems lnkd ms v3
Citrix systems lnkd ms v3Citrix systems lnkd ms v3
Citrix systems lnkd ms v3cbmoore14
 
MRSC company presentation
MRSC company presentationMRSC company presentation
MRSC company presentationJo Fone
 
MRSC company presentation (U.S.)
MRSC company presentation (U.S.)MRSC company presentation (U.S.)
MRSC company presentation (U.S.)Jo Fone
 
Brand rjvntr brochure
Brand rjvntr brochureBrand rjvntr brochure
Brand rjvntr brochureRoy Wollen
 
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw... Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...Christian Posse
 
Brand Asset Valuation
Brand Asset ValuationBrand Asset Valuation
Brand Asset ValuationChappy_02
 
Consumer Engagement in the Digital Age
Consumer Engagement in the Digital AgeConsumer Engagement in the Digital Age
Consumer Engagement in the Digital AgeGregory Birgé
 
Market xcel profile
Market xcel profileMarket xcel profile
Market xcel profileAlwin Samuel
 
Attractive branding Portfolio2010
Attractive branding Portfolio2010Attractive branding Portfolio2010
Attractive branding Portfolio2010udimenda
 

Similar a Clustering Search Log Data (20)

5 Strategies to Market in the Digital Age - 2012 Event Marketer Summit
5 Strategies to Market in the Digital Age - 2012 Event Marketer Summit5 Strategies to Market in the Digital Age - 2012 Event Marketer Summit
5 Strategies to Market in the Digital Age - 2012 Event Marketer Summit
 
Applying Design Thinking Principles in Product Management
Applying Design Thinking Principles in Product ManagementApplying Design Thinking Principles in Product Management
Applying Design Thinking Principles in Product Management
 
David rogers ingles_bloque_5_y_6
David rogers ingles_bloque_5_y_6David rogers ingles_bloque_5_y_6
David rogers ingles_bloque_5_y_6
 
Energize 2013 slides
Energize 2013 slidesEnergize 2013 slides
Energize 2013 slides
 
Peru Marketing Symposium 2012 David Rogers
Peru Marketing Symposium 2012 David RogersPeru Marketing Symposium 2012 David Rogers
Peru Marketing Symposium 2012 David Rogers
 
ReformIS Capability Statement
ReformIS Capability StatementReformIS Capability Statement
ReformIS Capability Statement
 
Why care about brand management?
Why care about brand management?Why care about brand management?
Why care about brand management?
 
Ddu for ap ms edit
Ddu for ap ms   editDdu for ap ms   edit
Ddu for ap ms edit
 
Fall 2012 Info Session Slides
Fall 2012 Info Session SlidesFall 2012 Info Session Slides
Fall 2012 Info Session Slides
 
Software Product Management in Web 2.0
Software Product Management in Web 2.0Software Product Management in Web 2.0
Software Product Management in Web 2.0
 
Agile Prod Mgmt v. Proj Mgmt
Agile Prod Mgmt v. Proj MgmtAgile Prod Mgmt v. Proj Mgmt
Agile Prod Mgmt v. Proj Mgmt
 
Citrix systems lnkd ms v3
Citrix systems lnkd ms v3Citrix systems lnkd ms v3
Citrix systems lnkd ms v3
 
MRSC company presentation
MRSC company presentationMRSC company presentation
MRSC company presentation
 
MRSC company presentation (U.S.)
MRSC company presentation (U.S.)MRSC company presentation (U.S.)
MRSC company presentation (U.S.)
 
Brand rjvntr brochure
Brand rjvntr brochureBrand rjvntr brochure
Brand rjvntr brochure
 
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw... Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 
Brand Asset Valuation
Brand Asset ValuationBrand Asset Valuation
Brand Asset Valuation
 
Consumer Engagement in the Digital Age
Consumer Engagement in the Digital AgeConsumer Engagement in the Digital Age
Consumer Engagement in the Digital Age
 
Market xcel profile
Market xcel profileMarket xcel profile
Market xcel profile
 
Attractive branding Portfolio2010
Attractive branding Portfolio2010Attractive branding Portfolio2010
Attractive branding Portfolio2010
 

Más de Ravi Mynampaty

Build Your Own World Class Directory Search From Alpha to Omega
Build Your Own World Class Directory Search From Alpha to OmegaBuild Your Own World Class Directory Search From Alpha to Omega
Build Your Own World Class Directory Search From Alpha to OmegaRavi Mynampaty
 
Let Search Power Your Intranet!
Let Search Power Your Intranet!Let Search Power Your Intranet!
Let Search Power Your Intranet!Ravi Mynampaty
 
How we spiked the HBS water supply with Solr
How we spiked the HBS water supply with Solr How we spiked the HBS water supply with Solr
How we spiked the HBS water supply with Solr Ravi Mynampaty
 
Building a Solr-driven Web Portal
Building a Solr-driven Web PortalBuilding a Solr-driven Web Portal
Building a Solr-driven Web PortalRavi Mynampaty
 
Developing a Search & Findability Practice for the Enterprise
Developing a Search & Findability Practice for the EnterpriseDeveloping a Search & Findability Practice for the Enterprise
Developing a Search & Findability Practice for the EnterpriseRavi Mynampaty
 
Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013Ravi Mynampaty
 
How We Incrementally Improved Search
How We Incrementally Improved SearchHow We Incrementally Improved Search
How We Incrementally Improved SearchRavi Mynampaty
 
What to Feed Your Search Engine: The Evolution of Search Analytics at HBS
What to Feed Your Search Engine:  The Evolution of Search Analytics at HBSWhat to Feed Your Search Engine:  The Evolution of Search Analytics at HBS
What to Feed Your Search Engine: The Evolution of Search Analytics at HBSRavi Mynampaty
 
Business owner findability interview questions
Business owner findability interview questionsBusiness owner findability interview questions
Business owner findability interview questionsRavi Mynampaty
 
Developing & Implementing Findability Standards
Developing & Implementing Findability StandardsDeveloping & Implementing Findability Standards
Developing & Implementing Findability StandardsRavi Mynampaty
 

Más de Ravi Mynampaty (13)

Build Your Own World Class Directory Search From Alpha to Omega
Build Your Own World Class Directory Search From Alpha to OmegaBuild Your Own World Class Directory Search From Alpha to Omega
Build Your Own World Class Directory Search From Alpha to Omega
 
Let Search Power Your Intranet!
Let Search Power Your Intranet!Let Search Power Your Intranet!
Let Search Power Your Intranet!
 
How we spiked the HBS water supply with Solr
How we spiked the HBS water supply with Solr How we spiked the HBS water supply with Solr
How we spiked the HBS water supply with Solr
 
Building a Solr-driven Web Portal
Building a Solr-driven Web PortalBuilding a Solr-driven Web Portal
Building a Solr-driven Web Portal
 
Developing a Search & Findability Practice for the Enterprise
Developing a Search & Findability Practice for the EnterpriseDeveloping a Search & Findability Practice for the Enterprise
Developing a Search & Findability Practice for the Enterprise
 
Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013Clustering as presented at UX Poland 2013
Clustering as presented at UX Poland 2013
 
Unix for Librarians
Unix for LibrariansUnix for Librarians
Unix for Librarians
 
How We Incrementally Improved Search
How We Incrementally Improved SearchHow We Incrementally Improved Search
How We Incrementally Improved Search
 
Findability Standards
Findability StandardsFindability Standards
Findability Standards
 
What to Feed Your Search Engine: The Evolution of Search Analytics at HBS
What to Feed Your Search Engine:  The Evolution of Search Analytics at HBSWhat to Feed Your Search Engine:  The Evolution of Search Analytics at HBS
What to Feed Your Search Engine: The Evolution of Search Analytics at HBS
 
Better Search UX
Better Search UXBetter Search UX
Better Search UX
 
Business owner findability interview questions
Business owner findability interview questionsBusiness owner findability interview questions
Business owner findability interview questions
 
Developing & Implementing Findability Standards
Developing & Implementing Findability StandardsDeveloping & Implementing Findability Standards
Developing & Implementing Findability Standards
 

Último

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 

Último (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 

Clustering Search Log Data

  • 1. Clustering Search Query Log Data to Improve Search Sophy Bishop & Ravi Mynampaty Copyright © President & Fellows of Harvard College.
  • 2. Agenda  Background  Five W’s of Clustering • What, why, who, how, when  Is it really repeatable?  Questions
  • 3. About Information Management Services (IMS) Analytics Lifecycle Metadata Mgmt. Mgmt. - Standards - Best Practices - User Needs - Service Models Taxonomy Search Dev.
  • 5. About this talk…  Case study on how we are improving search and browse by performing clustering exercises on your search query data  Not rocket science  High-level overview  You can follow this method, with your own insights and tweaks  You can kick this off next week at your work
  • 6. What is clustering? A process for organizing and analyzing search log data that:  Is repeatable, low-cost, scalable, simple  Yields actionable results  Supports constant incremental improvement to search
  • 7. What’s clustering good for?  Ensure results for high frequency queries  Improve Metadata and Taxonomy  Inform and validate decision making in site IA  Informs editorial/curatorial activities  Provides Feedback for Search Suggestions o Autosuggest, synonym lists, no-hits page suggestions  But more on this later...
  • 8. So how do I cluster search queries? A simple set of steps Create query report Draw Cluster conclusions queries and ACT Determine # Analyze queries to clusters analyze
  • 9. Step 1: Create a query report We started with the site with the most traffic • Upper-bound limit • One year’s data by quarter • Cut off tail at frequency < 10
  • 10. Step 1: Create a query report We started with the site with the most traffic • Upper-bound limit HBS Working Knowledge FY12 Use Snapshot • One year’s data by quarter Overall Traffic • Cut off tail at frequency < 10 Page Views: 6,439,485 Visits: 3,635,746 Unique visitors: 2,734,620 On-site searches: 174,425 Views per Visit: 1.77 Local Search visit rate: 5% Organic Search visit rate: 46%
  • 11. Step 2: Cluster the queries
  • 12. Step 2 (cont’d): Three levels of clustering Level Method Example Narrow Simple Eliminate normalization grammatical, spelling, typos, and punctuation differences Mid-level Group by subject management, finance, decision making Broad Group by facet topic, name, date, content type
  • 13. Step 2 (cont’d): Levels  Tasks Enabled Level Improve your Ensure Improve Improve base for representation Metadata/Index Search query of major /Taxonomy Suggestions analysis clusters on your site Narrow X X X (simple) Mid-level X X X (group by subject) Broad X X (group by facet)
  • 14. Step 2 (cont’d): Narrow Clustering Example
  • 15. Step 2 (cont’d): Mid-level Example Cluster brand branding 245 brand 160 brand management 73 consumer branding 57 global brand 32 service brands 24 brand image retail bank 17 employer branding 16 brand management professional services 16 global branding 13 b2b branding 13 importance of branding 12 brand 2002 12 brand equity 11 brand image 11
  • 16. Cluster brand Step 2 (cont’d): Mid-level Example branding 245 brand 160 brand management 73 consumer branding 57 global brand 32 service brands 24 brand image retail bank 17 employer branding 16 brand management professional services 16 global branding 13 b2b branding 13 importance of branding 12 brand 2002 12 brand equity 11 brand image 11
  • 17. Cluster customer brand Step 2 (cont’d): Mid-level Example 350 333 branding 245 brand 300 160 brand management 73 250 consumer branding 57 global brand 200 32 179 service brands 24 145 brand image retail bank 150 17 employer branding101 111 16 100 brand management professional 88 services 16 50 40 global branding 26 26 25 20 13 19 15 14 12 12 11 11 10 10 10 b2b branding 0 13 importance of branding 12 brand 2002 12 brand equity 11 brand image 11
  • 18. Step 2 (cont’d): Broad Clustering Example
  • 19. Step 2 (cont’d): List of facets we used Facet Example case studies, cases, working papers, articles, content type newspaper date 2011, world in 2030 demographic characteristics women, Gen Y, gender, baby boomers event economic crisis format podcast, video geographic area india, japan, mount everest industry global wine industry independent director, entrepreneur, ceo, phd job type/role economist organization name ikea, zara, toyota person name michael porter, kanter, sebenius product name / brand name ipad product/commodity coffee, wine, cement topic this covers the majority of keywords faculty work, ex: publication name, title of a work case
  • 20. Step 3: Choose #clusters to analyze Number of Analyze Top Hits Improve Metadata/ Supply Search Clusters Taxonomy Suggestions Analyzed /Index 50 X 150 X X 300+ X X X
  • 21. Small # Clusters can cover a lot of your data Number of top clusters % Total Queries Top 20 clusters 14 Top 30 clusters 18 Top 50 clusters 26 Top 100 clusters 37
  • 22. Now you have your clusters… What do you do with them? TAKE ACTION!
  • 23. Analyze Top (“Short Head”) Clusters Clustering has created a condensed and reliable list of your top search queries  Are they what you thought they would be?  Does the information on your site accurately represent the top searches?  Are you fulfilling user needs?
  • 24. Use your clusters: Improve Site Navigation Examine the short-head of clusters, basically:  For each cluster, add up the frequencies of queries  Reorder clusters by cumulative frequency descending  Ensure top clusters are accounted for in your navigation  Use cluster topics as browse/navigation headers/footers for your website
  • 25. WK Top Clusters Cluster Frequency innovation 867 balanced scorecard 794 leadership 570 cases 545 social media 508 negotiation 470 knowledge management 457 ethics 448 apple 430 corporate social responsibility 398
  • 26. Use your clusters: Improve Taxonomy • Missing categories in browse taxonomy • "Balanced Scorecard" • “Ethics” • “Social media” • Second-level topics in the WK context
  • 27. Use your clusters: Improve Taxonomy • Missing categories in browse taxonomy • "Balanced Scorecard" • “Ethics” • “Social media” • Second-level topics in the WK context
  • 28. Use your clusters: Improve Taxonomy • Missing categories in browse taxonomy • "Balanced Scorecard" • “Ethics” • “Social media” • Second-level topics in the WK context
  • 29. Mid-level clustering: Informs editorial /curatorial activities  “Featured Topics” o What topics to highlight this week/month/year o News items to focus on o What research guides to create o How to formulate queries for the topics
  • 30. Use your clusters: Improve Synonym Handling  Clustered list provides synonyms for taxonomy  Requires human judgment and standards/guidelines for synonyms – in our case, synonyms are exact  Map to one "like term" in the search engine Example: Balanced Scorecard, BSC, Balanced score card kaplan and norton -> Balanced Scorecard
  • 31. Use your clusters: Improve no-hits page
  • 32. Time Commitment • 2 hours to 2 weeks • Variables include: • What kind of information you want to gather • How broad or narrow you want your clusters • How many queries you analyze • In our case ~2 person-weeks • We had Sophy Bishop • Intern, MSLIS student
  • 33. Results vs. Time Invested Analyze top Update Create New Determine clusters Taxonomy Metadata New Search Suggestions 2 Hours X X 6 Hours X X X One Week X X X X
  • 34. Next Steps: Autosuggest  Your top clusters probably make up a large percentage of what people are looking for o Use them to establish/supplement auto-suggest! Example: suggestions for “innovation” o innovation and leadership o disruptive innovation o innovation management o open innovation
  • 35. Next Steps: New Access Structures  Needed an obvious way to search podcasts o Put in best bets for now  A lot of people searching for article titles o Considering simple interface/approach for select field-specific search, e.g. “title”  Consider adding other facets to browse taxonomy where we have entities tagged o “company name”, “job type/class”, etc.
  • 36. Next Steps  SEO Optimization Input o Advise authors to use top cluster terms in Titles, Abstracts, Keywords o Report on clusters in our monthly analytics reports to faculty (“Top search topics/subjects in May 2012 were…” ; “Searchers found your works with following queries”)  Repeat process on other sites/content
  • 37. Summary  Established plan/process, but be willing to tweak as you go  Keep it very simple.  Play with your data – the more we played, the better we understood what benefits could be realized by levels of clustering and effort  Tuning process/results o Build staging/working prototypes o Repeat process on other sites  TAKE ACTION!
  • 38. Thank you! Questions? sophybishop@gmail.com @sophreads searchguy@hbs.edu @ravimynampaty