SlideShare una empresa de Scribd logo
1 de 49
Descargar para leer sin conexión
Good Afternoon and many thanks for attending the last session on the last
day of this conference. The focus of this presentation are the many excellent
features contained in MOSS 2007 search. My goal is to show you why these
features are excellent so that you will make use of them. Because, if you do,
you will be able to walk the halls of your organization with your heads held
high and fear no “search sucks” cracks as you do.




                                                                                1
I am a pointy-head and not a propeller-head. While there are technical
references in this presentation, the orientation will be more behavioral and
less technical. There are terrific technical resources contained in the
Resources section and the occasional snippet of code did make its way into
the main section.




                                                                               2
3
UC Berkeley Study on How Much Information: http://www2.sims.berkeley.edu/research/projects/how-much-
info-2003/

Print, film, magnetic, and optical storage media produced about 5 exabytes of new information in 2002.
Ninety-two percent of the new information was stored on magnetic media, mostly in hard disks.
           How big is five exabytes? If digitized with full formatting, the seventeen million books in the Library
           of Congress contain about 136 terabytes of information; five exabytes of information is equivalent in
           size to the information contained in 37,000 new libraries the size of the Library of Congress book
           collections.
           Hard disks store most new information. Ninety-two percent of new information is stored on magnetic
           media, primarily hard disks. Film represents 7% of the total, paper 0.01%, and optical media 0.002%.
           The United States produces about 40% of the world's new stored information, including 33% of the
           world's new printed information, 30% of the world's new film titles, 40% of the world's information
           stored on optical media, and about 50% of the information stored on magnetic media.
           How much new information per person? According to the Population Reference Bureau, the world
           population is 6.3 billion, thus almost 800 MB of recorded information is produced per person each
           year. It would take about 30 feet of books to store the equivalent of 800 MB of information on paper.
We estimate that the amount of new information stored on paper, film, magnetic, and optical media has about
doubled in the last three years.
           Information explosion? We estimate that new stored information grew about 30% a year between
           1999 and 2002.
           Paperless society? The amount of information printed on paper is still increasing, but the vast
           majority of original information on paper is produced by individuals in office documents and postal
           mail, not in formally published titles such as books, newspapers and journals.

Hosted websites [UC Berkeley How Much Information Project]
          •July 1993: 1,776,000
          •July 2005: 353,084,187
Size of the Web [Indexable Web: Guilli & Signorini 2005]
          •1997: 200 million Web pages
          •2005: 11.5 billion pages




                                                                                                                     4
Information Re/volution: Michael Wensch; Kansas State University
http://www.youtube.com/user/mwesch
All of his work is very good


And how we manage information is different because searchers are squishy – some just want to find “it”, others want it to find
them and others want to change it, create it, manipulate it, share it…
•They are searching because they don’t know
•Language and perception are different
         •Some people think women put their stuff in a purse, others a pocketbook, and others a handbag.
         •“Animal” is a mammal, a Sesame Street character, and an uncouth person
•Enterprise information is individualized.
         •Gates Foundation has different issues than PACCAR
         •Providence Healthcare has different types of content than King County Library
         •Codeplex has a different user type [or a more standard one] than Microsoft Virtual Earth




                                                                                                                                 5
Search engines use bots to crawl pages and send compressed data based on grammatical requirements such as stemming [taking
the word down to its most basic root] and stop words [common articles and others stipulated by the company] back to the index.
This index is then inverted so that lookup is done on the basis of record contents and not the document ID which is a completely
different method of data storage and retrieval from other relational database data storage. A complete copy of the Web page
may be stored in the search engine’s cache. With brute force calculation, the system pulls each record from the inverted index
[mapping of words to where they appear in document text]. This is recall or all documents in the corpus with text instances that
match your the term(s).


Search engine indexes are not like relational databases. There is no such thing as normalization, no unique identifiers and the
loosest of structures.


The “secret sauce” for each search engine are algorithms that sort order the recall results in a meaningful fashion. This is precision
or the number of documents from recall that are relevant to your query term(s). All search engines use a common set of values to
refine precision. If the search term used in the title of the document, in heading text, formatted in any way, or used in link text,
the document is considered to be more relevant to the query. If the query term(s) are used frequently throughout the document,
the document is considered to be more relevant.


Another example is Term Frequency - Inverse Document Frequency [TF-IDF] weighting. Here the raw term frequency (TF) of a
term in a document by the term's inverse document frequency (IDF) weight [frequency of occurrence in a particular document
multiplied the number of documents containing the term divided by the number of documents in the entire corpus. [caveat
emptor: high-level, low-level, level-playing-field math are not my strong suits].




                                                                                                                                         6
There is a fundamental difference between Web search and Enterprise search.


Web Search:
•Web search is generic search. One size fits all. Features serve the technology to better enable it to serve the masses.
•Search technology has to work for the broadest document set, those 11 billion plus pages
         •Keys off strong linking [the # and the structure]
         •Links are “editorial” – endorsement of destination content through “vote”
•Millions of publishers that are not required to adhere to any specific standards
•Site structure is not often tied to content or context
•Search engines are constantly fighting attempts to game their technology in the Web search space. Black hat techniques
like cloaking, link farms, spamming, keyword stuffing, Sybil attacks and the like are a blight. They manipulate the results and
reduce user confidence in the system
•Technology changing and refining its operation to rely on both internal [document level] and external [site level] data.
Examples of this would be: IBM’s narrative distiller, MSN link text analysis, Google Scout that finds related hyperlinks, and
Yahoo!’s document segmentation
         Important to note: The PageRank algorithm is a pre-query calculation. It is a value that is assigned as a result of the
         search engine’s indexing of the entire Web and the associated value has no relationship to the user’s information
         need. There have been a number of additions and enhancements to lend some contextual credence to the
         relevance ranking of the results.
Enterprise Search:
•Bounded corpus of content
•Produced and maintained by a limited set of authors
•No strong linking strategy – links mostly for navigation [not editorial]
•Information related in ways that key outside of document content
•Hierarchical structure intended – part of corporate culture
•Publishing guidelines can be established to enforce meta data standards to tune a search appliance and improve relevance
through enforced semantic relationships.
                                                                                                                                   7
In the early days of search engines, Advanced Search was a means for those who could phrase their
queries in Boolean or SQL language to do so for more refined results. As search engines became
more sophisticated, the need for such coding ability discrimination.


Usability studies show that most customers avoid Advanced Search because they assume that it is
too advanced for them. A better method is to offer means for the searcher to refine their own
search using facets based on document type, subject or location.




                                                                                                    8
From MOSS 2007 search Under the Hood PPT by Adir Ron
Search Query Execution:
•The query engine passes the query through a language-specific wordbreaker.
•After wordbreaking, the resulting words are passed through a stemmer to generate language-
specific inflected forms of a given word.
•When the query engine executes a property value query, the index is checked first to get a list of
possible matches.
•If the user does not have permission to a matching document, the query engine filters that
document out of the list that is returned.

Search Architecture
http://www.sharepointblogs.com/heliosa/archive/2007/03/07/enterprise-search-architecture-in-
sharepoint-technologies-2007.aspx
• Index Engine: Processes the chunks of text and properties filtered from content sources, storing
them in the content index and property store.
• Query Engine: Executes keyword and SQL syntax queries against the content index and search
configuration data.
• Protocol Handlers: Opens content sources in their native protocols and exposes documents and
other items to be filtered.
• IFilters: Opens documents and other content source items in their native formats and filters into
chunks of text and properties.
• Property Store: Stores a table of properties and associated values.
• Wordbreakers: Used by the query and index engines to break compound words and phrases into
individual words or tokens.




                                                                                                      9
SPS 2003 was SQL search - different db structure, more classic RDM
MOSS 2007 is indexed search = inverted index based on words not records -- scopes, structured Biz
data search, people search


MOSS 2007
•Click Distance: Browsing distance from authoritative sites: shorter tends to be more relevant
•Anchor Text: Hyperlinks act as annotations on their target
•URL Depth: URLs higher in the hierarchy tend to be more relevant
•URL Matching: Direct matches on text in URLs
•Metadata Extraction: Automatically extract titles and authors from document text
•Automatic Language Detection: Helps bias toward results in your language
•File Type Biasing: For example, PPT docs tend to be more relevant than XLS
•Text Analysis: Traditional text ranking based on matching terms, term frequencies, word variants,
etc.


SPS 2003
•Collection frequency: The number of documents a term appears in compared to total number of
documents. Search terms that occur in only a few documents are likely to be more useful than
terms that occur in many documents.
•Term frequency: The number of occurrences of the search term in a document. The more
frequently a search term appears in a document the more important it is likely to be important for
ranking that document.
•Document length: The length of the searched document. A term that occurs the same number of
times in a short document as in a long one is likely to be more important to the short document.
•Term Position: The position of a word within a document, for example, presence of a term in the
document’s title. A term that appears in a particular component of the document, such as the title,
is more likely to be important for ranking that document.                                             10
Here is where you manage the components that manage search performance and search
experience
Because search is a shared service, you only have to configure in one location
MOSS 2007 enables testing the configuration to ensure performance
Where you put the content is not necessarily where your customers will look for it




                                                                                     11
Better management and control
Better resource management, both hardware and personnel
Agile index changes




                                                          12
Text Analysis [internal]: Traditional text ranking based on such factors as matching terms, term frequencies, and
word variants.


Dynamic and Static ranking: Like other search technology MOSS 2007 Search incorporates both internal [text on
the page, term frequency, page layout and formatting, etc] and external metadata to more closely match user’s
request. However, MOSS 2007 Search incorporates cutting edge technology from Microsoft Search to push beyond
the 1 link=1 vote for quality/relevance of the PageRank model.
        •Click Distance [external]: Browsing distance from authoritative sites (shorter distances tend to be more
        relevant).
        •Anchor Text [external]: Hyperlinks act as annotations on their target. In addition, they tend to be highly
        descriptive.
        •URL Depth [external]: URLs higher in the hierarchy tend to be more relevant.
        •URL Matching [external]: Direct matches on text that's in URLs.
        •Metadata Extraction [internal]: Automatically extracts titles and authors from document text if they are
        missing.
        •Automatic Language [internal]: Detection Helps create preference for results in your language.
        •File Type Biasing [internal]: Certain file types tend to be more relevant (for example, PPT files are often
        more relevant than XLS files).




                                                                                                                       13
You must turn on stemming and PDF indexing




                                             14
Project Description from Codeplex http://www.codeplex.com/FacetedSearch

MOSS Faceted Search is a set of web parts that provide intuitive way to refine search results by
category (facet).

The facets are implemented using SharePoint API and stored within native SharePoint METADATA
store. The solution demonstrates following key features:
Grouping search results by facet
Displaying a total number of hits per facet value
Refining search results by facet value
Update of the facet menu based on refined search criteria
Displaying of the search criteria in a Bread Crumbs
Ability to exclude the chosen facet from the search criteria
Flexibility of the Faceted search configuration and its consistency with MOSS administration




                                                                                                   15
3/23/2009




Estimated dev time to create own FLD file is 3 days (from MS internal)
Best to pass the query through and have destination do relevance ranking (saves bandwidth) than
to access destination index (lose proprietary relevance ranking though)


Day Software Delivers Standardized Connectivity for Open Text Livelink
http://www.econtentmag.com/Articles/ArticleReader.aspx?ArticleID=19280


Using SharePoint 2007 to Index Lotus Notes
http://meiyinglim.blogspot.com/2007/01/using-sharepoint-2007-to-index-lotus.html




                                                                                                        16
Microsoft Knowledge Network: Stored on separate server


Version 1.0 is an add-on product for Enterprise version of Stand-alone Search and for both versions
of Full Product


Refinement/scoping available


Initial results are presented with identity masked – KN server takes user request and sends to
person who can accept or reject the request through the KN server without identity ever being
revealed.




                                                                                                      17
The Business Data Catalogue (BDC) crawls and integrates data from other applications [email servers, line-of-
business applications, external databases, customer relationship management apps] and puts into a cache for
crawl by the search server.


Accesses these repositories with a connector http://msdn.microsoft.com/en-us/library/ms563661.aspx


Available in MOSS 2007 Search Enterprise edition and both version of MOSS 2007 Full Product




                                                                                                                19
3/23/2009




Short term: FAST will remain an independent entity that Microsoft will continue to support on the non-
Windows platforms with a connector for MOSS 2007. Next release will see 2 versions of FAST ESP, a stand-
alone successor and a SharePoint edition that will incorporate the connect and add new features that require
less customization

Relevance by using the underlying semantic relationships
•Categorization
•Transformation (lemmatization)
•Presentation

FAST Platform
          •unity (federation of results from outside resources)
          •admomentum (search driven monetization with ad serving)
          •recommendations (recommendation engine similar to Amazon/Netflicks - based on behavior of user
          base - cookie based, item to item, people to items)
          •featured content (search driven content merchandizing)
          •fast unity (search driven portal experiences)

Core Capabilities
         •phrasing and anti-phrasing: strips out the extraneous terms
         •clustering: comprehension through association
                    •can be taxonomy based or on the Open Source Directory
         •flexible relevancy model: boost block search results - dynamic on per query basis
         •whole equalizer with whole set of knobs - reissues query with different weights based on choices -
         ranking more than filtering - does not change the # of results, changes the order of display
         •can work in conjunction with faceted search




                                                                                                                     20
Search Scopes
Represent a collection of documents mapped to a single element [i.e. authored by, specific directory, file type,
metadata type], no longer tied to an index crawl – effective immediately.
By default, the scope plug-in will create scopes for the following:
        •Display URL
        •Site (domain, sub-domain, host-name)
        •Author
        •All content (used to include all content)
        •Global query exclusions (used to exclude content)

Results Collapsing
Results collapsing can group duplicated or similar results together, so that they are displayed as one entry in the
search result set. This entry includes a link to display the expanded results for that collapsed result set entry. Search
administrators can collapse results for the following content item groups:
         •Duplicates and derivatives of documents
         •Windows SharePoint Services discussion messages for the same topic
         •Microsoft Exchange Server public folder messages for the same conversation topic
         •Current versions of the same document
         •Different language versions of the same document
         •Content from the same site
By default, results collapsing is turned on in Enterprise Search. The search administrator can configure it, however,
either through the Search Administration UI or the Search Administration object model.

Security Trimmed Results: they don’t see what they are not allowed to see

Best Bets: editorially programmed results or what you want them to want to see




                                                                                                                            21
22
23
Report Center
        •Dashboard-style data presentation
        •Keys of document library of reports
        •Can import KPIs

KPIs are a central way of presenting business intelligence for an organization. High level goals for
organization or site
KPIs increase the speed and efficiency of evaluating progress against key business goals. Reduces
the amount of data for analysis
KPIs connect to business data from various sources. Consolidates data against KPI, not repository.

Each KPI gets a single value from a data source, either from a single property or by calculating
averages across the selected data, and then compares that value against a pre-selected value. Data
sources include:
         •Excel workbooks: The data comes from an Excel workbook.
         •SQL Server 2005 Analysis Services: The data comes from database stores known as cubes,
         for connections in a data connection library.
         •Manually entered information: The data is from a static list, rather than based on
         underlying data sources. This is used less frequently, for test purposes prior to deployment
         or on occasions when regular data sources are unavailable but you still want to provide
         performance indicators




                                                                                                        24
Sometimes configuring search can seem like that big ticking box from Acme…




                                                                             25
Frank Lloyd Wright said something along the lines of it being easier to take an eraser to the drafting
table than a sledgehammer to the construction site.




                                                                                                         26
Don’t boil the ocean.
A smaller segment of your content is satisfying a significant portion of your customer searches
Search logs, customer feedback, server logs will reveal this portion




                                                                                                  27
28
3/23/2009




HILLTOP
Performed on a small subset of the corpus that best represents nature of the whole
Ranked according to the number of non-affiliated “experts” point to it – i.e. not in the same site or directory
Affiliation is transitive [if A=B and B=C then A=C]
Beauty of Hilltop is that unlike PageRank, it is query-specific and reinforces the relationship between the authority and the user’s
query. You don’t have to be big or have a thousand links from auto parts sites to be an “authority”
Segmentation of corpus into broad topics
         Subset that is then extrapolated to Web as a whole
Selection of authority sources within these topic areas
         Authorities have lots of non-related pages on the same subject pointing to them
         Quality of links more important than quantity of links
Determination of HUBS (pages that point to many authority sources)
Pre query calculations applied at query time
TOPIC SENSITIVE PR
•Consolidation of Hypertext Induced Topic Selection [HITS] and PageRank
•Pre-query calculation of factors based on subset of corpus: context of term use in document, context of term use in history of
queries and context of term use by user submitting query
•Computes PR based on a set of representational topics [augments PR with content analysis]
•Topic derived from the Open Source directory
•Uses a set of ranking vectors: Pre-query selection of topics + at-query comparison of the similarity of query to topics




                                                                                                                                       29
30
31
32
33
34
35
During the age of early explorers, map makers would insert this phrase when
they reached the edge of their known world.


The “dragons” on the following slides are known issues that Ascentium
developers have discovered in working with MOSS 2007 search or found
through my own research. Few diamonds are flawless. I find it best to
address the shortcomings upfront and have solutions in hand to
mitigate customer pain.




                                                                              36
37
38
39
40
41
42
43
3/23/2009




•Advanced auto-classification, taxonomy management and compound term metadata tagging
technology
•Only statistical metadata generation, auto Classification and taxonomy management vendor in the
world that uses concept extraction and compound term processing
•Proven to deliver the highest precision without the loss of recall
•Only Tagging and classification solution fully integrated with MOSS, Microsoft Office, Exchange and
Microsoft Enterprise Search
•Automatically classifies content at the time creation or ingestion
•Generates compound term metadata (concepts) and stores in SharePoint properties
•Automatic classification within MS Office applications, metadata stored in the document
•Taxonomy Manager -Supports multiple taxonomies
•Priced by server -$95K per production server, $47.5 per staging/test server
•Highly scalable
•Vertical applications (Legal, Finance, eDiscovery, Services, Oil & Gas, Manufacturing, Government,
Education, Life Sciences & Healthcare, Energy & Utilities)
•Horizontal applications (ECM, Document Management, Compliance & Risk Management, Records
Management, Enterprise Search, Portals, Intranets & Information Rich Web Sites




                                                                                                             44
Notes:
         •The weights used in the product were carefully tested. Changes to the weights may also
         have a negative effect on relevance.
         •After you set property.weight you must call the property.Update() method to save the
         change.




                                                                                                   45
46
47
48
Used in custom Web parts to execute queries against the enterprise search service
http://msdn.microsoft.com/en-us/library/ms544561.aspx




                                                                                    49

Más contenido relacionado

La actualidad más candente

An Introduction to Entities in Semantic Search
An Introduction to Entities in Semantic SearchAn Introduction to Entities in Semantic Search
An Introduction to Entities in Semantic SearchDavid Amerland
 
Питер Мика "Making the web searchable"
Питер Мика "Making the web searchable"Питер Мика "Making the web searchable"
Питер Мика "Making the web searchable"Yandex
 
Searching over the past, present and future
Searching over the past, present and futureSearching over the past, present and future
Searching over the past, present and futureRoi Blanco
 
Web search engines and search technology
Web search engines and search technologyWeb search engines and search technology
Web search engines and search technologyStefanos Anastasiadis
 
Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Roi Blanco
 
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence Marina Santini
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataRoi Blanco
 
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLPA NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLPijnlc
 
Books and Webs: Pulling the Down Rows
Books and Webs: Pulling the Down RowsBooks and Webs: Pulling the Down Rows
Books and Webs: Pulling the Down RowsPeter Brantley
 
Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Roi Blanco
 
Wiser Pku Lecture@Life Science School Pku
Wiser Pku Lecture@Life Science School PkuWiser Pku Lecture@Life Science School Pku
Wiser Pku Lecture@Life Science School Pkuguest8ed46d
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignMarianne Sweeny
 

La actualidad más candente (18)

Web Mining
Web Mining Web Mining
Web Mining
 
An Introduction to Entities in Semantic Search
An Introduction to Entities in Semantic SearchAn Introduction to Entities in Semantic Search
An Introduction to Entities in Semantic Search
 
Searching the web general
Searching the web generalSearching the web general
Searching the web general
 
Питер Мика "Making the web searchable"
Питер Мика "Making the web searchable"Питер Мика "Making the web searchable"
Питер Мика "Making the web searchable"
 
Searching over the past, present and future
Searching over the past, present and futureSearching over the past, present and future
Searching over the past, present and future
 
Web search engines and search technology
Web search engines and search technologyWeb search engines and search technology
Web search engines and search technology
 
Mining Web content for Enhanced Search
Mining Web content for Enhanced Search Mining Web content for Enhanced Search
Mining Web content for Enhanced Search
 
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence
SearchInFocus: Exploratory Study on Query Logs and Actionable Intelligence
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Ultra search
Ultra searchUltra search
Ultra search
 
NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
 NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti... NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Insti...
 
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLPA NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
A NOVEL APPROACH FOR INFORMATION RETRIEVAL TECHNIQUE FOR WEB USING NLP
 
Books and Webs: Pulling the Down Rows
Books and Webs: Pulling the Down RowsBooks and Webs: Pulling the Down Rows
Books and Webs: Pulling the Down Rows
 
Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations
 
Web Content Mining
Web Content MiningWeb Content Mining
Web Content Mining
 
Wiser Pku Lecture@Life Science School Pku
Wiser Pku Lecture@Life Science School PkuWiser Pku Lecture@Life Science School Pku
Wiser Pku Lecture@Life Science School Pku
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By Design
 
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
NISO/DCMI Webinar: Cooperative Authority Control: The Virtual International A...
 

Destacado

Analytics: latest developments & guidance
Analytics: latest developments & guidanceAnalytics: latest developments & guidance
Analytics: latest developments & guidanceScreen Pages
 
Screen Pages: introduction & benchmarks
Screen Pages: introduction & benchmarksScreen Pages: introduction & benchmarks
Screen Pages: introduction & benchmarksScreen Pages
 
GroundTruth Map Kibera East Coast Tour
GroundTruth Map Kibera East Coast TourGroundTruth Map Kibera East Coast Tour
GroundTruth Map Kibera East Coast Tourmikel_maron
 
There is a Kitfox
There is a KitfoxThere is a Kitfox
There is a Kitfoxmikel_maron
 

Destacado (8)

Analytics: latest developments & guidance
Analytics: latest developments & guidanceAnalytics: latest developments & guidance
Analytics: latest developments & guidance
 
Screen Pages: introduction & benchmarks
Screen Pages: introduction & benchmarksScreen Pages: introduction & benchmarks
Screen Pages: introduction & benchmarks
 
Snm brain
Snm brainSnm brain
Snm brain
 
GroundTruth Map Kibera East Coast Tour
GroundTruth Map Kibera East Coast TourGroundTruth Map Kibera East Coast Tour
GroundTruth Map Kibera East Coast Tour
 
иип км школа. начало работы
иип  км школа. начало работыиип  км школа. начало работы
иип км школа. начало работы
 
There is a Kitfox
There is a KitfoxThere is a Kitfox
There is a Kitfox
 
Tbl hc3v4
Tbl hc3v4Tbl hc3v4
Tbl hc3v4
 
Presentation on fundraising
Presentation on fundraisingPresentation on fundraising
Presentation on fundraising
 

Similar a Enterprise Search Share Point2009 Best Practices Final

A web content mining application for detecting relevant pages using Jaccard ...
A web content mining application for detecting relevant pages  using Jaccard ...A web content mining application for detecting relevant pages  using Jaccard ...
A web content mining application for detecting relevant pages using Jaccard ...IJECEIAES
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notesAnandh Arumugakan
 
lawTechCamp - Knowledge Management Panel
lawTechCamp - Knowledge Management PanellawTechCamp - Knowledge Management Panel
lawTechCamp - Knowledge Management Panellawtechcamp
 
chapter 1-Overview of Information Retrieval.ppt
chapter 1-Overview of Information Retrieval.pptchapter 1-Overview of Information Retrieval.ppt
chapter 1-Overview of Information Retrieval.pptSamuelKetema1
 
Perception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document ClusteringPerception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document ClusteringIRJET Journal
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Marianne Sweeny
 
Rscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libsRscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libsSusanMRob
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information RetrievalCarsten Eickhoff
 
Information Retrieval (for beginners)
Information Retrieval (for beginners)Information Retrieval (for beginners)
Information Retrieval (for beginners)James Melzer
 
Structured and Unstructured Information Extraction Using Text Mining and Natu...
Structured and Unstructured Information Extraction Using Text Mining and Natu...Structured and Unstructured Information Extraction Using Text Mining and Natu...
Structured and Unstructured Information Extraction Using Text Mining and Natu...rahulmonikasharma
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us? Andrea Volpini
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic WaveKaniska Mandal
 
Search Analytics for Content Strategists
Search Analytics for Content StrategistsSearch Analytics for Content Strategists
Search Analytics for Content StrategistsLouis Rosenfeld
 
Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...Dan Keldsen
 
Towards Semantic APIs for Research Data Services (Invited Talk)
Towards Semantic APIs for Research Data Services (Invited Talk)Towards Semantic APIs for Research Data Services (Invited Talk)
Towards Semantic APIs for Research Data Services (Invited Talk)Anna Fensel
 
Using Search Analytics to Diagnose What’s Ailing your Information Architecture
Using Search Analytics to Diagnose What’s Ailing your Information ArchitectureUsing Search Analytics to Diagnose What’s Ailing your Information Architecture
Using Search Analytics to Diagnose What’s Ailing your Information ArchitectureLouis Rosenfeld
 

Similar a Enterprise Search Share Point2009 Best Practices Final (20)

A web content mining application for detecting relevant pages using Jaccard ...
A web content mining application for detecting relevant pages  using Jaccard ...A web content mining application for detecting relevant pages  using Jaccard ...
A web content mining application for detecting relevant pages using Jaccard ...
 
CS8080 IRT UNIT I NOTES.pdf
CS8080 IRT UNIT I  NOTES.pdfCS8080 IRT UNIT I  NOTES.pdf
CS8080 IRT UNIT I NOTES.pdf
 
CS8080_IRT__UNIT_I_NOTES.pdf
CS8080_IRT__UNIT_I_NOTES.pdfCS8080_IRT__UNIT_I_NOTES.pdf
CS8080_IRT__UNIT_I_NOTES.pdf
 
CS6007 information retrieval - 5 units notes
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
 
Starting a search application
Starting a search applicationStarting a search application
Starting a search application
 
lawTechCamp - Knowledge Management Panel
lawTechCamp - Knowledge Management PanellawTechCamp - Knowledge Management Panel
lawTechCamp - Knowledge Management Panel
 
chapter 1-Overview of Information Retrieval.ppt
chapter 1-Overview of Information Retrieval.pptchapter 1-Overview of Information Retrieval.ppt
chapter 1-Overview of Information Retrieval.ppt
 
Perception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document ClusteringPerception Determined Constructing Algorithm for Document Clustering
Perception Determined Constructing Algorithm for Document Clustering
 
File000162
File000162File000162
File000162
 
Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3Sweeny ux-seo om-cap 2014_v3
Sweeny ux-seo om-cap 2014_v3
 
Rscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libsRscd 2017 bo f data lifecycle data skills for libs
Rscd 2017 bo f data lifecycle data skills for libs
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Information Retrieval (for beginners)
Information Retrieval (for beginners)Information Retrieval (for beginners)
Information Retrieval (for beginners)
 
Structured and Unstructured Information Extraction Using Text Mining and Natu...
Structured and Unstructured Information Extraction Using Text Mining and Natu...Structured and Unstructured Information Extraction Using Text Mining and Natu...
Structured and Unstructured Information Extraction Using Text Mining and Natu...
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic Wave
 
Search Analytics for Content Strategists
Search Analytics for Content StrategistsSearch Analytics for Content Strategists
Search Analytics for Content Strategists
 
Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...
 
Towards Semantic APIs for Research Data Services (Invited Talk)
Towards Semantic APIs for Research Data Services (Invited Talk)Towards Semantic APIs for Research Data Services (Invited Talk)
Towards Semantic APIs for Research Data Services (Invited Talk)
 
Using Search Analytics to Diagnose What’s Ailing your Information Architecture
Using Search Analytics to Diagnose What’s Ailing your Information ArchitectureUsing Search Analytics to Diagnose What’s Ailing your Information Architecture
Using Search Analytics to Diagnose What’s Ailing your Information Architecture
 

Más de Marianne Sweeny

Connection and Context: ROI of AI for Digital Marketing
Connection and Context: ROI of AI for Digital MarketingConnection and Context: ROI of AI for Digital Marketing
Connection and Context: ROI of AI for Digital MarketingMarianne Sweeny
 
Design the Search Experience
Design the Search ExperienceDesign the Search Experience
Design the Search ExperienceMarianne Sweeny
 
Team of Rivals: UX, SEO, Content & Dev UXDC 2015
Team of Rivals: UX, SEO, Content & Dev  UXDC 2015Team of Rivals: UX, SEO, Content & Dev  UXDC 2015
Team of Rivals: UX, SEO, Content & Dev UXDC 2015Marianne Sweeny
 
Sweeny group think-ias2015
Sweeny group think-ias2015Sweeny group think-ias2015
Sweeny group think-ias2015Marianne Sweeny
 
Sweeny smx-social-media-2014 final-with-notes
Sweeny smx-social-media-2014 final-with-notesSweeny smx-social-media-2014 final-with-notes
Sweeny smx-social-media-2014 final-with-notesMarianne Sweeny
 
Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014Marianne Sweeny
 
Smx toronto adv-kw-research-final
Smx toronto adv-kw-research-finalSmx toronto adv-kw-research-final
Smx toronto adv-kw-research-finalMarianne Sweeny
 
Birds Bears and Bs:Optimal SEO for Today's Search Engines
Birds Bears and Bs:Optimal SEO for Today's Search EnginesBirds Bears and Bs:Optimal SEO for Today's Search Engines
Birds Bears and Bs:Optimal SEO for Today's Search EnginesMarianne Sweeny
 
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search LandscapeBearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search LandscapeMarianne Sweeny
 
Configuring share point 2010 just do it
Configuring share point 2010   just do itConfiguring share point 2010   just do it
Configuring share point 2010 just do itMarianne Sweeny
 
Defining the Search Experience
Defining the Search ExperienceDefining the Search Experience
Defining the Search ExperienceMarianne Sweeny
 
Widj social media-is-not-search-v1-1
Widj social media-is-not-search-v1-1Widj social media-is-not-search-v1-1
Widj social media-is-not-search-v1-1Marianne Sweeny
 
Uw Digital Communications Social Media Is Not Search
Uw Digital Communications Social Media Is Not SearchUw Digital Communications Social Media Is Not Search
Uw Digital Communications Social Media Is Not SearchMarianne Sweeny
 
Sweeny Seo30 Web20 Finalversion
Sweeny Seo30 Web20 FinalversionSweeny Seo30 Web20 Finalversion
Sweeny Seo30 Web20 FinalversionMarianne Sweeny
 
Share Point2007 Best Practices Final
Share Point2007 Best Practices FinalShare Point2007 Best Practices Final
Share Point2007 Best Practices FinalMarianne Sweeny
 
Univ Washington Social Media Marketing
Univ Washington Social Media MarketingUniv Washington Social Media Marketing
Univ Washington Social Media MarketingMarianne Sweeny
 
Sweeny Seo30 Web20 Final
Sweeny Seo30 Web20 FinalSweeny Seo30 Web20 Final
Sweeny Seo30 Web20 FinalMarianne Sweeny
 
Incentive Architecture 1224362486736986 8
Incentive Architecture 1224362486736986 8Incentive Architecture 1224362486736986 8
Incentive Architecture 1224362486736986 8Marianne Sweeny
 

Más de Marianne Sweeny (20)

Connection and Context: ROI of AI for Digital Marketing
Connection and Context: ROI of AI for Digital MarketingConnection and Context: ROI of AI for Digital Marketing
Connection and Context: ROI of AI for Digital Marketing
 
Design the Search Experience
Design the Search ExperienceDesign the Search Experience
Design the Search Experience
 
Team of Rivals: UX, SEO, Content & Dev UXDC 2015
Team of Rivals: UX, SEO, Content & Dev  UXDC 2015Team of Rivals: UX, SEO, Content & Dev  UXDC 2015
Team of Rivals: UX, SEO, Content & Dev UXDC 2015
 
Sweeny group think-ias2015
Sweeny group think-ias2015Sweeny group think-ias2015
Sweeny group think-ias2015
 
Sweeny smx-social-media-2014 final-with-notes
Sweeny smx-social-media-2014 final-with-notesSweeny smx-social-media-2014 final-with-notes
Sweeny smx-social-media-2014 final-with-notes
 
Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014Smashing silos ia-ux-meetup-mar112014
Smashing silos ia-ux-meetup-mar112014
 
Smx toronto adv-kw-research-final
Smx toronto adv-kw-research-finalSmx toronto adv-kw-research-final
Smx toronto adv-kw-research-final
 
Birds Bears and Bs:Optimal SEO for Today's Search Engines
Birds Bears and Bs:Optimal SEO for Today's Search EnginesBirds Bears and Bs:Optimal SEO for Today's Search Engines
Birds Bears and Bs:Optimal SEO for Today's Search Engines
 
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search LandscapeBearish SEO: Defining the User Experience for Google’s Panda Search Landscape
Bearish SEO: Defining the User Experience for Google’s Panda Search Landscape
 
Configuring share point 2010 just do it
Configuring share point 2010   just do itConfiguring share point 2010   just do it
Configuring share point 2010 just do it
 
Defining the Search Experience
Defining the Search ExperienceDefining the Search Experience
Defining the Search Experience
 
Not Your Mom's SEO
Not Your Mom's SEONot Your Mom's SEO
Not Your Mom's SEO
 
Widj social media-is-not-search-v1-1
Widj social media-is-not-search-v1-1Widj social media-is-not-search-v1-1
Widj social media-is-not-search-v1-1
 
Uw Digital Communications Social Media Is Not Search
Uw Digital Communications Social Media Is Not SearchUw Digital Communications Social Media Is Not Search
Uw Digital Communications Social Media Is Not Search
 
Sweeny Seo30 Web20 Finalversion
Sweeny Seo30 Web20 FinalversionSweeny Seo30 Web20 Finalversion
Sweeny Seo30 Web20 Finalversion
 
Search V Next Final
Search V Next FinalSearch V Next Final
Search V Next Final
 
Share Point2007 Best Practices Final
Share Point2007 Best Practices FinalShare Point2007 Best Practices Final
Share Point2007 Best Practices Final
 
Univ Washington Social Media Marketing
Univ Washington Social Media MarketingUniv Washington Social Media Marketing
Univ Washington Social Media Marketing
 
Sweeny Seo30 Web20 Final
Sweeny Seo30 Web20 FinalSweeny Seo30 Web20 Final
Sweeny Seo30 Web20 Final
 
Incentive Architecture 1224362486736986 8
Incentive Architecture 1224362486736986 8Incentive Architecture 1224362486736986 8
Incentive Architecture 1224362486736986 8
 

Último

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 

Enterprise Search Share Point2009 Best Practices Final

  • 1. Good Afternoon and many thanks for attending the last session on the last day of this conference. The focus of this presentation are the many excellent features contained in MOSS 2007 search. My goal is to show you why these features are excellent so that you will make use of them. Because, if you do, you will be able to walk the halls of your organization with your heads held high and fear no “search sucks” cracks as you do. 1
  • 2. I am a pointy-head and not a propeller-head. While there are technical references in this presentation, the orientation will be more behavioral and less technical. There are terrific technical resources contained in the Resources section and the occasional snippet of code did make its way into the main section. 2
  • 3. 3
  • 4. UC Berkeley Study on How Much Information: http://www2.sims.berkeley.edu/research/projects/how-much- info-2003/ Print, film, magnetic, and optical storage media produced about 5 exabytes of new information in 2002. Ninety-two percent of the new information was stored on magnetic media, mostly in hard disks. How big is five exabytes? If digitized with full formatting, the seventeen million books in the Library of Congress contain about 136 terabytes of information; five exabytes of information is equivalent in size to the information contained in 37,000 new libraries the size of the Library of Congress book collections. Hard disks store most new information. Ninety-two percent of new information is stored on magnetic media, primarily hard disks. Film represents 7% of the total, paper 0.01%, and optical media 0.002%. The United States produces about 40% of the world's new stored information, including 33% of the world's new printed information, 30% of the world's new film titles, 40% of the world's information stored on optical media, and about 50% of the information stored on magnetic media. How much new information per person? According to the Population Reference Bureau, the world population is 6.3 billion, thus almost 800 MB of recorded information is produced per person each year. It would take about 30 feet of books to store the equivalent of 800 MB of information on paper. We estimate that the amount of new information stored on paper, film, magnetic, and optical media has about doubled in the last three years. Information explosion? We estimate that new stored information grew about 30% a year between 1999 and 2002. Paperless society? The amount of information printed on paper is still increasing, but the vast majority of original information on paper is produced by individuals in office documents and postal mail, not in formally published titles such as books, newspapers and journals. Hosted websites [UC Berkeley How Much Information Project] •July 1993: 1,776,000 •July 2005: 353,084,187 Size of the Web [Indexable Web: Guilli & Signorini 2005] •1997: 200 million Web pages •2005: 11.5 billion pages 4
  • 5. Information Re/volution: Michael Wensch; Kansas State University http://www.youtube.com/user/mwesch All of his work is very good And how we manage information is different because searchers are squishy – some just want to find “it”, others want it to find them and others want to change it, create it, manipulate it, share it… •They are searching because they don’t know •Language and perception are different •Some people think women put their stuff in a purse, others a pocketbook, and others a handbag. •“Animal” is a mammal, a Sesame Street character, and an uncouth person •Enterprise information is individualized. •Gates Foundation has different issues than PACCAR •Providence Healthcare has different types of content than King County Library •Codeplex has a different user type [or a more standard one] than Microsoft Virtual Earth 5
  • 6. Search engines use bots to crawl pages and send compressed data based on grammatical requirements such as stemming [taking the word down to its most basic root] and stop words [common articles and others stipulated by the company] back to the index. This index is then inverted so that lookup is done on the basis of record contents and not the document ID which is a completely different method of data storage and retrieval from other relational database data storage. A complete copy of the Web page may be stored in the search engine’s cache. With brute force calculation, the system pulls each record from the inverted index [mapping of words to where they appear in document text]. This is recall or all documents in the corpus with text instances that match your the term(s). Search engine indexes are not like relational databases. There is no such thing as normalization, no unique identifiers and the loosest of structures. The “secret sauce” for each search engine are algorithms that sort order the recall results in a meaningful fashion. This is precision or the number of documents from recall that are relevant to your query term(s). All search engines use a common set of values to refine precision. If the search term used in the title of the document, in heading text, formatted in any way, or used in link text, the document is considered to be more relevant to the query. If the query term(s) are used frequently throughout the document, the document is considered to be more relevant. Another example is Term Frequency - Inverse Document Frequency [TF-IDF] weighting. Here the raw term frequency (TF) of a term in a document by the term's inverse document frequency (IDF) weight [frequency of occurrence in a particular document multiplied the number of documents containing the term divided by the number of documents in the entire corpus. [caveat emptor: high-level, low-level, level-playing-field math are not my strong suits]. 6
  • 7. There is a fundamental difference between Web search and Enterprise search. Web Search: •Web search is generic search. One size fits all. Features serve the technology to better enable it to serve the masses. •Search technology has to work for the broadest document set, those 11 billion plus pages •Keys off strong linking [the # and the structure] •Links are “editorial” – endorsement of destination content through “vote” •Millions of publishers that are not required to adhere to any specific standards •Site structure is not often tied to content or context •Search engines are constantly fighting attempts to game their technology in the Web search space. Black hat techniques like cloaking, link farms, spamming, keyword stuffing, Sybil attacks and the like are a blight. They manipulate the results and reduce user confidence in the system •Technology changing and refining its operation to rely on both internal [document level] and external [site level] data. Examples of this would be: IBM’s narrative distiller, MSN link text analysis, Google Scout that finds related hyperlinks, and Yahoo!’s document segmentation Important to note: The PageRank algorithm is a pre-query calculation. It is a value that is assigned as a result of the search engine’s indexing of the entire Web and the associated value has no relationship to the user’s information need. There have been a number of additions and enhancements to lend some contextual credence to the relevance ranking of the results. Enterprise Search: •Bounded corpus of content •Produced and maintained by a limited set of authors •No strong linking strategy – links mostly for navigation [not editorial] •Information related in ways that key outside of document content •Hierarchical structure intended – part of corporate culture •Publishing guidelines can be established to enforce meta data standards to tune a search appliance and improve relevance through enforced semantic relationships. 7
  • 8. In the early days of search engines, Advanced Search was a means for those who could phrase their queries in Boolean or SQL language to do so for more refined results. As search engines became more sophisticated, the need for such coding ability discrimination. Usability studies show that most customers avoid Advanced Search because they assume that it is too advanced for them. A better method is to offer means for the searcher to refine their own search using facets based on document type, subject or location. 8
  • 9. From MOSS 2007 search Under the Hood PPT by Adir Ron Search Query Execution: •The query engine passes the query through a language-specific wordbreaker. •After wordbreaking, the resulting words are passed through a stemmer to generate language- specific inflected forms of a given word. •When the query engine executes a property value query, the index is checked first to get a list of possible matches. •If the user does not have permission to a matching document, the query engine filters that document out of the list that is returned. Search Architecture http://www.sharepointblogs.com/heliosa/archive/2007/03/07/enterprise-search-architecture-in- sharepoint-technologies-2007.aspx • Index Engine: Processes the chunks of text and properties filtered from content sources, storing them in the content index and property store. • Query Engine: Executes keyword and SQL syntax queries against the content index and search configuration data. • Protocol Handlers: Opens content sources in their native protocols and exposes documents and other items to be filtered. • IFilters: Opens documents and other content source items in their native formats and filters into chunks of text and properties. • Property Store: Stores a table of properties and associated values. • Wordbreakers: Used by the query and index engines to break compound words and phrases into individual words or tokens. 9
  • 10. SPS 2003 was SQL search - different db structure, more classic RDM MOSS 2007 is indexed search = inverted index based on words not records -- scopes, structured Biz data search, people search MOSS 2007 •Click Distance: Browsing distance from authoritative sites: shorter tends to be more relevant •Anchor Text: Hyperlinks act as annotations on their target •URL Depth: URLs higher in the hierarchy tend to be more relevant •URL Matching: Direct matches on text in URLs •Metadata Extraction: Automatically extract titles and authors from document text •Automatic Language Detection: Helps bias toward results in your language •File Type Biasing: For example, PPT docs tend to be more relevant than XLS •Text Analysis: Traditional text ranking based on matching terms, term frequencies, word variants, etc. SPS 2003 •Collection frequency: The number of documents a term appears in compared to total number of documents. Search terms that occur in only a few documents are likely to be more useful than terms that occur in many documents. •Term frequency: The number of occurrences of the search term in a document. The more frequently a search term appears in a document the more important it is likely to be important for ranking that document. •Document length: The length of the searched document. A term that occurs the same number of times in a short document as in a long one is likely to be more important to the short document. •Term Position: The position of a word within a document, for example, presence of a term in the document’s title. A term that appears in a particular component of the document, such as the title, is more likely to be important for ranking that document. 10
  • 11. Here is where you manage the components that manage search performance and search experience Because search is a shared service, you only have to configure in one location MOSS 2007 enables testing the configuration to ensure performance Where you put the content is not necessarily where your customers will look for it 11
  • 12. Better management and control Better resource management, both hardware and personnel Agile index changes 12
  • 13. Text Analysis [internal]: Traditional text ranking based on such factors as matching terms, term frequencies, and word variants. Dynamic and Static ranking: Like other search technology MOSS 2007 Search incorporates both internal [text on the page, term frequency, page layout and formatting, etc] and external metadata to more closely match user’s request. However, MOSS 2007 Search incorporates cutting edge technology from Microsoft Search to push beyond the 1 link=1 vote for quality/relevance of the PageRank model. •Click Distance [external]: Browsing distance from authoritative sites (shorter distances tend to be more relevant). •Anchor Text [external]: Hyperlinks act as annotations on their target. In addition, they tend to be highly descriptive. •URL Depth [external]: URLs higher in the hierarchy tend to be more relevant. •URL Matching [external]: Direct matches on text that's in URLs. •Metadata Extraction [internal]: Automatically extracts titles and authors from document text if they are missing. •Automatic Language [internal]: Detection Helps create preference for results in your language. •File Type Biasing [internal]: Certain file types tend to be more relevant (for example, PPT files are often more relevant than XLS files). 13
  • 14. You must turn on stemming and PDF indexing 14
  • 15. Project Description from Codeplex http://www.codeplex.com/FacetedSearch MOSS Faceted Search is a set of web parts that provide intuitive way to refine search results by category (facet). The facets are implemented using SharePoint API and stored within native SharePoint METADATA store. The solution demonstrates following key features: Grouping search results by facet Displaying a total number of hits per facet value Refining search results by facet value Update of the facet menu based on refined search criteria Displaying of the search criteria in a Bread Crumbs Ability to exclude the chosen facet from the search criteria Flexibility of the Faceted search configuration and its consistency with MOSS administration 15
  • 16. 3/23/2009 Estimated dev time to create own FLD file is 3 days (from MS internal) Best to pass the query through and have destination do relevance ranking (saves bandwidth) than to access destination index (lose proprietary relevance ranking though) Day Software Delivers Standardized Connectivity for Open Text Livelink http://www.econtentmag.com/Articles/ArticleReader.aspx?ArticleID=19280 Using SharePoint 2007 to Index Lotus Notes http://meiyinglim.blogspot.com/2007/01/using-sharepoint-2007-to-index-lotus.html 16
  • 17. Microsoft Knowledge Network: Stored on separate server Version 1.0 is an add-on product for Enterprise version of Stand-alone Search and for both versions of Full Product Refinement/scoping available Initial results are presented with identity masked – KN server takes user request and sends to person who can accept or reject the request through the KN server without identity ever being revealed. 17
  • 18.
  • 19. The Business Data Catalogue (BDC) crawls and integrates data from other applications [email servers, line-of- business applications, external databases, customer relationship management apps] and puts into a cache for crawl by the search server. Accesses these repositories with a connector http://msdn.microsoft.com/en-us/library/ms563661.aspx Available in MOSS 2007 Search Enterprise edition and both version of MOSS 2007 Full Product 19
  • 20. 3/23/2009 Short term: FAST will remain an independent entity that Microsoft will continue to support on the non- Windows platforms with a connector for MOSS 2007. Next release will see 2 versions of FAST ESP, a stand- alone successor and a SharePoint edition that will incorporate the connect and add new features that require less customization Relevance by using the underlying semantic relationships •Categorization •Transformation (lemmatization) •Presentation FAST Platform •unity (federation of results from outside resources) •admomentum (search driven monetization with ad serving) •recommendations (recommendation engine similar to Amazon/Netflicks - based on behavior of user base - cookie based, item to item, people to items) •featured content (search driven content merchandizing) •fast unity (search driven portal experiences) Core Capabilities •phrasing and anti-phrasing: strips out the extraneous terms •clustering: comprehension through association •can be taxonomy based or on the Open Source Directory •flexible relevancy model: boost block search results - dynamic on per query basis •whole equalizer with whole set of knobs - reissues query with different weights based on choices - ranking more than filtering - does not change the # of results, changes the order of display •can work in conjunction with faceted search 20
  • 21. Search Scopes Represent a collection of documents mapped to a single element [i.e. authored by, specific directory, file type, metadata type], no longer tied to an index crawl – effective immediately. By default, the scope plug-in will create scopes for the following: •Display URL •Site (domain, sub-domain, host-name) •Author •All content (used to include all content) •Global query exclusions (used to exclude content) Results Collapsing Results collapsing can group duplicated or similar results together, so that they are displayed as one entry in the search result set. This entry includes a link to display the expanded results for that collapsed result set entry. Search administrators can collapse results for the following content item groups: •Duplicates and derivatives of documents •Windows SharePoint Services discussion messages for the same topic •Microsoft Exchange Server public folder messages for the same conversation topic •Current versions of the same document •Different language versions of the same document •Content from the same site By default, results collapsing is turned on in Enterprise Search. The search administrator can configure it, however, either through the Search Administration UI or the Search Administration object model. Security Trimmed Results: they don’t see what they are not allowed to see Best Bets: editorially programmed results or what you want them to want to see 21
  • 22. 22
  • 23. 23
  • 24. Report Center •Dashboard-style data presentation •Keys of document library of reports •Can import KPIs KPIs are a central way of presenting business intelligence for an organization. High level goals for organization or site KPIs increase the speed and efficiency of evaluating progress against key business goals. Reduces the amount of data for analysis KPIs connect to business data from various sources. Consolidates data against KPI, not repository. Each KPI gets a single value from a data source, either from a single property or by calculating averages across the selected data, and then compares that value against a pre-selected value. Data sources include: •Excel workbooks: The data comes from an Excel workbook. •SQL Server 2005 Analysis Services: The data comes from database stores known as cubes, for connections in a data connection library. •Manually entered information: The data is from a static list, rather than based on underlying data sources. This is used less frequently, for test purposes prior to deployment or on occasions when regular data sources are unavailable but you still want to provide performance indicators 24
  • 25. Sometimes configuring search can seem like that big ticking box from Acme… 25
  • 26. Frank Lloyd Wright said something along the lines of it being easier to take an eraser to the drafting table than a sledgehammer to the construction site. 26
  • 27. Don’t boil the ocean. A smaller segment of your content is satisfying a significant portion of your customer searches Search logs, customer feedback, server logs will reveal this portion 27
  • 28. 28
  • 29. 3/23/2009 HILLTOP Performed on a small subset of the corpus that best represents nature of the whole Ranked according to the number of non-affiliated “experts” point to it – i.e. not in the same site or directory Affiliation is transitive [if A=B and B=C then A=C] Beauty of Hilltop is that unlike PageRank, it is query-specific and reinforces the relationship between the authority and the user’s query. You don’t have to be big or have a thousand links from auto parts sites to be an “authority” Segmentation of corpus into broad topics Subset that is then extrapolated to Web as a whole Selection of authority sources within these topic areas Authorities have lots of non-related pages on the same subject pointing to them Quality of links more important than quantity of links Determination of HUBS (pages that point to many authority sources) Pre query calculations applied at query time TOPIC SENSITIVE PR •Consolidation of Hypertext Induced Topic Selection [HITS] and PageRank •Pre-query calculation of factors based on subset of corpus: context of term use in document, context of term use in history of queries and context of term use by user submitting query •Computes PR based on a set of representational topics [augments PR with content analysis] •Topic derived from the Open Source directory •Uses a set of ranking vectors: Pre-query selection of topics + at-query comparison of the similarity of query to topics 29
  • 30. 30
  • 31. 31
  • 32. 32
  • 33. 33
  • 34. 34
  • 35. 35
  • 36. During the age of early explorers, map makers would insert this phrase when they reached the edge of their known world. The “dragons” on the following slides are known issues that Ascentium developers have discovered in working with MOSS 2007 search or found through my own research. Few diamonds are flawless. I find it best to address the shortcomings upfront and have solutions in hand to mitigate customer pain. 36
  • 37. 37
  • 38. 38
  • 39. 39
  • 40. 40
  • 41. 41
  • 42. 42
  • 43. 43
  • 44. 3/23/2009 •Advanced auto-classification, taxonomy management and compound term metadata tagging technology •Only statistical metadata generation, auto Classification and taxonomy management vendor in the world that uses concept extraction and compound term processing •Proven to deliver the highest precision without the loss of recall •Only Tagging and classification solution fully integrated with MOSS, Microsoft Office, Exchange and Microsoft Enterprise Search •Automatically classifies content at the time creation or ingestion •Generates compound term metadata (concepts) and stores in SharePoint properties •Automatic classification within MS Office applications, metadata stored in the document •Taxonomy Manager -Supports multiple taxonomies •Priced by server -$95K per production server, $47.5 per staging/test server •Highly scalable •Vertical applications (Legal, Finance, eDiscovery, Services, Oil & Gas, Manufacturing, Government, Education, Life Sciences & Healthcare, Energy & Utilities) •Horizontal applications (ECM, Document Management, Compliance & Risk Management, Records Management, Enterprise Search, Portals, Intranets & Information Rich Web Sites 44
  • 45. Notes: •The weights used in the product were carefully tested. Changes to the weights may also have a negative effect on relevance. •After you set property.weight you must call the property.Update() method to save the change. 45
  • 46. 46
  • 47. 47
  • 48. 48
  • 49. Used in custom Web parts to execute queries against the enterprise search service http://msdn.microsoft.com/en-us/library/ms544561.aspx 49